Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118843 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 56171 invoked from network); 18 Oct 2022 17:48:57 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 18 Oct 2022 17:48:57 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 705231804A9 for ; Tue, 18 Oct 2022 10:48:56 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 18 Oct 2022 10:48:55 -0700 (PDT) Received: by mail-wr1-f51.google.com with SMTP id a3so24862751wrt.0 for ; Tue, 18 Oct 2022 10:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=3hnqsFf8uvQ51Gowv0aPKgNLul82jmgJWKveNjAyy3Q=; b=jIxBiZL6xAK7St8god3AFvCzg11265qeOjFUQE/12hK76K/FW2eJ4yTIy3rx45WR8S 5Y1EWk/G0+qdgAD86HDmnm1zA3ebrTNq5rRD8Zicm7vEbu1GyF57kFw3azHihy3uSTh4 rjXdDnRg8HVNci2izNQ6YJP2gWPHLoJzH1s26PFC6y6/c3hXfH7LOrv8MLjsW8Zs9nyP mRhwRqxicphJOn7NvR6VhTyBRjGX4oOBt7zTQaQVE2axgtw29ELgPbcXkQX1e2oIk9sb 7OyzXFbFIV0fUxnxxQWdNCLxJ+E78Y+Td3y/4tmIlN7PXLLXYq6NQb2bFS5+TkRehQKE 6Kjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3hnqsFf8uvQ51Gowv0aPKgNLul82jmgJWKveNjAyy3Q=; b=udUPICtRWaKJ4vzs+iBLnOgabRFd0gq3cjyWkGiDNDc10Pi3m322iRN2oxDfD8T8cV To4Ep2z3UxVMwzqvSqPAbOyhkrPZMlC1C1X6bD7WiUTysus9Rs0qeTltttZtjDdJ167G NMHfoOHi+7zT4fjkUQwtF6ZJfatP8NsPjK2J8C1UFCxskB9FO9EZqyGXlmpYpRu16jkh KYeP3df8ZSESk9zFvRZvuYCIDSlVgqPT3LnX2kKJ1mUNAM8T4kq2dk/8XGUTyx1JscvD j3TsrhSg3lJFO2FbVNPkW20C7ARyOATSVbJdYCnqa3pb4+ho+wjmT2nz67rDbWUSwkDd gA/w== X-Gm-Message-State: ACrzQf1iZ8r6vNp45xZRKIU6HXfBHRm26wrZvsjRgAF5DU7vntg9bbaK dsPxewl1eFC6ThPPrUJA29/2OH8UBQsbhqgtK/p0buuymCo= X-Google-Smtp-Source: AMsMyM46zdpry4soTK70ziWoHoQPwFGrJ+vscPQ2P7FtYAbJ2XpYnE+j09Fy1cD+UvQIxWbBuh7b2dshSKFkNRRwpd8= X-Received: by 2002:a5d:6485:0:b0:230:b6db:d41c with SMTP id o5-20020a5d6485000000b00230b6dbd41cmr2561672wri.709.1666115334275; Tue, 18 Oct 2022 10:48:54 -0700 (PDT) MIME-Version: 1.0 References: <22177032-fe72-c39b-63fe-fa4368a70852@bastelstu.be> <0d950416-372b-1ff5-21c2-0c6720d47a07@bastelstu.be> In-Reply-To: <0d950416-372b-1ff5-21c2-0c6720d47a07@bastelstu.be> Date: Tue, 18 Oct 2022 19:48:42 +0200 Message-ID: To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000a7871f05eb52b471" Subject: Re: [PHP-DEV] [VOTE] Improve unserialize() error handling From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --000000000000a7871f05eb52b471 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > > The other failing tests involve "unserialize_callback_func" to graceful= ly > > detect class-not-found, and they are all plain broken by the RFC. > > > > I created this patch/PR to show the changes that would be required on > > Symfony to work around the BC break: > > https://github.com/symfony/symfony/pull/47882 > > > > Note to readers: in this whole discussion, Symfony is just an example o= f > > affected code. In the end, Symfony will adapt to the RFC if it passes. > The > > point being made is that PHP scripts should not have to be patched to r= un > > on newer minor versions of PHP. That's what "keeping BC" means. > > Scripts will not need to be changed, unless they are already relying on > observed behavior instead of the documented behavior. When I use Symfony's 'PhpSerializer' and perform the following call: > > > $serializer->decode([ > > 'body' =3D> '{"message": "bar"}', > > ]); > > Then I can expect the input to be gracefully rejected with a > MessageDecodingFailedException. This behavior is tested in the > 'testDecodingFailsWithBadFormat' test. > > But *under no circumstances* must I perform the following method call: > > > $serializer->decode([ > > 'body' =3D> 'O:8:"DateTime":0:{}', > > ]); > > Because this payload is invalid and thus must not exist in practice. > That's the current behavior, yes. It allows graceful errors when one misconfigures their workers and sends jsons to a worker that expects php-ser. Under no circumstances do we need to guard against adversary payloads. We could handle the situation you describe of course, but that'd be just dead code in practice. =C2=AF\_(=E3=83=84)_/=C2=AF > > Breaking BC in the name of such invalid payloads doesn't make an argume= nt > > in favor of the RFC. > > I demonstrated how to adjust 'PhpSerializer' to improve the behavior for > what I consider a reasonable use case without breaking any existing > tests in step3b.diff and step3c.diff. At the same time the fixed version > will automatically handle the change this RFC proposes. I even offered > to adjust the RFC implementation to preserve the original Exception > message and code. > > >> Failed asserting that exception message 'Could not decode message usin= g > >> PHP serialization: O:13:"ReceivedSt0mp":0:{}.' matches '/class > >> "ReceivedSt0mp" not found/'. > >> > >> Okay, now the Exception message changed. Personally I do not consider > >> this a BC break: I believe Exception messages are meant for human > >> consumption, not for programs. Otherwise fixing a typo in the message > >> would be a BC break. If the code wants to learn about the cause, it > >> should either use the '$code' or different types of Exception should b= e > >> thrown to clarify the cause by entering a different catch() block. > >> > > > > Yes, the specific error message should be part of the BC promise. This > > allows building test suites that can assert the message in a stable way= . > > I'm not talking about test suites here. I believe makes sense to verify > the error message to ensure a specific error message is emitted to the > human observer in the error log. > Breaking test suites is the BC issue I wanted to highlight. > I was talking about code that does something like this, which I consider > to be inherently unsafe: > > try { =E2=80=A6 } > catch (SomeException $e) { > if ($e->getMessage() =3D=3D=3D 'Foobar') doSomething(); > else doSomethingElse(); > } > > As a library author I want to be able to provide the best possible > Exception message to ease debugging for the user. This is not possible > if I am locked into a bad choice forever. > > > This is also why we don't change the output of > var_dump/print_r/var_export: > > they're written now, in the same of BC, for the best of stability. (I'v= e > > barely read any PHP code where exception's code is used in any useful > BTW - > > that can't be a solution.) > > PDO is an example where the $code is useful, because it exposes the > standardized SQL error code. Similarly a HTTP client could expose the > status code within $code. > > For other libraries simply using a different Exception class within the > same hierarchy is often sufficient to differentiate between different > errors, because programmatically I often don't care *why exactly* an > operation failed. > > > More importantly, the type of thrown exceptions should undoubtedly be > part > > of the BC promise. > > I agree. > > > Wrapping exceptions inside UnserializationFailedException breaks existi= ng > > catch/instanceof checks (see my PR above.) > > > > unserialize() only promises you that a Throwable might be thrown: > > > https://www.php.net/manual/en/function.unserialize.php#refsect1-function.= unserialize-errors Breaking BC at will unless it's properly documented looks like a terrible policy. It'd make developing in PHP really risky. Especially here: expecting that an exception would bubble down is just what everybody would expect from unserialize(), like everything else in the engine. It always did, so ppl built on that. Nothing fancy here. > Symfony is relying on undocumented behavior. > > > >> All green! We **did not have to change anything** after fixing the > >> implementation in step3a.diff for PHP 8.2 and earlier! > >> > > > > We did change the exception hierarchy, both in terms of types and in > terms > > of stack of "previous". In this specific case, we can discuss the merit= s > of > > If not even the $previous value may change, then we are deep in > https://xkcd.com/1172/ territory. The whole purpose of $previous is > wrapping exceptions that come up as an implementation detail and then > exposing a well-defined Exception at the border of your library / > component and at the same time exposing the underlying cause to the > developer to simplify debugging. > > > wrapping all throwables or not. But in doubt the other failing cases I > > reported clearly highlight the BC break. > > > > As I've mentioned at the beginning of this email, the other locations > are also broken with the current PHP versions. > > -------------- > > Let me work through ResourceCheckerConfigCache as another example. As > with the 'PhpSerializer' example, I'm working on the assumption that the > input to 'unserialize' might be broken. If it would be impossible for it > to be broken, then we do not need error handling in the first place and > the RFC would not change anything, because it only concerns itself with > error handling. > > So let's add a new test case to 'ResourceCheckerConfigCacheTest'. > Specifically I am implementing a new resource that implements > 'SelfCheckingResourceInterface': > > > final class MyResource implements > \Symfony\Component\Config\Resource\SelfCheckingResourceInterface > > { > > public function __construct( > > private \DateTimeImmutable $dti > > ) {} > > > > public function isFresh(int $timestamp): bool { return false; } > > > > public function __toString(): string { return ''; } > > } > > This resource stores a DTI internally, but the specific object does not > really matter. Anything that implements __unserialize() would work. > > Now let me add a test that uses this resource. Unfortunately my meta > data file got corrupted on disk, because some unrelated process stomped > over it and replaced all '-' by an 'x'. > > > public function testCacheIsNotFreshWhenUnserializeFails2() > > { > > $checker =3D $this->createMock(ResourceCheckerInterface::class)= ; > > $cache =3D new ResourceCheckerConfigCache($this->cacheFile, > [$checker]); > > $cache->write('foo', [new MyResource(new > \DateTimeImmutable('now'))]); > > > > $metaFile =3D "{$this->cacheFile}.meta"; > > file_put_contents($metaFile, str_replace('-', 'x', > file_get_contents($metaFile))); > > > > $this->assertFalse($cache->isFresh()); > > } > > Similarly to the test case without the '2' suffix, I expect the cache to > be ignored, because it can simply be recreated. Let's test this: > > > 1) > Symfony\Component\Config\Tests\ResourceCheckerConfigCacheTest::testCacheI= sNotFreshWhenUnserializeFails2 > > Error: Invalid serialization data for DateTimeImmutable object > > > > /app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:159 > > /app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:77 > > > /app/src/Symfony/Component/Config/Tests/ResourceCheckerConfigCacheTest.ph= p:152 > > Oh no. The test failed. To fix this we just need remove everything from > the body of the catch in > ResourceCheckerConfigCache::safelyUnserialize(). No matter why the > unserialization fails, the cache will be ignored if it contains invalid > data. It goes without saying that this will correctly handle the > UnserializationFailedException that the RFC proposes. > > Full patch attached. > > -------------- > > For DefaultMarshaller I consider the test to be wrong / to be overly > specific. The 'MarshallerInterface' defines that: > > > * @throws \Exception Whenever unserialization fails > > UnserializationFailedException is a subclass of Exception, thus the > external contract is not violated. > > -------------- > > For ContextListenerTest we can just add 'O:8:"DateTime":0:{}' to > 'provideInvalidToken' and the same reasoning as with PhpSerializer and > ResourceCheckerConfigCache applies. I will not work through the steps > again to keep this email succinct. > > -------------- > > On VarExporterTest I can't make a clear statement, because frankly I > don't understand what the code is supposed to do. > > However: > > a) 'Registry' constructs payloads by hand, something that you admitted > yourself does not exist in practice: "constructed payloads like the ones > involving DateTime do not exist in practice". > It's quite common practice to create objects without constructors by using constructed payloads. That's what the code you read does. While ReflectionClass::newInstanceWithoutConstructor() works for userland classes, many internal classes do require this trick. b) Registry is marked @internal, as such it does not affect any > downstream users of Symfony. > > c) What Registry does is definitely nothing your average application > does, as such the caveat "this is not just about Symfony" does not apply. > All the above cases are variations of the same topic: When unserializing a payload, said payload might reference a class that was removed since the payload was created. For ResourceChecker, this happens all the time during dev, when one refactors their code. For Cache, VarExporter and Security, this happens for similar reasons - renamed classes, etc, but also in prod. By default, PHP creates weird __PHP_Incomplete_Class objects in places where not found classes are referenced. This creates invalid object graphs that we want to reject early on (everybody should most of the time I guess.) The only way to do so reliably is to use the mechanism I mentioned in the discussion thread: ini_set(unserialize_callback_func), documented here: https://www.php.net/manual/en/var.configuration.php#ini.unserialize-callbac= k-func and then throw an exception from this callback. Here is the typical code: function throwingCallback($class) { throw new MyClassNotFoundException($class); } $previousCallback =3D ini_set('unserialize_callback_func', 'throwingCallbac= k' ); try { $data =3D unserialize($payload); } catch (MyClassNotFoundException $e) { // deal with $e specifically } catch (Throwable $e) { // deal with other type of throwables differently } The RFC forces rewriting such logics to accomodate for the new wrapper. That's the BC break that is highlighted by the PR I linked. Nicolas --000000000000a7871f05eb52b471--