Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118833 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 83072 invoked from network); 17 Oct 2022 17:33:41 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 17 Oct 2022 17:33:41 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 38E8D180211 for ; Mon, 17 Oct 2022 10:33:38 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 17 Oct 2022 10:33:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1666028015; bh=+7FpdfPE8B4pAngzS+NhvRrZBXP7Z1/RlhpeWlF5fMk=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=Gk49hIdLPmNrLInUty6rgIsemqu2Iqk3+eQIBg91HX8y3+bDk9kRXcvR6oijWrL9b pqmtRQCU6b4j4HGJ0pkDVBtuFvOiKiJGw3wBnokcS70lWdFX5u/5P21QLvLOiLOn6i 2d0IVLRmqPLXFYJU64mbf80qBxA4ys/wXKoKgzDDPTFSeV63kcG4tb5FcMrzWdBJzL riehkemxXYn2ue/xRWuw7SZBNeaRwYReeEasgoAFJJxYq0EsXY7WoNZNGzvYXHjEPp gc2A0nHy+gb6ZKHlVbIHADvYPy1Iozy/rAzfb9P1R1avYCU59Fw1KkpPVaqWOICKDp 8ATkzlpB1BDRg== Content-Type: multipart/mixed; boundary="------------Qrd0rK7nXaJjNyWN5GXFxu6C" Message-ID: <0d950416-372b-1ff5-21c2-0c6720d47a07@bastelstu.be> Date: Mon, 17 Oct 2022 19:33:34 +0200 MIME-Version: 1.0 Content-Language: en-US To: Nicolas Grekas Cc: PHP internals References: <22177032-fe72-c39b-63fe-fa4368a70852@bastelstu.be> In-Reply-To: Subject: Re: [PHP-DEV] [VOTE] Improve unserialize() error handling From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) --------------Qrd0rK7nXaJjNyWN5GXFxu6C Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi On 10/17/22 10:57, Nicolas Grekas wrote: > Thanks for taking the time to have a closer look. Unfortunately, you picked > the one failing test where there could be a discussion about whether the > original code needs improvement or not. I disagree. The other locations are equally broken with current versions of PHP. > The other failing tests involve "unserialize_callback_func" to gracefully > detect class-not-found, and they are all plain broken by the RFC. > > I created this patch/PR to show the changes that would be required on > Symfony to work around the BC break: > https://github.com/symfony/symfony/pull/47882 > > Note to readers: in this whole discussion, Symfony is just an example of > affected code. In the end, Symfony will adapt to the RFC if it passes. The > point being made is that PHP scripts should not have to be patched to run > on newer minor versions of PHP. That's what "keeping BC" means. Scripts will not need to be changed, unless they are already relying on observed behavior instead of the documented behavior. >> Let's fix this using test driven development. We first add the failing >> test cases. This is step1.diff (patches are in >> https://gist.github.com/TimWolla/376dd162f7684daef38f76a07254871c). When >> running the tests we get the following output: >> > > I didn't answer the question because we ruled out such payloads in the > discussion thread: constructed payloads like the ones involving DateTime do > not exist in practice. Or if they do, it means the source of the payload is > not trusted, and then we are in much bigger trouble (security issues for > which the RFC can't do anything.) Okay, just to make sure I understand this correctly, because I don't want to draw incorrect conclusions or make false statements due to my lack of knowledge about Symfony's ecosystem: When I use Symfony's 'PhpSerializer' and perform the following call: > $serializer->decode([ > 'body' => '{"message": "bar"}', > ]); Then I can expect the input to be gracefully rejected with a MessageDecodingFailedException. This behavior is tested in the 'testDecodingFailsWithBadFormat' test. But *under no circumstances* must I perform the following method call: > $serializer->decode([ > 'body' => 'O:8:"DateTime":0:{}', > ]); Because this payload is invalid and thus must not exist in practice. Did I understand this correctly? If I did, is this limitation documented somewhere? > Breaking BC in the name of such invalid payloads doesn't make an argument > in favor of the RFC. I demonstrated how to adjust 'PhpSerializer' to improve the behavior for what I consider a reasonable use case without breaking any existing tests in step3b.diff and step3c.diff. At the same time the fixed version will automatically handle the change this RFC proposes. I even offered to adjust the RFC implementation to preserve the original Exception message and code. >> Failed asserting that exception message 'Could not decode message using >> PHP serialization: O:13:"ReceivedSt0mp":0:{}.' matches '/class >> "ReceivedSt0mp" not found/'. >> >> Okay, now the Exception message changed. Personally I do not consider >> this a BC break: I believe Exception messages are meant for human >> consumption, not for programs. Otherwise fixing a typo in the message >> would be a BC break. If the code wants to learn about the cause, it >> should either use the '$code' or different types of Exception should be >> thrown to clarify the cause by entering a different catch() block. >> > > Yes, the specific error message should be part of the BC promise. This > allows building test suites that can assert the message in a stable way. I'm not talking about test suites here. I believe makes sense to verify the error message to ensure a specific error message is emitted to the human observer in the error log. I was talking about code that does something like this, which I consider to be inherently unsafe: try { … } catch (SomeException $e) { if ($e->getMessage() === 'Foobar') doSomething(); else doSomethingElse(); } As a library author I want to be able to provide the best possible Exception message to ease debugging for the user. This is not possible if I am locked into a bad choice forever. > This is also why we don't change the output of var_dump/print_r/var_export: > they're written now, in the same of BC, for the best of stability. (I've > barely read any PHP code where exception's code is used in any useful BTW - > that can't be a solution.) PDO is an example where the $code is useful, because it exposes the standardized SQL error code. Similarly a HTTP client could expose the status code within $code. For other libraries simply using a different Exception class within the same hierarchy is often sufficient to differentiate between different errors, because programmatically I often don't care *why exactly* an operation failed. > More importantly, the type of thrown exceptions should undoubtedly be part > of the BC promise. I agree. > Wrapping exceptions inside UnserializationFailedException breaks existing > catch/instanceof checks (see my PR above.) > unserialize() only promises you that a Throwable might be thrown: https://www.php.net/manual/en/function.unserialize.php#refsect1-function.unserialize-errors Symfony is relying on undocumented behavior. > >> All green! We **did not have to change anything** after fixing the >> implementation in step3a.diff for PHP 8.2 and earlier! >> > > We did change the exception hierarchy, both in terms of types and in terms > of stack of "previous". In this specific case, we can discuss the merits of If not even the $previous value may change, then we are deep in https://xkcd.com/1172/ territory. The whole purpose of $previous is wrapping exceptions that come up as an implementation detail and then exposing a well-defined Exception at the border of your library / component and at the same time exposing the underlying cause to the developer to simplify debugging. > wrapping all throwables or not. But in doubt the other failing cases I > reported clearly highlight the BC break. > As I've mentioned at the beginning of this email, the other locations are also broken with the current PHP versions. -------------- Let me work through ResourceCheckerConfigCache as another example. As with the 'PhpSerializer' example, I'm working on the assumption that the input to 'unserialize' might be broken. If it would be impossible for it to be broken, then we do not need error handling in the first place and the RFC would not change anything, because it only concerns itself with error handling. So let's add a new test case to 'ResourceCheckerConfigCacheTest'. Specifically I am implementing a new resource that implements 'SelfCheckingResourceInterface': > final class MyResource implements \Symfony\Component\Config\Resource\SelfCheckingResourceInterface > { > public function __construct( > private \DateTimeImmutable $dti > ) {} > > public function isFresh(int $timestamp): bool { return false; } > > public function __toString(): string { return ''; } > } This resource stores a DTI internally, but the specific object does not really matter. Anything that implements __unserialize() would work. Now let me add a test that uses this resource. Unfortunately my meta data file got corrupted on disk, because some unrelated process stomped over it and replaced all '-' by an 'x'. > public function testCacheIsNotFreshWhenUnserializeFails2() > { > $checker = $this->createMock(ResourceCheckerInterface::class); > $cache = new ResourceCheckerConfigCache($this->cacheFile, [$checker]); > $cache->write('foo', [new MyResource(new \DateTimeImmutable('now'))]); > > $metaFile = "{$this->cacheFile}.meta"; > file_put_contents($metaFile, str_replace('-', 'x', file_get_contents($metaFile))); > > $this->assertFalse($cache->isFresh()); > } Similarly to the test case without the '2' suffix, I expect the cache to be ignored, because it can simply be recreated. Let's test this: > 1) Symfony\Component\Config\Tests\ResourceCheckerConfigCacheTest::testCacheIsNotFreshWhenUnserializeFails2 > Error: Invalid serialization data for DateTimeImmutable object > > /app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:159 > /app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:77 > /app/src/Symfony/Component/Config/Tests/ResourceCheckerConfigCacheTest.php:152 Oh no. The test failed. To fix this we just need remove everything from the body of the catch in ResourceCheckerConfigCache::safelyUnserialize(). No matter why the unserialization fails, the cache will be ignored if it contains invalid data. It goes without saying that this will correctly handle the UnserializationFailedException that the RFC proposes. Full patch attached. -------------- For DefaultMarshaller I consider the test to be wrong / to be overly specific. The 'MarshallerInterface' defines that: > * @throws \Exception Whenever unserialization fails UnserializationFailedException is a subclass of Exception, thus the external contract is not violated. -------------- For ContextListenerTest we can just add 'O:8:"DateTime":0:{}' to 'provideInvalidToken' and the same reasoning as with PhpSerializer and ResourceCheckerConfigCache applies. I will not work through the steps again to keep this email succinct. -------------- On VarExporterTest I can't make a clear statement, because frankly I don't understand what the code is supposed to do. However: a) 'Registry' constructs payloads by hand, something that you admitted yourself does not exist in practice: "constructed payloads like the ones involving DateTime do not exist in practice". b) Registry is marked @internal, as such it does not affect any downstream users of Symfony. c) What Registry does is definitely nothing your average application does, as such the caveat "this is not just about Symfony" does not apply. Best regards Tim Düsterhus --------------Qrd0rK7nXaJjNyWN5GXFxu6C Content-Type: text/x-patch; charset=UTF-8; name="resource-checker.diff" Content-Disposition: attachment; filename="resource-checker.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBpL3NyYy9TeW1mb255L0NvbXBvbmVudC9Db25maWcvUmVzb3VyY2VDaGVj a2VyQ29uZmlnQ2FjaGUucGhwIHcvc3JjL1N5bWZvbnkvQ29tcG9uZW50L0NvbmZpZy9SZXNv dXJjZUNoZWNrZXJDb25maWdDYWNoZS5waHAKaW5kZXggM2EzZTZjNTJiYi4uNTg0MWUzMDI5 ZiAxMDA2NDQKLS0tIGkvc3JjL1N5bWZvbnkvQ29tcG9uZW50L0NvbmZpZy9SZXNvdXJjZUNo ZWNrZXJDb25maWdDYWNoZS5waHAKKysrIHcvc3JjL1N5bWZvbnkvQ29tcG9uZW50L0NvbmZp Zy9SZXNvdXJjZUNoZWNrZXJDb25maWdDYWNoZS5waHAKQEAgLTE1NywxMCArMTU3LDggQEAg Y2xhc3MgUmVzb3VyY2VDaGVja2VyQ29uZmlnQ2FjaGUgaW1wbGVtZW50cyBDb25maWdDYWNo ZUludGVyZmFjZQogCiAgICAgICAgIHRyeSB7CiAgICAgICAgICAgICAkbWV0YSA9IHVuc2Vy aWFsaXplKCRjb250ZW50KTsKLSAgICAgICAgfSBjYXRjaCAoXFRocm93YWJsZSAkZSkgewot ICAgICAgICAgICAgaWYgKCRlICE9PSAkc2lnbmFsaW5nRXhjZXB0aW9uKSB7Ci0gICAgICAg ICAgICAgICAgdGhyb3cgJGU7Ci0gICAgICAgICAgICB9CisgICAgICAgIH0gY2F0Y2ggKFxU aHJvd2FibGUpIHsKKyAgICAgICAgICAgIC8vIG5vdGhpbmcgdG8gZG8KICAgICAgICAgfSBm aW5hbGx5IHsKICAgICAgICAgICAgIHJlc3RvcmVfZXJyb3JfaGFuZGxlcigpOwogICAgICAg ICAgICAgaW5pX3NldCgndW5zZXJpYWxpemVfY2FsbGJhY2tfZnVuYycsICRwcmV2VW5zZXJp YWxpemVIYW5kbGVyKTsKZGlmZiAtLWdpdCBpL3NyYy9TeW1mb255L0NvbXBvbmVudC9Db25m aWcvVGVzdHMvUmVzb3VyY2VDaGVja2VyQ29uZmlnQ2FjaGVUZXN0LnBocCB3L3NyYy9TeW1m b255L0NvbXBvbmVudC9Db25maWcvVGVzdHMvUmVzb3VyY2VDaGVja2VyQ29uZmlnQ2FjaGVU ZXN0LnBocAppbmRleCA4NzQ3Y2JlY2Y0Li4xYTVmNzMxNjY5IDEwMDY0NAotLS0gaS9zcmMv U3ltZm9ueS9Db21wb25lbnQvQ29uZmlnL1Rlc3RzL1Jlc291cmNlQ2hlY2tlckNvbmZpZ0Nh Y2hlVGVzdC5waHAKKysrIHcvc3JjL1N5bWZvbnkvQ29tcG9uZW50L0NvbmZpZy9UZXN0cy9S ZXNvdXJjZUNoZWNrZXJDb25maWdDYWNoZVRlc3QucGhwCkBAIC0xNyw2ICsxNywxNyBAQCB1 c2UgU3ltZm9ueVxDb21wb25lbnRcQ29uZmlnXFJlc291cmNlQ2hlY2tlckNvbmZpZ0NhY2hl OwogdXNlIFN5bWZvbnlcQ29tcG9uZW50XENvbmZpZ1xSZXNvdXJjZUNoZWNrZXJJbnRlcmZh Y2U7CiB1c2UgU3ltZm9ueVxDb21wb25lbnRcQ29uZmlnXFRlc3RzXFJlc291cmNlXFJlc291 cmNlU3R1YjsKIAorZmluYWwgY2xhc3MgTXlSZXNvdXJjZSBpbXBsZW1lbnRzIFxTeW1mb255 XENvbXBvbmVudFxDb25maWdcUmVzb3VyY2VcU2VsZkNoZWNraW5nUmVzb3VyY2VJbnRlcmZh Y2UKK3sKKyAgICBwdWJsaWMgZnVuY3Rpb24gX19jb25zdHJ1Y3QoCisgICAgICAgIHByaXZh dGUgXERhdGVUaW1lSW1tdXRhYmxlICRkdGkKKyAgICApIHt9CisKKyAgICBwdWJsaWMgZnVu Y3Rpb24gaXNGcmVzaChpbnQgJHRpbWVzdGFtcCk6IGJvb2wgeyByZXR1cm4gZmFsc2U7IH0K KworICAgIHB1YmxpYyBmdW5jdGlvbiBfX3RvU3RyaW5nKCk6IHN0cmluZyB7IHJldHVybiAn JzsgfQorfQorCiBjbGFzcyBSZXNvdXJjZUNoZWNrZXJDb25maWdDYWNoZVRlc3QgZXh0ZW5k cyBUZXN0Q2FzZQogewogICAgIHByaXZhdGUgJGNhY2hlRmlsZSA9IG51bGw7CkBAIC0xMjks NiArMTQwLDE4IEBAIGNsYXNzIFJlc291cmNlQ2hlY2tlckNvbmZpZ0NhY2hlVGVzdCBleHRl bmRzIFRlc3RDYXNlCiAgICAgICAgICR0aGlzLT5hc3NlcnRGYWxzZSgkY2FjaGUtPmlzRnJl c2goKSk7CiAgICAgfQogCisgICAgcHVibGljIGZ1bmN0aW9uIHRlc3RDYWNoZUlzTm90RnJl c2hXaGVuVW5zZXJpYWxpemVGYWlsczIoKQorICAgIHsKKyAgICAgICAgJGNoZWNrZXIgPSAk dGhpcy0+Y3JlYXRlTW9jayhSZXNvdXJjZUNoZWNrZXJJbnRlcmZhY2U6OmNsYXNzKTsKKyAg ICAgICAgJGNhY2hlID0gbmV3IFJlc291cmNlQ2hlY2tlckNvbmZpZ0NhY2hlKCR0aGlzLT5j YWNoZUZpbGUsIFskY2hlY2tlcl0pOworICAgICAgICAkY2FjaGUtPndyaXRlKCdmb28nLCBb bmV3IE15UmVzb3VyY2UobmV3IFxEYXRlVGltZUltbXV0YWJsZSgnbm93JykpXSk7CisKKyAg ICAgICAgJG1ldGFGaWxlID0gInskdGhpcy0+Y2FjaGVGaWxlfS5tZXRhIjsKKyAgICAgICAg ZmlsZV9wdXRfY29udGVudHMoJG1ldGFGaWxlLCBzdHJfcmVwbGFjZSgnLScsICd4JywgZmls ZV9nZXRfY29udGVudHMoJG1ldGFGaWxlKSkpOworCisgICAgICAgICR0aGlzLT5hc3NlcnRG YWxzZSgkY2FjaGUtPmlzRnJlc2goKSk7CisgICAgfQorCiAgICAgcHVibGljIGZ1bmN0aW9u IHRlc3RDYWNoZUtlZXBzQ29udGVudCgpCiAgICAgewogICAgICAgICAkY2FjaGUgPSBuZXcg UmVzb3VyY2VDaGVja2VyQ29uZmlnQ2FjaGUoJHRoaXMtPmNhY2hlRmlsZSk7Cg== --------------Qrd0rK7nXaJjNyWN5GXFxu6C--