Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118613 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 23890 invoked from network); 13 Sep 2022 17:34:21 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 13 Sep 2022 17:34:21 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A077F180084 for ; Tue, 13 Sep 2022 10:34:19 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 13 Sep 2022 10:34:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1663090456; bh=peFgVCQ7Rk2QtBXaDXGV7v/bHPH3Q7vreZW+lLwXvNI=; h=Date:From:Subject:To:Cc:References:In-Reply-To:From; b=FfUA19+/aLv3q0PiLEHh3o1nIaN9HuHe5HKm3znkozVVtpBqg+OKpoKCqz8Inc8tT RDR4hZ6XnWwiUn0WT3cIuozm/013wDUHAu4yEgZaVHETScEzSfYw9RrYuLvqjZsQQe UYx8o3C5RhOVXRBC0l2TjtXD8bchUz8qHNIuFbzHNpNpsT9ZlTksUFEPaz28WIru25 WwRy2QsCQHon8nYkPhBl+6F+LpdItoi/FBdwehtD0p+vJFqAtHResdrQ5/8LX4bpYs VDSRBmE5Z9YOe6RGroxWDxEajZDl1LHxdCUnuN68DXsQGsXPovNCIlhAeZV3KoYOA0 g9g+04ILbFnjA== Content-Type: multipart/mixed; boundary="------------O0hOgaiacTiGE01ngVSVzsiq" Message-ID: <81f1ceb4-1086-3f1a-942d-aebbb5c2712f@bastelstu.be> Date: Tue, 13 Sep 2022 19:34:15 +0200 MIME-Version: 1.0 To: Nicolas Grekas Cc: internals@lists.php.net References: <530b3a9d-0ee4-6061-8c69-df672d238032@bastelstu.be> <51a57a4d-ccb5-a581-205e-e2684e4db8c8@bastelstu.be> Content-Language: en-US In-Reply-To: Subject: Re: [PHP-DEV] RFC [Discussion]: Improve unserialize() error handling From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) --------------O0hOgaiacTiGE01ngVSVzsiq Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi On 9/12/22 21:46, Nicolas Grekas wrote:>> unserialize() is a generic function that will also call arbitrary >> callbacks. You already have no guarantees whatsoever about the kind of >> exception that is thrown from it. I am unable to think of a situation >> where the input data is well-defined enough to reliably throw a specific >> type of exception, but not well-defined enough to successfully unserialize. >> >> So on the contrary wrapping any Exceptions with a well-defined Exception >> allows you to rely on a specific and stable Exception to catch in the >> first place. >> >> As you specifically mention catch(Throwable), I'd like to point out that >> this will continue to work, because UnserializationFailedException >> implements Throwable. >> > > You might have misunderstood my point so let me give two examples: > > Example 1. > set_error_handler(fn ($t, $m) => throw new ErrorException($m)); > try { > $ser = serialize($value); > catch (ErrorException $e) { > // deal with $e > finally { > restore_error_handler(); > } > > Example 2. > try { > $ser = @serialize($value); > catch (Exception $e) { > // deal with $e but don't catch Error on purpose, to let them bubble down > } > > Changing serialize to wrap any throwables into an > UnserializationFailedException would break both examples. That's what I > mean when I say this breaks BC. As any BC-break, this might cause big > troubles for the community and this should be avoided. The release process > policy I mentioned above is a wise document. > First I'd like to note that these examples use 'serialize()' (which is not touched by my RFC, because serialize rarely fails). I consider this a typo and will answer as if you used 'unserialize()'. If that was not a typo, then this might explain the confusion. Are those two examples based on real-world use cases or did you craft them specifically to point out how the proposal would introduce a behavioral change? I'm asking, because the examples fail to explain the "why". Why is the code written like this? I fail to see how catching "Exception", but not catching "Error" during unserialization is ever going to be useful. Likewise I don't see how catching ErrorException specifically is useful when '__unserialize()' might throw arbitrary Throwables. Specifically see the list in this email: https://externals.io/message/118311. Code outside of the function that actually calls unserialize() is not going to be prepared to usefully handle unserialization failure. This effectively means that the Exception will bubble up to the global Exception handler (or the Exception handler middleware), resulting in an aborted request. Quoting myself, because the examples didn't answer the question: "I am unable to think of a situation where the input data is well-defined enough to reliably throw a specific type of exception, but not well-defined enough to successfully unserialize". And don't get me wrong: I'm not attempting to say that it is appropriate to break logic that I personally consider wrong without justification. I believe the benefits of having the unified Exception outweigh the drawbacks of introducing a behavioral change in code that is already subtly broken depending on the exact input value passed to unserialize(). This is similar to the attribute syntax breaking hash comments that start with a square bracket without any space in the middle. Or the '@@' attribute syntax that was also proposed. It would have broken code containing duplicated error suppression operators. Similarly to the attribute breakage, it's also reasonably easy to find possibly affected logic by grepping for 'unserialize('. >>> About 2., unserialize() accepts a second argument to give it options. Did >>> you consider adding a 'throw_on_error' option to opt-in into the behavior >>> you're looking for? I think that would be a nice replacement for the >>> current logics based on set_error_handler(). If we want to make PHP 9 >> throw >>> by default, we could then decide to deprecate *not* passing this option. >> >> I did not consider this when writing the RFC. I now considered it, and I >> do not believe adding a flag to control this a good thing. >> >> 1. "No one" is going to set that flag, because it requires additional >> effort. I strongly believe that the easiest solution should also the >> correct solution for the majority of use cases. The flag fails this >> "test", because the correct solution should be "don't fail silently". >> > > Unless we deprecate calling the method without the argument as I suggested. > I agree, it might not be ideal to ask the community to rewrite every call > to serialize($v) to serialize($v, ['throw_on_error' => true/false]) but > that's still way better than a BC break. > Maybe there's another way that doesn't break BC. I'm looking forward to one. See (4) and above regarding my opinion on the BC break. > 2. If you actually want to set that option in e.g. a library, then you >> break compatibility with older PHP versions for no real gain. If you go >> all the way and remember to add that extra flag, then you can also write >> an unserialize wrapper that does the set_error_handler dance I've shown >> in the introduction of the RFC. Similar effort to adding the flag, but >> better compatibility. >> > > I think my proposal provides BC with older versions of php: passing extra > (unused) options is allowed and polyfilling UnserializationFailedException > is trivial, so one could set_error_handler(fn (...) => throw new > UnserializationFailedException(...)) and be both BC and FC. > > About writing a wrapper, that's what we all do right now, each with our own > wrapping logic. You're looking for unifying those logics and I support that > goal. I come from a userland developer background, but without the well-known frameworks in the back. I must admit that I am not currently using a wrapper around unserialize(). Partly because the application I work on uses a global throwing error handler (see (3)), partly because the non-E_SOMETHING error cases were not obvious to me before researching this RFC which came to life because of my work on PHP 8.2's ext/random. So to clarify: What I'm looking for is making unserialize() safe and well-defined to use, so that wrapping logic is no longer required. > But this leads to another idea : what about adding a new function, let's > name it unserialize2() for now, and deprecate unserialize()? That'd provide > the upgrade path we need. > This sounds like another instance of mysql_real_escape_string() to me. Also the default choice and the one documented all around the Internet is still going to be 'unserialize()', because it's the more obvious one. >> 3. It does literally nothing for users that use a throwing error >> handler, which to my understanding includes the vast majority of >> framework code. >> > > I'm not sure what you mean here. The flag would allow removing those error > handlers, which is what you want to achieve in the end I believe (well, > except when BC with older versions is desired of course, see my previous > comment.) > I likely wasn't clear here: I am talking about the global throwing error handler that is added by frameworks during the bootstrapping of the request. I've tested with Symfony (controller attached): When I just call `unserialize('asdf')` within a controller, then an ErrorException comes out of it, because of the global error handler. If I don't catch that one then Symfony's Exception handler will show me a pretty error page with a ghost. The same is true for Laravel the last time I tested and I suspect this also is true for other frameworks. For all users that use such a global error handler setting the option or not setting the option will result in 100% identical behavior. >> 4. Even for users that do not use a throwing error handler, omitting the >> option literally does nothing, because unserialize() might already throw >> depending on what the unserialize handlers of unserialized objects do. >> > > Sure. I don't see how that's a problem. Throwing is what almost any lines > of PHP code can do. We're used to dealing with that and I don't see why > serialize() needs special care. I might have missed the point you want to > make here though. > The difference is that for the majority of functions the possible failure cases are known and can be ruled out statically. As an example I can be sure that count() will not throw if I pass it an array and I don't change the $mode. This is not true for unserialize() which internally might call arbitrary __unserialize() callbacks. >>> Lastly I'd like to add a 3. to your proposal, because there is one more >>> thing that makes unserialization annoying: the unserialize_callback_func >>> f the results are copies of php-src that match the .phpt tests. > > PHP creates a __PHP_Incomplete_Class object, but I regularly use this > callback to throw instead when I don't want those strange incomplete > objects (note that I also have seen use cases where generating a > __PHP_Incomplete_Class *is* the desired behavior.) > I see. So you are not so much interested in the callback specifically, but a "throw_for_unknown_classes" flag would also solve your use case. Personally I think such having such a flag would be clearer with regard to the intent. > I proposed that because in many situations unserializing a payload to > something that contains __PHP_Incomplete_Class objects is not desired and > should be leveled up to an error instead. That's the relation with error > handling and thus with the RFC. > That said, I believe adding a flag to *generate* additional errors (be it via a callback or via my proposal) is orthogonal to *handling* errors and thus is out of scope for *this* RFC. Your suggestion likely has plenty of bikeshedding potential on its own. I think that being able to cleanly make unknown classes an error would be a useful addition, though. I recommend that you create a separate RFC for that once mine went through the vote, so that your's can build on the results of mine. Best regards Tim Düsterhus --------------O0hOgaiacTiGE01ngVSVzsiq Content-Type: application/x-php; name="DemoController.php" Content-Disposition: attachment; filename="DemoController.php" Content-Transfer-Encoding: base64 PD9waHAKCm5hbWVzcGFjZSBBcHBcQ29udHJvbGxlcjsKCnVzZSBTeW1mb255XEJ1bmRsZVxG cmFtZXdvcmtCdW5kbGVcQ29udHJvbGxlclxBYnN0cmFjdENvbnRyb2xsZXI7CnVzZSBTeW1m b255XENvbXBvbmVudFxIdHRwRm91bmRhdGlvblxSZXNwb25zZTsKdXNlIFN5bWZvbnlcQ29t cG9uZW50XFJvdXRpbmdcQW5ub3RhdGlvblxSb3V0ZTsKCmNsYXNzIERlbW9Db250cm9sbGVy IGV4dGVuZHMgQWJzdHJhY3RDb250cm9sbGVyCnsKICAgICNbUm91dGUoJy8nKV0KICAgIHB1 YmxpYyBmdW5jdGlvbiBpbmRleCgpOiBSZXNwb25zZQogICAgewogICAgICAgIFx1bnNlcmlh bGl6ZSgnYXNkZicpOwoKICAgICAgICByZXR1cm4gbmV3IFJlc3BvbnNlKCk7CiAgICB9Cn0K --------------O0hOgaiacTiGE01ngVSVzsiq--