Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118672 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 18189 invoked from network); 20 Sep 2022 17:41:01 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 20 Sep 2022 17:41:01 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 05E0A180384 for ; Tue, 20 Sep 2022 10:41:00 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 20 Sep 2022 10:40:59 -0700 (PDT) Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-3450990b0aeso35539547b3.12 for ; Tue, 20 Sep 2022 10:40:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=+WX0pvwH5I+vethv2CoKIyTAzqf2Vx/7nUGwHNAi0DE=; b=j14EXpJwluy5Xn/KuQ0nQAZy83JlDGKY+1HRV7MDGUiBvd4uUqcg2/DtuZjanVbMCL nt/2AEzcQQRHIZFe+oFjZ2Gxt198TbPVVkiYfML8WZkTj2UHMVsER41BU4kPANpWGELr 4mGeV4o8ODQchNzTtfAsW/Vc5Ch+RShVTFfPACL/dnqpR/OsveUNZh/d0qZCD0x1SMEK 05T481TQ5d9gE+myQYrbnxRPDRcGJnW5QtmuBOXEU6YrqDQblK4EXlHQ3AoKib5iDWpd zbeMCozNfoAjjaUyBe0QY02Q/3K4Rdxtp1VWtFOEf9rbhOthTzhqsiztt75aNTH936ni 24cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=+WX0pvwH5I+vethv2CoKIyTAzqf2Vx/7nUGwHNAi0DE=; b=qJlLZP+YMq/BcxmN2+LZZ/5o5mCAaPObCKMkPdy96m4sypqvz5MR9wJdwCTHZJdPnn JegRQHTVYha5XHmRGow9ma9DBkz338nOnI8jf4HNC5Z0RS/IQAIOph9KXKMXnww+U2O6 JTu1ID91PiWZLt+FqoaUziEfEsE+lKhhZ3E2OHdEHEF0sqM6P8IWGlBgs7gs4lokX19G BkpRhN6vBoxi5PVUZEKlRiZSXEfYm1A16MRovmGagXL0NRNSoGxfqpQAxCfJZpn1qnx1 QJiENbvuket9kOXGRD2zoHtPrPOOMmoHVlTudew3SmrmqEtEda5W3typb2VLfgSrCt3m Haqw== X-Gm-Message-State: ACrzQf3TRNl10AnRd8ZqOd/6ZTd2gebUccoai2LreRF8YgsXgB44nUVC 5rQcL4iDYW3Ybn0RomU0xWmRp4u/ZES9WwRqosBuCpP+Wyk= X-Google-Smtp-Source: AMsMyM5+rWaLSfRZe88x9VyixvTtUZOj9At9HcCB33nlgfPYU5h9gReOE5l08Jads2/YRmS8+xk3WnVYHS2j2BYyNbA= X-Received: by 2002:a81:5790:0:b0:348:9584:bf4b with SMTP id l138-20020a815790000000b003489584bf4bmr20883474ywb.483.1663695658747; Tue, 20 Sep 2022 10:40:58 -0700 (PDT) MIME-Version: 1.0 References: <530b3a9d-0ee4-6061-8c69-df672d238032@bastelstu.be> <51a57a4d-ccb5-a581-205e-e2684e4db8c8@bastelstu.be> <81f1ceb4-1086-3f1a-942d-aebbb5c2712f@bastelstu.be> In-Reply-To: <81f1ceb4-1086-3f1a-942d-aebbb5c2712f@bastelstu.be> Date: Tue, 20 Sep 2022 19:40:46 +0200 Message-ID: To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000c10a9805e91f54c7" Subject: Re: [PHP-DEV] RFC [Discussion]: Improve unserialize() error handling From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --000000000000c10a9805e91f54c7 Content-Type: text/plain; charset="UTF-8" Hi Tim, I'm a bit busy with conferences these days... On 9/12/22 21:46, Nicolas Grekas wrote:>> unserialize() is a generic > function that will also call arbitrary > >> callbacks. You already have no guarantees whatsoever about the kind of > >> exception that is thrown from it. I am unable to think of a situation > >> where the input data is well-defined enough to reliably throw a specific > >> type of exception, but not well-defined enough to successfully > unserialize. > >> > >> So on the contrary wrapping any Exceptions with a well-defined Exception > >> allows you to rely on a specific and stable Exception to catch in the > >> first place. > >> > >> As you specifically mention catch(Throwable), I'd like to point out that > >> this will continue to work, because UnserializationFailedException > >> implements Throwable. > >> > > > > You might have misunderstood my point so let me give two examples: > > > > Example 1. > > set_error_handler(fn ($t, $m) => throw new ErrorException($m)); > > try { > > $ser = serialize($value); > > catch (ErrorException $e) { > > // deal with $e > > finally { > > restore_error_handler(); > > } > > > > Example 2. > > try { > > $ser = @serialize($value); > > catch (Exception $e) { > > // deal with $e but don't catch Error on purpose, to let them bubble > down > > } > > > > Changing serialize to wrap any throwables into an > > UnserializationFailedException would break both examples. That's what I > > mean when I say this breaks BC. As any BC-break, this might cause big > > troubles for the community and this should be avoided. The release > process > > policy I mentioned above is a wise document. > > > > First I'd like to note that these examples use 'serialize()' (which is > not touched by my RFC, because serialize rarely fails). I consider this > a typo and will answer as if you used 'unserialize()'. If that was not a > typo, then this might explain the confusion. > It was a typo! I meant unserialize() yes. > Are those two examples based on real-world use cases or did you craft > them specifically to point out how the proposal would introduce a > behavioral change? > They were inspired by what I found in the Symfony codebase. I just conducted a quick lookup at this code base: most call to unserialize() are not checked at all, but I hghlighted the checked ones in this gist: https://gist.github.com/nicolas-grekas/5dd3169f94ed3b4576152605330824fe > I'm asking, because the examples fail to explain the "why". Why is the > code written like this? I fail to see how catching "Exception", but not > catching "Error" during unserialization is ever going to be useful. > It's useful because these two roots mean very different things. That's why Error has been introduced in the first place: instances of Error are not just exceptions, they're really programming mistakes. As such the best practice to me is to *not* catch them in domain layers, and let them bubble down to a generic Error catch layer instead. > Likewise I don't see how catching ErrorException specifically is useful > when '__unserialize()' might throw arbitrary Throwables. It's useful when all you need is eg to care about the exceptions thrown by a custom error handler, and want to ignore the other ones (because other exceptions aren't part of the current domain layer.) Specifically > see the list in this email: https://externals.io/message/118311. Code > outside of the function that actually calls unserialize() is not going > to be prepared to usefully handle unserialization failure. This > effectively means that the Exception will bubble up to the global > Exception handler (or the Exception handler middleware), resulting in an > aborted request. > Yep, this is fine in many cases. > Quoting myself, because the examples didn't answer the question: "I am > unable to think of a situation where the input data is well-defined > enough to reliably throw a specific type of exception, but not > well-defined enough to successfully unserialize". > > And don't get me wrong: I'm not attempting to say that it is appropriate > to break logic that I personally consider wrong without justification. I > believe the benefits of having the unified Exception outweigh the > drawbacks of introducing a behavioral change in code that is already > subtly broken depending on the exact input value passed to unserialize(). > The thoughts you describe might make sense as a rule of thumb principle when you write your code. It might even be upgraded to a best practice. Generally turning an Error to an Exception is not by my book, but I'm not arguing about this. My point is : /!\ BC break ahead, do not cross /!\ If you look at the gist I linked before, wrapping all throwables would break most if not all the cases I listed. That's bad numbers, and I don't think this is specific in any way to the Symfony codebase. > This is similar to the attribute syntax breaking hash comments that > start with a square bracket without any space in the middle. Or the '@@' > attribute syntax that was also proposed. It would have broken code > containing duplicated error suppression operators. > > Similarly to the attribute breakage, it's also reasonably easy to find > possibly affected logic by grepping for 'unserialize('. > Sorry to disagree, that's a very different sort of break. One that changes the behavior of perfectly fine apps. >>> About 2., unserialize() accepts a second argument to give it options. > Did > >>> you consider adding a 'throw_on_error' option to opt-in into the > behavior > >>> you're looking for? I think that would be a nice replacement for the > >>> current logics based on set_error_handler(). If we want to make PHP 9 > >> throw > >>> by default, we could then decide to deprecate *not* passing this > option. > >> > >> I did not consider this when writing the RFC. I now considered it, and I > >> do not believe adding a flag to control this a good thing. > >> > >> 1. "No one" is going to set that flag, because it requires additional > >> effort. I strongly believe that the easiest solution should also the > >> correct solution for the majority of use cases. The flag fails this > >> "test", because the correct solution should be "don't fail silently". > >> > > > > Unless we deprecate calling the method without the argument as I > suggested. > > I agree, it might not be ideal to ask the community to rewrite every call > > to serialize($v) to serialize($v, ['throw_on_error' => true/false]) but > > that's still way better than a BC break. > > Maybe there's another way that doesn't break BC. I'm looking forward to > one. > > See (4) and above regarding my opinion on the BC break. > > > 2. If you actually want to set that option in e.g. a library, then you > >> break compatibility with older PHP versions for no real gain. If you go > >> all the way and remember to add that extra flag, then you can also write > >> an unserialize wrapper that does the set_error_handler dance I've shown > >> in the introduction of the RFC. Similar effort to adding the flag, but > >> better compatibility. > >> > > > > I think my proposal provides BC with older versions of php: passing extra > > (unused) options is allowed and polyfilling > UnserializationFailedException > > is trivial, so one could set_error_handler(fn (...) => throw new > > UnserializationFailedException(...)) and be both BC and FC. > > > > About writing a wrapper, that's what we all do right now, each with our > own > > wrapping logic. You're looking for unifying those logics and I support > that > > goal. > > I come from a userland developer background, but without the well-known > frameworks in the back. I must admit that I am not currently using a > wrapper around unserialize(). Partly because the application I work on > uses a global throwing error handler (see (3)), partly because the > non-E_SOMETHING error cases were not obvious to me before researching > this RFC which came to life because of my work on PHP 8.2's ext/random. > > So to clarify: What I'm looking for is making unserialize() safe and > well-defined to use, so that wrapping logic is no longer required. > > > But this leads to another idea : what about adding a new function, let's > > name it unserialize2() for now, and deprecate unserialize()? That'd > provide > > the upgrade path we need. > > > > This sounds like another instance of mysql_real_escape_string() to me. > Also the default choice and the one documented all around the Internet > is still going to be 'unserialize()', because it's the more obvious one. > > >> 3. It does literally nothing for users that use a throwing error > >> handler, which to my understanding includes the vast majority of > >> framework code. > >> > > > > I'm not sure what you mean here. The flag would allow removing those > error > > handlers, which is what you want to achieve in the end I believe (well, > > except when BC with older versions is desired of course, see my previous > > comment.) > > > > I likely wasn't clear here: I am talking about the global throwing error > handler that is added by frameworks during the bootstrapping of the > request. > > I've tested with Symfony (controller attached): When I just call > `unserialize('asdf')` within a controller, then an ErrorException comes > out of it, because of the global error handler. If I don't catch that > one then Symfony's Exception handler will show me a pretty error page > with a ghost. > > The same is true for Laravel the last time I tested and I suspect this > also is true for other frameworks. > > For all users that use such a global error handler setting the option or > not setting the option will result in 100% identical behavior. > > >> 4. Even for users that do not use a throwing error handler, omitting the > >> option literally does nothing, because unserialize() might already throw > >> depending on what the unserialize handlers of unserialized objects do. > >> > > > > Sure. I don't see how that's a problem. Throwing is what almost any lines > > of PHP code can do. We're used to dealing with that and I don't see why > > serialize() needs special care. I might have missed the point you want to > > make here though. > > > > The difference is that for the majority of functions the possible > failure cases are known and can be ruled out statically. As an example I > can be sure that count() will not throw if I pass it an array and I > don't change the $mode. This is not true for unserialize() which > internally might call arbitrary __unserialize() callbacks. > True, that's expected to me given what unserialize does. I don't see the need to wrap sorry (even ignoring the BC break.) Turning Error into Exception would be especially bad to me, eg a ParseError should remain a ParseError so that it can follow the usual path for reporting parse errors. > >>> Lastly I'd like to add a 3. to your proposal, because there is one more > >>> thing that makes unserialization annoying: the > unserialize_callback_func > >>> f the results are copies of php-src that match the .phpt tests. > > > > PHP creates a __PHP_Incomplete_Class object, but I regularly use this > > callback to throw instead when I don't want those strange incomplete > > objects (note that I also have seen use cases where generating a > > __PHP_Incomplete_Class *is* the desired behavior.) > > > > I see. So you are not so much interested in the callback specifically, > but a "throw_for_unknown_classes" flag would also solve your use case. > Personally I think such having such a flag would be clearer with regard > to the intent. > I checked at the code and we use varying domain exceptions + messages so a flag wouldn't be enough. There's also a case where we call trigger_error() to connect missing classes to the regular error handler. > I proposed that because in many situations unserializing a payload to > > something that contains __PHP_Incomplete_Class objects is not desired and > > should be leveled up to an error instead. That's the relation with error > > handling and thus with the RFC. > > > > That said, I believe adding a flag to *generate* additional errors (be > it via a callback or via my proposal) is orthogonal to *handling* errors > and thus is out of scope for *this* RFC. Your suggestion likely has > plenty of bikeshedding potential on its own. > > I think that being able to cleanly make unknown classes an error would > be a useful addition, though. I recommend that you create a separate RFC > for that once mine went through the vote, so that your's can build on > the results of mine. > Thanks, I hoped I could slip in this idea but I'll keep it on my todo list for php-src :) Nicolas --000000000000c10a9805e91f54c7--