Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118676 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 32094 invoked from network); 20 Sep 2022 19:24:04 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 20 Sep 2022 19:24:04 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 57F5B1804BA for ; Tue, 20 Sep 2022 12:24:03 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 20 Sep 2022 12:24:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1663701840; bh=7DsL/m7O6juCpFmGlROlaq3lY/8tCMgwjMt60PNijQ4=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=fkvUH1puw2c2FKJNuqXq2h95z658TY2+dV6z6GIeoA5avu0wkQADjqi82J2/zHJ+l IdZIEmAcd/I5kqYEbhZtxnGAMv9FREtXnoLJx2tBD68g64MUD8qyIZBmDvS2S1kkRI KUM1pgc/0o1U0tVPV/KUUdQswVTziamp7XJkh5VU3gvZdiJDX3uN76cWhG2dbWBdhx SAGIqLn/tPS0MrJrPKeWq+iRYOoUK7zU0sInBipLlLZ4AWgQL6PKifyvMNw1UvfGLQ Ieybi+Vn44uhM6QF6eu5LqfNbxfvXfC+X3COb2EB2FiAPY25qFkEiPF/WD7Xg7+8vp hx1WEkzyvEtKg== Message-ID: <94b9c2ab-70d5-48e1-50f8-409a3073023a@bastelstu.be> Date: Tue, 20 Sep 2022 21:24:00 +0200 MIME-Version: 1.0 Content-Language: en-US To: Nicolas Grekas Cc: internals@lists.php.net References: <530b3a9d-0ee4-6061-8c69-df672d238032@bastelstu.be> <51a57a4d-ccb5-a581-205e-e2684e4db8c8@bastelstu.be> <81f1ceb4-1086-3f1a-942d-aebbb5c2712f@bastelstu.be> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] RFC [Discussion]: Improve unserialize() error handling From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) Hi Nicolas On 9/20/22 19:40, Nicolas Grekas wrote: > I'm a bit busy with conferences these days... Understood. Enjoy! >> Are those two examples based on real-world use cases or did you craft >> them specifically to point out how the proposal would introduce a >> behavioral change? >> > > They were inspired by what I found in the Symfony codebase. > > I just conducted a quick lookup at this code base: most call to > unserialize() are not checked at all, but I hghlighted the checked ones in "unserialize()" is generally not checked matches my experience with the code I work with. However I would attribute at least some of that to the fact that the failure cases are not entirely clear - something I'd like to fix with this RFC. > this gist: > https://gist.github.com/nicolas-grekas/5dd3169f94ed3b4576152605330824fe > Thank you for that. I've looked through the gist and also cloned symfony/symfony to take a look at the broader context. Symfony/Bridge/PhpUnit/DeprecationErrorHandler/Deprecation.php: =============================================================== My understanding is that this attempts to unserialize the input message which might or might not be serialized data and fail gracefully. I'd like to point out that the logic is unsafe if the message happens to be syntactically correct serialized string that contains broken object data, because the object will throw. However in this case an attacker-specified value likely cannot reach that location. - It will continue to work as is for "wrap Throwables", because it does not have a catch() block. - It will break if E_FOO is upgraded to Exception. The fix would be trivial: Add an empty catch(UnserializationFailedException). src/Symfony/Component/Cache/Adapter/ArrayAdapter.php: ===================================================== My understanding is that this attempts to gracefully handle broken cache entries and force a rebuild if the payload is invalid. This will fail if the serialized payload contains a malformed DateTimeImmutable, because DateTimeImmutable will throw an Error, instead of Exception: https://3v4l.org/eB5ZE. The 'catch' should likely be adjusted to catch '\Throwable'. - Behavior will change for "wrap Throwables", because Errors will now be caught. In this case I'd say that this is intended behavior. - It will continue to work as is for E_FOO to Exception, because the catch block will catch the Exception and set the value to 'false', just like unserialize() returns for syntax errors. src/Symfony/Component/Cache/Marshaller/DefaultMarshaller.php: ============================================================= - Behavior will technically change for "wrap Throwables", because the catch(Error) will no longer catch anything. However as the catch's purpose is to wrap Error into an Exception and MarshallerInterface::unmarshall() is documented to '@throws Exception', the new behavior is equally correct. - Behavior will technically change for E_FOO to Exception, but the externally observable behavior will still be correct. src/Symfony/Component/Security/Http/Firewall/ContextListener.php: ================================================================= Frankly I have no idea what all that logic is supposed to do just from reading the code. It's already slightly broken if a third-partyException uses the code 0x37313BC for its own purpose. Checking the git commit history reveals this: https://github.com/symfony/symfony/pull/25669 I'd say that this code is already broken, similarly to ArrayAdapter. All this complicated logic with checking the error code can likely be simplified to catch(Throwable). - Behavior will change for "wrap Throwables", because the catch block will no longer catch. However I'd say that the code will be much clearer going forward with catch(UnserializationFailedException). In the PR discussion you point out "robustness" and the unified Exception type will provide robustness. - Behavior will change for E_FOO to Exception, because the type of Exception will change and thus the catch block no longer catches. Same as "wrap Throwables". src/Symfony/Component/VarDumper/Server/DumpServer.php: ====================================================== - Behavior will not change for "wrap Throwables", because this will not throw due to allowed_classes. - Behavior will change for E_FOO to Exception, because the '@' no longer works. A catch block needs to be added. src/Symfony/Component/VarExporter/Internal/Registry.php: ======================================================== The first unserialize: - It will continue to work as is for "wrap Throwables", because it does not have a catch() block. - It will likely also continue to work as is for E_FOO to Exception, because no explicit error handler is set. However I see that 'getClassReflector' throws specific types of Exception, so that might no longer work. To fix this: catch (UnserializationFailedException $e) { throw $e->getPrevious(); } would need to be added to restore the original behavior. The second unserialize is already broken for DateTimeImmutable, similar to ArrayAdapter: https://3v4l.org/8ZmOS - Behavior will change for "wrap Throwables", because Errors will now be caught. In this case I'd say that this is intended behavior. - It will continue to work as is for E_FOO to Exception, because the code already throws an Exception when false is returned and the same type of Exception is thrown in the catch(). ========================================================================= > >> I'm asking, because the examples fail to explain the "why". Why is the >> code written like this? I fail to see how catching "Exception", but not >> catching "Error" during unserialization is ever going to be useful. >> > > It's useful because these two roots mean very different things. I understand the distinction between Error and Exception in general. However I've specifically asked about the distinction in the context of unserialization. An Exception during unserialization is not more or less a programmer error compared to an Error. You generally put in some opaque string into unserialize and you hopefully get out some useful value. If the string is broken you either get an E_FOO or some random Throwable you can't control. You can't magically fix the input string and you can't really prevent the errors happening either. As pointed out in the list above: DateTimeImmutable will already throw an Error, whereas other classes will throw an Exception. That does not mean that an error during unserialization of DateTimeImmutable is worse, because the choice is pretty arbitrary. The documentation of unserialize() (https://www.php.net/manual/en/function.unserialize.php) also indicates that: > Objects may throw Throwables in their unserialization handlers. Further indicating that you are not guaranteed to only get Exception or only get Error out of it. > >> Likewise I don't see how catching ErrorException specifically is useful >> when '__unserialize()' might throw arbitrary Throwables. > > > It's useful when all you need is eg to care about the exceptions thrown by > a custom error handler, and want to ignore the other ones (because other > exceptions aren't part of the current domain layer.) > Basically see response to previous quote. >> Quoting myself, because the examples didn't answer the question: "I am >> unable to think of a situation where the input data is well-defined >> enough to reliably throw a specific type of exception, but not >> well-defined enough to successfully unserialize". >> >> And don't get me wrong: I'm not attempting to say that it is appropriate >> to break logic that I personally consider wrong without justification. I >> believe the benefits of having the unified Exception outweigh the >> drawbacks of introducing a behavioral change in code that is already >> subtly broken depending on the exact input value passed to unserialize(). >> > > The thoughts you describe might make sense as a rule of thumb principle > when you write your code. It might even be upgraded to a best practice. > Generally turning an Error to an Exception is not by my book, but I'm not > arguing about this. My point is : /!\ BC break ahead, do not cross /!\ > > If you look at the gist I linked before, wrapping all throwables would > break most if not all the cases I listed. That's bad numbers, and I don't > think this is specific in any way to the Symfony codebase. > Based on my analysis and understanding of the linked examples, the only one that *might* break (I don't fully understand it), is the one in src/Symfony/Component/VarExporter/Internal/Registry.php. The others will either no change or change to be more correct. > >> This is similar to the attribute syntax breaking hash comments that >> start with a square bracket without any space in the middle. Or the '@@' >> attribute syntax that was also proposed. It would have broken code >> containing duplicated error suppression operators. >> >> Similarly to the attribute breakage, it's also reasonably easy to find >> possibly affected logic by grepping for 'unserialize('. >> > > Sorry to disagree, that's a very different sort of break. One that changes > the behavior of perfectly fine apps. > Disagreeing is fine. Without disagreement we would not need the RFC process. However I disagree on the "perfectly fine" part. I believe that some of the Symfony examples are already subtly broken (the DateTimeImmutable part). However I also have another RFC as an example of a behavioral change that is much harder to detect and check statically than my proposed unserialize changes where it suffices to check the catch block and '@' symbol. "RFC: Static variables in inherited methods" - https://wiki.php.net/rfc/static_variable_inheritance That RFC actually changed the behavior of the application I maintain. You voted in favor of the RFC, but did not participate in the discussion. The fix was simple and I agree that the new behavior is better, so no hard feelings. I'd be curious though how the breakage from that RFC is more acceptable to you than the anticipated breakage from this RFC? Honest question to understand your PoV better. >> >> The difference is that for the majority of functions the possible >> failure cases are known and can be ruled out statically. As an example I >> can be sure that count() will not throw if I pass it an array and I >> don't change the $mode. This is not true for unserialize() which >> internally might call arbitrary __unserialize() callbacks. >> > > True, that's expected to me given what unserialize does. I don't see the > need to wrap sorry (even ignoring the BC break.) Turning Error into > Exception would be especially bad to me, eg a ParseError should remain a > ParseError so that it can follow the usual path for reporting parse errors. > See above how Error vs Exception IMO is meaningless for unserialize(). > > Thanks, I hoped I could slip in this idea but I'll keep it on my todo list > for php-src :) > I've had a quick look at the implementation. Making the necessary changes to the C code shouldn't be too much effort. The hard part is writing the RFC and agreeing on a behavior :-) Best regards Tim Düsterhus