Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118698 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 94997 invoked from network); 28 Sep 2022 17:53:47 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 28 Sep 2022 17:53:47 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 192791804AC for ; Wed, 28 Sep 2022 10:53:46 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-oi1-f181.google.com (mail-oi1-f181.google.com [209.85.167.181]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 28 Sep 2022 10:53:45 -0700 (PDT) Received: by mail-oi1-f181.google.com with SMTP id d64so16198525oia.9 for ; Wed, 28 Sep 2022 10:53:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=b/zzOS46GAxKChaZlqT9Ez3neSY3fAEjl747fFOjadQ=; b=QacUX86GgLlWlRM9M+meNaskWi9Pt0SnvMvV+T4Z+UvydBWoh3A/le1Jg374hgcy0x U1KmxoUpyO8dOa0oh3poaGRLUQLxOBpxCi/ZaqNwDT7jABn8VSgyW6/RoBaJFtjgsbOg q8Tf9CcdsrNHm7gattrnv/Rgqy94bJ9y3Kkel5YqtXJp6M2nz1+Q5ODbgxvdw+WVhgCh OmU+mV0olu3TckvNrIpQ6Ptpb324YVSc3qBPyBPYc16YTQsRvmV7nUCXYofWgFpxtoh8 8lz+VWEecnnov5qiMsIKh6FZkBsauPqXzllciZtRhq7oPywalcIb1yoAqsdCKfJgIMFq AiKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=b/zzOS46GAxKChaZlqT9Ez3neSY3fAEjl747fFOjadQ=; b=ab5m4CeLpCdNy1125P7iANwqwlyRFH4eIy/Z43kBLsZhVP3XzxXCXq7QLQoVeFxNJe nVjI8z7nfjNdIzWuXGLYtaRtomAx/fmoJbAlEO3c4/vTSHI0djPqUvyzGVBZtxtW3+o5 iEdCNVxBf9r7Nmy4Nvx/5bXFqN/1GPeNcl+Th5d+jsm72Mbz/G2nJRZV9mAzXl6OHCoH yBP6+ERoYq9gidR3IXs/5209I2L4MSH83QVrzne5LetDq5xXnxf2bRRczMbOoHEFQp/+ FAbJ9KUHbvgc0Bm3EwP+hEcHMmX0+4aNXgeUo0884xWkmIuS1W54q+qPxSRyMwBN1qNQ KNAQ== X-Gm-Message-State: ACrzQf1pIJ63hz3H62S3LzkzaAFXIzUi9lG/8m0WBkZksBnRCDW2fA4E O3we3GzCtDLDvD/w+eibcpjJUmcLK7RaDtUgXu2/2ixZfDs= X-Google-Smtp-Source: AMsMyM66GsMjZWcunrFwBh16/u5Ez7RALeWxj6N3ZM0YTKbJahCayMehA2hz7hIQb6+tYX8gGvZ+UqEq6JR7kQ+rQL0= X-Received: by 2002:a05:6808:2012:b0:34f:ca73:ee55 with SMTP id q18-20020a056808201200b0034fca73ee55mr5196097oiw.229.1664387624184; Wed, 28 Sep 2022 10:53:44 -0700 (PDT) MIME-Version: 1.0 References: <530b3a9d-0ee4-6061-8c69-df672d238032@bastelstu.be> <51a57a4d-ccb5-a581-205e-e2684e4db8c8@bastelstu.be> <81f1ceb4-1086-3f1a-942d-aebbb5c2712f@bastelstu.be> <94b9c2ab-70d5-48e1-50f8-409a3073023a@bastelstu.be> In-Reply-To: <94b9c2ab-70d5-48e1-50f8-409a3073023a@bastelstu.be> Date: Wed, 28 Sep 2022 19:53:32 +0200 Message-ID: To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="0000000000001bb1ad05e9c07183" Subject: Re: [PHP-DEV] RFC [Discussion]: Improve unserialize() error handling From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --0000000000001bb1ad05e9c07183 Content-Type: text/plain; charset="UTF-8" Hi Tim, I'm not quoting the full text of the discussion because the php ML server refused the message - too long it said. > this gist: > > https://gist.github.com/nicolas-grekas/5dd3169f94ed3b4576152605330824fe > > > > > > [...] (see previous messages on the thread for this missing part) > > >> I'm asking, because the examples fail to explain the "why". Why is the > >> code written like this? I fail to see how catching "Exception", but not > >> catching "Error" during unserialization is ever going to be useful. > >> > > > > It's useful because these two roots mean very different things. > > I understand the distinction between Error and Exception in general. > > However I've specifically asked about the distinction in the context of > unserialization. An Exception during unserialization is not more or less > a programmer error compared to an Error. You generally put in some > opaque string into unserialize and you hopefully get out some useful > value. If the string is broken you either get an E_FOO or some random > Throwable you can't control. You can't magically fix the input string > and you can't really prevent the errors happening either. > > As pointed out in the list above: DateTimeImmutable will already throw > an Error, whereas other classes will throw an Exception. That does not > mean that an error during unserialization of DateTimeImmutable is worse, > because the choice is pretty arbitrary. > > The documentation of unserialize() > (https://www.php.net/manual/en/function.unserialize.php) also indicates > that: > > > Objects may throw Throwables in their unserialization handlers. > > Further indicating that you are not guaranteed to only get Exception or > only get Error out of it. > > > > >> Likewise I don't see how catching ErrorException specifically is useful > >> when '__unserialize()' might throw arbitrary Throwables. > > > > > > It's useful when all you need is eg to care about the exceptions thrown > by > > a custom error handler, and want to ignore the other ones (because other > > exceptions aren't part of the current domain layer.) > > > > Basically see response to previous quote. > > >> Quoting myself, because the examples didn't answer the question: "I am > >> unable to think of a situation where the input data is well-defined > >> enough to reliably throw a specific type of exception, but not > >> well-defined enough to successfully unserialize". > >> > >> And don't get me wrong: I'm not attempting to say that it is appropriate > >> to break logic that I personally consider wrong without justification. I > >> believe the benefits of having the unified Exception outweigh the > >> drawbacks of introducing a behavioral change in code that is already > >> subtly broken depending on the exact input value passed to > unserialize(). > >> > > > > The thoughts you describe might make sense as a rule of thumb principle > > when you write your code. It might even be upgraded to a best practice. > > Generally turning an Error to an Exception is not by my book, but I'm not > > arguing about this. My point is : /!\ BC break ahead, do not cross /!\ > > > > If you look at the gist I linked before, wrapping all throwables would > > break most if not all the cases I listed. That's bad numbers, and I don't > > think this is specific in any way to the Symfony codebase. > > > > Based on my analysis and understanding of the linked examples, the only > one that *might* break (I don't fully understand it), is the one in > src/Symfony/Component/VarExporter/Internal/Registry.php. The others will > either no change or change to be more correct. > > > > >> This is similar to the attribute syntax breaking hash comments that > >> start with a square bracket without any space in the middle. Or the '@@' > >> attribute syntax that was also proposed. It would have broken code > >> containing duplicated error suppression operators. > >> > >> Similarly to the attribute breakage, it's also reasonably easy to find > >> possibly affected logic by grepping for 'unserialize('. > >> > > > > Sorry to disagree, that's a very different sort of break. One that > changes > > the behavior of perfectly fine apps. > > > > Disagreeing is fine. Without disagreement we would not need the RFC > process. However I disagree on the "perfectly fine" part. I believe that > some of the Symfony examples are already subtly broken (the > DateTimeImmutable part). > As you saw, the BC break is real. That's what I wanted to highlight and what worries me for the community at large. This is not some niche BC break. There is also an argument that is missing in this conversation and that Yasuo made (in another thread by mistake I guess?): In general, unserializing external serialised data is the same as "Read and > execute remote PHP code" > (Executing arbitrary code via memory corruption is trivial task with > unserialize()) > > Therefore, developers must validate serialized data integrity at least > with HMAC always, so that serialized data is generated by trustworthy > systems > (your servers usually. Since HMAC does not have forward security, HKDF > with expiration info is recommended.) > > If these restrictions are for security, then these restrictions won't > protect anything. If these are for misuse protections, > then these would be "nice to have". > In all valid use cases of unserialize(), we do trust the backend where we get serialized payloads from. This means constructed broken payloads are not something we need to deal with (the DateTimeImmutable example you gave). The thing we need to guard against is much narrower. That makes the problem this PR aims to solve much smaller. Truncated payloads are a thing because of race conditions or unreliable network services. FC/BC of serialized payloads is another thing that can only be dealt with by implementing __unserialize() or the likes. And that's mostly it. Other errors (e.g. parse error) are not caused by unserialize itself, but by regular user code. However I also have another RFC as an example of a behavioral change > that is much harder to detect and check statically than my proposed > unserialize changes where it suffices to check the catch block and '@' > symbol. > > "RFC: Static variables in inherited methods" - > https://wiki.php.net/rfc/static_variable_inheritance > > That RFC actually changed the behavior of the application I maintain. > You voted in favor of the RFC, but did not participate in the > discussion. The fix was simple and I agree that the new behavior is > better, so no hard feelings. I'd be curious though how the breakage from > that RFC is more acceptable to you than the anticipated breakage from > this RFC? Honest question to understand your PoV better. > Rereading the RFC, Nikita described a niche and obscure behavior of the engine. There are no numbers to support that understanding (how "niche" this was) and that's missing in the RFC. I remember other RFCs which did have numbers to quantify the impact. I guess it was hard to get that number. And that's also why I made this gist: so that the impact of the BC break could be evaluated somehow. Many spots in Symfony likely means many more spots in the PHP community. I'm not going to be able to conduct a larger analysis of this impact, but that can still give an intuition of it to readers. Mine is: it's likely going to be impactful, not in a good way. > >> The difference is that for the majority of functions the possible > >> failure cases are known and can be ruled out statically. As an example I > >> can be sure that count() will not throw if I pass it an array and I > >> don't change the $mode. This is not true for unserialize() which > >> internally might call arbitrary __unserialize() callbacks. > >> > > > > True, that's expected to me given what unserialize does. I don't see the > > need to wrap sorry (even ignoring the BC break.) Turning Error into > > Exception would be especially bad to me, eg a ParseError should remain a > > ParseError so that it can follow the usual path for reporting parse > errors. > > > > See above how Error vs Exception IMO is meaningless for unserialize(). > > > > > Thanks, I hoped I could slip in this idea but I'll keep it on my todo > list > > for php-src :) > > > > I've had a quick look at the implementation. Making the necessary > changes to the C code shouldn't be too much effort. The hard part is > writing the RFC and agreeing on a behavior :-) > ACK :) Cheers, Nicolas --0000000000001bb1ad05e9c07183--