Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:118672
MIME-Version: 1.0
References: <530b3a9d-0ee4-6061-8c69-df672d238032@bastelstu.be>
 <CAOWwgpnQuRMv4Uv=bM+QNBPeUC8BCk4f1kYOcg8EUrYD1WoRzQ@mail.gmail.com>
 <51a57a4d-ccb5-a581-205e-e2684e4db8c8@bastelstu.be> <CAOWwgp=W94c4Vzkn7HZkzWESjaE6bm8yCUGTD1+X1eiHQsCTjw@mail.gmail.com>
 <81f1ceb4-1086-3f1a-942d-aebbb5c2712f@bastelstu.be>
In-Reply-To: <81f1ceb4-1086-3f1a-942d-aebbb5c2712f@bastelstu.be>
Date: Tue, 20 Sep 2022 19:40:46 +0200
Message-ID: <CAOWwgpko4i=Af6ybAy+p5N7zEjGnWhh+ZCxYy6=nJ7J7k+KJVw@mail.gmail.com>
To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= <tim@bastelstu.be>
Cc: internals@lists.php.net
Content-Type: multipart/alternative; boundary="000000000000c10a9805e91f54c7"
Subject: Re: [PHP-DEV] RFC [Discussion]: Improve unserialize() error handling
From: nicolas.grekas+php@gmail.com (Nicolas Grekas)

--000000000000c10a9805e91f54c7
Content-Type: text/plain; charset="UTF-8"

Hi Tim,

I'm a bit busy with conferences these days...

On 9/12/22 21:46, Nicolas Grekas wrote:>> unserialize() is a generic
> function that will also call arbitrary
> >> callbacks. You already have no guarantees whatsoever about the kind of
> >> exception that is thrown from it. I am unable to think of a situation
> >> where the input data is well-defined enough to reliably throw a specific
> >> type of exception, but not well-defined enough to successfully
> unserialize.
> >>
> >> So on the contrary wrapping any Exceptions with a well-defined Exception
> >> allows you to rely on a specific and stable Exception to catch in the
> >> first place.
> >>
> >> As you specifically mention catch(Throwable), I'd like to point out that
> >> this will continue to work, because UnserializationFailedException
> >> implements Throwable.
> >>
> >
> > You might have misunderstood my point so let me give two examples:
> >
> > Example 1.
> > set_error_handler(fn ($t, $m) => throw new ErrorException($m));
> > try {
> >     $ser = serialize($value);
> > catch (ErrorException $e) {
> >    // deal with $e
> > finally {
> >    restore_error_handler();
> > }
> >
> > Example 2.
> > try {
> >     $ser = @serialize($value);
> > catch (Exception $e) {
> >    // deal with $e but don't catch Error on purpose, to let them bubble
> down
> > }
> >
> > Changing serialize to wrap any throwables into an
> > UnserializationFailedException would break both examples. That's what I
> > mean when I say this breaks BC. As any BC-break, this might cause big
> > troubles for the community and this should be avoided. The release
> process
> > policy I mentioned above is a wise document.
> >
>
> First I'd like to note that these examples use 'serialize()' (which is
> not touched by my RFC, because serialize rarely fails). I consider this
> a typo and will answer as if you used 'unserialize()'. If that was not a
> typo, then this might explain the confusion.
>

It was a typo! I meant unserialize() yes.


> Are those two examples based on real-world use cases or did you craft
> them specifically to point out how the proposal would introduce a
> behavioral change?
>

They were inspired by what I found in the Symfony codebase.

I just conducted a quick lookup at this code base: most call to
unserialize() are not checked at all, but I hghlighted the checked ones in
this gist:
https://gist.github.com/nicolas-grekas/5dd3169f94ed3b4576152605330824fe



> I'm asking, because the examples fail to explain the "why". Why is the
> code written like this? I fail to see how catching "Exception", but not
> catching "Error" during unserialization is ever going to be useful.
>

It's useful because these two roots mean very different things.

That's why Error has been introduced in the first place: instances of Error
are not just exceptions, they're really programming mistakes. As such the
best practice to me is to *not* catch them in domain layers, and let them
bubble down to a generic Error catch layer instead.



> Likewise I don't see how catching ErrorException specifically is useful
> when '__unserialize()' might throw arbitrary Throwables.


It's useful when all you need is eg to care about the exceptions thrown by
a custom error handler, and want to ignore the other ones (because other
exceptions aren't part of the current domain layer.)


Specifically
> see the list in this email: https://externals.io/message/118311. Code
> outside of the function that actually calls unserialize() is not going
> to be prepared to usefully handle unserialization failure. This
> effectively means that the Exception will bubble up to the global
> Exception handler (or the Exception handler middleware), resulting in an
> aborted request.
>

Yep, this is fine in many cases.



> Quoting myself, because the examples didn't answer the question: "I am
> unable to think of a situation where the input data is well-defined
> enough to reliably throw a specific type of exception, but not
> well-defined enough to successfully unserialize".
>
> And don't get me wrong: I'm not attempting to say that it is appropriate
> to break logic that I personally consider wrong without justification. I
> believe the benefits of having the unified Exception outweigh the
> drawbacks of introducing a behavioral change in code that is already
> subtly broken depending on the exact input value passed to unserialize().
>

The thoughts you describe might make sense as a rule of thumb principle
when you write your code. It might even be upgraded to a best practice.
Generally turning an Error to an Exception is not by my book, but I'm not
arguing about this. My point is : /!\ BC break ahead, do not cross /!\

If you look at the gist I linked before, wrapping all throwables would
break most if not all the cases I listed. That's bad numbers, and I don't
think this is specific in any way to the Symfony codebase.



> This is similar to the attribute syntax breaking hash comments that
> start with a square bracket without any space in the middle. Or the '@@'
> attribute syntax that was also proposed. It would have broken code
> containing duplicated error suppression operators.
>
> Similarly to the attribute breakage, it's also reasonably easy to find
> possibly affected logic by grepping for 'unserialize('.
>

Sorry to disagree, that's a very different sort of break. One that changes
the behavior of perfectly fine apps.


>>> About 2., unserialize() accepts a second argument to give it options.
> Did
> >>> you consider adding a 'throw_on_error' option to opt-in into the
> behavior
> >>> you're looking for? I think that would be a nice replacement for the
> >>> current logics based on set_error_handler(). If we want to make PHP 9
> >> throw
> >>> by default, we could then decide to deprecate *not* passing this
> option.
> >>
> >> I did not consider this when writing the RFC. I now considered it, and I
> >> do not believe adding a flag to control this a good thing.
> >>
> >> 1. "No one" is going to set that flag, because it requires additional
> >> effort. I strongly believe that the easiest solution should also the
> >> correct solution for the majority of use cases. The flag fails this
> >> "test", because the correct solution should be "don't fail silently".
> >>
> >
> > Unless we deprecate calling the method without the argument as I
> suggested.
> > I agree, it might not be ideal to ask the community to rewrite every call
> > to serialize($v) to serialize($v, ['throw_on_error' => true/false]) but
> > that's still way better than a BC break.
> > Maybe there's another way that doesn't break BC. I'm looking forward to
> one.
>
> See (4) and above regarding my opinion on the BC break.
>
> > 2. If you actually want to set that option in e.g. a library, then you
> >> break compatibility with older PHP versions for no real gain. If you go
> >> all the way and remember to add that extra flag, then you can also write
> >> an unserialize wrapper that does the set_error_handler dance I've shown
> >> in the introduction of the RFC. Similar effort to adding the flag, but
> >> better compatibility.
> >>
> >
> > I think my proposal provides BC with older versions of php: passing extra
> > (unused) options is allowed and polyfilling
> UnserializationFailedException
> > is trivial, so one could set_error_handler(fn (...) => throw new
> > UnserializationFailedException(...)) and be both BC and FC.
> >
> > About writing a wrapper, that's what we all do right now, each with our
> own
> > wrapping logic. You're looking for unifying those logics and I support
> that
> > goal.
>
> I come from a userland developer background, but without the well-known
> frameworks in the back. I must admit that I am not currently using a
> wrapper around unserialize(). Partly because the application I work on
> uses a global throwing error handler (see (3)), partly because the
> non-E_SOMETHING error cases were not obvious to me before researching
> this RFC which came to life because of my work on PHP 8.2's ext/random.
>
> So to clarify: What I'm looking for is making unserialize() safe and
> well-defined to use, so that wrapping logic is no longer required.
>
> > But this leads to another idea : what about adding a new function, let's
> > name it unserialize2() for now, and deprecate unserialize()? That'd
> provide
> > the upgrade path we need.
> >
>
> This sounds like another instance of mysql_real_escape_string() to me.
> Also the default choice and the one documented all around the Internet
> is still going to be 'unserialize()', because it's the more obvious one.
>
> >> 3. It does literally nothing for users that use a throwing error
> >> handler, which to my understanding includes the vast majority of
> >> framework code.
> >>
> >
> > I'm not sure what you mean here. The flag would allow removing those
> error
> > handlers, which is what you want to achieve in the end I believe (well,
> > except when BC with older versions is desired of course, see my previous
> > comment.)
> >
>
> I likely wasn't clear here: I am talking about the global throwing error
> handler that is added by frameworks during the bootstrapping of the
> request.
>
> I've tested with Symfony (controller attached): When I just call
> `unserialize('asdf')` within a controller, then an ErrorException comes
> out of it, because of the global error handler. If I don't catch that
> one then Symfony's Exception handler will show me a pretty error page
> with a ghost.
>
> The same is true for Laravel the last time I tested and I suspect this
> also is true for other frameworks.
>
> For all users that use such a global error handler setting the option or
> not setting the option will result in 100% identical behavior.
>
> >> 4. Even for users that do not use a throwing error handler, omitting the
> >> option literally does nothing, because unserialize() might already throw
> >> depending on what the unserialize handlers of unserialized objects do.
> >>
> >
> > Sure. I don't see how that's a problem. Throwing is what almost any lines
> > of PHP code can do. We're used to dealing with that and I don't see why
> > serialize() needs special care. I might have missed the point you want to
> > make here though.
> >
>
> The difference is that for the majority of functions the possible
> failure cases are known and can be ruled out statically. As an example I
> can be sure that count() will not throw if I pass it an array and I
> don't change the $mode. This is not true for unserialize() which
> internally might call arbitrary __unserialize() callbacks.
>

True, that's expected to me given what unserialize does. I don't see the
need to wrap sorry (even ignoring the BC break.) Turning Error into
Exception would be especially bad to me, eg a ParseError should remain a
ParseError so that it can follow the usual path for reporting parse errors.



> >>> Lastly I'd like to add a 3. to your proposal, because there is one more
> >>> thing that makes unserialization annoying: the
> unserialize_callback_func
> >>> f the results are copies of php-src that match the .phpt tests.
> >
> > PHP creates a __PHP_Incomplete_Class object, but I regularly use this
> > callback to throw instead when I don't want those strange incomplete
> > objects (note that I also have seen use cases where generating a
> > __PHP_Incomplete_Class *is* the desired behavior.)
> >
>
> I see. So you are not so much interested in the callback specifically,
> but a "throw_for_unknown_classes" flag would also solve your use case.
> Personally I think such having such a flag would be clearer with regard
> to the intent.
>

I checked at the code and we use varying domain exceptions + messages so a
flag wouldn't be enough.
There's also a case where we call trigger_error() to connect missing
classes to the regular error handler.


> I proposed that because in many situations unserializing a payload to
> > something that contains __PHP_Incomplete_Class objects is not desired and
> > should be leveled up to an error instead. That's the relation with error
> > handling and thus with the RFC.
> >
>
> That said, I believe adding a flag to *generate* additional errors (be
> it via a callback or via my proposal) is orthogonal to *handling* errors
> and thus is out of scope for *this* RFC. Your suggestion likely has
> plenty of bikeshedding potential on its own.
>
> I think that being able to cleanly make unknown classes an error would
> be a useful addition, though. I recommend that you create a separate RFC
> for that once mine went through the vote, so that your's can build on
> the results of mine.
>

Thanks, I hoped I could slip in this idea but I'll keep it on my todo list
for php-src :)


Nicolas

--000000000000c10a9805e91f54c7--