Hi
I've now written up an RFC as a follow-up for the "What type of
Exception to use for unserialize()
failure?" thread [1]:
RFC: Improve unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handling
Proof of concept implementation is in:
https://github.com/php/php-src/pull/9425
Discussion period for that RFC is officially opened up.
The primary point of discussion in the previous mailing list thread and
in the PR comments is whether unserialize()
should continue to emit
E_WARNING
or whether that should consistently be changed to an
Exception. As of now I plan to explicitly vote on this and the RFC
contains some opinions on that matter.
Best regards
Tim Düsterhus
Hi
I've now written up an RFC as a follow-up for the "What type of
Exception to use forunserialize()
failure?" thread [1]:
RFC: Improve
unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handlingProof of concept implementation is in:
https://github.com/php/php-src/pull/9425
Discussion period for that RFC is officially opened up.
The primary point of discussion in the previous mailing list thread and
in the PR comments is whetherunserialize()
should continue to emit
E_WARNING
or whether that should consistently be changed to an
Exception. As of now I plan to explicitly vote on this and the RFC
contains some opinions on that matter.Best regards
Tim Düsterhus
Well-explained and well-argued. The only thing I'd add is that we should consider bumping the E_NOTICE
to an E_WARNING, and slating it to increase to an exception in 9.0. This feels like a smaller BC concern than most, but people are extra sensitive these days about those edge cases so it's probably good to be cautious.
--Larry Garfield
Hi
RFC: Improve
unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handlingWell-explained and well-argued. The only thing I'd add is that we should consider bumping the
E_NOTICE
to an E_WARNING, and slating it to increase to an exception in 9.0. This feels like a smaller BC concern than most, but people are extra sensitive these days about those edge cases so it's probably good to be cautious.
Can you please clarify whether you mean:
- Change the existing
E_WARNING
option to "E_WARNING+Exception in 9.0". - Add a new "E_WARNING+Exception in 9.0" option the vote, such that the
vote will be "E_WARNING" vs "E_WARNING+Exception in 9.0" vs "Exception"
Best regards
Tim Düsterhus
Hi
RFC: Improve
unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handlingWell-explained and well-argued. The only thing I'd add is that we should consider bumping the
E_NOTICE
to an E_WARNING, and slating it to increase to an exception in 9.0. This feels like a smaller BC concern than most, but people are extra sensitive these days about those edge cases so it's probably good to be cautious.Can you please clarify whether you mean:
- Change the existing
E_WARNING
option to "E_WARNING+Exception in 9.0".- Add a new "E_WARNING+Exception in 9.0" option the vote, such that the
vote will be "E_WARNING" vs "E_WARNING+Exception in 9.0" vs "Exception"Best regards
Tim Düsterhus
Either I guess? Honestly we should decide that in advance on the list. :-) E_WARNING+Exception in 9 is what I'd probably favor, with "Exception now" as a second choice.
--Larry Garfield
Hi
Either I guess? Honestly we should decide that in advance on the list. :-) E_WARNING+Exception in 9 is what I'd probably favor, with "Exception now" as a second choice.
I'm a new-ish contributor here in internals, so I don't know how things
were done in the past for similar situations/issues.
I'm not sure, though if it makes sense to already decide on something
for PHP 9. If it's not baked into code shortly after the vote finishes,
then people might forget that "there's something that still needs to be
done". For a deprecation one can at least go through all the
deprecations once PHP 9 opens, as a deprecation effectively is defined
to be a removal in the next major. For a warning this is less obvious.
Personally my first choice would be "Straight to Exception", so I might
not be the best person to decide on that :-)
Best regards
Tim Düsterhus
Hi
Either I guess? Honestly we should decide that in advance on the list. :-) E_WARNING+Exception in 9 is what I'd probably favor, with "Exception now" as a second choice.
I'm a new-ish contributor here in internals, so I don't know how things
were done in the past for similar situations/issues.I'm not sure, though if it makes sense to already decide on something
for PHP 9. If it's not baked into code shortly after the vote finishes,
then people might forget that "there's something that still needs to be
done". For a deprecation one can at least go through all the
deprecations once PHP 9 opens, as a deprecation effectively is defined
to be a removal in the next major. For a warning this is less obvious.Personally my first choice would be "Straight to Exception", so I might
not be the best person to decide on that :-)Best regards
Tim Düsterhus
We've done this kind of two-step thing before; a lot of PHP 8 changes were voted on well in advance of 8.0's release. A "Warning now, Exception in 9" vote would not be unprecedented.
--Larry Garfield
Hi
Either I guess? Honestly we should decide that in advance on the list. :-) E_WARNING+Exception in 9 is what I'd probably favor, with "Exception now" as a second choice.
We've done this kind of two-step thing before; a lot of PHP 8 changes were voted on well in advance of 8.0's release. A "Warning now, Exception in 9" vote would not be unprecedented.
Thank you, I thought about what to do here and I've adjusted the options
in the "increase to what" vote to make this a 3-way vote:
Do you believe that my reasoning with regard to the interpretation of
the vote's results is sound? A ranked choice vote should not necessary
here, because the three options follow a natural order with regard to
severity/possible breakage.
Best regards
Tim Düsterhus
Hi
Either I guess? Honestly we should decide that in advance on the list. :-) E_WARNING+Exception in 9 is what I'd probably favor, with "Exception now" as a second choice.
We've done this kind of two-step thing before; a lot of PHP 8 changes were voted on well in advance of 8.0's release. A "Warning now, Exception in 9" vote would not be unprecedented.
Thank you, I thought about what to do here and I've adjusted the options
in the "increase to what" vote to make this a 3-way vote:Do you believe that my reasoning with regard to the interpretation of
the vote's results is sound? A ranked choice vote should not necessary
here, because the three options follow a natural order with regard to
severity/possible breakage.Best regards
Tim Düsterhus
Predicting people's second-place choice is risky business. This assumption seems logical on its face, but I'm sure there are people that will buck your expectations.
The reasoning is that unless “E_WARNING in 8.x without future decision” receives more than 50%, more than 50% prefer an Exception no later than 9.0. Unless “UnserializationFailedException in 8.x” receives more than 50%, more than 50% prefer no Exception in 8.x.
If you want to go that route, I'd go all the way to an RCV vote and be done with it. Or else just make an executive decision as the RFC author and let the chips fall where they may.
Anyone else want to weigh in here on the timeline?
--Larry Garfield
Thank you, I thought about what to do here and I've adjusted the options
in the "increase to what" vote to make this a 3-way vote:Do you believe that my reasoning with regard to the interpretation of
the vote's results is sound? A ranked choice vote should not necessary
here, because the three options follow a natural order with regard to
severity/possible breakage.Predicting people's second-place choice is risky business. This assumption seems logical on its face, but I'm sure there are people that will buck your expectations.
The reasoning is that unless “E_WARNING in 8.x without future decision” receives more than 50%, more than 50% prefer an Exception no later than 9.0. Unless “UnserializationFailedException in 8.x” receives more than 50%, more than 50% prefer no Exception in 8.x.
If you want to go that route, I'd go all the way to an RCV vote and be done with it. Or else just make an executive decision as the RFC author and let the chips fall where they may.
I'm generally not too happy with secondary votes. Sometimes you only
support the primary vote for certain secondary options; to "be sure"
that another secondary option won't "win", you'd need to vote "no" on
the primary choice.
I'd prefer a single vote with pre-selected details. I don't have any
particular preference in this case, though.
--
Christoph M. Becker
Hi
Predicting people's second-place choice is risky business. This assumption seems logical on its face, but I'm sure there are people that will buck your expectations.
The reasoning is that unless “E_WARNING in 8.x without future decision” receives more than 50%, more than 50% prefer an Exception no later than 9.0. Unless “UnserializationFailedException in 8.x” receives more than 50%, more than 50% prefer no Exception in 8.x.
If you want to go that route, I'd go all the way to an RCV vote and be done with it. Or else just make an executive decision as the RFC author and let the chips fall where they may.
I'm generally not too happy with secondary votes. Sometimes you only
support the primary vote for certain secondary options; to "be sure"
that another secondary option won't "win", you'd need to vote "no" on
the primary choice.I'd prefer a single vote with pre-selected details. I don't have any
particular preference in this case, though.
Would the increase of E_NOTICE
to E_WARNING
for the two cases that
currently emit E_NOTICE
be something that even requires a vote or is
this something that can simply be decided by merging a PR? [1] In that
case the second and third vote could be simplified to "Convert E_* to
Exception" with the regular 2/3 majority required.
[1] I would hope that unifying E_NOTICE/E_WARNING to E_WARNING
everywhere is uncontroversial.
Best regards
Tim Düsterhus
Hi
Would the increase of
E_NOTICE
toE_WARNING
for the two cases that
currently emitE_NOTICE
be something that even requires a vote or is
this something that can simply be decided by merging a PR? [1] In that
case the second and third vote could be simplified to "Convert E_* to
Exception" with the regular 2/3 majority required.[1] I would hope that unifying E_NOTICE/E_WARNING to
E_WARNING
everywhere is uncontroversial.
I've now created a standalone PR for this:
https://github.com/php/php-src/pull/9629
While creating that one, I also came across this:
https://github.com/php/php-src/pull/9630
Best regards
Tim Düsterhus
Hi
If you want to go that route, I'd go all the way to an RCV vote and be done with it. Or else just make an executive decision as the RFC author and let the chips fall where they may.
I'm generally not too happy with secondary votes. Sometimes you only
support the primary vote for certain secondary options; to "be sure"
that another secondary option won't "win", you'd need to vote "no" on
the primary choice.I'd prefer a single vote with pre-selected details. I don't have any
particular preference in this case, though.Would the increase of
E_NOTICE
toE_WARNING
for the two cases that
currently emitE_NOTICE
be something that even requires a vote or is
this something that can simply be decided by merging a PR? [1] In that
case the second and third vote could be simplified to "Convert E_* to
Exception" with the regular 2/3 majority required.
As I've been told that the E_NOTICE
to E_WARNING
also requires a vote,
I've thought about this whole matter a little more, because I agree with
Christoph that secondary votes are "not great".
In the end I've decided to make this all primary votes that more or less
build on each other, each requiring a 2/3 majority. The vote to change
E_NOTICE/E_WARNING to Exception in 8.3 was dropped to keep thing simpler
and to not have conflicting votes. Instead there is only "Unify to
E_WARNING
in 8.x" and "Upgrade to Exception in 9.0" (which may happen
independently of whether the E_NOTICE
will remain in 8.x).
That said: Discussion has come to a halt and I don't plan any more
changes to the proposal. As such I plan to open voting on Friday with
the voting widgets as they are currently within the RFC:
https://wiki.php.net/rfc/improve_unserialize_error_handling#proposed_voting_choices
Best regards
Tim Düsterhus
Hi
Either I guess? Honestly we should decide that in advance on the list.
:-) E_WARNING+Exception in 9 is what I'd probably favor, with "Exception
now" as a second choice.I'm a new-ish contributor here in internals, so I don't know how things
were done in the past for similar situations/issues.
Welcome, Tim :-)
Do give it more thought regarding the callbacks what Nikolas said, sleep on
it.
Not saying it matters, but also double check how error handling MAY differ
when in a Class::_unserialize() context. Maybe it doesn't diff, I haven't
checked in a while, but food for thought.
In any case, good cleanup initiative.
I'm not sure, though if it makes sense to already decide on something
for PHP 9. If it's not baked into code shortly after the vote finishes,
then people might forget that "there's something that still needs to be
done". For a deprecation one can at least go through all the
deprecations once PHP 9 opens, as a deprecation effectively is defined
to be a removal in the next major. For a warning this is less obvious.Personally my first choice would be "Straight to Exception", so I might
not be the best person to decide on that :-)Best regards
Tim Düsterhus--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi
Do give it more thought regarding the callbacks what Nikolas said, sleep on
it.
Are you referring to 'unserialize_callback_func' or to the unserialize
handlers (i.e. __wakeup, __unserialize, Serializable::unserialize)?
If the former, I am currently waiting for an explanation of the use
cases from someone who uses it, to be able to make an educated decision.
Not saying it matters, but also double check how error handling MAY differ
when in a Class::_unserialize() context. Maybe it doesn't diff, I haven't
checked in a while, but food for thought.
I'm afraid, this is a little vague. What possible difference (and
difference to what) do you have in mind?
In any case, good cleanup initiative.
Best regards
Tim Düsterhus
Hi Paul,
Welcome, Tim :-)
Do give it more thought regarding the callbacks what Nikolas
said, sleep on it.
Although welcoming people is nice, you might want to check that they
haven't already had two RFCs passed, and become the maintainer of
ext/random.
Otherwise "welcoming them" could give the appearance of being quite
condescending.
Actually in general, people should go out of their way to avoid
appearing to speak for an Open Source project, when they aren't
contributing regularly. It could be annoying to people who are
regularly contributing week-in, week-out.
cheers
Dan
Ack
Hi
I've now written up an RFC as a follow-up for the "What type of
Exception to use forunserialize()
failure?" thread [1]:
RFC: Improve
unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handlingProof of concept implementation is in:
https://github.com/php/php-src/pull/9425
Discussion period for that RFC is officially opened up.
The primary point of discussion in the previous mailing list thread and
in the PR comments is whetherunserialize()
should continue to emit
E_WARNING
or whether that should consistently be changed to an
Exception. As of now I plan to explicitly vote on this and the RFC
contains some opinions on that matter.Best regards
Tim Düsterhus[1] https://externals.io/message/118311
--
To unsubscribe, visit: https://www.php.net/unsub.php
Thank you Tim for the thorough investigation.
I didn't know how bad the situation was in regards to unserialization.
So I'm now tending in favour of promoting the notice/warnings to exceptions
as it is currently extremely hard to handle the behaviour correctly.
Best regards,
George P. Banyard
Le lundi 5 septembre 2022, 19:20:00 CEST Tim Düsterhus a écrit :
RFC: Improve
unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handling
Is the new UnserializationFailedException class extending any other Exception
class ? This is not explained in the RFC.
Côme
Hi
Le lundi 5 septembre 2022, 19:20:00 CEST Tim Düsterhus a écrit :
RFC: Improve
unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handlingIs the new UnserializationFailedException class extending any other Exception
class ? This is not explained in the RFC.
Yes, it necessarily extends another exception class, because \Throwable
may only be implemented by \Exception and \Error.
\UnserializationFailedException is a direct child of \Exception:
-
Making it part of the \Error hierarchy was argued against in this
comment: https://github.com/php/php-src/pull/9185#issuecomment-1199580418 -
Using a different parent class does not bring any benefit, because
the intended use is to specifically
catch(\UnserializationFailedException) and not to catch it together with
unrelated stuff.
I've also added a code block to the RFC that shows the full (and
trivial) implementation of the \UnserializationFailedException.
Best regards
Tim Düsterhus
Hi Tim,
Thanks for the RFC.
I've now written up an RFC as a follow-up for the "What type of
Exception to use for
unserialize()
failure?" thread [1]:
RFC: Improve
unserialize()
error handling
https://wiki.php.net/rfc/improve_unserialize_error_handlingProof of concept implementation is in:
https://github.com/php/php-src/pull/9425
Discussion period for that RFC is officially opened up.
The primary point of discussion in the previous mailing list thread and
in the PR comments is whetherunserialize()
should continue to emit
E_WARNING
or whether that should consistently be changed to an
Exception. As of now I plan to explicitly vote on this and the RFC
contains some opinions on that matter.
From what I understand, there are two things in the RFC:
- a proposal to wrap any throwables thrown during unserialization
inside an UnserializationFailedException - a discussion to figure out a way to turn these notices+warnings into
exceptions.
About 1., I read that you think "this is not considered an issue", but to
me, changing the kind of exceptions that a piece of code might throw is a
non negligible BC break. There needs to be serious justification for it. I
tried looking for one, but I'm not convinced there is one: replacing the
existing catch(Throwable) by catch(UnserializationFailedException) won't
provide much added value to userland, if any. And "reliably include more
than just the call unserialize()
in the try" is not worth the BC break IMHO.
About 2., unserialize()
accepts a second argument to give it options. Did
you consider adding a 'throw_on_error' option to opt-in into the behavior
you're looking for? I think that would be a nice replacement for the
current logics based on set_error_handler()
. If we want to make PHP 9 throw
by default, we could then decide to deprecate not passing this option.
Lastly I'd like to add a 3. to your proposal, because there is one more
thing that makes unserialization annoying: the unserialize_callback_func
https://www.php.net/manual/var.configuration.php#ini.unserialize-callback-func
ini setting. It would be great to be able to pass a 'callback_func' option
to unserialize to provide it, instead of calling ini_set()
as we have to
quite often right now.
Would that make sense to you?
Kind regards,
Nicolas
Hi
From what I understand, there are two things in the RFC:
1. a proposal to wrap any throwables thrown during unserialization inside an UnserializationFailedException
Correct.
2. a discussion to figure out a way to turn these notices+warnings into exceptions.
Partly correct:
The RFC primarily proposes unifying the existing non-Exception errors
(which are currently split between E_WARNING
and E_NOTICE
with no
apparent pattern). To what exactly (E_WARNING or some Exception) they
are unified is up for discussion/vote.
About 1., I read that you think "this is not considered an issue", but to
me, changing the kind of exceptions that a piece of code might throw is a
non negligible BC break. There needs to be serious justification for it. I
tried looking for one, but I'm not convinced there is one: replacing the
existing catch(Throwable) by catch(UnserializationFailedException) won't
provide much added value to userland, if any. And "reliably include more
than just the callunserialize()
in the try" is not worth the BC break IMHO.
unserialize()
is a generic function that will also call arbitrary
callbacks. You already have no guarantees whatsoever about the kind of
exception that is thrown from it. I am unable to think of a situation
where the input data is well-defined enough to reliably throw a specific
type of exception, but not well-defined enough to successfully unserialize.
So on the contrary wrapping any Exceptions with a well-defined Exception
allows you to rely on a specific and stable Exception to catch in the
first place.
As you specifically mention catch(Throwable), I'd like to point out that
this will continue to work, because UnserializationFailedException
implements Throwable.
About 2.,
unserialize()
accepts a second argument to give it options. Did
you consider adding a 'throw_on_error' option to opt-in into the behavior
you're looking for? I think that would be a nice replacement for the
current logics based onset_error_handler()
. If we want to make PHP 9 throw
by default, we could then decide to deprecate not passing this option.
I did not consider this when writing the RFC. I now considered it, and I
do not believe adding a flag to control this a good thing.
-
"No one" is going to set that flag, because it requires additional
effort. I strongly believe that the easiest solution should also the
correct solution for the majority of use cases. The flag fails this
"test", because the correct solution should be "don't fail silently". -
If you actually want to set that option in e.g. a library, then you
break compatibility with older PHP versions for no real gain. If you go
all the way and remember to add that extra flag, then you can also write
an unserialize wrapper that does the set_error_handler dance I've shown
in the introduction of the RFC. Similar effort to adding the flag, but
better compatibility. -
It does literally nothing for users that use a throwing error
handler, which to my understanding includes the vast majority of
framework code. -
Even for users that do not use a throwing error handler, omitting the
option literally does nothing, becauseunserialize()
might already throw
depending on what the unserialize handlers of unserialized objects do.
Lastly I'd like to add a 3. to your proposal, because there is one more
thing that makes unserialization annoying: the unserialize_callback_func
https://www.php.net/manual/var.configuration.php#ini.unserialize-callback-func
ini setting. It would be great to be able to pass a 'callback_func' option
to unserialize to provide it, instead of callingini_set()
as we have to
quite often right now.Would that make sense to you?
TIL about that ini setting. Can you clarify where this callback comes in
helpful? What can it do for you what your autoloader can't do? I've
attempted to search GitHub to find out about the use cases, but almost
all of the results are copies of php-src that match the .phpt tests.
However as of now I do not believe that it is appropriate to include in
my RFC, because it is only indirectly related to error handling. I'd
like to keep the RFC focused. This makes it easier for readers to
understand the proposal, allowing voters to make an educated vote and
serving as longer-term documentation if the vote passes.
Best regards
Tim Düsterhus
From what I understand, there are two things in the RFC:
1. a proposal to wrap any throwables thrown during unserialization inside an UnserializationFailedException
Correct.
2. a discussion to figure out a way to turn these notices+warnings
into
exceptions.
Partly correct:
The RFC primarily proposes unifying the existing non-Exception errors
(which are currently split betweenE_WARNING
andE_NOTICE
with no
apparent pattern). To what exactly (E_WARNING or some Exception) they
are unified is up for discussion/vote.
Thanks for clarifying. Our accepted release process policy
https://wiki.php.net/rfc/releaseprocess says "Backward compatibility must
be respected with the same major releases". I think it's critical to stick
to this policy as much as possible and also to plan for the smoothest
upgrade path when preparing the next major (deprecations FTW.)
That's what I have in mind when I review the RFC and where I don't see the
match yet. I know we're only at the beginning of the discussion and we're
still looking for what would work best.
About 1., I read that you think "this is not considered an issue", but to
me, changing the kind of exceptions that a piece of code might throw is a
non negligible BC break. There needs to be serious justification for it.
I
tried looking for one, but I'm not convinced there is one: replacing the
existing catch(Throwable) by catch(UnserializationFailedException) won't
provide much added value to userland, if any. And "reliably include more
than just the callunserialize()
in the try" is not worth the BC break
IMHO.
unserialize()
is a generic function that will also call arbitrary
callbacks. You already have no guarantees whatsoever about the kind of
exception that is thrown from it. I am unable to think of a situation
where the input data is well-defined enough to reliably throw a specific
type of exception, but not well-defined enough to successfully unserialize.So on the contrary wrapping any Exceptions with a well-defined Exception
allows you to rely on a specific and stable Exception to catch in the
first place.As you specifically mention catch(Throwable), I'd like to point out that
this will continue to work, because UnserializationFailedException
implements Throwable.
You might have misunderstood my point so let me give two examples:
Example 1.
set_error_handler(fn ($t, $m) => throw new ErrorException($m));
try {
$ser = serialize($value);
catch (ErrorException $e) {
// deal with $e
finally {
restore_error_handler()
;
}
Example 2.
try {
$ser = @serialize($value);
catch (Exception $e) {
// deal with $e but don't catch Error on purpose, to let them bubble down
}
Changing serialize to wrap any throwables into an
UnserializationFailedException would break both examples. That's what I
mean when I say this breaks BC. As any BC-break, this might cause big
troubles for the community and this should be avoided. The release process
policy I mentioned above is a wise document.
About 2.,
unserialize()
accepts a second argument to give it options. Did
you consider adding a 'throw_on_error' option to opt-in into the behavior
you're looking for? I think that would be a nice replacement for the
current logics based onset_error_handler()
. If we want to make PHP 9
throw
by default, we could then decide to deprecate not passing this option.I did not consider this when writing the RFC. I now considered it, and I
do not believe adding a flag to control this a good thing.
- "No one" is going to set that flag, because it requires additional
effort. I strongly believe that the easiest solution should also the
correct solution for the majority of use cases. The flag fails this
"test", because the correct solution should be "don't fail silently".
Unless we deprecate calling the method without the argument as I suggested.
I agree, it might not be ideal to ask the community to rewrite every call
to serialize($v) to serialize($v, ['throw_on_error' => true/false]) but
that's still way better than a BC break.
Maybe there's another way that doesn't break BC. I'm looking forward to one.
- If you actually want to set that option in e.g. a library, then you
break compatibility with older PHP versions for no real gain. If you go
all the way and remember to add that extra flag, then you can also write
an unserialize wrapper that does the set_error_handler dance I've shown
in the introduction of the RFC. Similar effort to adding the flag, but
better compatibility.
I think my proposal provides BC with older versions of php: passing extra
(unused) options is allowed and polyfilling UnserializationFailedException
is trivial, so one could set_error_handler(fn (...) => throw new
UnserializationFailedException(...)) and be both BC and FC.
About writing a wrapper, that's what we all do right now, each with our own
wrapping logic. You're looking for unifying those logics and I support that
goal.
But this leads to another idea : what about adding a new function, let's
name it unserialize2() for now, and deprecate unserialize()
? That'd provide
the upgrade path we need.
- It does literally nothing for users that use a throwing error
handler, which to my understanding includes the vast majority of
framework code.
I'm not sure what you mean here. The flag would allow removing those error
handlers, which is what you want to achieve in the end I believe (well,
except when BC with older versions is desired of course, see my previous
comment.)
- Even for users that do not use a throwing error handler, omitting the
option literally does nothing, becauseunserialize()
might already throw
depending on what the unserialize handlers of unserialized objects do.
Sure. I don't see how that's a problem. Throwing is what almost any lines
of PHP code can do. We're used to dealing with that and I don't see why
serialize()
needs special care. I might have missed the point you want to
make here though.
Lastly I'd like to add a 3. to your proposal, because there is one more
thing that makes unserialization annoying: the unserialize_callback_func
<
https://www.php.net/manual/var.configuration.php#ini.unserialize-callback-funcini setting. It would be great to be able to pass a 'callback_func'
option
to unserialize to provide it, instead of callingini_set()
as we have to
quite often right now.Would that make sense to you?
TIL about that ini setting. Can you clarify where this callback comes in
helpful? What can it do for you what your autoloader can't do? I've
attempted to search GitHub to find out about the use cases, but almost
all of the results are copies of php-src that match the .phpt tests.
Sure: that callback is called when PHP tries to create an object of class
Foo but the autoloader fails to load said class. When there is no handler,
PHP creates a __PHP_Incomplete_Class object, but I regularly use this
callback to throw instead when I don't want those strange incomplete
objects (note that I also have seen use cases where generating a
__PHP_Incomplete_Class is the desired behavior.)
However as of now I do not believe that it is appropriate to include in
my RFC, because it is only indirectly related to error handling. I'd
like to keep the RFC focused. This makes it easier for readers to
understand the proposal, allowing voters to make an educated vote and
serving as longer-term documentation if the vote passes.
I proposed that because in many situations unserializing a payload to
something that contains __PHP_Incomplete_Class objects is not desired and
should be leveled up to an error instead. That's the relation with error
handling and thus with the RFC.
Nicolas
Hi
On 9/12/22 21:46, Nicolas Grekas wrote:>> unserialize()
is a generic
function that will also call arbitrary
callbacks. You already have no guarantees whatsoever about the kind of
exception that is thrown from it. I am unable to think of a situation
where the input data is well-defined enough to reliably throw a specific
type of exception, but not well-defined enough to successfully unserialize.So on the contrary wrapping any Exceptions with a well-defined Exception
allows you to rely on a specific and stable Exception to catch in the
first place.As you specifically mention catch(Throwable), I'd like to point out that
this will continue to work, because UnserializationFailedException
implements Throwable.You might have misunderstood my point so let me give two examples:
Example 1.
set_error_handler(fn ($t, $m) => throw new ErrorException($m));
try {
$ser = serialize($value);
catch (ErrorException $e) {
// deal with $e
finally {
restore_error_handler()
;
}Example 2.
try {
$ser = @serialize($value);
catch (Exception $e) {
// deal with $e but don't catch Error on purpose, to let them bubble down
}Changing serialize to wrap any throwables into an
UnserializationFailedException would break both examples. That's what I
mean when I say this breaks BC. As any BC-break, this might cause big
troubles for the community and this should be avoided. The release process
policy I mentioned above is a wise document.
First I'd like to note that these examples use 'serialize()' (which is
not touched by my RFC, because serialize rarely fails). I consider this
a typo and will answer as if you used 'unserialize()'. If that was not a
typo, then this might explain the confusion.
Are those two examples based on real-world use cases or did you craft
them specifically to point out how the proposal would introduce a
behavioral change?
I'm asking, because the examples fail to explain the "why". Why is the
code written like this? I fail to see how catching "Exception", but not
catching "Error" during unserialization is ever going to be useful.
Likewise I don't see how catching ErrorException specifically is useful
when '__unserialize()' might throw arbitrary Throwables. Specifically
see the list in this email: https://externals.io/message/118311. Code
outside of the function that actually calls unserialize()
is not going
to be prepared to usefully handle unserialization failure. This
effectively means that the Exception will bubble up to the global
Exception handler (or the Exception handler middleware), resulting in an
aborted request.
Quoting myself, because the examples didn't answer the question: "I am
unable to think of a situation where the input data is well-defined
enough to reliably throw a specific type of exception, but not
well-defined enough to successfully unserialize".
And don't get me wrong: I'm not attempting to say that it is appropriate
to break logic that I personally consider wrong without justification. I
believe the benefits of having the unified Exception outweigh the
drawbacks of introducing a behavioral change in code that is already
subtly broken depending on the exact input value passed to unserialize()
.
This is similar to the attribute syntax breaking hash comments that
start with a square bracket without any space in the middle. Or the '@@'
attribute syntax that was also proposed. It would have broken code
containing duplicated error suppression operators.
Similarly to the attribute breakage, it's also reasonably easy to find
possibly affected logic by grepping for 'unserialize('.
About 2.,
unserialize()
accepts a second argument to give it options. Did
you consider adding a 'throw_on_error' option to opt-in into the behavior
you're looking for? I think that would be a nice replacement for the
current logics based onset_error_handler()
. If we want to make PHP 9
throw
by default, we could then decide to deprecate not passing this option.I did not consider this when writing the RFC. I now considered it, and I
do not believe adding a flag to control this a good thing.
- "No one" is going to set that flag, because it requires additional
effort. I strongly believe that the easiest solution should also the
correct solution for the majority of use cases. The flag fails this
"test", because the correct solution should be "don't fail silently".Unless we deprecate calling the method without the argument as I suggested.
I agree, it might not be ideal to ask the community to rewrite every call
to serialize($v) to serialize($v, ['throw_on_error' => true/false]) but
that's still way better than a BC break.
Maybe there's another way that doesn't break BC. I'm looking forward to one.
See (4) and above regarding my opinion on the BC break.
- If you actually want to set that option in e.g. a library, then you
break compatibility with older PHP versions for no real gain. If you go
all the way and remember to add that extra flag, then you can also write
an unserialize wrapper that does the set_error_handler dance I've shown
in the introduction of the RFC. Similar effort to adding the flag, but
better compatibility.I think my proposal provides BC with older versions of php: passing extra
(unused) options is allowed and polyfilling UnserializationFailedException
is trivial, so one could set_error_handler(fn (...) => throw new
UnserializationFailedException(...)) and be both BC and FC.About writing a wrapper, that's what we all do right now, each with our own
wrapping logic. You're looking for unifying those logics and I support that
goal.
I come from a userland developer background, but without the well-known
frameworks in the back. I must admit that I am not currently using a
wrapper around unserialize()
. Partly because the application I work on
uses a global throwing error handler (see (3)), partly because the
non-E_SOMETHING error cases were not obvious to me before researching
this RFC which came to life because of my work on PHP 8.2's ext/random.
So to clarify: What I'm looking for is making unserialize()
safe and
well-defined to use, so that wrapping logic is no longer required.
But this leads to another idea : what about adding a new function, let's
name it unserialize2() for now, and deprecateunserialize()
? That'd provide
the upgrade path we need.
This sounds like another instance of mysql_real_escape_string() to me.
Also the default choice and the one documented all around the Internet
is still going to be 'unserialize()', because it's the more obvious one.
- It does literally nothing for users that use a throwing error
handler, which to my understanding includes the vast majority of
framework code.I'm not sure what you mean here. The flag would allow removing those error
handlers, which is what you want to achieve in the end I believe (well,
except when BC with older versions is desired of course, see my previous
comment.)
I likely wasn't clear here: I am talking about the global throwing error
handler that is added by frameworks during the bootstrapping of the request.
I've tested with Symfony (controller attached): When I just call
unserialize('asdf')
within a controller, then an ErrorException comes
out of it, because of the global error handler. If I don't catch that
one then Symfony's Exception handler will show me a pretty error page
with a ghost.
The same is true for Laravel the last time I tested and I suspect this
also is true for other frameworks.
For all users that use such a global error handler setting the option or
not setting the option will result in 100% identical behavior.
- Even for users that do not use a throwing error handler, omitting the
option literally does nothing, becauseunserialize()
might already throw
depending on what the unserialize handlers of unserialized objects do.Sure. I don't see how that's a problem. Throwing is what almost any lines
of PHP code can do. We're used to dealing with that and I don't see why
serialize()
needs special care. I might have missed the point you want to
make here though.
The difference is that for the majority of functions the possible
failure cases are known and can be ruled out statically. As an example I
can be sure that count()
will not throw if I pass it an array and I
don't change the $mode. This is not true for unserialize()
which
internally might call arbitrary __unserialize() callbacks.
Lastly I'd like to add a 3. to your proposal, because there is one more
thing that makes unserialization annoying: the unserialize_callback_func
f the results are copies of php-src that match the .phpt tests.PHP creates a __PHP_Incomplete_Class object, but I regularly use this
callback to throw instead when I don't want those strange incomplete
objects (note that I also have seen use cases where generating a
__PHP_Incomplete_Class is the desired behavior.)
I see. So you are not so much interested in the callback specifically,
but a "throw_for_unknown_classes" flag would also solve your use case.
Personally I think such having such a flag would be clearer with regard
to the intent.
I proposed that because in many situations unserializing a payload to
something that contains __PHP_Incomplete_Class objects is not desired and
should be leveled up to an error instead. That's the relation with error
handling and thus with the RFC.
That said, I believe adding a flag to generate additional errors (be
it via a callback or via my proposal) is orthogonal to handling errors
and thus is out of scope for this RFC. Your suggestion likely has
plenty of bikeshedding potential on its own.
I think that being able to cleanly make unknown classes an error would
be a useful addition, though. I recommend that you create a separate RFC
for that once mine went through the vote, so that your's can build on
the results of mine.
Best regards
Tim Düsterhus
Hi Nicolas
[long response from a week ago]
Did you find the time to read through my response from September, 13th, yet?
Best regards
Tim Düsterhus
Hi Tim,
I'm a bit busy with conferences these days...
On 9/12/22 21:46, Nicolas Grekas wrote:>> unserialize()
is a generic
function that will also call arbitrary
callbacks. You already have no guarantees whatsoever about the kind of
exception that is thrown from it. I am unable to think of a situation
where the input data is well-defined enough to reliably throw a specific
type of exception, but not well-defined enough to successfully
unserialize.So on the contrary wrapping any Exceptions with a well-defined Exception
allows you to rely on a specific and stable Exception to catch in the
first place.As you specifically mention catch(Throwable), I'd like to point out that
this will continue to work, because UnserializationFailedException
implements Throwable.You might have misunderstood my point so let me give two examples:
Example 1.
set_error_handler(fn ($t, $m) => throw new ErrorException($m));
try {
$ser = serialize($value);
catch (ErrorException $e) {
// deal with $e
finally {
restore_error_handler()
;
}Example 2.
try {
$ser = @serialize($value);
catch (Exception $e) {
// deal with $e but don't catch Error on purpose, to let them bubble
down
}Changing serialize to wrap any throwables into an
UnserializationFailedException would break both examples. That's what I
mean when I say this breaks BC. As any BC-break, this might cause big
troubles for the community and this should be avoided. The release
process
policy I mentioned above is a wise document.First I'd like to note that these examples use 'serialize()' (which is
not touched by my RFC, because serialize rarely fails). I consider this
a typo and will answer as if you used 'unserialize()'. If that was not a
typo, then this might explain the confusion.
It was a typo! I meant unserialize()
yes.
Are those two examples based on real-world use cases or did you craft
them specifically to point out how the proposal would introduce a
behavioral change?
They were inspired by what I found in the Symfony codebase.
I just conducted a quick lookup at this code base: most call to
unserialize()
are not checked at all, but I hghlighted the checked ones in
this gist:
https://gist.github.com/nicolas-grekas/5dd3169f94ed3b4576152605330824fe
I'm asking, because the examples fail to explain the "why". Why is the
code written like this? I fail to see how catching "Exception", but not
catching "Error" during unserialization is ever going to be useful.
It's useful because these two roots mean very different things.
That's why Error has been introduced in the first place: instances of Error
are not just exceptions, they're really programming mistakes. As such the
best practice to me is to not catch them in domain layers, and let them
bubble down to a generic Error catch layer instead.
Likewise I don't see how catching ErrorException specifically is useful
when '__unserialize()' might throw arbitrary Throwables.
It's useful when all you need is eg to care about the exceptions thrown by
a custom error handler, and want to ignore the other ones (because other
exceptions aren't part of the current domain layer.)
Specifically
see the list in this email: https://externals.io/message/118311. Code
outside of the function that actually callsunserialize()
is not going
to be prepared to usefully handle unserialization failure. This
effectively means that the Exception will bubble up to the global
Exception handler (or the Exception handler middleware), resulting in an
aborted request.
Yep, this is fine in many cases.
Quoting myself, because the examples didn't answer the question: "I am
unable to think of a situation where the input data is well-defined
enough to reliably throw a specific type of exception, but not
well-defined enough to successfully unserialize".And don't get me wrong: I'm not attempting to say that it is appropriate
to break logic that I personally consider wrong without justification. I
believe the benefits of having the unified Exception outweigh the
drawbacks of introducing a behavioral change in code that is already
subtly broken depending on the exact input value passed tounserialize()
.
The thoughts you describe might make sense as a rule of thumb principle
when you write your code. It might even be upgraded to a best practice.
Generally turning an Error to an Exception is not by my book, but I'm not
arguing about this. My point is : /!\ BC break ahead, do not cross /!\
If you look at the gist I linked before, wrapping all throwables would
break most if not all the cases I listed. That's bad numbers, and I don't
think this is specific in any way to the Symfony codebase.
This is similar to the attribute syntax breaking hash comments that
start with a square bracket without any space in the middle. Or the '@@'
attribute syntax that was also proposed. It would have broken code
containing duplicated error suppression operators.Similarly to the attribute breakage, it's also reasonably easy to find
possibly affected logic by grepping for 'unserialize('.
Sorry to disagree, that's a very different sort of break. One that changes
the behavior of perfectly fine apps.
About 2.,
unserialize()
accepts a second argument to give it options.
Didyou consider adding a 'throw_on_error' option to opt-in into the
behavior
you're looking for? I think that would be a nice replacement for the
current logics based onset_error_handler()
. If we want to make PHP 9
throw
by default, we could then decide to deprecate not passing this
option.I did not consider this when writing the RFC. I now considered it, and I
do not believe adding a flag to control this a good thing.
- "No one" is going to set that flag, because it requires additional
effort. I strongly believe that the easiest solution should also the
correct solution for the majority of use cases. The flag fails this
"test", because the correct solution should be "don't fail silently".Unless we deprecate calling the method without the argument as I
suggested.
I agree, it might not be ideal to ask the community to rewrite every call
to serialize($v) to serialize($v, ['throw_on_error' => true/false]) but
that's still way better than a BC break.
Maybe there's another way that doesn't break BC. I'm looking forward to
one.See (4) and above regarding my opinion on the BC break.
- If you actually want to set that option in e.g. a library, then you
break compatibility with older PHP versions for no real gain. If you go
all the way and remember to add that extra flag, then you can also write
an unserialize wrapper that does the set_error_handler dance I've shown
in the introduction of the RFC. Similar effort to adding the flag, but
better compatibility.I think my proposal provides BC with older versions of php: passing extra
(unused) options is allowed and polyfilling
UnserializationFailedException
is trivial, so one could set_error_handler(fn (...) => throw new
UnserializationFailedException(...)) and be both BC and FC.About writing a wrapper, that's what we all do right now, each with our
own
wrapping logic. You're looking for unifying those logics and I support
that
goal.I come from a userland developer background, but without the well-known
frameworks in the back. I must admit that I am not currently using a
wrapper aroundunserialize()
. Partly because the application I work on
uses a global throwing error handler (see (3)), partly because the
non-E_SOMETHING error cases were not obvious to me before researching
this RFC which came to life because of my work on PHP 8.2's ext/random.So to clarify: What I'm looking for is making
unserialize()
safe and
well-defined to use, so that wrapping logic is no longer required.But this leads to another idea : what about adding a new function, let's
name it unserialize2() for now, and deprecateunserialize()
? That'd
provide
the upgrade path we need.This sounds like another instance of mysql_real_escape_string() to me.
Also the default choice and the one documented all around the Internet
is still going to be 'unserialize()', because it's the more obvious one.
- It does literally nothing for users that use a throwing error
handler, which to my understanding includes the vast majority of
framework code.I'm not sure what you mean here. The flag would allow removing those
error
handlers, which is what you want to achieve in the end I believe (well,
except when BC with older versions is desired of course, see my previous
comment.)I likely wasn't clear here: I am talking about the global throwing error
handler that is added by frameworks during the bootstrapping of the
request.I've tested with Symfony (controller attached): When I just call
unserialize('asdf')
within a controller, then an ErrorException comes
out of it, because of the global error handler. If I don't catch that
one then Symfony's Exception handler will show me a pretty error page
with a ghost.The same is true for Laravel the last time I tested and I suspect this
also is true for other frameworks.For all users that use such a global error handler setting the option or
not setting the option will result in 100% identical behavior.
- Even for users that do not use a throwing error handler, omitting the
option literally does nothing, becauseunserialize()
might already throw
depending on what the unserialize handlers of unserialized objects do.Sure. I don't see how that's a problem. Throwing is what almost any lines
of PHP code can do. We're used to dealing with that and I don't see why
serialize()
needs special care. I might have missed the point you want to
make here though.The difference is that for the majority of functions the possible
failure cases are known and can be ruled out statically. As an example I
can be sure thatcount()
will not throw if I pass it an array and I
don't change the $mode. This is not true forunserialize()
which
internally might call arbitrary __unserialize() callbacks.
True, that's expected to me given what unserialize does. I don't see the
need to wrap sorry (even ignoring the BC break.) Turning Error into
Exception would be especially bad to me, eg a ParseError should remain a
ParseError so that it can follow the usual path for reporting parse errors.
Lastly I'd like to add a 3. to your proposal, because there is one more
thing that makes unserialization annoying: the
unserialize_callback_func
f the results are copies of php-src that match the .phpt tests.PHP creates a __PHP_Incomplete_Class object, but I regularly use this
callback to throw instead when I don't want those strange incomplete
objects (note that I also have seen use cases where generating a
__PHP_Incomplete_Class is the desired behavior.)I see. So you are not so much interested in the callback specifically,
but a "throw_for_unknown_classes" flag would also solve your use case.
Personally I think such having such a flag would be clearer with regard
to the intent.
I checked at the code and we use varying domain exceptions + messages so a
flag wouldn't be enough.
There's also a case where we call trigger_error()
to connect missing
classes to the regular error handler.
I proposed that because in many situations unserializing a payload to
something that contains __PHP_Incomplete_Class objects is not desired and
should be leveled up to an error instead. That's the relation with error
handling and thus with the RFC.That said, I believe adding a flag to generate additional errors (be
it via a callback or via my proposal) is orthogonal to handling errors
and thus is out of scope for this RFC. Your suggestion likely has
plenty of bikeshedding potential on its own.I think that being able to cleanly make unknown classes an error would
be a useful addition, though. I recommend that you create a separate RFC
for that once mine went through the vote, so that your's can build on
the results of mine.
Thanks, I hoped I could slip in this idea but I'll keep it on my todo list
for php-src :)
Nicolas
Hi Nicolas
I'm a bit busy with conferences these days...
Understood. Enjoy!
Are those two examples based on real-world use cases or did you craft
them specifically to point out how the proposal would introduce a
behavioral change?They were inspired by what I found in the Symfony codebase.
I just conducted a quick lookup at this code base: most call to
unserialize()
are not checked at all, but I hghlighted the checked ones in
"unserialize()" is generally not checked matches my experience with the
code I work with. However I would attribute at least some of that to the
fact that the failure cases are not entirely clear - something I'd like
to fix with this RFC.
this gist:
https://gist.github.com/nicolas-grekas/5dd3169f94ed3b4576152605330824fe
Thank you for that. I've looked through the gist and also cloned
symfony/symfony to take a look at the broader context.
Symfony/Bridge/PhpUnit/DeprecationErrorHandler/Deprecation.php:
My understanding is that this attempts to unserialize the input message
which might or might not be serialized data and fail gracefully.
I'd like to point out that the logic is unsafe if the message happens to
be syntactically correct serialized string that contains broken object
data, because the object will throw. However in this case an
attacker-specified value likely cannot reach that location.
-
It will continue to work as is for "wrap Throwables", because it does
not have a catch() block. -
It will break if E_FOO is upgraded to Exception. The fix would be
trivial: Add an empty catch(UnserializationFailedException).
src/Symfony/Component/Cache/Adapter/ArrayAdapter.php:
My understanding is that this attempts to gracefully handle broken cache
entries and force a rebuild if the payload is invalid.
This will fail if the serialized payload contains a malformed
DateTimeImmutable, because DateTimeImmutable will throw an Error,
instead of Exception: https://3v4l.org/eB5ZE. The 'catch' should likely
be adjusted to catch '\Throwable'.
-
Behavior will change for "wrap Throwables", because Errors will now be
caught. In this case I'd say that this is intended behavior. -
It will continue to work as is for E_FOO to Exception, because the
catch block will catch the Exception and set the value to 'false', just
likeunserialize()
returns for syntax errors.
src/Symfony/Component/Cache/Marshaller/DefaultMarshaller.php:
-
Behavior will technically change for "wrap Throwables", because the
catch(Error) will no longer catch anything. However as the catch's
purpose is to wrap Error into an Exception and
MarshallerInterface::unmarshall() is documented to '@throws Exception',
the new behavior is equally correct. -
Behavior will technically change for E_FOO to Exception, but the
externally observable behavior will still be correct.
src/Symfony/Component/Security/Http/Firewall/ContextListener.php:
Frankly I have no idea what all that logic is supposed to do just from
reading the code. It's already slightly broken if a third-partyException
uses the code 0x37313BC for its own purpose.
Checking the git commit history reveals this:
https://github.com/symfony/symfony/pull/25669
I'd say that this code is already broken, similarly to ArrayAdapter. All
this complicated logic with checking the error code can likely be
simplified to catch(Throwable).
-
Behavior will change for "wrap Throwables", because the catch block
will no longer catch. However I'd say that the code will be much clearer
going forward with catch(UnserializationFailedException). In the PR
discussion you point out "robustness" and the unified Exception type
will provide robustness. -
Behavior will change for E_FOO to Exception, because the type of
Exception will change and thus the catch block no longer catches. Same
as "wrap Throwables".
src/Symfony/Component/VarDumper/Server/DumpServer.php:
-
Behavior will not change for "wrap Throwables", because this will not
throw due to allowed_classes. -
Behavior will change for E_FOO to Exception, because the '@' no longer
works. A catch block needs to be added.
src/Symfony/Component/VarExporter/Internal/Registry.php:
The first unserialize:
- It will continue to work as is for "wrap Throwables", because it does
not have a catch() block. - It will likely also continue to work as is for E_FOO to Exception,
because no explicit error handler is set.
However I see that 'getClassReflector' throws specific types of
Exception, so that might no longer work. To fix this:
catch (UnserializationFailedException $e) { throw $e->getPrevious(); }
would need to be added to restore the original behavior.
The second unserialize is already broken for DateTimeImmutable, similar
to ArrayAdapter: https://3v4l.org/8ZmOS
-
Behavior will change for "wrap Throwables", because Errors will now be
caught. In this case I'd say that this is intended behavior. -
It will continue to work as is for E_FOO to Exception, because the
code already throws an Exception when false is returned and the same
type of Exception is thrown in the catch().
=========================================================================
I'm asking, because the examples fail to explain the "why". Why is the
code written like this? I fail to see how catching "Exception", but not
catching "Error" during unserialization is ever going to be useful.It's useful because these two roots mean very different things.
I understand the distinction between Error and Exception in general.
However I've specifically asked about the distinction in the context of
unserialization. An Exception during unserialization is not more or less
a programmer error compared to an Error. You generally put in some
opaque string into unserialize and you hopefully get out some useful
value. If the string is broken you either get an E_FOO or some random
Throwable you can't control. You can't magically fix the input string
and you can't really prevent the errors happening either.
As pointed out in the list above: DateTimeImmutable will already throw
an Error, whereas other classes will throw an Exception. That does not
mean that an error during unserialization of DateTimeImmutable is worse,
because the choice is pretty arbitrary.
The documentation of unserialize()
(https://www.php.net/manual/en/function.unserialize.php) also indicates
that:
Objects may throw Throwables in their unserialization handlers.
Further indicating that you are not guaranteed to only get Exception or
only get Error out of it.
Likewise I don't see how catching ErrorException specifically is useful
when '__unserialize()' might throw arbitrary Throwables.It's useful when all you need is eg to care about the exceptions thrown by
a custom error handler, and want to ignore the other ones (because other
exceptions aren't part of the current domain layer.)
Basically see response to previous quote.
Quoting myself, because the examples didn't answer the question: "I am
unable to think of a situation where the input data is well-defined
enough to reliably throw a specific type of exception, but not
well-defined enough to successfully unserialize".And don't get me wrong: I'm not attempting to say that it is appropriate
to break logic that I personally consider wrong without justification. I
believe the benefits of having the unified Exception outweigh the
drawbacks of introducing a behavioral change in code that is already
subtly broken depending on the exact input value passed tounserialize()
.The thoughts you describe might make sense as a rule of thumb principle
when you write your code. It might even be upgraded to a best practice.
Generally turning an Error to an Exception is not by my book, but I'm not
arguing about this. My point is : /!\ BC break ahead, do not cross /!\If you look at the gist I linked before, wrapping all throwables would
break most if not all the cases I listed. That's bad numbers, and I don't
think this is specific in any way to the Symfony codebase.
Based on my analysis and understanding of the linked examples, the only
one that might break (I don't fully understand it), is the one in
src/Symfony/Component/VarExporter/Internal/Registry.php. The others will
either no change or change to be more correct.
This is similar to the attribute syntax breaking hash comments that
start with a square bracket without any space in the middle. Or the '@@'
attribute syntax that was also proposed. It would have broken code
containing duplicated error suppression operators.Similarly to the attribute breakage, it's also reasonably easy to find
possibly affected logic by grepping for 'unserialize('.Sorry to disagree, that's a very different sort of break. One that changes
the behavior of perfectly fine apps.
Disagreeing is fine. Without disagreement we would not need the RFC
process. However I disagree on the "perfectly fine" part. I believe that
some of the Symfony examples are already subtly broken (the
DateTimeImmutable part).
However I also have another RFC as an example of a behavioral change
that is much harder to detect and check statically than my proposed
unserialize changes where it suffices to check the catch block and '@'
symbol.
"RFC: Static variables in inherited methods" -
https://wiki.php.net/rfc/static_variable_inheritance
That RFC actually changed the behavior of the application I maintain.
You voted in favor of the RFC, but did not participate in the
discussion. The fix was simple and I agree that the new behavior is
better, so no hard feelings. I'd be curious though how the breakage from
that RFC is more acceptable to you than the anticipated breakage from
this RFC? Honest question to understand your PoV better.
The difference is that for the majority of functions the possible
failure cases are known and can be ruled out statically. As an example I
can be sure thatcount()
will not throw if I pass it an array and I
don't change the $mode. This is not true forunserialize()
which
internally might call arbitrary __unserialize() callbacks.True, that's expected to me given what unserialize does. I don't see the
need to wrap sorry (even ignoring the BC break.) Turning Error into
Exception would be especially bad to me, eg a ParseError should remain a
ParseError so that it can follow the usual path for reporting parse errors.
See above how Error vs Exception IMO is meaningless for unserialize()
.
Thanks, I hoped I could slip in this idea but I'll keep it on my todo list
for php-src :)
I've had a quick look at the implementation. Making the necessary
changes to the C code shouldn't be too much effort. The hard part is
writing the RFC and agreeing on a behavior :-)
Best regards
Tim Düsterhus
Hi
My understanding is that this attempts to gracefully handle broken cache
entries and force a rebuild if the payload is invalid.This will fail if the serialized payload contains a malformed
DateTimeImmutable, because DateTimeImmutable will throw an Error,
instead of Exception: https://3v4l.org/eB5ZE. The 'catch' should likely
be adjusted to catch '\Throwable'.
To add onto this: "An Error could be thrown" might also happen with a
PHP 8.2 readonly class that later removes a property. As the property no
longer exists on the class it will be considered a dynamic property
which is disallowed on readonly classes, thus throwing an 'Error', not
'Exception' during unserialization:
Best regards
Tim Düsterhus
Hi Tim,
I'm not quoting the full text of the discussion because the php ML server
refused the message - too long it said.
this gist:
https://gist.github.com/nicolas-grekas/5dd3169f94ed3b4576152605330824fe
[...] (see previous messages on the thread for this missing part)
I'm asking, because the examples fail to explain the "why". Why is the
code written like this? I fail to see how catching "Exception", but not
catching "Error" during unserialization is ever going to be useful.It's useful because these two roots mean very different things.
I understand the distinction between Error and Exception in general.
However I've specifically asked about the distinction in the context of
unserialization. An Exception during unserialization is not more or less
a programmer error compared to an Error. You generally put in some
opaque string into unserialize and you hopefully get out some useful
value. If the string is broken you either get an E_FOO or some random
Throwable you can't control. You can't magically fix the input string
and you can't really prevent the errors happening either.As pointed out in the list above: DateTimeImmutable will already throw
an Error, whereas other classes will throw an Exception. That does not
mean that an error during unserialization of DateTimeImmutable is worse,
because the choice is pretty arbitrary.The documentation of
unserialize()
(https://www.php.net/manual/en/function.unserialize.php) also indicates
that:Objects may throw Throwables in their unserialization handlers.
Further indicating that you are not guaranteed to only get Exception or
only get Error out of it.Likewise I don't see how catching ErrorException specifically is useful
when '__unserialize()' might throw arbitrary Throwables.It's useful when all you need is eg to care about the exceptions thrown
by
a custom error handler, and want to ignore the other ones (because other
exceptions aren't part of the current domain layer.)Basically see response to previous quote.
Quoting myself, because the examples didn't answer the question: "I am
unable to think of a situation where the input data is well-defined
enough to reliably throw a specific type of exception, but not
well-defined enough to successfully unserialize".And don't get me wrong: I'm not attempting to say that it is appropriate
to break logic that I personally consider wrong without justification. I
believe the benefits of having the unified Exception outweigh the
drawbacks of introducing a behavioral change in code that is already
subtly broken depending on the exact input value passed to
unserialize()
.The thoughts you describe might make sense as a rule of thumb principle
when you write your code. It might even be upgraded to a best practice.
Generally turning an Error to an Exception is not by my book, but I'm not
arguing about this. My point is : /!\ BC break ahead, do not cross /!\If you look at the gist I linked before, wrapping all throwables would
break most if not all the cases I listed. That's bad numbers, and I don't
think this is specific in any way to the Symfony codebase.Based on my analysis and understanding of the linked examples, the only
one that might break (I don't fully understand it), is the one in
src/Symfony/Component/VarExporter/Internal/Registry.php. The others will
either no change or change to be more correct.This is similar to the attribute syntax breaking hash comments that
start with a square bracket without any space in the middle. Or the '@@'
attribute syntax that was also proposed. It would have broken code
containing duplicated error suppression operators.Similarly to the attribute breakage, it's also reasonably easy to find
possibly affected logic by grepping for 'unserialize('.Sorry to disagree, that's a very different sort of break. One that
changes
the behavior of perfectly fine apps.Disagreeing is fine. Without disagreement we would not need the RFC
process. However I disagree on the "perfectly fine" part. I believe that
some of the Symfony examples are already subtly broken (the
DateTimeImmutable part).
As you saw, the BC break is real. That's what I wanted to highlight and
what worries me for the community at large. This is not some niche BC break.
There is also an argument that is missing in this conversation and that
Yasuo made (in another thread by mistake I guess?):
In general, unserializing external serialised data is the same as "Read and
execute remote PHP code"
(Executing arbitrary code via memory corruption is trivial task with
unserialize()
)Therefore, developers must validate serialized data integrity at least
with HMAC always, so that serialized data is generated by trustworthy
systems
(your servers usually. Since HMAC does not have forward security, HKDF
with expiration info is recommended.)If these restrictions are for security, then these restrictions won't
protect anything. If these are for misuse protections,
then these would be "nice to have".
In all valid use cases of unserialize()
, we do trust the backend where we
get serialized payloads from. This means constructed broken payloads are
not something we need to deal with (the DateTimeImmutable example you
gave).
The thing we need to guard against is much narrower. That makes the problem
this PR aims to solve much smaller. Truncated payloads are a thing because
of race conditions or unreliable network services. FC/BC of serialized
payloads is another thing that can only be dealt with by implementing
__unserialize() or the likes. And that's mostly it. Other errors (e.g.
parse error) are not caused by unserialize itself, but by regular user code.
However I also have another RFC as an example of a behavioral change
that is much harder to detect and check statically than my proposed
unserialize changes where it suffices to check the catch block and '@'
symbol."RFC: Static variables in inherited methods" -
https://wiki.php.net/rfc/static_variable_inheritanceThat RFC actually changed the behavior of the application I maintain.
You voted in favor of the RFC, but did not participate in the
discussion. The fix was simple and I agree that the new behavior is
better, so no hard feelings. I'd be curious though how the breakage from
that RFC is more acceptable to you than the anticipated breakage from
this RFC? Honest question to understand your PoV better.
Rereading the RFC, Nikita described a niche and obscure behavior of the
engine. There are no numbers to support that understanding (how "niche"
this was) and that's missing in the RFC. I remember other RFCs which did
have numbers to quantify the impact. I guess it was hard to get that
number. And that's also why I made this gist: so that the impact of the BC
break could be evaluated somehow. Many spots in Symfony likely means many
more spots in the PHP community. I'm not going to be able to conduct a
larger analysis of this impact, but that can still give an intuition of it
to readers. Mine is: it's likely going to be impactful, not in a good way.
The difference is that for the majority of functions the possible
failure cases are known and can be ruled out statically. As an example I
can be sure thatcount()
will not throw if I pass it an array and I
don't change the $mode. This is not true forunserialize()
which
internally might call arbitrary __unserialize() callbacks.True, that's expected to me given what unserialize does. I don't see the
need to wrap sorry (even ignoring the BC break.) Turning Error into
Exception would be especially bad to me, eg a ParseError should remain a
ParseError so that it can follow the usual path for reporting parse
errors.See above how Error vs Exception IMO is meaningless for
unserialize()
.Thanks, I hoped I could slip in this idea but I'll keep it on my todo
list
for php-src :)I've had a quick look at the implementation. Making the necessary
changes to the C code shouldn't be too much effort. The hard part is
writing the RFC and agreeing on a behavior :-)
ACK :)
Cheers,
Nicolas
Hi
I'm not quoting the full text of the discussion because the php ML server
refused the message - too long it said.
Yeah, that's fine. I also aggressively prune irrelevant parts from
whatever I quote; saves traffic and storage, and also makes it easier to
find the answers. Additional context is available by looking up the
quoted message.
Based on my analysis and understanding of the linked examples, the only
one that might break (I don't fully understand it), is the one in
src/Symfony/Component/VarExporter/Internal/Registry.php. The others will
either no change or change to be more correct.[…]
As you saw, the BC break is real. That's what I wanted to highlight and
what worries me for the community at large. This is not some niche BC break.
That's not at all what I'd say I've seen. My argument (see quote) was
that while it technically breaks BC under a strict definition, it does
not change behavior in a negative way. I don't find this strict
definition of BC break particularly useful, because that effectively
results in https://xkcd.com/1172/ ("Workflow"), preventing all progress.
There is also an argument that is missing in this conversation and that
Yasuo made (in another thread by mistake I guess?):
Yes, I have seen that one. For reference of the other readers, the email
is here: https://externals.io/message/118554#118684.
In general, unserializing external serialised data is the same as "Read and
execute remote PHP code"
(Executing arbitrary code via memory corruption is trivial task with
unserialize()
)
I am aware of that and acknowledged the security issue in the third list
point of this section:
In all valid use cases of
unserialize()
, we do trust the backend where we
get serialized payloads from. This means constructed broken payloads are
not something we need to deal with (the DateTimeImmutable example you
gave).
Okay, that's fair with regard to DTI. The 'Error' case still exists for
readonly classes that lose properties, so ...
The thing we need to guard against is much narrower. That makes the problem
this PR aims to solve much smaller. Truncated payloads are a thing because
of race conditions or unreliable network services. FC/BC of serialized
payloads is another thing that can only be dealt with by implementing
__unserialize() or the likes. And that's mostly it. Other errors (e.g.
... while it is true that this FC/BC of payloads needs to be dealt with
by implementing __unserialize(), you can't rely on every class doing
that (correctly). That's why you currently need to catch 'Throwable',
not just 'Exception' and that's why the RFC proposes to wrap the Throwables.
Truncated payloads should not happen even with unreliable network
services, because of integrity checks embedded into the protocol (e.g.
TLS). In any case, truncated payloads will currently emit E_NOTICE, so
that relates to the second vote (E_WARNING to Exception) more than the
first vote (wrap Exception), whereas I understand your concerns are
mostly with the "wrap Exception" part.
parse error) are not caused by unserialize itself, but by regular user code.
Sorry, I don't understand what "parse error" or "other errors" you are
referring to.
"RFC: Static variables in inherited methods" -
https://wiki.php.net/rfc/static_variable_inheritanceThat RFC actually changed the behavior of the application I maintain.
You voted in favor of the RFC, but did not participate in the
discussion. The fix was simple and I agree that the new behavior is
better, so no hard feelings. I'd be curious though how the breakage from
that RFC is more acceptable to you than the anticipated breakage from
this RFC? Honest question to understand your PoV better.Rereading the RFC, Nikita described a niche and obscure behavior of the
engine. There are no numbers to support that understanding (how "niche"
this was) and that's missing in the RFC. I remember other RFCs which did
have numbers to quantify the impact. I guess it was hard to get that
Understood, thank you for looking into this. I believe raw numbers are
not the (most) important metric. IMO subtlety of the break is more
important. A subtle change that affects a small minority is worse than a
less subtle change that affects more users.
number. And that's also why I made this gist: so that the impact of the BC
break could be evaluated somehow. Many spots in Symfony likely means many
more spots in the PHP community. I'm not going to be able to conduct a
I didn't and don't expect you to perform a larger-scale analysis, that's
hard to do, because one has to look at the context of the
'unserialize()' call to determine what exactly the intended behavior is.
That's what I attempted to do with your Symfony examples (I am not a
Symfony user and might have misunderstood some, though) and based on
that I disagree with "many spots".
For the application I maintain at work, I've also looked through the
unserialize()
calls and concluded the the vast majority of them are
buggy as they already are and I am planning to fix those independent of
the results of this RFC.
larger analysis of this impact, but that can still give an intuition of it
to readers. Mine is: it's likely going to be impactful, not in a good way.
Best regards
Tim Düsterhus
Hi Nicolas,
On Thu, 8 Sept 2022 at 18:06, Nicolas Grekas
nicolas.grekas+php@gmail.com wrote:
There needs to be serious justification for it.
I think this is covered by the RFC.
The current behaviour where people have to setup a custom error
handler, and then restore the previous error handler, is pretty
'ungood'.
The proposed replacement behaviour, where people can actually find out
the problem that happened is better.
About 2.,
unserialize()
accepts a second argument to give it options. Did
you consider adding a 'throw_on_error' option to opt-in into the behavior
you're looking for?
A new option might be a good idea, but unserialize should have
sensible defaults, and the current behaviour is really not that
sensible.
So, imo, an opt-out would be a better choice. It would allow anyone
who cares enough to keep backwards compatibility when they upgrade to
the next version of PHP by adding that to the options array, but
everyone who is unaware of the option gets the more sensible behaviour
by default.
cheers
Dan
Ack