[VOTE] Improve unserialize() error handling

3 years ago by Nicolas Grekas — view source — reply

unread

Hi Tim,

as announced on Wednesday [1] I've now opened the vote for:

"Improve unserialize() error handling" [2]

The RFC contains three votes, each of which requires a 2/3 majority. Two
of the votes are for 8.x (8.3), one for 9.0.

Voting will run 2 weeks until:

2022-10-28 at 14:00 UTC

Please find the below resources for your reference:

RFC: https://wiki.php.net/rfc/improve_unserialize_error_handling
PoC implementation:

https://github.com/php/php-src/pull/9425

https://github.com/php/php-src/pull/9629

Discussion Thread: https://externals.io/message/118566
Related Thread: https://externals.io/message/118311

[1] https://externals.io/message/118566#118807
[2] https://wiki.php.net/rfc/improve_unserialize_error_handling

Not sure why I didn't think about it before but I just ran the test suite
of Symfony after applying the patch proposed in the RFC to change the way
exceptions are handled by unserialize.

This change breaks the test suite of 5 separate components. I created this
gist to list all the failures:
https://gist.github.com/nicolas-grekas/3da652a51669baa40c99bd20e4a1b4dd

Maybe I wasn't convincing enough during the discussion period, but that
doesn't change the fact that the proposed behavior in the RFC is a very
clear BC break that will affect userland significantly.

I'm therefore voting NO on the proposal.

Cheers,
Nicolas

3 years ago by tim@bastelstu.be — view source — reply

unread

Hi

Not sure why I didn't think about it before but I just ran the test suite
of Symfony after applying the patch proposed in the RFC to change the way
exceptions are handled by unserialize.

This change breaks the test suite of 5 separate components. I created this
gist to list all the failures:
https://gist.github.com/nicolas-grekas/3da652a51669baa40c99bd20e4a1b4dd

Maybe I wasn't convincing enough during the discussion period, but that
doesn't change the fact that the proposed behavior in the RFC is a very
clear BC break that will affect userland significantly.

I'm therefore voting NO on the proposal.

I'm not surprised by the “no” on the first vote based on the previous
discussion. I am surprised however that you also disagree with raising
the E_NOTICE to E_WARNING for consistency.

I don't plan on repeating all the previous discussion points with you,
but as you mention the Symfony tests specifically: Please find the patch
attached. Do you believe the expectations in this new test would a
reasonable expectation by a Symfony user?

Best regards
Tim Düsterhus

3 years ago by Nicolas Grekas — view source — reply

unread

Not sure why I didn't think about it before but I just ran the test suite
of Symfony after applying the patch proposed in the RFC to change the way
exceptions are handled by unserialize.

This change breaks the test suite of 5 separate components. I created
this
gist to list all the failures:
https://gist.github.com/nicolas-grekas/3da652a51669baa40c99bd20e4a1b4dd

Maybe I wasn't convincing enough during the discussion period, but that
doesn't change the fact that the proposed behavior in the RFC is a very
clear BC break that will affect userland significantly.

I'm therefore voting NO on the proposal.

I'm not surprised by the “no” on the first vote based on the previous
discussion. I am surprised however that you also disagree with raising
the E_NOTICE to E_WARNING for consistency.

I don't plan on repeating all the previous discussion points with you,
but as you mention the Symfony tests specifically: Please find the patch
attached. Do you believe the expectations in this new test would a
reasonable expectation by a Symfony user?

Since the beginning my point is not that the RFC doesn't have merits. It's
that the proposed approach breaks BC in a way that will affect the
community significantly. We have policies that say we should avoid BC
breaks and they should apply here.

To anyone wondering about the reality of the BC break if you're not
convinced there is one because the original code is broken anyway (which is
your main argument Tim IIUC): please have a look at the failures I reported
above and wonder about the changes the RFC would impose. I cannot think
about one that would not imply some sort of "if the version of PHP is >=
8.3 then A, else B". This is the fact that highlights the BC break.

Nicolas

3 years ago by Marco Pivetta — view source — reply

unread

Tbh, this affects a new minor and a new major, and only in an unhappy path
scenario, where the current PHP API is bad.

PHP 9 will introduce a hard crash: good.
It's a BC break: yes, native de-serialization is one of the most unsafe
parts of the language, and it requires hardening.

The fact that Symfony declared that it already supports PHP 8.3 and PHP 9.0
is your problem, and mostly your misunderstanding of SemVer

On Sat, 15 Oct 2022, 11:06 Nicolas Grekas, nicolas.grekas+php@gmail.com
wrote:

Not sure why I didn't think about it before but I just ran the test
suite
of Symfony after applying the patch proposed in the RFC to change the
way
exceptions are handled by unserialize.

This change breaks the test suite of 5 separate components. I created
this
gist to list all the failures:

https://gist.github.com/nicolas-grekas/3da652a51669baa40c99bd20e4a1b4dd

Maybe I wasn't convincing enough during the discussion period, but that
doesn't change the fact that the proposed behavior in the RFC is a very
clear BC break that will affect userland significantly.

I'm therefore voting NO on the proposal.

I'm not surprised by the “no” on the first vote based on the previous
discussion. I am surprised however that you also disagree with raising
the E_NOTICE to E_WARNING for consistency.

I don't plan on repeating all the previous discussion points with you,
but as you mention the Symfony tests specifically: Please find the patch
attached. Do you believe the expectations in this new test would a
reasonable expectation by a Symfony user?

Since the beginning my point is not that the RFC doesn't have merits. It's
that the proposed approach breaks BC in a way that will affect the
community significantly. We have policies that say we should avoid BC
breaks and they should apply here.

To anyone wondering about the reality of the BC break if you're not
convinced there is one because the original code is broken anyway (which is
your main argument Tim IIUC): please have a look at the failures I reported
above and wonder about the changes the RFC would impose. I cannot think
about one that would not imply some sort of "if the version of PHP is >=
8.3 then A, else B". This is the fact that highlights the BC break.

Nicolas

3 years ago by Christoph M. Becker — view source — reply

unread

I'm therefore voting NO on the proposal.

I'm not surprised by the “no” on the first vote based on the previous
discussion. I am surprised however that you also disagree with raising
the E_NOTICE to E_WARNING for consistency.

Since the beginning my point is not that the RFC doesn't have merits. It's
that the proposed approach breaks BC in a way that will affect the
community significantly. We have policies that say we should avoid BC
breaks and they should apply here.

Well, I consider the current behavior very unfortunate; sometimes
E_NOTICE, sometimes E_WARNING, sometimes both, and sometimes a
UnexpectedValueException – and this looks more like historic randomness
than having any deeper reasoning behind it. I also understand that we
need to be concerned about BC breaks.

So for me the question is whether we want to stick with the current
behavior forever, or if there is an acceptable way forward. I
certainly hope for the latter. Would it really be a serious issue to
promote E_NOTICE to E_WARNING in PHP 8.3? I don't think so, but I may
be wrong.

And would it really be a serious issue to promote E_WARNING to an
exception in PHP 9? Maybe, but after all that would be a new major
version, so some BC breaks are to be expected anyway.

--
Christoph M. Becker

3 years ago by juan carlos morales — view source — reply

unread

As far as I read... Everyone is right. All agree is BC break change, but
this is being considered to be part of PHP 9.0, so if what We get is a
better PHP then .... We must move forward, and such steps should be done
in a major version change.

3 years ago by juan carlos morales — view source — reply

unread

El sáb., 15 de octubre de 2022 11:02, juan carlos morales <
dev.juan.morales@gmail.com> escribió:

As far as I read... Everyone is right. All agree is BC break change, but
this is being considered to be part of PHP 9.0, so if what We get is a
better PHP then .... We must move forward, and such steps should be done
in a major version change.

I just checked that the RFC says in the PROPOSED PHP VERSIONS section....
next PHP 8.X

Maybe that should be adjusted for clarification and documentation reasons

3 years ago by tim@bastelstu.be — view source — reply

unread

Hi

Resending the message without attachments, because the mailing list
rejected the original due to exceeding 30000 bytes:

ezmlm-reject: fatal: Sorry, I don't accept messages larger than 30000 bytes (#5.2.3)

You can find the attachments at:
https://gist.github.com/TimWolla/376dd162f7684daef38f76a07254871c

Original message below:

I'm not surprised by the “no” on the first vote based on the previous
discussion. I am surprised however that you also disagree with raising
the E_NOTICE to E_WARNING for consistency.

I don't plan on repeating all the previous discussion points with you,
but as you mention the Symfony tests specifically: Please find the patch
attached. Do you believe the expectations in this new test would a
reasonable expectation by a Symfony user?

Since the beginning my point is not that the RFC doesn't have merits. It's
that the proposed approach breaks BC in a way that will affect the
community significantly. We have policies that say we should avoid BC
breaks and they should apply here.

To anyone wondering about the reality of the BC break if you're not
convinced there is one because the original code is broken anyway (which is
your main argument Tim IIUC): please have a look at the failures I reported

Yes, you understood this correctly: I consider the existing
implementation to be broken. Unfortunately you did not answer my
question whether the attached patch would be a reasonable expectation by
a Symfony user.

Let's presume it is: If I use the 'PhpSerializer' then I expect to
receive exactly the 'MessageDecodingFailedException' whenever the
message fails to decode. When decoding an erroneous DateTime (or an
erroneous SplDoublyLinkedList or whatever) a different Exception will be
thrown and not caught, thus the contract by the 'PhpSerializer' is violated.

Let's fix this using test driven development. We first add the failing
test cases. This is step1.diff (patches are in
https://gist.github.com/TimWolla/376dd162f7684daef38f76a07254871c). When
running the tests we get the following output:

There were 2 failures:

Symfony\Component\Messenger\Tests\Transport\Serialization\PhpSerializerTest::testDecodingFailsWithBadDateTimeData
Failed asserting that exception of type "Error" matches expected exception "Symfony\Component\Messenger\Exception\MessageDecodingFailedException". Message was: "Invalid serialization data for DateTime object" at
/app/src/Symfony/Component/Messenger/Transport/Serialization/PhpSerializer.php:95
/app/src/Symfony/Component/Messenger/Transport/Serialization/PhpSerializer.php:54
/app/src/Symfony/Component/Messenger/Tests/Transport/Serialization/PhpSerializerTest.php:99
.

Symfony\Component\Messenger\Tests\Transport\Serialization\PhpSerializerTest::testDecodingFailsWithBadDoublyLinkedListData
Failed asserting that exception of type "UnexpectedValueException" matches expected exception "Symfony\Component\Messenger\Exception\MessageDecodingFailedException". Message was: "Incomplete or ill-typed serialization data" at
/app/src/Symfony/Component/Messenger/Transport/Serialization/PhpSerializer.php:95
/app/src/Symfony/Component/Messenger/Transport/Serialization/PhpSerializer.php:54
/app/src/Symfony/Component/Messenger/Tests/Transport/Serialization/PhpSerializerTest.php:110
.

The tests fails (as expected) and we need to fix the code. To fix this
we add a new 'catch(Throwable)' to the existing 'try'. Within the catch
we throw a new 'MessageDecodingFailedException' that wraps the Throwable
we caught using the '$previous' parameter. You can find the change in
step2.diff. Let me re-run the tests:

There was 1 failure:

Symfony\Component\Messenger\Tests\Transport\Serialization\PhpSerializerTest::testDecodingFailsWithBadClass
Failed asserting that exception message 'Could not decode message using PHP serialization: O:13:"ReceivedSt0mp":0:{}.' matches '/class "ReceivedSt0mp" not found/'.

Okay, now the Exception message changed. Personally I do not consider
this a BC break: I believe Exception messages are meant for human
consumption, not for programs. Otherwise fixing a typo in the message
would be a BC break. If the code wants to learn about the cause, it
should either use the '$code' or different types of Exception should be
thrown to clarify the cause by entering a different catch() block.

There are two options here: We could fix the expectation (that's what I
would do, see step3a.diff), or we could use the message of the original
Exception as the message of the wrapper Exception (step3b.diff), or we
could rethrow the original exception, if it already is of the correct
class (step3c.diff).

In all three variants of Step 3 we now fixed the bug for DateTime and
SplDoublyLinkedList. PhpSerializer will now consistently throw
MessageDecodingFailedException in all cases, just the message might have
changed (in variant 'a').

above and wonder about the changes the RFC would impose. I cannot think
about one that would not imply some sort of "if the version of PHP is >=
8.3 then A, else B". This is the fact that highlights the BC break.

Now that we fixed the implementation of PhpSerializer for PHP 8.2, let's
have a look at what changes with my RFC. Let's take step3a.diff as the
basis. Let me first write a userland implementation of the new
unserialize, as this makes it easier to demonstrate the behavior to
folks without knowledge about the internals. I'll put the implementation
into PhpSerializer.php to keep things simple.

You can find the implementation in rfc.diff, but I'll repeat it here:

class UnserializationFailedException extends \Exception {}

function unserialize_php83(string $data, array $options = []): mixed
{
try {
return \unserialize($data, $options);
} catch (\Throwable $e) {
throw new UnserializationFailedException(
'An Exception was thrown during unserialization',
0,
$e
);
}
}

Now, let us use the custom implementation, instead of unserialize()
directly. To do so we replace unserialize() with unserialize_php83().
You can find the result in rfc-use.diff. What happens if I run the tests?

PHPUnit 9.5.25 #StandWithUkraine

Testing Symfony\Component\Messenger\Tests\Transport\Serialization\PhpSerializerTest
......... 9 / 9 (100%)

Time: 00:00.030, Memory: 8.00 MB

OK (9 tests, 14 assertions)

All green! We did not have to change anything after fixing the
implementation in step3a.diff for PHP 8.2 and earlier!

Okay, let's assume the actual Exception message must not change, e.g.
using the solution in step3b.diff and step3c.diff. Again there are
multiple solutions:

The implementation of the RFC could be adjusted, such that the
wrapper exception implicitly uses the original Exception message or
original code. So the native implementation could do something similar
to step3b. I would be willing to do this, if you believe that would
reduce the BC impact.
Alternatively a little change to the code is required, let me
demonstrate how that would look. The solution is based on 'step3c.diff':

We just need to add these three lines to the catch block:

        if ($e instanceof UnserializationFailedException && $e->getPrevious()) {
            $e = $e->getPrevious();
        }

This will unwrap the original Exception that is stored in the
'$previous' parameter. You can find the full patch in
'rfc-use-alternative.diff'.

Of course this is fully compatible with PHP 8.2 and earlier: The
instanceof will simply not match, because
'UnserializationFailedException' does not exist.

I believe with this example I have sufficiently demonstrated how fixing
the code for the existing PHP versions will cause the test failures to
go away automatically. If you consider the Exception message to be
relevant for backwards compatibility, then just three simple lines need
to be added - or alternatively the RFC implementation which currently
just is a proof of concept (!) could be adjusted. At no point is it
necessary to check for the PHP version.

I believe the changes to the other classes with failing tests are
equally simple based on my demonstration.

Feel free to use my attached patches to make the necessary changes in
Symfony. There is no need to credit me in any way. Alternatively I would
be happy to send a pull request. Just let me know which of the 3
alternatives of step 3 you prefer and I'll send a PR.

Best regards
Tim Düsterhus

3 years ago by Nicolas Grekas — view source — reply

unread

Hi Tim,

Thanks for taking the time to have a closer look. Unfortunately, you picked
the one failing test where there could be a discussion about whether the
original code needs improvement or not.

The other failing tests involve "unserialize_callback_func" to gracefully
detect class-not-found, and they are all plain broken by the RFC.

I created this patch/PR to show the changes that would be required on
Symfony to work around the BC break:
https://github.com/symfony/symfony/pull/47882

Note to readers: in this whole discussion, Symfony is just an example of
affected code. In the end, Symfony will adapt to the RFC if it passes. The
point being made is that PHP scripts should not have to be patched to run
on newer minor versions of PHP. That's what "keeping BC" means.

Resending the message without attachments, because the mailing list

rejected the original due to exceeding 30000 bytes:

ezmlm-reject: fatal: Sorry, I don't accept messages larger than 30000
bytes (#5.2.3)

You can find the attachments at:
https://gist.github.com/TimWolla/376dd162f7684daef38f76a07254871c

Original message below:

I'm not surprised by the “no” on the first vote based on the previous
discussion. I am surprised however that you also disagree with raising
the E_NOTICE to E_WARNING for consistency.

I don't plan on repeating all the previous discussion points with you,
but as you mention the Symfony tests specifically: Please find the patch
attached. Do you believe the expectations in this new test would a
reasonable expectation by a Symfony user?

Since the beginning my point is not that the RFC doesn't have merits.
It's
that the proposed approach breaks BC in a way that will affect the
community significantly. We have policies that say we should avoid BC
breaks and they should apply here.

To anyone wondering about the reality of the BC break if you're not
convinced there is one because the original code is broken anyway (which
is
your main argument Tim IIUC): please have a look at the failures I
reported

Yes, you understood this correctly: I consider the existing
implementation to be broken. Unfortunately you did not answer my
question whether the attached patch would be a reasonable expectation by
a Symfony user.

Let's presume it is: If I use the 'PhpSerializer' then I expect to
receive exactly the 'MessageDecodingFailedException' whenever the
message fails to decode. When decoding an erroneous DateTime (or an
erroneous SplDoublyLinkedList or whatever) a different Exception will be
thrown and not caught, thus the contract by the 'PhpSerializer' is
violated.

Let's fix this using test driven development. We first add the failing
test cases. This is step1.diff (patches are in
https://gist.github.com/TimWolla/376dd162f7684daef38f76a07254871c). When
running the tests we get the following output:

I didn't answer the question because we ruled out such payloads in the
discussion thread: constructed payloads like the ones involving DateTime do
not exist in practice. Or if they do, it means the source of the payload is
not trusted, and then we are in much bigger trouble (security issues for
which the RFC can't do anything.)

Breaking BC in the name of such invalid payloads doesn't make an argument
in favor of the RFC.

Failed asserting that exception message 'Could not decode message using
PHP serialization: O:13:"ReceivedSt0mp":0:{}.' matches '/class
"ReceivedSt0mp" not found/'.

Okay, now the Exception message changed. Personally I do not consider
this a BC break: I believe Exception messages are meant for human
consumption, not for programs. Otherwise fixing a typo in the message
would be a BC break. If the code wants to learn about the cause, it
should either use the '$code' or different types of Exception should be
thrown to clarify the cause by entering a different catch() block.

Yes, the specific error message should be part of the BC promise. This
allows building test suites that can assert the message in a stable way.
This is also why we don't change the output of var_dump/print_r/var_export:
they're written now, in the same of BC, for the best of stability. (I've
barely read any PHP code where exception's code is used in any useful BTW -
that can't be a solution.)

More importantly, the type of thrown exceptions should undoubtedly be part
of the BC promise.
Wrapping exceptions inside UnserializationFailedException breaks existing
catch/instanceof checks (see my PR above.)

All green! We did not have to change anything after fixing the
implementation in step3a.diff for PHP 8.2 and earlier!

We did change the exception hierarchy, both in terms of types and in terms
of stack of "previous". In this specific case, we can discuss the merits of
wrapping all throwables or not. But in doubt the other failing cases I
reported clearly highlight the BC break.

Nicolas

3 years ago by Peter Bowyer — view source — reply

unread

On Mon, 17 Oct 2022 at 09:57, Nicolas Grekas nicolas.grekas+php@gmail.com
wrote:

I created this patch/PR to show the changes that would be required on
Symfony to work around the BC break:
https://github.com/symfony/symfony/pull/47882

Note to readers: in this whole discussion, Symfony is just an example of
affected code. In the end, Symfony will adapt to the RFC if it passes. The
point being made is that PHP scripts should not have to be patched to run
on newer minor versions of PHP. That's what "keeping BC" means.

I didn't think there was much reason for the negative fuss being made
around this PR, after all catching \Throwable or another exception doesn't
make much difference. Seeing the workarounds here however it is more
involved than I expected.

I have changed my vote for the first point to "No". I’d like to see this
implemented in future (9.0?) but now agree that 8.3 isn’t the right place.

Peter

3 years ago by tim@bastelstu.be — view source — reply

unread

Hi

Thanks for taking the time to have a closer look. Unfortunately, you picked
the one failing test where there could be a discussion about whether the
original code needs improvement or not.

I disagree. The other locations are equally broken with current versions
of PHP.

The other failing tests involve "unserialize_callback_func" to gracefully
detect class-not-found, and they are all plain broken by the RFC.

I created this patch/PR to show the changes that would be required on
Symfony to work around the BC break:
https://github.com/symfony/symfony/pull/47882

Note to readers: in this whole discussion, Symfony is just an example of
affected code. In the end, Symfony will adapt to the RFC if it passes. The
point being made is that PHP scripts should not have to be patched to run
on newer minor versions of PHP. That's what "keeping BC" means.

Scripts will not need to be changed, unless they are already relying on
observed behavior instead of the documented behavior.

Let's fix this using test driven development. We first add the failing
test cases. This is step1.diff (patches are in
https://gist.github.com/TimWolla/376dd162f7684daef38f76a07254871c). When
running the tests we get the following output:

I didn't answer the question because we ruled out such payloads in the
discussion thread: constructed payloads like the ones involving DateTime do
not exist in practice. Or if they do, it means the source of the payload is
not trusted, and then we are in much bigger trouble (security issues for
which the RFC can't do anything.)

Okay, just to make sure I understand this correctly, because I don't
want to draw incorrect conclusions or make false statements due to my
lack of knowledge about Symfony's ecosystem:

When I use Symfony's 'PhpSerializer' and perform the following call:

    $serializer->decode([
        'body' => '{"message": "bar"}',
    ]);

Then I can expect the input to be gracefully rejected with a
MessageDecodingFailedException. This behavior is tested in the
'testDecodingFailsWithBadFormat' test.

But under no circumstances must I perform the following method call:

    $serializer->decode([
        'body' => 'O:8:"DateTime":0:{}',
    ]);

Because this payload is invalid and thus must not exist in practice.

Did I understand this correctly? If I did, is this limitation documented
somewhere?

Breaking BC in the name of such invalid payloads doesn't make an argument
in favor of the RFC.

I demonstrated how to adjust 'PhpSerializer' to improve the behavior for
what I consider a reasonable use case without breaking any existing
tests in step3b.diff and step3c.diff. At the same time the fixed version
will automatically handle the change this RFC proposes. I even offered
to adjust the RFC implementation to preserve the original Exception
message and code.

Failed asserting that exception message 'Could not decode message using
PHP serialization: O:13:"ReceivedSt0mp":0:{}.' matches '/class
"ReceivedSt0mp" not found/'.

Okay, now the Exception message changed. Personally I do not consider
this a BC break: I believe Exception messages are meant for human
consumption, not for programs. Otherwise fixing a typo in the message
would be a BC break. If the code wants to learn about the cause, it
should either use the '$code' or different types of Exception should be
thrown to clarify the cause by entering a different catch() block.

Yes, the specific error message should be part of the BC promise. This
allows building test suites that can assert the message in a stable way.

I'm not talking about test suites here. I believe makes sense to verify
the error message to ensure a specific error message is emitted to the
human observer in the error log.

I was talking about code that does something like this, which I consider
to be inherently unsafe:

try { … }
catch (SomeException $e) {
if ($e->getMessage() === 'Foobar') doSomething();
else doSomethingElse();
}

As a library author I want to be able to provide the best possible
Exception message to ease debugging for the user. This is not possible
if I am locked into a bad choice forever.

This is also why we don't change the output of var_dump/print_r/var_export:
they're written now, in the same of BC, for the best of stability. (I've
barely read any PHP code where exception's code is used in any useful BTW -
that can't be a solution.)

PDO is an example where the $code is useful, because it exposes the
standardized SQL error code. Similarly a HTTP client could expose the
status code within $code.

For other libraries simply using a different Exception class within the
same hierarchy is often sufficient to differentiate between different
errors, because programmatically I often don't care why exactly an
operation failed.

More importantly, the type of thrown exceptions should undoubtedly be part
of the BC promise.

I agree.

Wrapping exceptions inside UnserializationFailedException breaks existing
catch/instanceof checks (see my PR above.)

unserialize() only promises you that a Throwable might be thrown:

https://www.php.net/manual/en/function.unserialize.php#refsect1-function.unserialize-errors

Symfony is relying on undocumented behavior.

All green! We did not have to change anything after fixing the
implementation in step3a.diff for PHP 8.2 and earlier!

We did change the exception hierarchy, both in terms of types and in terms
of stack of "previous". In this specific case, we can discuss the merits of

If not even the $previous value may change, then we are deep in
https://xkcd.com/1172/ territory. The whole purpose of $previous is
wrapping exceptions that come up as an implementation detail and then
exposing a well-defined Exception at the border of your library /
component and at the same time exposing the underlying cause to the
developer to simplify debugging.

wrapping all throwables or not. But in doubt the other failing cases I
reported clearly highlight the BC break.

As I've mentioned at the beginning of this email, the other locations
are also broken with the current PHP versions.

Let me work through ResourceCheckerConfigCache as another example. As
with the 'PhpSerializer' example, I'm working on the assumption that the
input to 'unserialize' might be broken. If it would be impossible for it
to be broken, then we do not need error handling in the first place and
the RFC would not change anything, because it only concerns itself with
error handling.

So let's add a new test case to 'ResourceCheckerConfigCacheTest'.
Specifically I am implementing a new resource that implements
'SelfCheckingResourceInterface':

final class MyResource implements \Symfony\Component\Config\Resource\SelfCheckingResourceInterface
{
public function __construct(
private \DateTimeImmutable $dti
) {}
public function isFresh(int $timestamp): bool { return false; }

public function __toString(): string { return ''; }
}

This resource stores a DTI internally, but the specific object does not
really matter. Anything that implements __unserialize() would work.

Now let me add a test that uses this resource. Unfortunately my meta
data file got corrupted on disk, because some unrelated process stomped
over it and replaced all '-' by an 'x'.

public function testCacheIsNotFreshWhenUnserializeFails2()
{
    $checker = $this->createMock(ResourceCheckerInterface::class);
    $cache = new ResourceCheckerConfigCache($this->cacheFile, [$checker]);
    $cache->write('foo', [new MyResource(new \DateTimeImmutable('now'))]);

    $metaFile = "{$this->cacheFile}.meta";
    file_put_contents($metaFile, str_replace('-', 'x', file_get_contents($metaFile)));

    $this->assertFalse($cache->isFresh());
}

Similarly to the test case without the '2' suffix, I expect the cache to
be ignored, because it can simply be recreated. Let's test this:

Symfony\Component\Config\Tests\ResourceCheckerConfigCacheTest::testCacheIsNotFreshWhenUnserializeFails2
Error: Invalid serialization data for DateTimeImmutable object

/app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:159
/app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:77
/app/src/Symfony/Component/Config/Tests/ResourceCheckerConfigCacheTest.php:152

Oh no. The test failed. To fix this we just need remove everything from
the body of the catch in
ResourceCheckerConfigCache::safelyUnserialize(). No matter why the
unserialization fails, the cache will be ignored if it contains invalid
data. It goes without saying that this will correctly handle the
UnserializationFailedException that the RFC proposes.

Full patch attached.

For DefaultMarshaller I consider the test to be wrong / to be overly
specific. The 'MarshallerInterface' defines that:

@throws \Exception Whenever unserialization fails

UnserializationFailedException is a subclass of Exception, thus the
external contract is not violated.

For ContextListenerTest we can just add 'O:8:"DateTime":0:{}' to
'provideInvalidToken' and the same reasoning as with PhpSerializer and
ResourceCheckerConfigCache applies. I will not work through the steps
again to keep this email succinct.

On VarExporterTest I can't make a clear statement, because frankly I
don't understand what the code is supposed to do.

However:

a) 'Registry' constructs payloads by hand, something that you admitted
yourself does not exist in practice: "constructed payloads like the ones
involving DateTime do not exist in practice".

b) Registry is marked @internal, as such it does not affect any
downstream users of Symfony.

c) What Registry does is definitely nothing your average application
does, as such the caveat "this is not just about Symfony" does not apply.

Best regards
Tim Düsterhus

2 years ago by Larry Garfield — view source — reply

unread

Okay, now the Exception message changed. Personally I do not consider
this a BC break: I believe Exception messages are meant for human
consumption, not for programs. Otherwise fixing a typo in the message
would be a BC break. If the code wants to learn about the cause, it
should either use the '$code' or different types of Exception should be
thrown to clarify the cause by entering a different catch() block.

Yes, the specific error message should be part of the BC promise. This
allows building test suites that can assert the message in a stable way.

I'm not talking about test suites here. I believe makes sense to verify
the error message to ensure a specific error message is emitted to the
human observer in the error log.

I was talking about code that does something like this, which I consider
to be inherently unsafe:

try { … }
catch (SomeException $e) {
if ($e->getMessage() === 'Foobar') doSomething();
else doSomethingElse();
}

As a library author I want to be able to provide the best possible
Exception message to ease debugging for the user. This is not possible
if I am locked into a bad choice forever.

Just to be clear, such code is sometimes necessary. If the exception doesn't include sufficient information as dedicated properties, parsing out the string becomes the only option. I've had to do this myself.

In 100% of cases, without exception (no pun intended), that's because the code that throws the exception is bad and wrong and should be fixed. But such code absolutely exists in the wild, including in php-src. I recently needed to sscanf() and then explode the message of \ArgumentCountError as that was the only way I could find to get the class/method names out of it. I died inside a little.

So yes, such code is inherently unsafe, but is sadly not as uncommon as it should be.

All that said, I agree that we have not and should not treat error message strings as part of the API guarantee. If anything, maybe that will help incentivize people to stop writing bad (unparsable) exceptions.

Which I think gets at the crux of the issue here, and what is often the issue in BC discussions.

PHP does not, and has never, guaranteed full BC for all code. It has tried very hard to maintain BC for "well-behaved code." Well-behaved code can generally update to a new version, even a new major, very easily with minimal work. Any deprecations are alerted at least a year in advance, sometimes several, and usually have easy-to-automate transitions that tools like Rector can implement. (The discussion about deprecations being "too noisy" in some tooling is another matter that has been hashed out before, so I won't belabor it here.) Code that is following common best-practices rarely has an issue.

The problem is that "well-behaved" is not well-defined. Is relying on an error message string well-behaved? I'd argue no, but as noted above sometimes there's no better option. Others may disagree. Is relying on the exact exception class thrown well-behaved? Well... sometimes. Is relying on an error condition being swallowed rather than turning into an exception well-behaved? I could argue either way here, honestly. Is code relying on undefined variables silently becoming null well-behaved? Absolutely not, and hasn't been for a decade, but it wasn't until PHP 8.0 that the language called people out for it by default, so many projects had a lot of catch up to do. (Eg, I did that catch up for TYPO3.)

I don't know that we can make a clear definition of "well-behaved" that everyone could agree on. It would be nice, but it's a very squishy topic. But it may be prudent, when not tied down to this specific RFC, to have a broader discussion about better codifying (meaning, writing down) what we consider a BC break and what we don't, what some heuristics are for "well-behaved," and so on. That could help avoid, or at least shorten, a lot of discussions in the future.

--Larry Garfield

2 years ago by Mike Schinkel — view source — reply

unread

Larry,

Okay, now the Exception message changed. Personally I do not consider
this a BC break: I believe Exception messages are meant for human
consumption, not for programs. Otherwise fixing a typo in the message
would be a BC break. If the code wants to learn about the cause, it
should either use the '$code' or different types of Exception should be
thrown to clarify the cause by entering a different catch() block.

Yes, the specific error message should be part of the BC promise. This
allows building test suites that can assert the message in a stable way.

I'm not talking about test suites here. I believe makes sense to verify
the error message to ensure a specific error message is emitted to the
human observer in the error log.

I was talking about code that does something like this, which I consider
to be inherently unsafe:

try { … }
catch (SomeException $e) {
if ($e->getMessage() === 'Foobar') doSomething();
else doSomethingElse();
}

As a library author I want to be able to provide the best possible
Exception message to ease debugging for the user. This is not possible
if I am locked into a bad choice forever.

Just to be clear, such code is sometimes necessary. If the exception doesn't include sufficient information as dedicated properties, parsing out the string becomes the only option. I've had to do this myself.

In 100% of cases, without exception (no pun intended), that's because the code that throws the exception is bad and wrong and should be fixed. But such code absolutely exists in the wild, including in php-src. I recently needed to sscanf() and then explode the message of \ArgumentCountError as that was the only way I could find to get the class/method names out of it. I died inside a little.

So yes, such code is inherently unsafe, but is sadly not as uncommon as it should be.

All that said, I agree that we have not and should not treat error message strings as part of the API guarantee. If anything, maybe that will help incentivize people to stop writing bad (unparsable) exceptions.

I am curious what you would envision a better, "parsable" exception from sscanf() would look like?

I ask not because I disagree with you here but because maybe the question of whether errors messages should be part of BC is focusing on the wrong question?

-Mike

P.S. Also, looking at https://www.php.net/manual/en/function.sscanf.php https://www.php.net/manual/en/function.sscanf.php, it is interesting that the exceptions it triggers are not documented here. (I am not picking on sscanf() — AFAIK few if any functions document their exceptions.) Shouldn't more thought (and documentation) be given to this information which obviously is part of its API/usage?

2 years ago by Nicolas Grekas — view source — reply

unread

The other failing tests involve "unserialize_callback_func" to gracefully
detect class-not-found, and they are all plain broken by the RFC.

I created this patch/PR to show the changes that would be required on
Symfony to work around the BC break:
https://github.com/symfony/symfony/pull/47882

Note to readers: in this whole discussion, Symfony is just an example of
affected code. In the end, Symfony will adapt to the RFC if it passes.
The
point being made is that PHP scripts should not have to be patched to run
on newer minor versions of PHP. That's what "keeping BC" means.

Scripts will not need to be changed, unless they are already relying on
observed behavior instead of the documented behavior.

When I use Symfony's 'PhpSerializer' and perform the following call:

    $serializer->decode([
        'body' => '{"message": "bar"}',
    ]);
Then I can expect the input to be gracefully rejected with a
MessageDecodingFailedException. This behavior is tested in the
'testDecodingFailsWithBadFormat' test.

But under no circumstances must I perform the following method call:
    $serializer->decode([
        'body' => 'O:8:"DateTime":0:{}',
    ]);
Because this payload is invalid and thus must not exist in practice.

That's the current behavior, yes. It allows graceful errors when one
misconfigures their workers and sends jsons to a worker that expects
php-ser. Under no circumstances do we need to guard against adversary
payloads. We could handle the situation you describe of course, but that'd
be just dead code in practice. ¯_(ツ)_/¯

Breaking BC in the name of such invalid payloads doesn't make an argument
in favor of the RFC.

I demonstrated how to adjust 'PhpSerializer' to improve the behavior for
what I consider a reasonable use case without breaking any existing
tests in step3b.diff and step3c.diff. At the same time the fixed version
will automatically handle the change this RFC proposes. I even offered
to adjust the RFC implementation to preserve the original Exception
message and code.

Failed asserting that exception message 'Could not decode message using
PHP serialization: O:13:"ReceivedSt0mp":0:{}.' matches '/class
"ReceivedSt0mp" not found/'.

Okay, now the Exception message changed. Personally I do not consider
this a BC break: I believe Exception messages are meant for human
consumption, not for programs. Otherwise fixing a typo in the message
would be a BC break. If the code wants to learn about the cause, it
should either use the '$code' or different types of Exception should be
thrown to clarify the cause by entering a different catch() block.

Yes, the specific error message should be part of the BC promise. This
allows building test suites that can assert the message in a stable way.

I'm not talking about test suites here. I believe makes sense to verify
the error message to ensure a specific error message is emitted to the
human observer in the error log.

Breaking test suites is the BC issue I wanted to highlight.

I was talking about code that does something like this, which I consider
to be inherently unsafe:

try { … }
catch (SomeException $e) {
if ($e->getMessage() === 'Foobar') doSomething();
else doSomethingElse();
}

As a library author I want to be able to provide the best possible
Exception message to ease debugging for the user. This is not possible
if I am locked into a bad choice forever.

This is also why we don't change the output of
var_dump/print_r/var_export:
they're written now, in the same of BC, for the best of stability. (I've
barely read any PHP code where exception's code is used in any useful
BTW -
that can't be a solution.)

PDO is an example where the $code is useful, because it exposes the
standardized SQL error code. Similarly a HTTP client could expose the
status code within $code.

For other libraries simply using a different Exception class within the
same hierarchy is often sufficient to differentiate between different
errors, because programmatically I often don't care why exactly an
operation failed.

More importantly, the type of thrown exceptions should undoubtedly be
part
of the BC promise.

I agree.

Wrapping exceptions inside UnserializationFailedException breaks existing
catch/instanceof checks (see my PR above.)

unserialize() only promises you that a Throwable might be thrown:

https://www.php.net/manual/en/function.unserialize.php#refsect1-function.unserialize-errors

Breaking BC at will unless it's properly documented looks like a terrible
policy. It'd make developing in PHP really risky. Especially here:
expecting that an exception would bubble down is just what everybody would
expect from unserialize(), like everything else in the engine. It always
did, so ppl built on that. Nothing fancy here.

Symfony is relying on undocumented behavior.

All green! We did not have to change anything after fixing the
implementation in step3a.diff for PHP 8.2 and earlier!

We did change the exception hierarchy, both in terms of types and in
terms
of stack of "previous". In this specific case, we can discuss the merits
of

If not even the $previous value may change, then we are deep in
https://xkcd.com/1172/ territory. The whole purpose of $previous is
wrapping exceptions that come up as an implementation detail and then
exposing a well-defined Exception at the border of your library /
component and at the same time exposing the underlying cause to the
developer to simplify debugging.

wrapping all throwables or not. But in doubt the other failing cases I
reported clearly highlight the BC break.

As I've mentioned at the beginning of this email, the other locations
are also broken with the current PHP versions.

Let me work through ResourceCheckerConfigCache as another example. As
with the 'PhpSerializer' example, I'm working on the assumption that the
input to 'unserialize' might be broken. If it would be impossible for it
to be broken, then we do not need error handling in the first place and
the RFC would not change anything, because it only concerns itself with
error handling.

So let's add a new test case to 'ResourceCheckerConfigCacheTest'.
Specifically I am implementing a new resource that implements
'SelfCheckingResourceInterface':
final class MyResource implements
\Symfony\Component\Config\Resource\SelfCheckingResourceInterface
{
public function __construct(
private \DateTimeImmutable $dti
) {}
public function isFresh(int $timestamp): bool { return false; }

public function __toString(): string { return ''; }
}
This resource stores a DTI internally, but the specific object does not
really matter. Anything that implements __unserialize() would work.

Now let me add a test that uses this resource. Unfortunately my meta
data file got corrupted on disk, because some unrelated process stomped
over it and replaced all '-' by an 'x'.
public function testCacheIsNotFreshWhenUnserializeFails2()
{
    $checker = $this->createMock(ResourceCheckerInterface::class);
    $cache = new ResourceCheckerConfigCache($this->cacheFile,
[$checker]);
    $cache->write('foo', [new MyResource(new
\DateTimeImmutable('now'))]);
    $metaFile = "{$this->cacheFile}.meta";
    file_put_contents($metaFile, str_replace('-', 'x',
file_get_contents($metaFile)));
    $this->assertFalse($cache->isFresh());
}
Similarly to the test case without the '2' suffix, I expect the cache to
be ignored, because it can simply be recreated. Let's test this:

Symfony\Component\Config\Tests\ResourceCheckerConfigCacheTest::testCacheIsNotFreshWhenUnserializeFails2

Error: Invalid serialization data for DateTimeImmutable object

/app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:159
/app/src/Symfony/Component/Config/ResourceCheckerConfigCache.php:77

/app/src/Symfony/Component/Config/Tests/ResourceCheckerConfigCacheTest.php:152

Oh no. The test failed. To fix this we just need remove everything from
the body of the catch in
ResourceCheckerConfigCache::safelyUnserialize(). No matter why the
unserialization fails, the cache will be ignored if it contains invalid
data. It goes without saying that this will correctly handle the
UnserializationFailedException that the RFC proposes.

Full patch attached.

For DefaultMarshaller I consider the test to be wrong / to be overly
specific. The 'MarshallerInterface' defines that:

@throws \Exception Whenever unserialization fails

UnserializationFailedException is a subclass of Exception, thus the
external contract is not violated.

For ContextListenerTest we can just add 'O:8:"DateTime":0:{}' to
'provideInvalidToken' and the same reasoning as with PhpSerializer and
ResourceCheckerConfigCache applies. I will not work through the steps
again to keep this email succinct.

On VarExporterTest I can't make a clear statement, because frankly I
don't understand what the code is supposed to do.

However:

a) 'Registry' constructs payloads by hand, something that you admitted
yourself does not exist in practice: "constructed payloads like the ones
involving DateTime do not exist in practice".

It's quite common practice to create objects without constructors by using
constructed payloads. That's what the code you read does. While
ReflectionClass::newInstanceWithoutConstructor() works for userland
classes, many internal classes do require this trick.

b) Registry is marked @internal, as such it does not affect any

downstream users of Symfony.

c) What Registry does is definitely nothing your average application
does, as such the caveat "this is not just about Symfony" does not apply.

All the above cases are variations of the same topic:

When unserializing a payload, said payload might reference a class that was
removed since the payload was created.
For ResourceChecker, this happens all the time during dev, when one
refactors their code. For Cache, VarExporter and Security, this happens for
similar reasons - renamed classes, etc, but also in prod.

By default, PHP creates weird __PHP_Incomplete_Class objects in places
where not found classes are referenced. This creates invalid object graphs
that we want to reject early on (everybody should most of the time I
guess.) The only way to do so reliably is to use the mechanism I mentioned
in the discussion thread: ini_set(unserialize_callback_func), documented
here:
https://www.php.net/manual/en/var.configuration.php#ini.unserialize-callback-func

and then throw an exception from this callback.
Here is the typical code:

The RFC forces rewriting such logics to accomodate for the new wrapper.
That's the BC break that is highlighted by the PR I linked.

Nicolas

3 years ago by Jordan LeDoux — view source — reply

unread

Sorry for double send Nicolas, I hit reply instead of reply-all on my first
message. :)

On Mon, Oct 17, 2022 at 1:57 AM Nicolas Grekas nicolas.grekas+php@gmail.com
wrote:

Yes, the specific error message should be part of the BC promise. This
allows building test suites that can assert the message in a stable way.
This is also why we don't change the output of var_dump/print_r/var_export:
they're written now, in the same of BC, for the best of stability. (I've
barely read any PHP code where exception's code is used in any useful BTW -
that can't be a solution.)

While I definitely sympathize with the "BC break" for tests, this doesn't
actually break code unless you are switching behaviors in catch based on
specific exception messages, which does not seem like a workflow that PHP
needs to guarantee, as that is not the purpose of exception messages.

Moreover, the actual messages change in php-src all the time in PR's,
sometimes in PR's not even attached to RFCs. This has not, to my knowledge,
ever been previously considered a BC promise, and it certainly hasn't ever
been treated that way. If php-src took the position you are saying here,
then any error message in PHP would need to remain constant until major
versions, and we also probably could change between NOTICE, WARNING, or
anything like that, as it would present similar issues. This would make it
a BC break to provide deprecations ahead of the major versions, unless we
did an entire version ahead, which I do not think is worth the benefit of
providing that level of BC guarantee personally.

As I said, I definitely sympathize with your tests example, because I have
had libraries I work on with similar tests that break and/or are fragile
due to the message changing. But the situations in which I was doing this
in tests I realized were all because I was throwing the same exception
for different problems and trying to ensure which path caused the throw
within the test. In that case, I refactored the code to provide subclasses
of that exception instead so they could be differentiated, which was the
much more maintainable way to handle it.

Basically, this change may break the Symfony tests here, and could
definitely break other tests as well, but the tests it breaks are incorrect
tests in my opinion, and don't actually guarantee the correctness that the
green result implies. It is unlikely to break code (not tests) in ways
that it wasn't already broken before. (Tim has explained this, RE:
unserialization of various possible inputs). I still have yet to see an
example (even a contrived one) in which this RFC would introduce a failing
path to code that wasn't there before instead of promote a hidden
existing failing path into something that the developer can now respond to
intelligently.

As far as I can tell from the examples provided so far, this RFC reduces
the failing paths of 8.2 -> 8.3 code by promoting and exposing those paths
to existing tests in a way that actually matches the documentation for
unserialize. I might be still misunderstanding some nuance here.

Jordan

2 years ago by Tim Starling — view source — reply

unread

I'm going to vote against exception wrapping.

If I'm reading the implementation correctly, the original exception is
thrown away, there's no way to get the original message and backtrace.
That will make debugging more difficult.

I reviewed about 150 callers of unserialize() in the MediaWiki core
and extensions. There were no instances of the kind of error guarding
shown in the RFC. Occasionally, we have:

$result = @unserialize( $data );
if ( !$result ) {
throw new Exception( ... );
}

The assumption is that if there is a failure, false will be returned,
as is documented in the manual. I don't understand the assertion in
the RFC that set_error_handler() is required to detect unserialization
failure.

Normally, exceptions in MediaWiki are allowed to propagate back to the
global exception handler. There are a couple of categories of
exceptions which should never be caught as a matter of policy:

Database errors. These are permanent errors. Attempting to persist
with executing the request after a database error will mostly likely
lead to another error.
Request timeout exceptions. We're using our Excimer extension to
implement request timeouts by throwing exceptions from a timer
callback. This has been an interesting can of worms which has brought
some nice features with a number of unexpected consequences. It means
that it's always incorrect to catch and discard an arbitrary Throwable.

Probably nothing in our ecosystem is doing database access in an
__unserialize() magic method. But if someone tried it, I would want
the resulting errors to be properly logged with correct backtraces,
not silently discarded.

I would vote in favour of such a feature being added as a non-default
option, by analogy with JSON_THROW_ON_ERROR.

--
Tim Starling
Wikimedia Foundation

2 years ago by Derick Rethans — view source — reply

unread

If I'm reading the implementation correctly, the original exception
is thrown away, there's no way to get the
original message and backtrace.
That will make debugging more difficult.

I believe that is not true. It should be wrapped and accessible through ->previous on the exception.

cheers
Derick

2 years ago by tim@bastelstu.be — view source — reply

unread

Hi

If I'm reading the implementation correctly, the original exception
is thrown away, there's no way to get the
original message and backtrace.
That will make debugging more difficult.

I believe that is not true. It should be wrapped and accessible through ->previous on the exception.

Correct, the original Exception/Error will be available using
$e->getPrevious(). This is also explained in the RFC in the "Add wrapper
Exception 'UnserializationFailedException'" section:

The original \Throwable will be accessible through the $previous property of \UnserializationFailedException, allowing the developer to learn about the actual cause of the unserialization failure by inspecting ->getPrevious().

This is not apparent from taking a cursory look at the diff of the
native implementation in the pull request, because setting $previous
will happen implicitly whenever native code throws an Exception, while
an Exception is already "active".

Best regards
Tim Düsterhus

2 years ago by tim@bastelstu.be — view source — reply

unread

Hi

If I'm reading the implementation correctly, the original exception is
thrown away, there's no way to get the original message and backtrace.
That will make debugging more difficult.

See Derick's reply (and my reply to Derick's reply).

I reviewed about 150 callers of unserialize() in the MediaWiki core
and extensions. There were no instances of the kind of error guarding
shown in the RFC. Occasionally, we have:

$result = @unserialize( $data );
if ( !$result ) {
throw new Exception( ... );
}

The assumption is that if there is a failure, false will be returned,
as is documented in the manual. I don't understand the assertion in
the RFC that set_error_handler() is required to detect unserialization
failure.

false is a valid value that might've been serialized. The manual
specifically notes:

false is returned both in the case of an error and if unserializing the serialized false value. It is possible to catch this special case by comparing data with serialize(false) or by catching the issued E_NOTICE.

Note that your example code also errors out if the $result is an empty
array, an empty string or another falsy values. This might be an
acceptable result in your specific case, but is not the correct behavior
in the general case.

The example in the RFC was written to be correct for the general case,
without imposing any constraints in the input data.

Normally, exceptions in MediaWiki are allowed to propagate back to the
global exception handler. There are a couple of categories of
exceptions which should never be caught as a matter of policy:

Database errors. These are permanent errors. Attempting to persist
with executing the request after a database error will mostly likely
lead to another error.

Request timeout exceptions. We're using our Excimer extension to
implement request timeouts by throwing exceptions from a timer
callback. This has been an interesting can of worms which has brought
some nice features with a number of unexpected consequences. It means
that it's always incorrect to catch and discard an arbitrary Throwable.

Please note that this is already broken independently of this RFC. It is
entirely reasonable for a library to catch all Exceptions that happen
internally and wrap them in a well-defined "library exception" at the
library boundary.

As a specific example, the "flysystem" file system abstraction library
is doing this:

https://github.com/thephpleague/flysystem/blob/f5e93fad64cff5f45a1f79246dae15b31099b105/src/MountManager.php#L35-L37

Probably nothing in our ecosystem is doing database access in an
__unserialize() magic method. But if someone tried it, I would want
the resulting errors to be properly logged with correct backtraces,
not silently discarded.

The internal behavior of unserialization is unspecified. It only
guarantees you that you get back your original structure and that all
the unserialization callbacks will have been called if unserialization
fails. For the error case there are no guarantees, except "a Throwable
might be thrown".

As an example PHP 7.0.10 changed the behavior for unserialization
failures to no longer call __destruct() on incompletely initialized
objects to fix a security issue:

https://3v4l.org/82UJj

I believe this is the corresponding bug:
https://bugs.php.net/bug.php?id=72663

In PHP 7.0.15/7.1.1 apparently there was another change:

https://3v4l.org/tnh2R

As such you can't rely on the database access Exception to actually be
emitted if unserialization of another object also fails.

Best regards
Tim Düsterhus

2 years ago by Tim Starling — view source — reply

unread

false is a valid value that might've been serialized. The manual
specifically notes:

false is returned both in the case of an error and if
unserializing the serialized false value. It is possible to catch
this special case by comparing data with serialize(false) or by
catching the issued E_NOTICE.

Yes, that is fair enough.

Note that your example code also errors out if the $result is an
empty array, an empty string or another falsy values. This might be
an acceptable result in your specific case, but is not the correct
behavior in the general case.

Right, there was no instance in the 150 callers I reviewed yesterday
in which unserialize() was explicitly checked for errors while
unserializing a falsy value. Most often, there is an array wrapper
around the actual value, or the possibility is otherwise excluded.

The example in the RFC was written to be correct for the general
case, without imposing any constraints in the input data.

As I said, if throwing was optional, like in json_decode(), I would be
all for it. I'm just doing a cost/benefit analysis. The common case
should be easy, while the general case should be possible.

As a matter of style, I think PHP's false returns are fine. I don't
think we need to follow Python into making every error an exception.

-- Tim Starling

2 years ago by tim@bastelstu.be — view source — reply

unread

Hi

The example in the RFC was written to be correct for the general
case, without imposing any constraints in the input data.

As I said, if throwing was optional, like in json_decode(), I would be
all for it. I'm just doing a cost/benefit analysis. The common case
should be easy, while the general case should be possible.

As a matter of style, I think PHP's false returns are fine. I don't
think we need to follow Python into making every error an exception.

Let me clarify why unserialize is not really comparable to json_decode:

The unserialize handlers (__unserialize, __wakeup,
Serializable::unserialize()) effectively allow arbitrary (both native
and userland) code to run during unserialize(). This arbitrary code
might or might not throw an Exception.

Thus you already need to be prepared to deal with Exceptions being
thrown in the general case: Whenever you unserialize objects you didn't
write yourself. You don't know if a library introduces a change that
causes any existing long-lived serialized object to become incompatible.

Here's an example I've already sent to the list during the discussion
phase of this RFC: https://3v4l.org/Xsmfu

The readonly class 'Foo' had a private (!) '$a' property in the initial
version. This property was removed in a future version, thus breaking
all serialized versions of it, unless the author makes sure to add an
unserialize handler that handles this gracefully. As a developer you
generally wouldn't consider a change to a private property to be a
breaking API change, making this an unexpected gotcha.

Thus not setting "UNSERIALIZE_THROW_ON_ERROR" (or whatever you may
want to call it) does not guarantee that unserialize() won't throw.
You need to deal with both Notice/Warning and Exceptions. On the other
hand if the Notice/Warning becomes an Exception in a future version the
error handling is simplified: You only need to deal with Exceptions
(specifically a single type of Exception with this RFC).

Best regards
Tim Düsterhus

2 years ago by Dan Ackroyd — view source — reply

unread

Hi Tim,

Hi

as announced on Wednesday [1] I've now opened the vote for:

"Improve unserialize() error handling" [2]

I'm going to have to vote no for changing to throwing an exception.
The BC break is too big imo, and too hard for people to be aware of.

IMO There needs to be a clear path of deprecating the current
behaviour that people are alerted to (rather than having to read
upgrade notes), so that they know they need to change their code from:

function getUserPreferences($data) {
$user_preferences = @unserialize($data);
if ($user_preferences === false) {
$user_preferences = get_default_preferences();
}
return $user_preferences;
}

to something like:

function getUserPreferences($data) {
try {
return unserialize($data);
}
catch (UnserializationFailedException $ufe) {
return get_default_preferences();
}
}

but I'm not having great ideas of how that could be done 'nicely'.

Similar for the exception hierarchy change (example is below).
Changing the type of exception thrown would break valid (if not super
great) code in a way that would be hard to detect, and I'm not sure we
have a good way of performing this type of change.

IMO these isn't just a BC break that needs to wait for a major version
release; it's something that needs to have a clear upgrade path no
matter which version they are proposed for.

btw I think there are underlying interlinked problems with PHP that
causes proposed changes like these to be more difficult than anyone
would want:

The PHP core library is 'ungood' in quite a few aspects, and many
people would like to improve it, but some of those changes are hard to
do in a BC way.
Because of the way PHP works, it's not possible for one
library/module to use different versions of the same function. This is
different from JavaScript where with their (ridiculously convoluted*)
tools, each module is able to specify exactly which versions of which
libraries it wants to use.
Although PECL was good for the time it was created in (where it was
uncommon for individuals to have places to share files online) it
really isn't a tool fit for purpose any more. It's not really fit for
purpose any more, as even when there are (allegedly) great libraries
like ds (https://www.php.net/manual/en/book.ds.php) relatively few
people using PHP use them.

I think trying to improve the situation, so that changes to individual
functions avoid even requiring a discussion in internals in the first
place, is a worthy goal. But it's one that would require many multiple
years of effort, so is far too difficult to be done by individual
people volunteering their time.

Addressing problems that are far too large for individual contributors
to fix, is possibly a discussion to be taken up by the PHP Foundation.
If anyone who works for a for-profit company has read this far and
their company isn't already sponsoring something like 1% of their
engineers salaries to the foundation, then I encourage you to do so at
https://opencollective.com/phpfoundation .

For the "Increase the severity of emitted E_NOTICE to E_WARNING"
part... I offer no opinion on the change. I personally think any
project that doesn't convert any unsilenced warning to an exception is
just asking for trouble.

cheers
Dan
Ack

I wouldn't want to see a 'standard' library for PHP split across
80,000 repositories, but I think that the one (1) standard library
that PHP has is too few. Which possibly gives somes bounds to the
desirable number of libraries for a programming language.

// Example for __unserialize exception

class UserNotFoundException extends \Exception {}

function check_user_exists($user_id)
{
// check user account hasn't been deleted, otherwise throw:
throw new UserNotFoundException("exceptions for flow control are great.\n");
}

class User
{
private $id = 5;

function __unserialize(array $data) {
    check_user_exists($data["\x00User\x00id"]);
}

}

$user = new User();
$data = serialize($user);

echo "$data \n";

try {
$result = unserialize($data);
}
catch (UserNotFoundException $unfe) {
// This exception would no longer be caught if the exception type
was changed.
echo "redirect user to login page\n";
exit(0);
}

echo "User is valid.\n";

2 years ago by tim@bastelstu.be — view source — reply

unread

Hi

function getUserPreferences($data) {
$user_preferences = @unserialize($data);
if ($user_preferences === false) {
$user_preferences = get_default_preferences();
}
return $user_preferences;
}

[…]

I agree that this implementation 'getUserPreferences()' is somewhat
reasonable code if the sole use case for the error handling is dealing
with an empty string or NULL default value in your database (the latter
will break with
https://wiki.php.net/rfc/deprecate_null_to_scalar_internal_arg, though).
That's why I've only left the "change this in 9.0" option when starting
the vote, instead of also providing the "change this in 8.x" option.

FWIW: The same pattern exists in the application I maintain, but I
already started on transforming this to remove the '@' and add a
'catch(Throwable)'.

but I'm not having great ideas of how that could be done 'nicely'.

Yes, this specific example suffers from the '@' already being in place
and thus suppressing a deprecation.

Similar for the exception hierarchy change (example is below).
Changing the type of exception thrown would break valid (if not super
great) code in a way that would be hard to detect, and I'm not sure we
have a good way of performing this type of change.

While the behavior would in fact change, it would not introduce a
"security issue" if this what you were hinting at. This would just
result in an uncaught Exception which should be very visible in your
error tracking service.

I'd also like to note that the behavior is not really well-defined for
current versions either. Consider that the user object internally stores
an array with other User objects ($followedUsers). Now when a followee
(is that a legal word?) is deleted, this would result in a spurious logout.

btw I think there are underlying interlinked problems with PHP that
causes proposed changes like these to be more difficult than anyone
would want:

I agree with all three points. The API surface of what can be considered
a default installation of PHP is huge and subsystems can't be versioned
independently, thus holding subsystems in need to a modernization
hostage to the version number. I'm in a very similar situation for the
application I maintain, it also has large API surface with a unified
version number.

Best regards
Tim Düsterhus

2 years ago by Dan Ackroyd — view source — reply

unread

While the behavior would in fact change, it would not introduce a
"security issue" if this what you were hinting at.

No, just that it would break in production, and then have to be fixed
after a live site was already affecting end-users.

This would just result in an uncaught Exception which should
be very visible in your error tracking service.

My impression is that most web-servers running PHP don't have those.

For people who run most sites, the first they would know about it is
when end-users started complaining.

cheers
Dan
Ack

2 years ago by tim@bastelstu.be — view source — reply

unread

Hi

Voting will run 2 weeks until:

2022-10-28 at 14:00 UTC

As a heads up: Voting closing tomorrow, in about 30.5 hours.

Best regards
Tim Düsterhus

2 years ago by tim@bastelstu.be — view source — reply

unread

Hi

Voting will run 2 weeks until:

2022-10-28 at 14:00 UTC

Voting just closed. The RFC was partly accepted:

a) "Add the \UnserializationFailedException and wrap any Throwables in
PHP 8.x?"

This proposal was declined with 20 (Yes) to 12 (No) Votes (62.5%).

b) "Increase the severity of emitted E_NOTICE to E_WARNING in PHP 8.x?"

This proposal was accepted with 33 (Yes) to 2 (No) Votes (94.3%).

c) "Throw \UnserializationFailedException instead of emitting
E_NOTICE/E_WARNING in PHP 9.0?"

This proposal depends on declined proposal (a) and thus results will be
disregarded. It would have been accepted with 23 (Yes) to 8 (No) Votes
(74.2%).

The implementation for the accepted proposal (b) is already completed
and waiting for review at: https://github.com/php/php-src/pull/9629.

Best regards
Tim Düsterhus

2 years ago by Jeffrey Dafoe — view source — reply

unread

a) "Add the \UnserializationFailedException and wrap any Throwables in PHP
8.x?"

This proposal was declined with 20 (Yes) to 12 (No) Votes (62.5%).

b) "Increase the severity of emitted E_NOTICE to E_WARNING in PHP 8.x?"

This proposal was accepted with 33 (Yes) to 2 (No) Votes (94.3%).

c) "Throw \UnserializationFailedException instead of emitting
E_NOTICE/E_WARNING in PHP 9.0?"

This proposal depends on declined proposal (a) and thus results will be
disregarded. It would have been accepted with 23 (Yes) to 8 (No) Votes
(74.2%).

Informally, doesn't this mean that voters wanted a and c in PHP9? Does this vote mean it's dead for 9, or just that it needs a vote at some point in the future with the choices grouped different?

-Jeff

2 years ago by Christoph M. Becker — view source — reply

unread

a) "Add the \UnserializationFailedException and wrap any Throwables in PHP
8.x?"

This proposal was declined with 20 (Yes) to 12 (No) Votes (62.5%).

b) "Increase the severity of emitted E_NOTICE to E_WARNING in PHP 8.x?"

This proposal was accepted with 33 (Yes) to 2 (No) Votes (94.3%).

c) "Throw \UnserializationFailedException instead of emitting
E_NOTICE/E_WARNING in PHP 9.0?"

This proposal depends on declined proposal (a) and thus results will be
disregarded. It would have been accepted with 23 (Yes) to 8 (No) Votes
(74.2%).

Informally, doesn't this mean that voters wanted a and c in PHP9? Does this vote mean it's dead for 9, or just that it needs a vote at some point in the future with the choices grouped different?

Right; at least that is what I'd like to have.

--
Christoph M. Becker

2 years ago by tim@bastelstu.be — view source — reply

unread

Hi

Informally, doesn't this mean that voters wanted a and c in PHP9?

Not necessarily. It might be that a voter does not like the Exception
wrapping at all (thus declining the proposal), but if it happens to be
accepted then also increasing everything to Exception would be be an
obvious follow-up to them (thus declining the other proposal).

Also I've taken care to explicitly spell out the "terms of vote" before
starting the voting to make sure voters are able to make an informed
decision and making assumptions violates that informed decision.

Does this vote mean it's dead for 9, or just that it needs a vote at some point in the future with the choices grouped different?

I believe
https://wiki.php.net/rfc/voting#resurrecting_rejected_proposals answers
your question here.

My understanding is that it may be reproposed after April 28, 2023, or
earlier when substantial changes are made. Whether the change of target
versions or voting dependencies constitute such a substantial change, I
don't feel qualified to answer.

In any case I do not plan to repropose the RFC myself, for two reasons
that both relate to the fact that I've written the RFC and developed the
PoC patch in my free time (a co-worker proof-read the initial version of
the RFC itself on company time, though):

a) The discussion phase and voting phase were both pretty exhausting to
me mentally. One of the reasons is that the folks following the RFC on
Twitter were not pointed to the related discussion thread, thus were
missing out on important information and thus asked questions that were
already answered without them knowing about it and needed to be answered
again.

b) As a "personal time" contributor I personally find it demotivating if
I need to wait until the next major (possibly multiple years) before I
am able to use and rely on what I proposed myself.

I've written the RFC such that I am happy with what I proposed and such
that I feel that I spent my free time well.

Even if the proposal was partly rejected, it ultimately still benefitted
me, as I've learned something: I've learned that how I handled
unserialize() failures in code I've written is unsafe and broken. I will
improve my own code based on that and I strongly recommend everyone else
who followed along the discussion to also do that:

The only safe error handling is (1) using a throwing error handler, and
then (2) using catch(Throwable) without relying on a specific type of
Exception.

If someone wants to pick this up, they are free re-use (parts of) the
words I've written for the RFC. The 'Increase the error reporting
severity in the unserialize() parser' section will likely need to be
rewritten, as the proposal (b) "unify Notice to Warning" passed. A short
private heads-up is appreciated, but not required. If assistance with
the C implementation is required, I might be able to help out as well.

Best regards
Tim Düsterhus