[RFC] New custom object serialization mechanism

6 years ago by Nikita Popov — view source — reply

unread

Hi internals,

I'd like to propose a new custom object serialization mechanism intended to
replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in https://externals.io/message/98834,
this just brings it into RFC form. The latest motivation for this is
https://bugs.php.net/bug.php?id=77302, a compatibility issue in 7.3
affecting Symfony, caused by Serializable. We can't fix Serializable, but
we can at least make sure that a working alternative exists.

Regards,
Nikita

6 years ago by Marco Pivetta — view source — reply

unread

Not really fussed about having another implicit interface for serialization
(via magic methods).

Wouldn't a new interface make this clear, explicit, and make the
deprecation path easier (together with the migration)?

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/

Hi internals,

I'd like to propose a new custom object serialization mechanism intended to
replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834,
this just brings it into RFC form. The latest motivation for this is
https://bugs.php.net/bug.php?id=77302, a compatibility issue in 7.3
affecting Symfony, caused by Serializable. We can't fix Serializable, but
we can at least make sure that a working alternative exists.

Regards,
Nikita

6 years ago by Sebastian Bergmann — view source — reply

unread

Am 24.01.2019 um 15:09 schrieb Marco Pivetta:

Not really fussed about having another implicit interface for serialization
(via magic methods).

I second that emotion.

6 years ago by Nicolas Grekas — view source — reply

unread

Le jeu. 24 janv. 2019 à 15:18, Sebastian Bergmann sebastian@php.net a
écrit :

Am 24.01.2019 um 15:09 schrieb Marco Pivetta:

Not really fussed about having another implicit interface for
serialization
(via magic methods).

I second that emotion.

The more I think about it, the more I'm convinced we should not add an
interface for that.
An interface defines a semantics - something that an API tpoa domain and
that ppl can type-hint for to get a specific implementation of that
API.Here, both aspects are not desired: we don't want ppl to type-hint for
e.g. Serializable - and too bad it exists because I've already seen ppl
think: "hey, I'll type-hint or extend it to express I want a serializable
thing". BUT that's not the contract of Serializable or any variant of it
because: 1. any PHP object is serializable 2. Serializable it well
allowed to throw an exception to exactly forbid an object from being
serialized.

From this reasoning, I conclude that we really want magic methods here
because what we need is a behavior. We want to hook into the engine to
benefit from some special features it provides. That's what all magic
methods are about and hooking into serialize()/unserialize() is not
different.

The parallel with the Symfony Serializer is interesting but it stop here:
we don't need any help from core to build it. We should not seek for that
goal IMHO as it blur the lines of what we're needing: a core primitive to
build more useful things.

Magic methods would just provide that, without polluting the semantics
space.

My 2cts,
Nicolas

6 years ago by Marco Pivetta — view source — reply

unread

On Sun, Jan 27, 2019 at 5:37 PM Nicolas Grekas nicolas.grekas+php@gmail.com
wrote:

Le jeu. 24 janv. 2019 à 15:18, Sebastian Bergmann sebastian@php.net a
écrit :

Am 24.01.2019 um 15:09 schrieb Marco Pivetta:

Not really fussed about having another implicit interface for
serialization
(via magic methods).

I second that emotion.

The more I think about it, the more I'm convinced we should not add an
interface for that.
An interface defines a semantics - something that an API tpoa domain and
that ppl can type-hint for to get a specific implementation of that
API.

An interface declares an API surface for an object: in this case, the
consumer would be serialize() or unserialize(), and that's a perfectly
valid use-case scenario.

Here, both aspects are not desired: we don't want ppl to type-hint for
e.g. Serializable - and too bad it exists because I've already seen ppl
think: "hey, I'll type-hint or extend it to express I want a serializable
thing".

That's actually a very correct thing to do: by declaring that something is
Serializable, you are expressing your intent to anybody inspecting the
structure of the object.

BUT that's not the contract of Serializable or any variant of it
because: 1. any PHP object is serializable 2. Serializable it well
allowed to throw an exception to exactly forbid an object from being
serialized.

That's a completely different problem, which is that PHP has no way to
declare APIs as functionally pure, or exception-less. That's something to
be explored, in my opinion, but the lack of it does not warrant dismissing
interfaces altogether (your current argument).

Serializability is something to be declared: currently, PHP is very much
ill-designed on this particular scope, but that doesn't mean that we should
make it even worse.

From this reasoning, I conclude that we really want magic methods here

because what we need is a behavior. We want to hook into the engine to
benefit from some special features it provides. That's what all magic
methods are about and hooking into serialize()/unserialize() is not
different.

Yet more magic methods just add an impressive amount of complexity to the
language (I'm already thinking of the permutations of edge cases that I'm
personally going to have to write).
It's yet another way to define something that would work correctly if an
interface (existing mechanism defined by the language) was used instead.

Magic methods would just provide that, without polluting the semantics

space.

The newly introduced magic methods pollute the entire language
semantics: now you have another edge case in the language, instead of an
interface that works in combination with the serialize() and
unserialize() functions.

I'll also add that the problem being solved is much smaller than the issues
introduced by the proposed additions.

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/

6 years ago by Nicolas Grekas — view source — reply

unread

(I should have reviewed myself in the first message - sentences edited in
this reply)

Le lun. 28 janv. 2019 à 08:58, Marco Pivetta ocramius@gmail.com a écrit :

On Sun, Jan 27, 2019 at 5:37 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Le jeu. 24 janv. 2019 à 15:18, Sebastian Bergmann sebastian@php.net a
écrit :

Am 24.01.2019 um 15:09 schrieb Marco Pivetta:

Not really fussed about having another implicit interface for
serialization
(via magic methods).

I second that emotion.

The more I think about it, the more I'm convinced we should not add an
interface for that.
An interface defines a semantics - an API to a domain - and
that ppl can type-hint for to get a specific implementation of that
API.

An interface declares an API surface for an object: in this case, the
consumer would be serialize() or unserialize(), and that's a perfectly
valid use-case scenario.

Implemeting serialize/unserialize methods is meant to hook into the
same-name functions.
That's a very technical and concrete thing - a behavior - not anything
abstract as an interface is.

Here, both aspects are not desired: we don't want ppl to type-hint for

e.g. Serializable - and too bad it exists because I've already seen ppl
think: "hey, I'll type-hint or extend it to express I want a serializable
thing".

That's actually a very correct thing to do: by declaring that something is
Serializable, you are expressing your intent to anybody inspecting the
structure of the object.

"intent" is the issue here: these is no such abstract thing here. The PHP
engine is a very concrete object that provides behaviors to userland. And
we need a proper hook to configure the behavior of the
serialize/unserialize functions. There is no such things as
interface/intent/abstraction here, just a hook and that's perfect because
the engine should define as little semantics as possible: that's the job of
userland!

BUT that's not the contract of Serializable or any variant of it

because: 1. any PHP object is serializable 2. Serializable it well
allowed to throw an exception to exactly forbid an object from being
serialized.

That's a completely different problem, which is that PHP has no way to
declare APIs as functionally pure, or exception-less. That's something to
be explored, in my opinion, but the lack of it does not warrant dismissing
interfaces altogether (your current argument).

You completly ignored 1.: any object is serializable. Another hint there
isn't any abstraction sitting here: if Serializable was one, it would be
like a small wall in wide field.

The only way we could make "Serializable2" an abstraction is by fordbidding
any object that does not impementit to be serializable. That's
unrealistic - and uneeded to me.

Serializability is something to be declared: currently, PHP is very much
ill-designed on this particular scope, but that doesn't mean that we should
make it even worse.

From this reasoning, I conclude that we really want magic methods here

because what we need is a behavior. We want to hook into the engine to
benefit from some special features it provides. That's what all magic
methods are about and hooking into serialize()/unserialize() is not
different.

Yet more magic methods just add an impressive amount of complexity to the
language (I'm already thinking of the permutations of edge cases that I'm
personally going to have to write).

Please do, I'd be happy to better understand your pov.

It's yet another way to define something that would work correctly if an

interface (existing mechanism defined by the language) was used instead.

It would not abstract anything, thus would be just broken syntactic
illusional sugar.

Magic methods would just provide that, without polluting the semantics

space.

The newly introduced magic methods pollute the entire language
semantics: now you have another edge case in the language, instead of an
interface that works in combination with the serialize() and
unserialize() functions.

"Language semantics" is another thing. I'm talking about "domain
semantics". See reasoning above :)

I'll also add that the problem being solved is much smaller than the issues

introduced by the

proposed additions.

I promise the contrary: my personal experience is that Serializable does
real harm (please work on https://github.com/symfony/symfony/issues/29951
and related issues to get the feeling).
Magic methods would solve all this crap.

Cheers,
Nicolas

6 years ago by Claude Pache — view source — reply

unread

Le 28 janv. 2019 à 08:58, Marco Pivetta ocramius@gmail.com a écrit :

Here, both aspects are not desired: we don't want ppl to type-hint for
e.g. Serializable - and too bad it exists because I've already seen ppl
think: "hey, I'll type-hint or extend it to express I want a serializable
thing".

That's actually a very correct thing to do: by declaring that something is
Serializable, you are expressing your intent to anybody inspecting the
structure of the object.

This interface (as well as JsonSerializable) is incorrectly named. If I believe the manual, it is an ”interface for customized serializing”; therefore it should have been called CustomizedSerialization or something like that.

If we use a new interface, at the very least, let its name match its function (or its function match its name).

—Claude

6 years ago by bjorn.x.larsson@telia.com — view source — reply

unread

Den 2019-01-24 kl. 15:09, skrev Marco Pivetta:

Not really fussed about having another implicit interface for serialization
(via magic methods).

Wouldn't a new interface make this clear, explicit, and make the
deprecation path easier (together with the migration)?

How important is the following statement from the RFC regarding magic
methods vs Interfaces?

"Using an interface instead requires either raising the version
requirement to PHP 7.4, or dealing with the definition of a stub
interface in a compatible manner."

Sounds like it would be easier to implement this feature in
code/libraries targeting 7.x versions.

r//Björn Larsson

6 years ago by Nicolas Grekas — view source — reply

unread

Hi Nikita,

https://wiki.php.net/rfc/custom_object_serialization

In the RFC, you mention that "Executing arbitrary code in the middle of
unserialization is dangerous and has led to numerous unserialize()
vulnerabilities in the past. For this reason __wakeup() calls are now
delayed until the end of unserialization."

How about destructors?
Some vulnerabilities come from destructors doing things with unserialized
state.
Would it be possible/a good idea to not call any destructors unless the
"wakeup" stage has been successful? Any exceptions thrown during
__wakeup/__unserialize would mean the unserialized data structure should be
destroyed without calling any destructors?
WDYT?

Nicolas

6 years ago by Nikita Popov — view source — reply

unread

On Wed, Jan 30, 2019 at 10:20 AM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Nikita,

https://wiki.php.net/rfc/custom_object_serialization

In the RFC, you mention that "Executing arbitrary code in the middle of
unserialization is dangerous and has led to numerous unserialize()
vulnerabilities in the past. For this reason __wakeup() calls are now
delayed until the end of unserialization."

How about destructors?
Some vulnerabilities come from destructors doing things with unserialized
state.
Would it be possible/a good idea to not call any destructors unless the
"wakeup" stage has been successful? Any exceptions thrown during
__wakeup/__unserialize would mean the unserialized data structure should be
destroyed without calling any destructors?
WDYT?

This is already how it works. If a class has __wakeup and unserialization
fails (or call of __wakeup fails), then we will not call the destructor.
(The same would be true for __unserialize under this proposal.)

Nikita

6 years ago by Nicolas Grekas — view source — reply

unread

Hi Nikita,

I'd like to propose a new custom object serialization mechanism intended to

replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC form.
The latest motivation for this is https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm spending a
great deal of time removing Serializable from the Symfony code base. It's
not pretty nor fun... but we have no choice since as ppl move to PHP 7.3,
they can now spot when the current mechanism is breaking their serialized
payloads (typically from user objects put in the session storage).

About interface vs magic methods, technicaly, we can work with an interface
provided by PHP core, so that if that's a blocker for voters to approve the
RFC, let's do it - the current situation is really aweful :). FYI, I found
myself implementing some getState()/setState() methods of my own, decoupled
from the underlying mechanisms used by PHP. That allows me to still use
Serializable for BC reasons when needed, or move to __sleep when possible,
then move to the new 7.4 style when ready, without requiring any changes
from the consumer POV. That's a good illustration of what I meant in my
previous email: domain-specific interfaces in everyone's code is a better
design as it allows better decoupling. It's also a better design to express
that "instances of this interface of mine MUST be serializable". IMHO.

Nicolas

6 years ago by Nikita Popov — view source — reply

unread

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas nicolas.grekas+php@gmail.com
wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism intended

to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC form.
The latest motivation for this is https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm spending
a great deal of time removing Serializable from the Symfony code base. It's
not pretty nor fun... but we have no choice since as ppl move to PHP 7.3,
they can now spot when the current mechanism is breaking their serialized
payloads (typically from user objects put in the session storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for voters to
approve the RFC, let's do it - the current situation is really aweful :).
FYI, I found myself implementing some getState()/setState() methods of my
own, decoupled from the underlying mechanisms used by PHP. That allows me
to still use Serializable for BC reasons when needed, or move to __sleep
when possible, then move to the new 7.4 style when ready, without requiring
any changes from the consumer POV. That's a good illustration of what I
meant in my previous email: domain-specific interfaces in everyone's code
is a better design as it allows better decoupling. It's also a better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along the
lines of (ignoring naming):

interface GetState {
public function getState(string $purpose): array;
}
interface SetState {
public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different purposes,
e.g. JSON will often need more preparation than PHP serialization. The
$purpose argument is a string, to allow extension for custom purposes.

Seeing that, this is really not something I would feel comfortable with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear about
SplObserver and SplSubject?) and I wouldn't want to do this without seeing
the concept fleshed out extensively in userland first.

Nikita

6 years ago by Nicolas Grekas — view source — reply

unread

Le mar. 19 févr. 2019 à 11:31, Nikita Popov nikita.ppv@gmail.com a écrit :

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism intended

to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC form.
The latest motivation for this is https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm spending
a great deal of time removing Serializable from the Symfony code base. It's
not pretty nor fun... but we have no choice since as ppl move to PHP 7.3,
they can now spot when the current mechanism is breaking their serialized
payloads (typically from user objects put in the session storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for voters to
approve the RFC, let's do it - the current situation is really aweful :).
FYI, I found myself implementing some getState()/setState() methods of my
own, decoupled from the underlying mechanisms used by PHP. That allows me
to still use Serializable for BC reasons when needed, or move to __sleep
when possible, then move to the new 7.4 style when ready, without requiring
any changes from the consumer POV. That's a good illustration of what I
meant in my previous email: domain-specific interfaces in everyone's code
is a better design as it allows better decoupling. It's also a better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along the
lines of (ignoring naming):

interface GetState {
public function getState(string $purpose): array;
}
interface SetState {
public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different purposes,
e.g. JSON will often need more preparation than PHP serialization. The
$purpose argument is a string, to allow extension for custom purposes.

Seeing that, this is really not something I would feel comfortable with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear about
SplObserver and SplSubject?) and I wouldn't want to do this without seeing
the concept fleshed out extensively in userland first.

Nikita

I think the per-purpose representation logic doesn't belong to each classes.
Symfony Serializer does it completly from the outside, and I think that's a
much better design.
I'm on the side this should not be in PHP code indeed.

Nicolas

6 years ago by Nikita Popov — view source — reply

unread

On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Le mar. 19 févr. 2019 à 11:31, Nikita Popov nikita.ppv@gmail.com a
écrit :

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism intended

to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC form.
The latest motivation for this is https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm
spending a great deal of time removing Serializable from the Symfony code
base. It's not pretty nor fun... but we have no choice since as ppl move to
PHP 7.3, they can now spot when the current mechanism is breaking their
serialized payloads (typically from user objects put in the session
storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for voters to
approve the RFC, let's do it - the current situation is really aweful :).
FYI, I found myself implementing some getState()/setState() methods of my
own, decoupled from the underlying mechanisms used by PHP. That allows me
to still use Serializable for BC reasons when needed, or move to __sleep
when possible, then move to the new 7.4 style when ready, without requiring
any changes from the consumer POV. That's a good illustration of what I
meant in my previous email: domain-specific interfaces in everyone's code
is a better design as it allows better decoupling. It's also a better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along the
lines of (ignoring naming):

interface GetState {
public function getState(string $purpose): array;
}
interface SetState {
public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different purposes,
e.g. JSON will often need more preparation than PHP serialization. The
$purpose argument is a string, to allow extension for custom purposes.

Seeing that, this is really not something I would feel comfortable with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear about
SplObserver and SplSubject?) and I wouldn't want to do this without seeing
the concept fleshed out extensively in userland first.

Nikita

I think the per-purpose representation logic doesn't belong to each
classes.
Symfony Serializer does it completly from the outside, and I think that's
a much better design.
I'm on the side this should not be in PHP code indeed.

As there hasn't been further input here, I'd like to go forward with the
RFC as-is and plan to start voting by Friday.

Nikita

6 years ago by Marc — view source — reply

unread

(sorry for top posting)

Hi Nikita,

as for the question about magic methods vs. interface.

I have the feeling that it could be done with both in a clean way to
make both parties happy.

Having __serialize/ __unserialize as magic methods - allowing to
throw exceptions also to explicitly disallow serialization.
Having some kind of Serializable interface (I know this is already
used) which expects both methods and also disallows throwing exceptions
to force mark an object serializable. It's not possible to define such
thing in the signature, but it can be documented as well as exceptions
here could be catched internally and re-thrown as UnexpectedExceptionError.

Another question about static unserialize:

Some people have expressed a desire to make |__unserialize()| a
static method which creates and returns the unserialized object (rather
than first constructing the object and then calling |__unserialize()| to
initialize it).

This would allow an even greater degree of control over the
serialization mechanism, for example it would allow to return an already
existing object from |__unserialize()|.

However, allowing this would once again require immediately calling
|__unserialize()| functions (interleaved with unserialization) to make
the object available for backreferences, which would reintroduce some of
the problems that |Serializable| suffers from. As such, this will not be
supported.

I would be very happy to also have a solution for this as I have a
library which would benefit from it:

(https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTrait.php#L46-L72)

Would it be possible to instead get the new instance as argument and
expect the instance to be returned but on the same time allow to return
a self instantiated instance?

Something like this:

public static function __unserialize(static $new, array http://www.php.net/array $data): static {
$new->prop_a = $data['prop_a'];
return $new;
}

// or ignoring the argument and instantiate a new one

public static function __unserialize(static $new, array http://www.php.net/array $data): static {
$new = (ReflectionClass(CLASS))->newInstanceWithoutConstructor();
$new->prop_a = $data['prop_a'];
return $new;
}

The second variant sounds like a wast of resources but if it makes an
edge-case possible it still better then just ignore it.

Thoughts?

Marc

On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Le mar. 19 févr. 2019 à 11:31, Nikita Popov nikita.ppv@gmail.com a
écrit :

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism intended

to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC form.
The latest motivation for this is https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm
spending a great deal of time removing Serializable from the Symfony code
base. It's not pretty nor fun... but we have no choice since as ppl move to
PHP 7.3, they can now spot when the current mechanism is breaking their
serialized payloads (typically from user objects put in the session
storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for voters to
approve the RFC, let's do it - the current situation is really aweful :).
FYI, I found myself implementing some getState()/setState() methods of my
own, decoupled from the underlying mechanisms used by PHP. That allows me
to still use Serializable for BC reasons when needed, or move to __sleep
when possible, then move to the new 7.4 style when ready, without requiring
any changes from the consumer POV. That's a good illustration of what I
meant in my previous email: domain-specific interfaces in everyone's code
is a better design as it allows better decoupling. It's also a better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along the
lines of (ignoring naming):

interface GetState {
public function getState(string $purpose): array;
}
interface SetState {
public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different purposes,
e.g. JSON will often need more preparation than PHP serialization. The
$purpose argument is a string, to allow extension for custom purposes.

Seeing that, this is really not something I would feel comfortable with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear about
SplObserver and SplSubject?) and I wouldn't want to do this without seeing
the concept fleshed out extensively in userland first.

Nikita

I think the per-purpose representation logic doesn't belong to each
classes.
Symfony Serializer does it completly from the outside, and I think that's
a much better design.
I'm on the side this should not be in PHP code indeed.

As there hasn't been further input here, I'd like to go forward with the
RFC as-is and plan to start voting by Friday.

Nikita

6 years ago by Nikita Popov — view source — reply

unread

(sorry for top posting)

Hi Nikita,

as for the question about magic methods vs. interface.

I have the feeling that it could be done with both in a clean way to
make both parties happy.

Having __serialize/ __unserialize as magic methods - allowing to
throw exceptions also to explicitly disallow serialization.

Having some kind of Serializable interface (I know this is already
used) which expects both methods and also disallows throwing exceptions
to force mark an object serializable. It's not possible to define such
thing in the signature, but it can be documented as well as exceptions
here could be catched internally and re-thrown as UnexpectedExceptionError.

Another question about static unserialize:

Some people have expressed a desire to make |__unserialize()| a
static method which creates and returns the unserialized object (rather
than first constructing the object and then calling |__unserialize()| to
initialize it).

This would allow an even greater degree of control over the
serialization mechanism, for example it would allow to return an already
existing object from |__unserialize()|.

However, allowing this would once again require immediately calling
|__unserialize()| functions (interleaved with unserialization) to make
the object available for backreferences, which would reintroduce some of
the problems that |Serializable| suffers from. As such, this will not be
supported.

I would be very happy to also have a solution for this as I have a
library which would benefit from it:

(
https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTrait.php#L46-L72
)

Would it be possible to instead get the new instance as argument and
expect the instance to be returned but on the same time allow to return
a self instantiated instance?

Something like this:

public static function __unserialize(static $new, array <
http://www.php.net/array>; $data): static {
$new->prop_a = $data['prop_a'];
return $new;
}

// or ignoring the argument and instantiate a new one

public static function __unserialize(static $new, array <
http://www.php.net/array>; $data): static {
$new = (ReflectionClass(CLASS))->newInstanceWithoutConstructor();
$new->prop_a = $data['prop_a'];
return $new;
}

The second variant sounds like a wast of resources but if it makes an
edge-case possible it still better then just ignore it.

I would love to have __unserialize() return a new object, I just don't
think it's technically possible. Let me explain in more detail what the
problem here is:

Serialization has a notion of object identity implemented through use of
backreferences. In serialize([$obj, $obj]) the first use of the object is
serialized as usual, while the second is a backreference to the first. To
resolve that backreference, we can either

a) construct objects immediately during unserialization, so we can just
directly use an existing object when we encounter a backreference, or
b) try to keep track of all the places where backreferences are used, so we
can fill them out later.

If __unserialize() is itself responsible for constructing the object, then
a) means that __unserialize() needs to be called in the middle of
unserialization. This is problematic, because the backreference mechanism
uses direct pointers into structures, which can easily become invalid if
the structures are modified inside the __unserialize() implementation. This
is why __wakeup() calls are delayed until the end of serialization. Variant
b) also runs into the same issue. Keeping pointers to all the places that
use backreferences works fine until the first __unserialize() call, which
might modify structures and invalidate those pointers.

Beyond these issues related to implementation details, you have the more
general issue of circular references. If you have an object that has a
reference to itself, how do you unserialize that? You need the already
instantiated object to pass it to __unserialize(), but only __unserialize()
is going to give you the object! You could do something like pass the data
with the circular reference nulled out and then patch it after
__unserialize() runs, but as you don't really know what happens to the data
passed to __unserialize(), you wouldn't know what exactly needs to be
changed.

As such, I think the only way we could have __unserialize() return a new
object is if there was also a way to opt-out from the preservation of
object identity.

Regards,
Nikita

On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Le mar. 19 févr. 2019 à 11:31, Nikita Popov nikita.ppv@gmail.com a
écrit :

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism
intended

to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC
form.
The latest motivation for this is
https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by
Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm
spending a great deal of time removing Serializable from the Symfony
code
base. It's not pretty nor fun... but we have no choice since as ppl
move to
PHP 7.3, they can now spot when the current mechanism is breaking
their
serialized payloads (typically from user objects put in the session
storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for
voters to
approve the RFC, let's do it - the current situation is really aweful
:).
FYI, I found myself implementing some getState()/setState() methods
of my
own, decoupled from the underlying mechanisms used by PHP. That
allows me
to still use Serializable for BC reasons when needed, or move to
__sleep
when possible, then move to the new 7.4 style when ready, without
requiring
any changes from the consumer POV. That's a good illustration of what
I
meant in my previous email: domain-specific interfaces in everyone's
code
is a better design as it allows better decoupling. It's also a better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along the
lines of (ignoring naming):

interface GetState {
public function getState(string $purpose): array;
}
interface SetState {
public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different
purposes,
e.g. JSON will often need more preparation than PHP serialization. The
$purpose argument is a string, to allow extension for custom purposes.

Seeing that, this is really not something I would feel comfortable with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear
about
SplObserver and SplSubject?) and I wouldn't want to do this without
seeing
the concept fleshed out extensively in userland first.

Nikita

I think the per-purpose representation logic doesn't belong to each
classes.
Symfony Serializer does it completly from the outside, and I think
that's
a much better design.
I'm on the side this should not be in PHP code indeed.

As there hasn't been further input here, I'd like to go forward with the
RFC as-is and plan to start voting by Friday.

Nikita

6 years ago by Marc — view source — reply

unread

(sorry for top posting)

Hi Nikita,

as for the question about magic methods vs. interface.

I have the feeling that it could be done with both in a clean way to
make both parties happy.

Having __serialize/ __unserialize as magic methods - allowing to
throw exceptions also to explicitly disallow serialization.

Having some kind of Serializable interface (I know this is already
used) which expects both methods and also disallows throwing exceptions
to force mark an object serializable. It's not possible to define such
thing in the signature, but it can be documented as well as exceptions
here could be catched internally and re-thrown as UnexpectedExceptionError.

Any thoughts about this?

Another question about static unserialize:

Some people have expressed a desire to make |__unserialize()| a
static method which creates and returns the unserialized object (rather
than first constructing the object and then calling |__unserialize()| to
initialize it).

This would allow an even greater degree of control over the
serialization mechanism, for example it would allow to return an already
existing object from |__unserialize()|.

However, allowing this would once again require immediately calling
|__unserialize()| functions (interleaved with unserialization) to make
the object available for backreferences, which would reintroduce some of
the problems that |Serializable| suffers from. As such, this will not be
supported.

I would be very happy to also have a solution for this as I have a
library which would benefit from it:

(
https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTrait.php#L46-L72
)

Would it be possible to instead get the new instance as argument and
expect the instance to be returned but on the same time allow to return
a self instantiated instance?

Something like this:

public static function __unserialize(static $new, array <
http://www.php.net/array>; $data): static {
$new->prop_a = $data['prop_a'];
return $new;
}

// or ignoring the argument and instantiate a new one

public static function __unserialize(static $new, array <
http://www.php.net/array>; $data): static {
$new = (ReflectionClass(CLASS))->newInstanceWithoutConstructor();
$new->prop_a = $data['prop_a'];
return $new;
}

The second variant sounds like a wast of resources but if it makes an
edge-case possible it still better then just ignore it.

I would love to have __unserialize() return a new object, I just don't
think it's technically possible. Let me explain in more detail what the
problem here is:

Serialization has a notion of object identity implemented through use of
backreferences. In serialize([$obj, $obj]) the first use of the object is
serialized as usual, while the second is a backreference to the first. To
resolve that backreference, we can either

a) construct objects immediately during unserialization, so we can just
directly use an existing object when we encounter a backreference, or
b) try to keep track of all the places where backreferences are used, so we
can fill them out later.

If __unserialize() is itself responsible for constructing the object, then
a) means that __unserialize() needs to be called in the middle of
unserialization. This is problematic, because the backreference mechanism
uses direct pointers into structures, which can easily become invalid if
the structures are modified inside the __unserialize() implementation. This
is why __wakeup() calls are delayed until the end of serialization. Variant
b) also runs into the same issue. Keeping pointers to all the places that
use backreferences works fine until the first __unserialize() call, which
might modify structures and invalidate those pointers.

Beyond these issues related to implementation details, you have the more
general issue of circular references. If you have an object that has a
reference to itself, how do you unserialize that? You need the already
instantiated object to pass it to __unserialize(), but only __unserialize()
is going to give you the object! You could do something like pass the data
with the circular reference nulled out and then patch it after
__unserialize() runs, but as you don't really know what happens to the data
passed to __unserialize(), you wouldn't know what exactly needs to be
changed.

As such, I think the only way we could have __unserialize() return a new
object is if there was also a way to opt-out from the preservation of
object identity.

Regards,
Nikita

Thank you for the very detailed explanation!

Do I understand it correctly that it's about self references in the
serialized object or is that self-reference an internal implementation
detail of the serializing logic.

In case about self references in the serialized object and without any
knowledge about the internal implementation I can just assume and based
in that I still thing it should be possible.

-> see https://3v4l.org/AbQ2s

If my assumptions are just stupid please ignore

Thanks,

Marc

On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Le mar. 19 févr. 2019 à 11:31, Nikita Popov nikita.ppv@gmail.com a
écrit :

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism
intended

to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC
form.
The latest motivation for this is
https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by
Serializable. We
can't fix Serializable, but we can at least make sure that a working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm
spending a great deal of time removing Serializable from the Symfony
code
base. It's not pretty nor fun... but we have no choice since as ppl
move to
PHP 7.3, they can now spot when the current mechanism is breaking
their
serialized payloads (typically from user objects put in the session
storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for
voters to
approve the RFC, let's do it - the current situation is really aweful
:).
FYI, I found myself implementing some getState()/setState() methods
of my
own, decoupled from the underlying mechanisms used by PHP. That
allows me
to still use Serializable for BC reasons when needed, or move to
__sleep
when possible, then move to the new 7.4 style when ready, without
requiring
any changes from the consumer POV. That's a good illustration of what
I
meant in my previous email: domain-specific interfaces in everyone's
code
is a better design as it allows better decoupling. It's also a better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along the
lines of (ignoring naming):

interface GetState {
public function getState(string $purpose): array;
}
interface SetState {
public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different
purposes,
e.g. JSON will often need more preparation than PHP serialization. The
$purpose argument is a string, to allow extension for custom purposes.

Seeing that, this is really not something I would feel comfortable with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear
about
SplObserver and SplSubject?) and I wouldn't want to do this without
seeing
the concept fleshed out extensively in userland first.

Nikita

I think the per-purpose representation logic doesn't belong to each
classes.
Symfony Serializer does it completly from the outside, and I think
that's
a much better design.
I'm on the side this should not be in PHP code indeed.

As there hasn't been further input here, I'd like to go forward with the
RFC as-is and plan to start voting by Friday.

Nikita

6 years ago by Nikita Popov — view source — reply

unread

(sorry for top posting)

Hi Nikita,

as for the question about magic methods vs. interface.

I have the feeling that it could be done with both in a clean way to
make both parties happy.

Having __serialize/ __unserialize as magic methods - allowing to
throw exceptions also to explicitly disallow serialization.

Having some kind of Serializable interface (I know this is already
used) which expects both methods and also disallows throwing exceptions
to force mark an object serializable. It's not possible to define such
thing in the signature, but it can be documented as well as exceptions
here could be catched internally and re-thrown as
UnexpectedExceptionError.

Any thoughts about this?

Another question about static unserialize:

Some people have expressed a desire to make |__unserialize()| a
static method which creates and returns the unserialized object (rather
than first constructing the object and then calling |__unserialize()| to
initialize it).

This would allow an even greater degree of control over the
serialization mechanism, for example it would allow to return an already
existing object from |__unserialize()|.

However, allowing this would once again require immediately calling
|__unserialize()| functions (interleaved with unserialization) to make
the object available for backreferences, which would reintroduce some of
the problems that |Serializable| suffers from. As such, this will not be
supported.

I would be very happy to also have a solution for this as I have a
library which would benefit from it:

(

https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTrait.php#L46-L72

)

Would it be possible to instead get the new instance as argument and
expect the instance to be returned but on the same time allow to return
a self instantiated instance?

Something like this:

public static function __unserialize(static $new, array <
http://www.php.net/array>; $data): static {
$new->prop_a = $data['prop_a'];
return $new;
}

// or ignoring the argument and instantiate a new one

public static function __unserialize(static $new, array <
http://www.php.net/array>; $data): static {
$new =
(ReflectionClass(CLASS))->newInstanceWithoutConstructor();
$new->prop_a = $data['prop_a'];
return $new;
}

The second variant sounds like a wast of resources but if it makes an
edge-case possible it still better then just ignore it.

I would love to have __unserialize() return a new object, I just don't
think it's technically possible. Let me explain in more detail what the
problem here is:

Serialization has a notion of object identity implemented through use of
backreferences. In serialize([$obj, $obj]) the first use of the object is
serialized as usual, while the second is a backreference to the first. To
resolve that backreference, we can either

a) construct objects immediately during unserialization, so we can just
directly use an existing object when we encounter a backreference, or
b) try to keep track of all the places where backreferences are used, so
we
can fill them out later.

If __unserialize() is itself responsible for constructing the object,
then
a) means that __unserialize() needs to be called in the middle of
unserialization. This is problematic, because the backreference mechanism
uses direct pointers into structures, which can easily become invalid if
the structures are modified inside the __unserialize() implementation.
This
is why __wakeup() calls are delayed until the end of serialization.
Variant
b) also runs into the same issue. Keeping pointers to all the places that
use backreferences works fine until the first __unserialize() call, which
might modify structures and invalidate those pointers.

Beyond these issues related to implementation details, you have the more
general issue of circular references. If you have an object that has a
reference to itself, how do you unserialize that? You need the already
instantiated object to pass it to __unserialize(), but only
__unserialize()
is going to give you the object! You could do something like pass the
data
with the circular reference nulled out and then patch it after
__unserialize() runs, but as you don't really know what happens to the
data
passed to __unserialize(), you wouldn't know what exactly needs to be
changed.

As such, I think the only way we could have __unserialize() return a new
object is if there was also a way to opt-out from the preservation of
object identity.

Regards,
Nikita

Thank you for the very detailed explanation!

Do I understand it correctly that it's about self references in the
serialized object or is that self-reference an internal implementation
detail of the serializing logic.

In case about self references in the serialized object and without any
knowledge about the internal implementation I can just assume and based
in that I still thing it should be possible.

-> see https://3v4l.org/AbQ2s

If my assumptions are just stupid please ignore

If I understand right, your suggestion here is basically to make
__unserialize() responsible for dealing with self-references by itself: It
it wants to return a new object, then it needs to also take care of
replacing any potential self-references.

This somewhat works, but only if the references are really self-references.
You can also simply have multiple references to the same object without
there being an actual cycle, in which case __unserialize() will not have
access to those references.

An additional issue is that this basically requires the object to be fully
aware of it's contents: You can't have an unspecified $data property that
holds arbitrary values, because there might be a self-reference somewhere
in there, but you wouldn't know whether to find it or how to update it.

Nikita

On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Le mar. 19 févr. 2019 à 11:31, Nikita Popov nikita.ppv@gmail.com a
écrit :

On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

Hi Nikita,

I'd like to propose a new custom object serialization mechanism
intended

to replace the broken Serializable interface:

https://wiki.php.net/rfc/custom_object_serialization

This was already previously discussed in
https://externals.io/message/98834, this just brings it into RFC
form.
The latest motivation for this is
https://bugs.php.net/bug.php?id=77302,
a compatibility issue in 7.3 affecting Symfony, caused by
Serializable. We
can't fix Serializable, but we can at least make sure that a
working
alternative exists.

Is there anything we can to do to help the RFC move forward? I'm
spending a great deal of time removing Serializable from the Symfony
code
base. It's not pretty nor fun... but we have no choice since as ppl
move to
PHP 7.3, they can now spot when the current mechanism is breaking
their
serialized payloads (typically from user objects put in the session
storage).

About interface vs magic methods, technicaly, we can work with an
interface provided by PHP core, so that if that's a blocker for
voters to
approve the RFC, let's do it - the current situation is really
aweful
:).
FYI, I found myself implementing some getState()/setState() methods
of my
own, decoupled from the underlying mechanisms used by PHP. That
allows me
to still use Serializable for BC reasons when needed, or move to
__sleep
when possible, then move to the new 7.4 style when ready, without
requiring
any changes from the consumer POV. That's a good illustration of
what
I
meant in my previous email: domain-specific interfaces in everyone's
code
is a better design as it allows better decoupling. It's also a
better
design to express that "instances of this interface of mine MUST be
serializable". IMHO.

Nicolas

I think for me the main open question here is whether we want to just
handle the serialization issue or try to come up with something more
general. If we limit this to just serialization, then the design
should
stay as-is -- for all the reasons you have already outlined, using an
interface for this is inappropriate.

For a more general mechanism, I think we would need something along
the
lines of (ignoring naming):

interface GetState {
public function getState(string $purpose): array;
}
interface SetState {
public function setState(array $data, string $purpose): void;
}

We need separate interfaces for get/set to properly cater to cases
like
JSON, where only get is meaningful (right now). We need the $purpose
argument to allow different state representations for different
purposes,
e.g. JSON will often need more preparation than PHP serialization.
The
$purpose argument is a string, to allow extension for custom
purposes.

Seeing that, this is really not something I would feel comfortable
with
introducing in core PHP. Our track record for introducing reusable
general-purpose interfaces is not exactly stellar (say, did you hear
about
SplObserver and SplSubject?) and I wouldn't want to do this without
seeing
the concept fleshed out extensively in userland first.

Nikita

I think the per-purpose representation logic doesn't belong to each
classes.
Symfony Serializer does it completly from the outside, and I think
that's
a much better design.
I'm on the side this should not be in PHP code indeed.

As there hasn't been further input here, I'd like to go forward with
the
RFC as-is and plan to start voting by Friday.

Nikita