Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104583 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 88332 invoked from network); 4 Mar 2019 23:30:40 -0000 Received: from unknown (HELO wp160.webpack.hosteurope.de) (80.237.132.167) by pb1.pair.com with SMTP; 4 Mar 2019 23:30:40 -0000 Received: from [2a02:8109:9d40:1d44:19ae:717a:50a6:57a6]; authenticated by wp160.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1h0u2n-0000PK-0i; Mon, 04 Mar 2019 21:18:37 +0100 To: internals@lists.php.net References: <4c3c09cb-7698-db46-0f4b-47ac6bdb5450@mabe.berlin> Message-ID: <0fcf050b-3d72-86f3-a870-6c6402ef841b@mabe.berlin> Date: Mon, 4 Mar 2019 21:18:36 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-bounce-key: webpack.hosteurope.de;dev@mabe.berlin;1551730717;06de36d8; X-HE-SMSGID: 1h0u2n-0000PK-0i Subject: Re: [PHP-DEV] Re: [RFC] New custom object serialization mechanism From: dev@mabe.berlin (Marc) On 04.03.19 10:08, Nikita Popov wrote: > On Mon, Mar 4, 2019 at 6:31 AM Marc wrote: > >> (sorry for top posting) >> >> Hi Nikita, >> >> as for the question about magic methods vs. interface. >> >> I have the feeling that it could be done with both in a clean way to >> make both parties happy. >> >> >> 1. Having __serialize/ __unserialize as magic methods - allowing to >> throw exceptions also to explicitly disallow serialization. >> >> 2. Having some kind of Serializable interface (I know this is already >> used) which expects both methods and also disallows throwing exceptions >> to force mark an object serializable. It's not possible to define such >> thing in the signature, but it can be documented as well as exceptions >> here could be catched internally and re-thrown as UnexpectedExceptionError. Any thoughts about this? >> >> Another question about static unserialize: >> >> > Some people have expressed a desire to make |__unserialize()| a >> static method which creates and returns the unserialized object (rather >> than first constructing the object and then calling |__unserialize()| to >> initialize it). >> >> > This would allow an even greater degree of control over the >> serialization mechanism, for example it would allow to return an already >> existing object from |__unserialize()|. >> >> > However, allowing this would once again require immediately calling >> |__unserialize()| functions (interleaved with unserialization) to make >> the object available for backreferences, which would reintroduce some of >> the problems that |Serializable| suffers from. As such, this will not be >> supported. >> >> I would be very happy to also have a solution for this as I have a >> library which would benefit from it: >> >> ( >> https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTrait.php#L46-L72 >> ) >> >> Would it be possible to instead get the new instance as argument and >> expect the instance to be returned but on the same time allow to return >> a self instantiated instance? >> >> Something like this: >> >> public static function __unserialize(static $new, array < >> http://www.php.net/array> $data): static { >> $new->prop_a = $data['prop_a']; >> return $new; >> } >> >> // or ignoring the argument and instantiate a new one >> >> public static function __unserialize(static $new, array < >> http://www.php.net/array> $data): static { >> $new = (ReflectionClass(__CLASS__))->newInstanceWithoutConstructor(); >> $new->prop_a = $data['prop_a']; >> return $new; >> } >> >> The second variant sounds like a wast of resources but if it makes an >> edge-case possible it still better then just ignore it. >> > I would love to have __unserialize() return a new object, I just don't > think it's technically possible. Let me explain in more detail what the > problem here is: > > Serialization has a notion of object identity implemented through use of > backreferences. In serialize([$obj, $obj]) the first use of the object is > serialized as usual, while the second is a backreference to the first. To > resolve that backreference, we can either > > a) construct objects immediately during unserialization, so we can just > directly use an existing object when we encounter a backreference, or > b) try to keep track of all the places where backreferences are used, so we > can fill them out later. > > If __unserialize() is itself responsible for constructing the object, then > a) means that __unserialize() needs to be called in the middle of > unserialization. This is problematic, because the backreference mechanism > uses direct pointers into structures, which can easily become invalid if > the structures are modified inside the __unserialize() implementation. This > is why __wakeup() calls are delayed until the end of serialization. Variant > b) also runs into the same issue. Keeping pointers to all the places that > use backreferences works fine until the first __unserialize() call, which > might modify structures and invalidate those pointers. > > Beyond these issues related to implementation details, you have the more > general issue of circular references. If you have an object that has a > reference to itself, how do you unserialize that? You need the already > instantiated object to pass it to __unserialize(), but only __unserialize() > is going to give you the object! You could do something like pass the data > with the circular reference nulled out and then patch it after > __unserialize() runs, but as you don't really know what happens to the data > passed to __unserialize(), you wouldn't know what exactly needs to be > changed. > > As such, I think the only way we could have __unserialize() return a new > object is if there was also a way to opt-out from the preservation of > object identity. > > Regards, > Nikita Thank you for the very detailed explanation! Do I understand it correctly that it's about self references in the serialized object or is that self-reference an internal implementation detail of the serializing logic. In case about self references in the serialized object and without any knowledge about the internal implementation I can just assume and based in that I still thing it should be possible. -> see https://3v4l.org/AbQ2s If my assumptions are just stupid please ignore Thanks, Marc > > On 26.02.19 15:17, Nikita Popov wrote: >>> On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas < >>> nicolas.grekas+php@gmail.com> wrote: >>> >>>> Le mar. 19 févr. 2019 à 11:31, Nikita Popov a >>>> écrit : >>>> >>>>> On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas < >>>>> nicolas.grekas+php@gmail.com> wrote: >>>>> >>>>>> Hi Nikita, >>>>>> >>>>>> I'd like to propose a new custom object serialization mechanism >> intended >>>>>>> to replace the broken Serializable interface: >>>>>>> >>>>>>> https://wiki.php.net/rfc/custom_object_serialization >>>>>>> >>>>>>> This was already previously discussed in >>>>>>> https://externals.io/message/98834, this just brings it into RFC >> form. >>>>>>> The latest motivation for this is >> https://bugs.php.net/bug.php?id=77302, >>>>>>> a compatibility issue in 7.3 affecting Symfony, caused by >> Serializable. We >>>>>>> can't fix Serializable, but we can at least make sure that a working >>>>>>> alternative exists. >>>>>>> >>>>>> Is there anything we can to do to help the RFC move forward? I'm >>>>>> spending a great deal of time removing Serializable from the Symfony >> code >>>>>> base. It's not pretty nor fun... but we have no choice since as ppl >> move to >>>>>> PHP 7.3, they can now spot when the current mechanism is breaking >> their >>>>>> serialized payloads (typically from user objects put in the session >>>>>> storage). >>>>>> >>>>>> About interface vs magic methods, technicaly, we can work with an >>>>>> interface provided by PHP core, so that if that's a blocker for >> voters to >>>>>> approve the RFC, let's do it - the current situation is really aweful >> :). >>>>>> FYI, I found myself implementing some getState()/setState() methods >> of my >>>>>> own, decoupled from the underlying mechanisms used by PHP. That >> allows me >>>>>> to still use Serializable for BC reasons when needed, or move to >> __sleep >>>>>> when possible, then move to the new 7.4 style when ready, without >> requiring >>>>>> any changes from the consumer POV. That's a good illustration of what >> I >>>>>> meant in my previous email: domain-specific interfaces in everyone's >> code >>>>>> is a better design as it allows better decoupling. It's also a better >>>>>> design to express that "instances of this interface of mine MUST be >>>>>> serializable". IMHO. >>>>>> >>>>>> Nicolas >>>>>> >>>>> I think for me the main open question here is whether we want to just >>>>> handle the serialization issue or try to come up with something more >>>>> general. If we limit this to just serialization, then the design should >>>>> stay as-is -- for all the reasons you have already outlined, using an >>>>> interface for this is inappropriate. >>>>> >>>>> For a more general mechanism, I think we would need something along the >>>>> lines of (ignoring naming): >>>>> >>>>> interface GetState { >>>>> public function getState(string $purpose): array; >>>>> } >>>>> interface SetState { >>>>> public function setState(array $data, string $purpose): void; >>>>> } >>>>> >>>>> We need separate interfaces for get/set to properly cater to cases like >>>>> JSON, where only get is meaningful (right now). We need the $purpose >>>>> argument to allow different state representations for different >> purposes, >>>>> e.g. JSON will often need more preparation than PHP serialization. The >>>>> $purpose argument is a string, to allow extension for custom purposes. >>>>> >>>>> Seeing that, this is really not something I would feel comfortable with >>>>> introducing in core PHP. Our track record for introducing reusable >>>>> general-purpose interfaces is not exactly stellar (say, did you hear >> about >>>>> SplObserver and SplSubject?) and I wouldn't want to do this without >> seeing >>>>> the concept fleshed out extensively in userland first. >>>>> >>>>> Nikita >>>>> >>>> I think the per-purpose representation logic doesn't belong to each >>>> classes. >>>> Symfony Serializer does it completly from the outside, and I think >> that's >>>> a much better design. >>>> I'm on the side this should not be in PHP code indeed. >>>> >>> As there hasn't been further input here, I'd like to go forward with the >>> RFC as-is and plan to start voting by Friday. >>> >>> Nikita >>>