Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104575 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 26125 invoked from network); 4 Mar 2019 12:20:38 -0000 Received: from unknown (HELO mail-it1-f196.google.com) (209.85.166.196) by pb1.pair.com with SMTP; 4 Mar 2019 12:20:38 -0000 Received: by mail-it1-f196.google.com with SMTP id f186so4497167ita.0 for ; Mon, 04 Mar 2019 01:08:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=h+q19cYDcmrox5RHAERXQDM92N4Rq9dwGC2/8+UTJN0=; b=d7vEZK0xnP89L1H1fwClT7GyfPLTm3fpKo+DHhG3Z/PBCiFuVjUT2GpWg8Vh9vsJMS 2watSNgg+z3amWYs0hOF2Mq1Ns+z0wo8Z+avZbaQPxxfgj0aIi9WX4Ht8xD2pYDSJZFU lIVY9bkiq7LE3uBeoSwDcLZPmyrbjpo1ebAqga+iMI05aJrdFJ/MvPvBN1LRCmmd8yg/ LcfSFeDIT8i8m3PjpzGdj0tTLJsST2NzkPoOBWLmXvUtR6VSm5+3jA6kC+URqV8goI0e Eo7wfsRDpOs6qh6nHX+5sY0CyIuCvn74Y7NGCSPnPo66GPa9/XMvaFSK3tUFu9ExIbEv jGnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=h+q19cYDcmrox5RHAERXQDM92N4Rq9dwGC2/8+UTJN0=; b=m6nbi8aAVU3j8d5srt+Ygxjb0wSYpZy1pDl9iLsUgFOz9M62aHAIKFrHgt3jdtCMPt ZNHF6B9lyjP+2J54luklXXHB7ovKFDZLqbfrFsLvp4C3CSrcSgDj6L4jC43oqvRXhPaN mLWIpHKQYjpI70qqVEMRhL1kYbxkkfjtjMKvPq+/VSkscd/u3E/w8wSKT8jNXP3Y7CSo 4X5ITR2UON6uGHEKiBLNnCFnUIxv2S+wayGOv/tQSUUgbL6IlJX951oKLs4cscdE7yNw 3MdbnSP4TH/G3rugqTxBfBXWdkgE47/iFuGK28ytNayw27MhKJgKlDC2q/9GL3YSV1ln 7UOw== X-Gm-Message-State: AHQUAuZD5hIF4H+UKEjOZGHKu2eDd+hKsaFuEHhcZPbNWEjTu7nm38eH YCBFZBtH6RMmd3qdSWxSgOaJ+Qt2RewF9p8N8lZBGQ== X-Google-Smtp-Source: APXvYqzZWs2iOyBrUQ948yrtELcKqlHpoEegELUDQC5eijqIp9CXH5riRbErc56omRIE3xvMIv3st0feaodZ/fKAQKM= X-Received: by 2002:a24:7542:: with SMTP id y63mr10103315itc.70.1551690508656; Mon, 04 Mar 2019 01:08:28 -0800 (PST) MIME-Version: 1.0 References: <4c3c09cb-7698-db46-0f4b-47ac6bdb5450@mabe.berlin> In-Reply-To: <4c3c09cb-7698-db46-0f4b-47ac6bdb5450@mabe.berlin> Date: Mon, 4 Mar 2019 10:08:10 +0100 Message-ID: To: Marc Cc: PHP internals Content-Type: multipart/alternative; boundary="00000000000091e5c9058341198c" Subject: Re: [PHP-DEV] Re: [RFC] New custom object serialization mechanism From: nikita.ppv@gmail.com (Nikita Popov) --00000000000091e5c9058341198c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 4, 2019 at 6:31 AM Marc wrote: > (sorry for top posting) > > Hi Nikita, > > as for the question about magic methods vs. interface. > > I have the feeling that it could be done with both in a clean way to > make both parties happy. > > > 1. Having __serialize/ __unserialize as magic methods - allowing to > throw exceptions also to explicitly disallow serialization. > > 2. Having some kind of Serializable interface (I know this is already > used) which expects both methods and also disallows throwing exceptions > to force mark an object serializable. It's not possible to define such > thing in the signature, but it can be documented as well as exceptions > here could be catched internally and re-thrown as UnexpectedExceptionErro= r. > > > Another question about static unserialize: > > > Some people have expressed a desire to make |__unserialize()| a > static method which creates and returns the unserialized object (rather > than first constructing the object and then calling |__unserialize()| to > initialize it). > > > This would allow an even greater degree of control over the > serialization mechanism, for example it would allow to return an already > existing object from |__unserialize()|. > > > However, allowing this would once again require immediately calling > |__unserialize()| functions (interleaved with unserialization) to make > the object available for backreferences, which would reintroduce some of > the problems that |Serializable| suffers from. As such, this will not be > supported. > > I would be very happy to also have a solution for this as I have a > library which would benefit from it: > > ( > https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTra= it.php#L46-L72 > ) > > Would it be possible to instead get the new instance as argument and > expect the instance to be returned but on the same time allow to return > a self instantiated instance? > > Something like this: > > public static function __unserialize(static $new, array < > http://www.php.net/array> $data): static { > $new->prop_a =3D $data['prop_a']; > return $new; > } > > // or ignoring the argument and instantiate a new one > > public static function __unserialize(static $new, array < > http://www.php.net/array> $data): static { > $new =3D (ReflectionClass(__CLASS__))->newInstanceWithoutConstructor= (); > $new->prop_a =3D $data['prop_a']; > return $new; > } > > The second variant sounds like a wast of resources but if it makes an > edge-case possible it still better then just ignore it. > I would love to have __unserialize() return a new object, I just don't think it's technically possible. Let me explain in more detail what the problem here is: Serialization has a notion of object identity implemented through use of backreferences. In serialize([$obj, $obj]) the first use of the object is serialized as usual, while the second is a backreference to the first. To resolve that backreference, we can either a) construct objects immediately during unserialization, so we can just directly use an existing object when we encounter a backreference, or b) try to keep track of all the places where backreferences are used, so we can fill them out later. If __unserialize() is itself responsible for constructing the object, then a) means that __unserialize() needs to be called in the middle of unserialization. This is problematic, because the backreference mechanism uses direct pointers into structures, which can easily become invalid if the structures are modified inside the __unserialize() implementation. This is why __wakeup() calls are delayed until the end of serialization. Variant b) also runs into the same issue. Keeping pointers to all the places that use backreferences works fine until the first __unserialize() call, which might modify structures and invalidate those pointers. Beyond these issues related to implementation details, you have the more general issue of circular references. If you have an object that has a reference to itself, how do you unserialize that? You need the already instantiated object to pass it to __unserialize(), but only __unserialize() is going to give you the object! You could do something like pass the data with the circular reference nulled out and then patch it after __unserialize() runs, but as you don't really know what happens to the data passed to __unserialize(), you wouldn't know what exactly needs to be changed. As such, I think the only way we could have __unserialize() return a new object is if there was also a way to opt-out from the preservation of object identity. Regards, Nikita On 26.02.19 15:17, Nikita Popov wrote: > > On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas < > > nicolas.grekas+php@gmail.com> wrote: > > > >> > >> Le mar. 19 f=C3=A9vr. 2019 =C3=A0 11:31, Nikita Popov a > >> =C3=A9crit : > >> > >>> On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas < > >>> nicolas.grekas+php@gmail.com> wrote: > >>> > >>>> Hi Nikita, > >>>> > >>>> I'd like to propose a new custom object serialization mechanism > intended > >>>>> to replace the broken Serializable interface: > >>>>> > >>>>> https://wiki.php.net/rfc/custom_object_serialization > >>>>> > >>>>> This was already previously discussed in > >>>>> https://externals.io/message/98834, this just brings it into RFC > form. > >>>>> The latest motivation for this is > https://bugs.php.net/bug.php?id=3D77302, > >>>>> a compatibility issue in 7.3 affecting Symfony, caused by > Serializable. We > >>>>> can't fix Serializable, but we can at least make sure that a workin= g > >>>>> alternative exists. > >>>>> > >>>> Is there anything we can to do to help the RFC move forward? I'm > >>>> spending a great deal of time removing Serializable from the Symfony > code > >>>> base. It's not pretty nor fun... but we have no choice since as ppl > move to > >>>> PHP 7.3, they can now spot when the current mechanism is breaking > their > >>>> serialized payloads (typically from user objects put in the session > >>>> storage). > >>>> > >>>> About interface vs magic methods, technicaly, we can work with an > >>>> interface provided by PHP core, so that if that's a blocker for > voters to > >>>> approve the RFC, let's do it - the current situation is really awefu= l > :). > >>>> FYI, I found myself implementing some getState()/setState() methods > of my > >>>> own, decoupled from the underlying mechanisms used by PHP. That > allows me > >>>> to still use Serializable for BC reasons when needed, or move to > __sleep > >>>> when possible, then move to the new 7.4 style when ready, without > requiring > >>>> any changes from the consumer POV. That's a good illustration of wha= t > I > >>>> meant in my previous email: domain-specific interfaces in everyone's > code > >>>> is a better design as it allows better decoupling. It's also a bette= r > >>>> design to express that "instances of this interface of mine MUST be > >>>> serializable". IMHO. > >>>> > >>>> Nicolas > >>>> > >>> I think for me the main open question here is whether we want to just > >>> handle the serialization issue or try to come up with something more > >>> general. If we limit this to just serialization, then the design shou= ld > >>> stay as-is -- for all the reasons you have already outlined, using an > >>> interface for this is inappropriate. > >>> > >>> For a more general mechanism, I think we would need something along t= he > >>> lines of (ignoring naming): > >>> > >>> interface GetState { > >>> public function getState(string $purpose): array; > >>> } > >>> interface SetState { > >>> public function setState(array $data, string $purpose): void; > >>> } > >>> > >>> We need separate interfaces for get/set to properly cater to cases li= ke > >>> JSON, where only get is meaningful (right now). We need the $purpose > >>> argument to allow different state representations for different > purposes, > >>> e.g. JSON will often need more preparation than PHP serialization. Th= e > >>> $purpose argument is a string, to allow extension for custom purposes= . > >>> > >>> Seeing that, this is really not something I would feel comfortable wi= th > >>> introducing in core PHP. Our track record for introducing reusable > >>> general-purpose interfaces is not exactly stellar (say, did you hear > about > >>> SplObserver and SplSubject?) and I wouldn't want to do this without > seeing > >>> the concept fleshed out extensively in userland first. > >>> > >>> Nikita > >>> > >> I think the per-purpose representation logic doesn't belong to each > >> classes. > >> Symfony Serializer does it completly from the outside, and I think > that's > >> a much better design. > >> I'm on the side this should not be in PHP code indeed. > >> > > As there hasn't been further input here, I'd like to go forward with th= e > > RFC as-is and plan to start voting by Friday. > > > > Nikita > > > --00000000000091e5c9058341198c--