Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104601 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 46181 invoked from network); 6 Mar 2019 12:40:42 -0000 Received: from unknown (HELO mail-it1-f196.google.com) (209.85.166.196) by pb1.pair.com with SMTP; 6 Mar 2019 12:40:42 -0000 Received: by mail-it1-f196.google.com with SMTP id w18so8471860itj.4 for ; Wed, 06 Mar 2019 01:29:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wixCyK4AnVhVKAiDauOps86cbbXdDWuN/UBdR6e53Rc=; b=OOsbcIs4YZ6hCzxSwW7sEQsE3tWuO0NommQzIaYjSUIXOrjYuqa7/sEBM7AlO6nmRr oC2ltWmaPyBxp9+TE4EE69HfItwDeW4TqwKcpfibJcBZrmhJIUWv8FnxutYSCPf910Um PZZ8HObFRF0rXdOTjQwi8Qs3Wi5Lmb6wJ9KXEyxq00fcYCWsD10y03bCcCLJbBJG6nYI 5ChJUIolYsXODwgdCHM+mVWEhWb7L3yW5RijyUBawJ5rMZWCOmjm+HlzCX+4zW+xkAdw r3YHDx5bXdnpw95oPc+9V2eYDxsqtnarMy3T2Nfbl9YfrVyItUXQAQEDachXmGeBxlOO WfcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wixCyK4AnVhVKAiDauOps86cbbXdDWuN/UBdR6e53Rc=; b=n1GLm+vWdq3rSnPHJbdwwDdlXApebrQAAZqjEH6X3VpUFCuirmHJlEX3OO9x4sjAca VUi5VV+QwVXjUmwyEYIRDwI7hp0z5MgDtlLKfp2i2QE9pHFNiO9+DCf7OwW77ezU/PDq 2Th+DEV+gfZeEH6Ivbp6ZHhAR42wMLRpagdo9ppLhlp8C6Cz/Cb74kOBL8yKLbiGWesA WMY4Y3xauKcUGH/scgeK0p4ht1mQIK0QKRLMYevpdYXu+v5ohhhXVyen/SeR7AhKxISh mQJXC2iQvjHff18/286rf/0uXNa1VgyjDn09MzjQTn1kxUdZGUpk+o3iRAUf/Y9mqN84 pOKw== X-Gm-Message-State: APjAAAX6tGKCNeeZ2b6tDybC1BnnMYW8DqsdOxXyT9NScTyQrsR8nsm0 zXxakek330ZmupvFgJsXBUE3o6cyyWVuObViThsmuQ== X-Google-Smtp-Source: APXvYqxLL3vtEs/dBFqo99G6rmlUluYUZdZhYIutubm9CRpbMw0ZR4nWSxpLIPTxLfTgSt6LrRSsBN5VqMjxb7o6OHM= X-Received: by 2002:a05:6638:285:: with SMTP id c5mr3502669jaq.36.1551864543298; Wed, 06 Mar 2019 01:29:03 -0800 (PST) MIME-Version: 1.0 References: <4c3c09cb-7698-db46-0f4b-47ac6bdb5450@mabe.berlin> <0fcf050b-3d72-86f3-a870-6c6402ef841b@mabe.berlin> In-Reply-To: <0fcf050b-3d72-86f3-a870-6c6402ef841b@mabe.berlin> Date: Wed, 6 Mar 2019 10:28:44 +0100 Message-ID: To: Marc Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000d7cbee0583699e96" Subject: Re: [PHP-DEV] Re: [RFC] New custom object serialization mechanism From: nikita.ppv@gmail.com (Nikita Popov) --000000000000d7cbee0583699e96 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 4, 2019 at 9:18 PM Marc wrote: > > On 04.03.19 10:08, Nikita Popov wrote: > > On Mon, Mar 4, 2019 at 6:31 AM Marc wrote: > > > >> (sorry for top posting) > >> > >> Hi Nikita, > >> > >> as for the question about magic methods vs. interface. > >> > >> I have the feeling that it could be done with both in a clean way to > >> make both parties happy. > >> > >> > >> 1. Having __serialize/ __unserialize as magic methods - allowing to > >> throw exceptions also to explicitly disallow serialization. > >> > >> 2. Having some kind of Serializable interface (I know this is already > >> used) which expects both methods and also disallows throwing exception= s > >> to force mark an object serializable. It's not possible to define such > >> thing in the signature, but it can be documented as well as exceptions > >> here could be catched internally and re-thrown as > UnexpectedExceptionError. > > Any thoughts about this? > > > >> > >> Another question about static unserialize: > >> > >> > Some people have expressed a desire to make |__unserialize()| a > >> static method which creates and returns the unserialized object (rathe= r > >> than first constructing the object and then calling |__unserialize()| = to > >> initialize it). > >> > >> > This would allow an even greater degree of control over the > >> serialization mechanism, for example it would allow to return an alrea= dy > >> existing object from |__unserialize()|. > >> > >> > However, allowing this would once again require immediately callin= g > >> |__unserialize()| functions (interleaved with unserialization) to make > >> the object available for backreferences, which would reintroduce some = of > >> the problems that |Serializable| suffers from. As such, this will not = be > >> supported. > >> > >> I would be very happy to also have a solution for this as I have a > >> library which would benefit from it: > >> > >> ( > >> > https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTra= it.php#L46-L72 > >> ) > >> > >> Would it be possible to instead get the new instance as argument and > >> expect the instance to be returned but on the same time allow to retur= n > >> a self instantiated instance? > >> > >> Something like this: > >> > >> public static function __unserialize(static $new, array < > >> http://www.php.net/array> $data): static { > >> $new->prop_a =3D $data['prop_a']; > >> return $new; > >> } > >> > >> // or ignoring the argument and instantiate a new one > >> > >> public static function __unserialize(static $new, array < > >> http://www.php.net/array> $data): static { > >> $new =3D > (ReflectionClass(__CLASS__))->newInstanceWithoutConstructor(); > >> $new->prop_a =3D $data['prop_a']; > >> return $new; > >> } > >> > >> The second variant sounds like a wast of resources but if it makes an > >> edge-case possible it still better then just ignore it. > >> > > I would love to have __unserialize() return a new object, I just don't > > think it's technically possible. Let me explain in more detail what the > > problem here is: > > > > Serialization has a notion of object identity implemented through use o= f > > backreferences. In serialize([$obj, $obj]) the first use of the object = is > > serialized as usual, while the second is a backreference to the first. = To > > resolve that backreference, we can either > > > > a) construct objects immediately during unserialization, so we can just > > directly use an existing object when we encounter a backreference, or > > b) try to keep track of all the places where backreferences are used, s= o > we > > can fill them out later. > > > > If __unserialize() is itself responsible for constructing the object, > then > > a) means that __unserialize() needs to be called in the middle of > > unserialization. This is problematic, because the backreference mechani= sm > > uses direct pointers into structures, which can easily become invalid i= f > > the structures are modified inside the __unserialize() implementation. > This > > is why __wakeup() calls are delayed until the end of serialization. > Variant > > b) also runs into the same issue. Keeping pointers to all the places th= at > > use backreferences works fine until the first __unserialize() call, whi= ch > > might modify structures and invalidate those pointers. > > > > Beyond these issues related to implementation details, you have the mor= e > > general issue of circular references. If you have an object that has a > > reference to itself, how do you unserialize that? You need the already > > instantiated object to pass it to __unserialize(), but only > __unserialize() > > is going to give you the object! You could do something like pass the > data > > with the circular reference nulled out and then patch it after > > __unserialize() runs, but as you don't really know what happens to the > data > > passed to __unserialize(), you wouldn't know what exactly needs to be > > changed. > > > > As such, I think the only way we could have __unserialize() return a ne= w > > object is if there was also a way to opt-out from the preservation of > > object identity. > > > > Regards, > > Nikita > > > Thank you for the very detailed explanation! > > Do I understand it correctly that it's about self references in the > serialized object or is that self-reference an internal implementation > detail of the serializing logic. > > > In case about self references in the serialized object and without any > knowledge about the internal implementation I can just assume and based > in that I still thing it should be possible. > > -> see https://3v4l.org/AbQ2s > > If my assumptions are just stupid please ignore > If I understand right, your suggestion here is basically to make __unserialize() responsible for dealing with self-references by itself: It it wants to return a new object, then it needs to also take care of replacing any potential self-references. This somewhat works, but only if the references are really self-references. You can also simply have multiple references to the same object without there being an actual cycle, in which case __unserialize() will not have access to those references. An additional issue is that this basically requires the object to be fully aware of it's contents: You can't have an unspecified $data property that holds arbitrary values, because there might be a self-reference *somewhere* in there, but you wouldn't know whether to find it or how to update it. Nikita > > > > On 26.02.19 15:17, Nikita Popov wrote: > >>> On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas < > >>> nicolas.grekas+php@gmail.com> wrote: > >>> > >>>> Le mar. 19 f=C3=A9vr. 2019 =C3=A0 11:31, Nikita Popov a > >>>> =C3=A9crit : > >>>> > >>>>> On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas < > >>>>> nicolas.grekas+php@gmail.com> wrote: > >>>>> > >>>>>> Hi Nikita, > >>>>>> > >>>>>> I'd like to propose a new custom object serialization mechanism > >> intended > >>>>>>> to replace the broken Serializable interface: > >>>>>>> > >>>>>>> https://wiki.php.net/rfc/custom_object_serialization > >>>>>>> > >>>>>>> This was already previously discussed in > >>>>>>> https://externals.io/message/98834, this just brings it into RFC > >> form. > >>>>>>> The latest motivation for this is > >> https://bugs.php.net/bug.php?id=3D77302, > >>>>>>> a compatibility issue in 7.3 affecting Symfony, caused by > >> Serializable. We > >>>>>>> can't fix Serializable, but we can at least make sure that a > working > >>>>>>> alternative exists. > >>>>>>> > >>>>>> Is there anything we can to do to help the RFC move forward? I'm > >>>>>> spending a great deal of time removing Serializable from the Symfo= ny > >> code > >>>>>> base. It's not pretty nor fun... but we have no choice since as pp= l > >> move to > >>>>>> PHP 7.3, they can now spot when the current mechanism is breaking > >> their > >>>>>> serialized payloads (typically from user objects put in the sessio= n > >>>>>> storage). > >>>>>> > >>>>>> About interface vs magic methods, technicaly, we can work with an > >>>>>> interface provided by PHP core, so that if that's a blocker for > >> voters to > >>>>>> approve the RFC, let's do it - the current situation is really > aweful > >> :). > >>>>>> FYI, I found myself implementing some getState()/setState() method= s > >> of my > >>>>>> own, decoupled from the underlying mechanisms used by PHP. That > >> allows me > >>>>>> to still use Serializable for BC reasons when needed, or move to > >> __sleep > >>>>>> when possible, then move to the new 7.4 style when ready, without > >> requiring > >>>>>> any changes from the consumer POV. That's a good illustration of > what > >> I > >>>>>> meant in my previous email: domain-specific interfaces in everyone= 's > >> code > >>>>>> is a better design as it allows better decoupling. It's also a > better > >>>>>> design to express that "instances of this interface of mine MUST b= e > >>>>>> serializable". IMHO. > >>>>>> > >>>>>> Nicolas > >>>>>> > >>>>> I think for me the main open question here is whether we want to ju= st > >>>>> handle the serialization issue or try to come up with something mor= e > >>>>> general. If we limit this to just serialization, then the design > should > >>>>> stay as-is -- for all the reasons you have already outlined, using = an > >>>>> interface for this is inappropriate. > >>>>> > >>>>> For a more general mechanism, I think we would need something along > the > >>>>> lines of (ignoring naming): > >>>>> > >>>>> interface GetState { > >>>>> public function getState(string $purpose): array; > >>>>> } > >>>>> interface SetState { > >>>>> public function setState(array $data, string $purpose): void; > >>>>> } > >>>>> > >>>>> We need separate interfaces for get/set to properly cater to cases > like > >>>>> JSON, where only get is meaningful (right now). We need the $purpos= e > >>>>> argument to allow different state representations for different > >> purposes, > >>>>> e.g. JSON will often need more preparation than PHP serialization. > The > >>>>> $purpose argument is a string, to allow extension for custom > purposes. > >>>>> > >>>>> Seeing that, this is really not something I would feel comfortable > with > >>>>> introducing in core PHP. Our track record for introducing reusable > >>>>> general-purpose interfaces is not exactly stellar (say, did you hea= r > >> about > >>>>> SplObserver and SplSubject?) and I wouldn't want to do this without > >> seeing > >>>>> the concept fleshed out extensively in userland first. > >>>>> > >>>>> Nikita > >>>>> > >>>> I think the per-purpose representation logic doesn't belong to each > >>>> classes. > >>>> Symfony Serializer does it completly from the outside, and I think > >> that's > >>>> a much better design. > >>>> I'm on the side this should not be in PHP code indeed. > >>>> > >>> As there hasn't been further input here, I'd like to go forward with > the > >>> RFC as-is and plan to start voting by Friday. > >>> > >>> Nikita > >>> > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > --000000000000d7cbee0583699e96--