Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:102213 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 90504 invoked from network); 10 Jun 2018 10:03:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Jun 2018 10:03:14 -0000 Authentication-Results: pb1.pair.com smtp.mail=nicolas.grekas@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nicolas.grekas@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.218.68 as permitted sender) X-PHP-List-Original-Sender: nicolas.grekas@gmail.com X-Host-Fingerprint: 209.85.218.68 mail-oi0-f68.google.com Received: from [209.85.218.68] ([209.85.218.68:40290] helo=mail-oi0-f68.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DB/83-62758-F57FC1B5 for ; Sun, 10 Jun 2018 06:03:12 -0400 Received: by mail-oi0-f68.google.com with SMTP id f79-v6so15469558oib.7 for ; Sun, 10 Jun 2018 03:03:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=8SGmkYdoUc62NsD1LgiCZ0dKfS0ssgNxoAzu7PF93mA=; b=AYNEdLmoZiBaxG6w1m79P0nGzbaLYVptKcIWqW9XTdxZL7CRKJZYfHQDMlwZoTO6Lu eWstb/zUHGvM8x56O+zQLb562W+gTd6qyCind9PBRdI5YYehTNBNMM7VUwqx/twLWncD U4vfuNzfR9YMtnoEWLii0/4dI+ImuA2oPQYFwCgJjg60UheB401ZnEi356Kyb9xlm7FD 31KlM+/sSsFXvBOm0GjTVxI+K7uKG9kCAwRM6VWsUFHv7Mp3T3lbnkG1f4REfvRNbTxw 1ZBSeA7lVg5tjNC+WTgZ/dvVqULKNcuSYq9owIRTKLZeyKhrWdeiDuCnvLIT8Vdyksp6 jzPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=8SGmkYdoUc62NsD1LgiCZ0dKfS0ssgNxoAzu7PF93mA=; b=ehctaYB3egMEphVnuVanzPUWErjs55kfAjiqeF8AzEnRtJ1QrsDJPjkke4PXjoadJ8 oWQlK0o9nTAguxGgx/BoMp8/quVuA7hCu1311azSuiqsRVLf0d0I3atVgbCAVmmLOKV2 ZSERUGU5xjvFjephgsdagiOvmFEk0ucTygsPGnPp3lmKWQamquRDte1opKAEnhk6ntZH A7K+OJXuR+qsEAzCIYm+aSpN9O+wbkK+qE8G3pBLVbJ66phuCIkWXzvOSh5i0Ba6k7ns ptoKHiRicLAfRkl4/ESWTZwVAf0ETaheueQ7pulbmBUVahqwu8I/STdqaZL29Do87rbX KD9Q== X-Gm-Message-State: APt69E2rH3PXDrWLe8nNMNgY08Lm28toTsyr9/GUEPYEQMsDXlCJvMN2 Se+TmYr7uLQ7KxwY48uFkPYLwJ5AK2sZVP4f86Y= X-Google-Smtp-Source: ADUXVKJTCZoG5ng0ArVYVtizGwFRoI8B4u/Ra4PHAl56m4rahyq4RZiZvYWhBOiR9b9xqmV0qfbMiiQa98ZYMTNPqGw= X-Received: by 2002:aca:5406:: with SMTP id i6-v6mr6820597oib.115.1528624989088; Sun, 10 Jun 2018 03:03:09 -0700 (PDT) MIME-Version: 1.0 Sender: nicolas.grekas@gmail.com Received: by 2002:a4a:8b28:0:0:0:0:0 with HTTP; Sun, 10 Jun 2018 03:02:48 -0700 (PDT) In-Reply-To: References: Date: Sun, 10 Jun 2018 12:02:48 +0200 X-Google-Sender-Auth: DoUxdNzLDZQ3PAMwm90WfsjfRts Message-ID: To: Nikita Popov Cc: PHP Internals List Content-Type: multipart/alternative; boundary="000000000000783032056e46bd82" Subject: Re: [PHP-DEV] A replacement for the Serializable interface From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --000000000000783032056e46bd82 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Please allow me to up vote for this proposal. I'm working on some serialization logic these days, and `Serializable` is totally broken: it breaks internal references in serialized data structures, and breaks custom serializers (e.g. igbinary=C3=A0 from inspect= ing nested structures. The proposal here would just fix all these issues. +1000 from me for what it matters :) 2017-04-21 15:50 GMT+02:00 Nikita Popov : > On Fri, Apr 21, 2017 at 2:47 PM, Micha=C5=82 Brzuchalski < > michal.brzuchalski@gmail.com> wrote: > > > I know my voice is doesn't mean anything but IMHO interface with magic > > methods could bring more inconsistency. > > > > I know PHP is consistently inconsistent but I would prefer if it is > > posdible to fix an issue with present method naming. > > > > Cheers > > > Magic methods have a distinct backwards compatibility advantage. They all= ow > you to add __serialize/__unserialize to an existing class that currently > uses Serializable. Older PHP versions will then use the Serializable > implementation, while never versions will use __serialize/__unserialize. = An > interface makes this a lot more complicated, because you either have to > bump your PHP version requirement (unlikely), or you have to provide a sh= im > interface for older PHP versions. This shim interface would then be part = of > any library currently implementing Serializable, which seems sub-optimal = to > me. That's why I think magic methods are better for this case (though I > don't strongly care). > > Also, to answer an OTR question: We cannot simply reuse the Serializable > interface by allowing an array return value from > Serializable::unserialize(). The array return value is only a means to an > end: the important part is that the new serialization mechanism does not > share serialization state -- using arrays instead of strings is just a > convenient way to achieve this. However, Serializable::unserialize() > currently shares the state and we cannot change this without breaking BC = -- > so we cannot reuse this interface. > > Nikita > > > > > 21.04.2017 11:39 "Nikita Popov" napisa=C5=82(a): > > > >> Hi internals, > >> > >> As you are surely aware, serialization in PHP is a big mess. Said mess > is > >> caused by some fundamental issues in the serialization format, and > >> exacerbated by the existence of the Serializable interface. Fixing the > >> serialization format is likely not possible at this point, but we can > >> replace Serializable with a better alternative and I'd like to start a > >> discussion on that. > >> > >> The problem is essentially that Serializable::serialize() is expected = to > >> return a string, which is generally obtained by recursively calling > >> serialize() in the Serializable::serialize() implementation. This > >> serialize() call shares state information with the outer serialize(), = to > >> ensure that two references to the same object (or the same reference) > will > >> continue referring to a single object/reference after serialization. > >> > >> This causes two big issues: > >> > >> First, the implementation is highly order-dependent. If > >> Serializable::serialize() contains multiple calls to serialize(), then > >> calls to unserialize() have to be repeated **in the same order** in > >> Serializable::unserialize(), otherwise unserialization may fail or be > >> corrupted. In particular this means that using parent::serialize() and > >> parent::unserialize() is unsafe. (See also > >> https://bugs.php.net/bug.php?id=3D66052 and linked bugs.) > >> > >> Second, the existence of Serializable introduces security issues that = we > >> cannot fix. Allowing the execution of PHP code during unserialization = is > >> unsafe, and even innocuous looking code is easily exploited. We have > >> recently mitigated __wakeup() based attacks by delaying __wakeup() cal= ls > >> until the end of the unserialization. We cannot do the same for > >> Serializable::unserialize() calls, as their design strictly requires t= he > >> unserialization context to still be active during the call. Similarly, > >> Serializable prevents an up-front validation pass of the serialized > >> string, > >> as the format used for Serializable objects is user-defined. > >> > >> The delayed __wakeup() mitigation mentioned in the previous point also > >> interacts badly with Serializable, because we have to delay __wakeup() > >> calls to the end of the unserialization, which in particular also > implies > >> that Serializable::unserialize() sees objects prior to wakeup. (See al= so > >> https://bugs.php.net/bug.php?id=3D74436.) > >> > >> In the end, everything comes down to the fact that Serializable requir= es > >> nested serialization calls with context sharing. > >> > >> The alternative mechanism (__sleep + __wakeup) does not have these > issues > >> (anymore), but it is not sufficiently flexible for general use: Notabl= y, > >> __sleep() allows you to limit which properties are serialized, but the > >> properties still have to actually exist on the object. > >> > >> I'd like to propose the addition of a new mechanism which essentially > >> works > >> the same way as Serializable, but uses arrays instead of strings and > does > >> not share context. I'm not sure about the naming (RealSerializable, > >> anyone?), so I'll just go with magic methods __serialize() and > >> __unserialize() for now: > >> > >> public function __serialize() : array; > >> public function __unserialize(array $data) : void; > >> > >> From a userland perspective the implementation should be the same as f= or > >> Serializable methods, but with interior serialize()/unserialize() call= s > >> stripped out. Right now Serializable implementations already usually > work > >> by doing something like "return serialize([ ... ])", this would change > it > >> to just "return [ ... ]" and move the serialize()/unserialize() call > into > >> the engine, where we can perform it safely and robustly. > >> > >> The new methods should reuse the "O" serialization format, rather than > >> introducing a new one. This allows a measure of interoperability with > >> previous PHP versions, which can still decode serialized strings from > >> newer > >> versions using __wakeup(). > >> > >> If an object has both __wakeup() and __unserialize(), then > __unserialize() > >> should be called. If an object implements both > Serializable::unserialize() > >> and __unserialize(), then we should invoke one or the other based on > >> whether "C" or "O" serialization is used. > >> > >> Thoughts? > >> > >> Nikita > >> > > > --000000000000783032056e46bd82--