Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:98834 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 40630 invoked from network); 21 Apr 2017 09:38:54 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Apr 2017 09:38:54 -0000 Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.223.171 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.223.171 mail-io0-f171.google.com Received: from [209.85.223.171] ([209.85.223.171:33532] helo=mail-io0-f171.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C7/D9-61625-A23D9F85 for ; Fri, 21 Apr 2017 05:38:52 -0400 Received: by mail-io0-f171.google.com with SMTP id k87so112159109ioi.0 for ; Fri, 21 Apr 2017 02:38:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=rvuoxlD6AZDYNm6FENLdjC7r0J1gnV7KAqIyKOHiZBA=; b=OtEmRtQD+P2YS19+tFvLi4vlxi9uC3pReUgmS0lmkv0deyQkxnvUvYph9+L+2AoDB3 U4ZURM8j+wnDwINk9WKorf99GUMDDZeWicoWBWOJLhRtgunppK4eWOLiaNTZFnL6FyHt quLcZ6sSWRBacg6P42Gxy9wp9ZILdEnKiKGbZX4eiNzLGe+l2p4YtdZLmbfpJbcMd2iG JdTZSe59LVwWrQzMUQ9w+LPQsH06FOaq2/dHhxqgJV7/9qh39u8ZUSRBhl857NcNIfHZ vDhuPBXRE/Bftgj4UNG4Qgm9xqMaDEKsra8N4hahGB82f1mr2sUyyWexhAzHOkQn2HXb Zw+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=rvuoxlD6AZDYNm6FENLdjC7r0J1gnV7KAqIyKOHiZBA=; b=CiMyaYeJrXEPfSkWCjV/2umDVrjAhKKvHhDF32rzH10vk+OaZrAO0MNApDGZBW25tI +17/nKM7Vldb9X33pfx6/pl5sZoPdoTDVRzFvvlI9d+mZjqKTKbSshyS3BByQRCBKd8a 1wj3E2Inb5/mMpr4bEaPXVr2QSLbVIIXxrS2WkKP/FyK0QLN91Wl2C2P4arRLx57ULk0 swT25DeRGVWg/GGwsoMjSXGa3Bizmh1H0pcyQR9Iydu/XXcr6HtAsnkKNkDjfMubTAUS OUjDgfwiwY7FbjmNjy2kFlkQXIL2j1g94SdXStInQRQdbYvQlpOOKI3mazF+74JuyN6H VXRA== X-Gm-Message-State: AN3rC/6msY0mENROSY/E/ql+p2yDCpkhBs4V2X+RG5uV/RFqp6xpokkK /V7o7ROEHJtNPRJaDQdj+4h6f1utCXS6 X-Received: by 10.107.30.20 with SMTP id e20mr13108732ioe.158.1492767527753; Fri, 21 Apr 2017 02:38:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.9.144 with HTTP; Fri, 21 Apr 2017 02:38:47 -0700 (PDT) Date: Fri, 21 Apr 2017 11:38:47 +0200 Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary=001a1141c2da395ce1054daa06ad Subject: A replacement for the Serializable interface From: nikita.ppv@gmail.com (Nikita Popov) --001a1141c2da395ce1054daa06ad Content-Type: text/plain; charset=UTF-8 Hi internals, As you are surely aware, serialization in PHP is a big mess. Said mess is caused by some fundamental issues in the serialization format, and exacerbated by the existence of the Serializable interface. Fixing the serialization format is likely not possible at this point, but we can replace Serializable with a better alternative and I'd like to start a discussion on that. The problem is essentially that Serializable::serialize() is expected to return a string, which is generally obtained by recursively calling serialize() in the Serializable::serialize() implementation. This serialize() call shares state information with the outer serialize(), to ensure that two references to the same object (or the same reference) will continue referring to a single object/reference after serialization. This causes two big issues: First, the implementation is highly order-dependent. If Serializable::serialize() contains multiple calls to serialize(), then calls to unserialize() have to be repeated **in the same order** in Serializable::unserialize(), otherwise unserialization may fail or be corrupted. In particular this means that using parent::serialize() and parent::unserialize() is unsafe. (See also https://bugs.php.net/bug.php?id=66052 and linked bugs.) Second, the existence of Serializable introduces security issues that we cannot fix. Allowing the execution of PHP code during unserialization is unsafe, and even innocuous looking code is easily exploited. We have recently mitigated __wakeup() based attacks by delaying __wakeup() calls until the end of the unserialization. We cannot do the same for Serializable::unserialize() calls, as their design strictly requires the unserialization context to still be active during the call. Similarly, Serializable prevents an up-front validation pass of the serialized string, as the format used for Serializable objects is user-defined. The delayed __wakeup() mitigation mentioned in the previous point also interacts badly with Serializable, because we have to delay __wakeup() calls to the end of the unserialization, which in particular also implies that Serializable::unserialize() sees objects prior to wakeup. (See also https://bugs.php.net/bug.php?id=74436.) In the end, everything comes down to the fact that Serializable requires nested serialization calls with context sharing. The alternative mechanism (__sleep + __wakeup) does not have these issues (anymore), but it is not sufficiently flexible for general use: Notably, __sleep() allows you to limit which properties are serialized, but the properties still have to actually exist on the object. I'd like to propose the addition of a new mechanism which essentially works the same way as Serializable, but uses arrays instead of strings and does not share context. I'm not sure about the naming (RealSerializable, anyone?), so I'll just go with magic methods __serialize() and __unserialize() for now: public function __serialize() : array; public function __unserialize(array $data) : void; From a userland perspective the implementation should be the same as for Serializable methods, but with interior serialize()/unserialize() calls stripped out. Right now Serializable implementations already usually work by doing something like "return serialize([ ... ])", this would change it to just "return [ ... ]" and move the serialize()/unserialize() call into the engine, where we can perform it safely and robustly. The new methods should reuse the "O" serialization format, rather than introducing a new one. This allows a measure of interoperability with previous PHP versions, which can still decode serialized strings from newer versions using __wakeup(). If an object has both __wakeup() and __unserialize(), then __unserialize() should be called. If an object implements both Serializable::unserialize() and __unserialize(), then we should invoke one or the other based on whether "C" or "O" serialization is used. Thoughts? Nikita --001a1141c2da395ce1054daa06ad--