Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:89122 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 31672 invoked from network); 9 Nov 2015 15:27:28 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Nov 2015 15:27:28 -0000 Authentication-Results: pb1.pair.com smtp.mail=steven.hilder@sevenpercent.solutions; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=steven.hilder@sevenpercent.solutions; sender-id=pass Received-SPF: pass (pb1.pair.com: domain sevenpercent.solutions designates 46.101.25.58 as permitted sender) X-PHP-List-Original-Sender: steven.hilder@sevenpercent.solutions X-Host-Fingerprint: 46.101.25.58 mail.sevenpercent.solutions Received: from [46.101.25.58] ([46.101.25.58:43771] helo=mail.sevenpercent.solutions) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 00/F1-13667-D5BB0465 for ; Mon, 09 Nov 2015 10:27:26 -0500 Received: from charlie (bzq-79-177-4-95.red.bezeqint.net [79.177.4.95]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: steven.hilder) by mail.sevenpercent.solutions (Postfix) with ESMTPSA id BC5313FC40; Mon, 9 Nov 2015 15:27:20 +0000 (UTC) Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: "PHP Internals" Cc: "Niklas Keller" , Leigh , "Rowan Collins" , "Joe Watkins" , "Dmitry Stogov" References: <563B6ED1.1030601@gmail.com> Date: Mon, 09 Nov 2015 17:27:14 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Organization: SevenPercent Solutions Ltd. Message-ID: In-Reply-To: User-Agent: Opera Mail/1.0 (MacIntel) Subject: Re: [PHP-DEV] Null bytes in anonymous class names From: steven.hilder@sevenpercent.solutions ("Steven Hilder") On Thu, 05 Nov 2015 15:14:47, Leigh wrote: > On 5 November 2015 at 14:59, Rowan Collins > wrote: >> PHP uses null bytes quite a lot to produce deliberately illegal >> identifiers. For instance the old eval-like create_function() [e.g. >> https://3v4l.org/hqHjh] and the serialization of private members [e.g. >> https://3v4l.org/R6Y6k] >> >> In this case, I guess the "@" in "class@anonymous" makes the name >> illegal >> anyway, but I'm not sold on the null byte being more unacceptable here >> than anywhere else. >> >> Regards, >> >> -- >> Rowan Collins >> [IMSoP] > > That doesn't mean it's a good approach (*cough* namespaces *cough*), and > these bits of "magic" are supposed to be hidden away from users. I'm > guessing in this particular instance, the point of the null is to make > string operations cut off after "anonymous", however string operations > that respect the zval string length aren't going to do this. > > e.g. var_dump() the class name is put through sprintf and it cuts off at > the null, but get_class or ReflectionClass::getName() just returns the > original string, and exposes the implementation details. Internal names for anonymous classes need to be unique. The current method of ensuring uniqueness ('\0' + filename + lexer pos) was written by Dmitry[1], following from Joe's original implementation[2]. As I understand it, this was supposed to be entirely hidden from userland; hence the null byte. So, I prepared a patch for `get_class()` and `ReflectionClass::getName()`, which in my view should behave the same way as `var_dump()` etc., but I've now realised that ignoring the unique suffix from the class name will break functionality that is otherwise desirable: * Aliasing an anonymous class... (see bug70106[3]) $instance = new class {}; $class_name = get_class($instance); class_alias($class_name, 'MyAlias'); * Creating a new anonymous class instance dynamically... $instance = new class {}; $class_name = get_class($instance); $new_instance = new $class_name; ...although the `get_class()` used here isn't necessary, and you could write `= new $instance;` without ever knowing $instance's class. Our options seem to be: (A) Leave everything as it is. This puts us in a situation where anonymous class names are treated inconsistently in userland; a situation that I think could lead to considerable confusion about the "real" names of these classes and how they should be used. (B) Patch `get_class()` and `ReflectionClass::getName()` (any others?) to strip the suffix (the '\0' and everything after it) from anonymous class names prior to display. This prevents userland code from distinguishing between multiple anonymous classes; breaking functionality as described above. To mitigate this, we could allow `class_alias()` to accept either a class name OR an instance in the first argument. The `new` operator already supports an instance in place of a class name. (C) Assuming that userland SHOULD have access to the internal names of anonymous classes, remove the null byte so that the full names are displayed everywhere. The drawback of this option is that class names won't be as predictable. It occurs to me that a different naming scheme could help here - wouldn't a monotonically increasing integer suffix be enough? The current scheme seems like an awful waste of memory, too. Option (B) seems like the best to me, because I can't really see a need to know the unique suffix -- provided any use-cases are catered for by an alternative (modifying `class_alias()` should be enough). Thoughts? Steve [1] http://git.php.net/?p=php-src.git;a=commitdiff;h=2a1b0b1#patch1 [2] http://git.php.net/?p=php-src.git;a=commitdiff;h=49608e0#patch12 [3] https://bugs.php.net/bug.php?id=70106