Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:102647 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 10212 invoked from network); 8 Jul 2018 13:00:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 8 Jul 2018 13:00:15 -0000 Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.223.178 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.223.178 mail-io0-f178.google.com Received: from [209.85.223.178] ([209.85.223.178:39661] helo=mail-io0-f178.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 0E/49-55607-ECA024B5 for ; Sun, 08 Jul 2018 09:00:00 -0400 Received: by mail-io0-f178.google.com with SMTP id e13-v6so14726007iof.6 for ; Sun, 08 Jul 2018 05:59:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=C3Hxq06C56rd6bBEUPZRWudczvrREoyg0b+Go9UiYsQ=; b=H0jIJ9HntuE7B51XBSV9qPLk4GzESRJiwnAd54+QvakZPmUGxYaEk4QDqhyjwJR6+q nT29t+MxOsy0eNOAGQ3M/pZs0pOAD7luLJ7fOEDOVsCEOZHLKMZk2uuFbj5+Tg61TPnz 2imwGIritqKpviW1Bgdn6iATMt7qMUcoPUp85lAOpQ3h+4KlzNMK/dD/RntWonbcZgLV J0d88MpfEfpOHGEH0lCW48YPSecwGmhSmGihOL/MVgfrcrzAA3Lpx9JNtVGJ6+pyzFH3 e/2XCrfKewsZLBDdDj2g87j7jzKOFH5tq709qqyS+6GGqsH9rpE5xhPZN9pSBs3Zlvaw pVKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=C3Hxq06C56rd6bBEUPZRWudczvrREoyg0b+Go9UiYsQ=; b=rg5EUU12DGFXbQQurL4Pigb5bd7KQytUFnbapZYwJd7enu8fv6V7JcOn1VM5fxbbNB 8aSQ3Cblfpd8b+SsaAaokDYfrc5EHFc6dv3EPUeSXylpbxsIqGo2Sqp1b7g7XbhHfM7B NrSu2FU4pZ5FDZ8kKGXlJvslHBl/RnU9hCHNCmOiibNwUySPsgP/gBpxUq9VVy5Y8iIp jz0vK2kHP9tofj+qKHrimP8Jr/kX9e2pN1vagftiDXl2dCAlim9SxmvXgEadFW6kD2jj jxOK9+Xi9+3PES4y4EvGev6bZWKeSbYeFP/5ZsXlNRd2Df6+Wg7RKBDbGNt66R+AmpXn Aiqw== X-Gm-Message-State: APt69E3RfcFY971Sr/gvfVhaus6li4evqjbvKNHDIJMhdWoxdm+WBzvo CVHKH2Rjle//eybHOHDZIi8Nb8zndUtaB/nHjA8= X-Google-Smtp-Source: AAOMgpcaXYrkTC7pT8EMFc2I8XCTtDsltV/Dhr/gDjhj0unCrDqcTpRMBgSpNxORe2zm6+AUlyfaHHDk/5rS3Yn7b8A= X-Received: by 2002:a6b:c844:: with SMTP id y65-v6mr14081092iof.187.1531054795763; Sun, 08 Jul 2018 05:59:55 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:148a:0:0:0:0:0 with HTTP; Sun, 8 Jul 2018 05:59:55 -0700 (PDT) In-Reply-To: References: Date: Sun, 8 Jul 2018 14:59:55 +0200 Message-ID: To: Nicolas Grekas Cc: PHP internals Content-Type: multipart/alternative; boundary="0000000000003bb58005707c7935" Subject: Re: [PHP-DEV] Introspection for references From: nikita.ppv@gmail.com (Nikita Popov) --0000000000003bb58005707c7935 Content-Type: text/plain; charset="UTF-8" On Sun, Jul 8, 2018 at 10:42 AM, Nicolas Grekas < nicolas.grekas+php@gmail.com> wrote: > Hi Nikita, > > > Before talking about solutions, can the people who need this first outline >> what functionality is needed and what it is needed for (and maybe what >> workarounds you currently use). E.g. do you only need to know whether >> something is a reference, or do you need to know whether two somethings >> are >> part of the same reference, etc. There are probably multiple use cases for >> this with different needs. >> > > We're using reference introspection to do both: we need to know when a > zval is a reference, and we also need to track each of them separately. > > The use case is being able to intropect any arbitrary PHP datastructure, > with one main application: providing an enhanced "dump()" function. > > See e.g. this screenshot for what we get using the dump() function > provided by Symfony VarDumper component: > https://symfony.com/doc/current/_images/07-hard-ref.png > > In PHP5 days, Julien Pauli wrote a PHP extension to do zval introspection. > Here is the code + README (see test case 001.phpt for example with > references.): > https://github.com/symfony/symfony/tree/3.4/src/Symfony/Comp > onent/Debug/Resources/ext > > With PHP7, using pure PHP introspection is easier to maintain and still > very fast so we deprecated the extension. > Here is the code doing reference introspection: > https://github.com/symfony/symfony/blob/master/src/Symfony/C > omponent/VarDumper/Cloner/VarCloner.php#L83 > > it might not be easy to follow, but the basic blocks are: > > $array2 = $array1; > $array2[$key] = $unique_cookie; > if ($array1[$key] === $unique_cookie) => we found a reference > then we also maintain a registry of $unique_cookie so that we know if we > already saw that reference or not (the check is done before the above "if" > or course.) > Thanks for the explanation. I think that the VarCloner use case needs two bits of functionality: 1. Detecting whether a variable is a reference, so you can handle this specially. 2. An efficient way of determining whether a variable is part of a reference that has already been seen (and which one). The second requirement is stronger than just the ability to detect whether two variables are part of the same reference. Given just a same_ref($v1, $v2) function, one would have to check against a list of all previously seen references one at a time, rather than only performing a hashtable lookup. Currently this functionality is implemented as: 1. Copying the array, assigning a cookie to the copy and seeing if the original array is modified. With an extra catch for TypeErrors, this is compatible with typed properties. 2. Replacing the reference with a Stub object, which can be looked up by object id. At the end the Stub objects are replaced with their values again. This is fundamentally incompatible with typed properties, as the type will likely not permit the Stub class. Here are my thoughts on possible APIs for this use case. Construction of reference-reflection objects ----- An issue already discussed in the other threads is that in PHP we need to specify whether a parameter is accepted by reference, by value or by preferred-reference. We don't have the possibility of accepting either a value or reference, whatever we get. This leaves us with a few options: 1. Introducing a VM-level primitive that is not subject to this limitation. The typed properties thread suggested a reflect_variable() language construct. I'm not too fond of this option because reference reflection seems like an awfully specific thing to introduce a new language construct for. 2. A ReflectionReference::fromVariable(&$var) constructor. Contrary to what was said in the other thread, this does not cause issues with the copy-on-write mechanism. Since PHP 7 references and non-references can share values (including immutablized values in SHM). However, this approach does have two issues: a) It is impossible to distinguish whether $var was a singleton reference or a value beforehand. Both will show up as rc=2 references inside ReflectionReference::fromVariable(). (This may also be an advantage, because from a language-design perspective, we treat singleton references as non-references.) b) In case the original $var was a variable, it will now be a reference, so this has a side-effect. 3. A ReflectionReference::fromArrayElem(array $array, string|int $key) constructor, as suggested by Nicolas. This avoids the reference/value problem and solves the specific VarCloner case efficiently and directly. On the other hand, introspection of references inside non-arrays requires some workarounds (e.g, casting objects to arrays). 4. A combination of these. For example we could have... ... ReflectionReference::fromArrayElem(array $array, string|int $key) for array items. ... ReflectionReference::fromObjectProp(object $object, string $key) for object properties. ... ReflectionReference::fromVariable(&$var) for any other special cases. This would allow to cover the common and interesting cases with specialized methods, and leave a less efficient fallback for the general case. This is probably the option I'd favor. Determining whether something is a reference ----- I think the best way to handle this (and the reason why I used named constructors above) is to return null if the value is not a reference. This should be the most common case and it would be best to avoid the overhead of constructing an unnecessary object in this case. One important question in this context would be whether we consider singleton references as references or not. If we do, then the ReflectionReference::fromVariable() constructor will always return a non-null value, as the variables will be turned into a singleton reference if it was a reference. If we consider them as references, we'll also want an API method to distinguish them. E.g. a specialized isSingleton() or more generally getNumUsers() == 1. The alternative would be to always construct a ReflectionReference object which may or may not be a reference and has an isReference() method. I don't see any advantages to that approach though. Reference equality ----- A couple of approaches: 1. Have an isEqual(ReflectionReference $other): bool method, which determined whether two references are the same. The disadvantage is that this only allows pair-wise comparisons, so it does not fully solve the VarCloner use-case. 2. Make ReflectionReference constructor uniquing. That is, if a ReflectionReference for a certain reference already exists, then the constructor will return the same object. This means that references can be compared by identity $ref1 === $ref2. It also means that they can be used in hashtables via spl_object_id(). (Caveat: It's important to keep the ReflectionReference object alive for the during in which spl_object_id() is used, as usual.) 3. Some variation on 2 via a separate API. That is don't unique ReflectionReferences themselves, but provide a separate getId() API. The returned ID would only be meaningful as long as at least one ReflectionReference object for the reference is live, otherwise it may be reused. Actual API ----- If we go with null return value on non-reference and uniquing, then most of the functionality is already provided by the constructor. The only useful API method I can think of is something like getNumUsers(). So, my overall suggestion would be the following API: class ReflectionReference { // Constructors return null if not a reference, object is uniqued static function fromArrayElem(array $array, string|int $key): ?ReflectionReference; static function fromObjectProp(object $object, string $key): ?ReflectionReference; static function fromVariable(&$var): ReflectionReference; // Basically the reference count. Would subtract 1 for the // fromVariable() constructor, to make the values consistent. function getNumUsers(): int; } Thoughts? Nikita --0000000000003bb58005707c7935--