Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:102659 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 71290 invoked from network); 9 Jul 2018 09:31:16 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Jul 2018 09:31:16 -0000 Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.214.47 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.214.47 mail-it0-f47.google.com Received: from [209.85.214.47] ([209.85.214.47:52275] helo=mail-it0-f47.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 11/22-49043-F5B234B5 for ; Mon, 09 Jul 2018 05:31:14 -0400 Received: by mail-it0-f47.google.com with SMTP id p4-v6so24449726itf.2 for ; Mon, 09 Jul 2018 02:31:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=9p9jzD1N+JvDaaZYV/iCho2rbcGnxoN3RMhYFo5iSdg=; b=bVHoKo6jrl1D5knhwBJKkTDzS2hLLOyTqDz5QYEXmuf47ubhF8G5juE7T7TMMS1FEx 0lsrHGbs6BAlfUNPh3n12C2eRyzaJIheaoqTq/fWDptX9Fm9df0blV58cIOr9tdVaSf2 VX64V/qi5kRzCwrOLi7aDT/vRH3cdrG1H/8WFkFfBDkmPix8nt6kohpGENMCv4otGvQn QiBGNZ0N01+ULmr9nkMpGQ9QG/97rxWLINKHZSaHj76X7jBtuBydUHfQeDTvKAG2e+Re Y3HfxW8IGNQMwXuIeIWfxAISlWPQ+z/nFUkvJ3aAL+TDrMmKi3SAahPyvAp012XO2lcX JcIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=9p9jzD1N+JvDaaZYV/iCho2rbcGnxoN3RMhYFo5iSdg=; b=O4BgwYrWOWzVlgeQBfAgf7Pgace5bu9RtwR0EndOLLXoCyuUgXKVGZjGOvR1ST4FfZ Ck4OaT04FdHPihUJ3DwThtN/B5xGH3A7mnCyLT9PFCrZ0s9BpcBwXAUPHBog7jwHm34B U5hJCJmIP0edjgNlgSoj7n1XjBm/oHvFRN9BqIOjdhe/EkDsiRWPygu3S52nOQYwLKe0 Hi5tnjFMDR5O9yduBLJWJmCsTUdEkCBkqtNpVHvMej5ZSsFwfufhpgI896mzdNZH5n9L aeubMNiaTHOH+fAKlIGYWtXKj69rUcDJJHx+Dfril+E8QFwRAeCTqp7K59wXmwMPt6o9 B+XQ== X-Gm-Message-State: APt69E1GcXH88UVSK5kEbjz6VOxQiADoKPUmSwe6Ht3S/FIayi5840Le Ryhl4LG/fWEpyNgxOSXM3rybWRb3DuxxqgGbCtM= X-Google-Smtp-Source: AAOMgpeWEQ4YW+s5p5XUNhYpkd/BJxXLZ3PzsSJPiBpncCGs7EraVZs6uoMWUmwQF2AqBJd50sPzt9yLyzRgiBDF0A8= X-Received: by 2002:a02:59c9:: with SMTP id v70-v6mr15428463jad.94.1531128668511; Mon, 09 Jul 2018 02:31:08 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:148a:0:0:0:0:0 with HTTP; Mon, 9 Jul 2018 02:31:07 -0700 (PDT) In-Reply-To: References: Date: Mon, 9 Jul 2018 11:31:07 +0200 Message-ID: To: Nicolas Grekas Cc: PHP internals Content-Type: multipart/alternative; boundary="00000000000064608a05708dac3f" Subject: Re: [PHP-DEV] Introspection for references From: nikita.ppv@gmail.com (Nikita Popov) --00000000000064608a05708dac3f Content-Type: text/plain; charset="UTF-8" On Sun, Jul 8, 2018 at 8:36 PM, Nicolas Grekas wrote: > > Before talking about solutions, can the people who need this first outline >>>> what functionality is needed and what it is needed for (and maybe what >>>> workarounds you currently use). E.g. do you only need to know whether >>>> something is a reference, or do you need to know whether two somethings >>>> are >>>> part of the same reference, etc. There are probably multiple use cases >>>> for >>>> this with different needs. >>>> >>> >>> We're using reference introspection to do both: we need to know when a >>> zval is a reference, and we also need to track each of them separately. >>> >>> The use case is being able to intropect any arbitrary PHP datastructure, >>> with one main application: providing an enhanced "dump()" function. >>> >>> See e.g. this screenshot for what we get using the dump() function >>> provided by Symfony VarDumper component: >>> https://symfony.com/doc/current/_images/07-hard-ref.png >>> >>> In PHP5 days, Julien Pauli wrote a PHP extension to do zval >>> introspection. >>> Here is the code + README (see test case 001.phpt for example with >>> references.): >>> https://github.com/symfony/symfony/tree/3.4/src/Symfony/Comp >>> onent/Debug/Resources/ext >>> >>> With PHP7, using pure PHP introspection is easier to maintain and still >>> very fast so we deprecated the extension. >>> Here is the code doing reference introspection: >>> https://github.com/symfony/symfony/blob/master/src/Symfony/C >>> omponent/VarDumper/Cloner/VarCloner.php#L83 >>> >>> it might not be easy to follow, but the basic blocks are: >>> >>> $array2 = $array1; >>> $array2[$key] = $unique_cookie; >>> if ($array1[$key] === $unique_cookie) => we found a reference >>> then we also maintain a registry of $unique_cookie so that we know if we >>> already saw that reference or not (the check is done before the above "if" >>> or course.) >>> >> >> Thanks for the explanation. I think that the VarCloner use case needs two >> bits of functionality: >> >> 1. Detecting whether a variable is a reference, so you can handle this >> specially. >> 2. An efficient way of determining whether a variable is part of a >> reference that has already been seen (and which one). >> >> The second requirement is stronger than just the ability to detect >> whether two variables are part of the same reference. Given just a >> same_ref($v1, $v2) function, one would have to check against a list of all >> previously seen references one at a time, rather than only performing a >> hashtable lookup. >> >> Currently this functionality is implemented as: >> >> 1. Copying the array, assigning a cookie to the copy and seeing if the >> original array is modified. With an extra catch for TypeErrors, this is >> compatible with typed properties. >> 2. Replacing the reference with a Stub object, which can be looked up by >> object id. At the end the Stub objects are replaced with their values >> again. This is fundamentally incompatible with typed properties, as the >> type will likely not permit the Stub class. >> >> Here are my thoughts on possible APIs for this use case. >> >> Construction of reference-reflection objects >> ----- >> >> An issue already discussed in the other threads is that in PHP we need to >> specify whether a parameter is accepted by reference, by value or by >> preferred-reference. We don't have the possibility of accepting either a >> value or reference, whatever we get. This leaves us with a few options: >> >> 1. Introducing a VM-level primitive that is not subject to this >> limitation. The typed properties thread suggested a reflect_variable() >> language construct. I'm not too fond of this option because reference >> reflection seems like an awfully specific thing to introduce a new language >> construct for. >> >> 2. A ReflectionReference::fromVariable(&$var) constructor. Contrary to >> what was said in the other thread, this does not cause issues with the >> copy-on-write mechanism. Since PHP 7 references and non-references can >> share values (including immutablized values in SHM). However, this approach >> does have two issues: >> a) It is impossible to distinguish whether $var was a singleton reference >> or a value beforehand. Both will show up as rc=2 references inside >> ReflectionReference::fromVariable(). (This may also be an advantage, >> because from a language-design perspective, we treat singleton references >> as non-references.) >> b) In case the original $var was a variable, it will now be a reference, >> so this has a side-effect. >> >> 3. A ReflectionReference::fromArrayElem(array $array, string|int $key) >> constructor, as suggested by Nicolas. This avoids the reference/value >> problem and solves the specific VarCloner case efficiently and directly. On >> the other hand, introspection of references inside non-arrays requires some >> workarounds (e.g, casting objects to arrays). >> >> 4. A combination of these. For example we could have... >> ... ReflectionReference::fromArrayElem(array $array, string|int $key) >> for array items. >> ... ReflectionReference::fromObjectProp(object $object, string $key) for >> object properties. >> ... ReflectionReference::fromVariable(&$var) for any other special cases. >> This would allow to cover the common and interesting cases with >> specialized methods, and leave a less efficient fallback for the general >> case. This is probably the option I'd favor. >> > > Either 3. or 4. would be good to me. > IMHO 3. is enough, because : > - ReflectionReference::fromObjectProp() => there need to be a way to deal > with visibility, e.g. accessing private ones. Maybe get the > ReflectionReference from a ReflectionProperty instead? Or just the > array-cast is enought? > Good point, visibility handling is a problem here. In that case I agree that it's best to just go through the array cast here. > - ReflectionReference::fromVariable() => honestly, I don't see any use of > local scope introspection. And if there is one, getting it as an array > first is always possible, so you might prefer less complexity here (i.e not > support this constructor) > Yeah, I don't see the use case either. Especially as this constructor has the likely unwanted side-effect of turning the argument into a reference, it's best not to include it. If it turns out that these constructors are useful after all, it's always possible to add them later. Determining whether something is a reference >> ----- >> >> I think the best way to handle this (and the reason why I used named >> constructors above) is to return null if the value is not a reference. This >> should be the most common case and it would be best to avoid the overhead >> of constructing an unnecessary object in this case. >> >> One important question in this context would be whether we consider >> singleton references as references or not. If we do, then the >> ReflectionReference::fromVariable() constructor will always return a >> non-null value, as the variables will be turned into a singleton reference >> if it was a reference. If we consider them as references, we'll also want >> an API method to distinguish them. E.g. a specialized isSingleton() or more >> generally getNumUsers() == 1. >> >> The alternative would be to always construct a ReflectionReference object >> which may or may not be a reference and has an isReference() method. I >> don't see any advantages to that approach though. >> > > If we're seeking for a benefit, it would be by turning this to a > ReflectionZval insteaf of ReflectionReference. There would be isRef, but > also getType, etc. But honeslty I'm with you on this, only refs make sense. > I'm wondering if there is any legitimate use for something like this. The only thing that comes to mind is comparing arrays by identity. E.g. https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php#L153 could check directly check whether an array is the $GLOBALS array. It would also allow direct recursion detection on arrays (ref https://stackoverflow.com/questions/9105816/is-there-a-way-to-detect-circular-arrays-in-pure-php). Doing the same on just references is nearly the same, but doesn't handle the $GLOBALS case correctly. > Reference equality >> ----- >> >> A couple of approaches: >> >> 1. Have an isEqual(ReflectionReference $other): bool method, which >> determined whether two references are the same. The disadvantage is that >> this only allows pair-wise comparisons, so it does not fully solve the >> VarCloner use-case. >> >> 2. Make ReflectionReference constructor uniquing. That is, if a >> ReflectionReference for a certain reference already exists, then the >> constructor will return the same object. This means that references can be >> compared by identity $ref1 === $ref2. It also means that they can be used >> in hashtables via spl_object_id(). (Caveat: It's important to keep the >> ReflectionReference object alive for the during in which spl_object_id() is >> used, as usual.) >> >> 3. Some variation on 2 via a separate API. That is don't unique >> ReflectionReferences themselves, but provide a separate getId() API. The >> returned ID would only be meaningful as long as at least one >> ReflectionReference object for the reference is live, otherwise it may be >> reused. >> > > 2. would work, as I could use that with an SplObjectStorage (which would > satisfy your condition of keeping the object around). > > There is a 4th possibility: have a "ReflectionReference::getHash" method, > that would return a unique string/id per reference (what > symfony_zval_debug() does.) > Is this the same as my 3rd suggestion with a different name, or is there a distinction here in that it removes the requirement for the ReflectionReference object to be live? The reason I included that was to make sure we have a (non-leaking) way of assigning ids to references. Otherwise the only way to do is to use the address of the zend_reference as the hash/id. I don't like exposing memory addresses directly to userland (as this is generally a security issue), but we could return the address hashed with a per-process key. This would make the function somewhat slower though. Nikita --00000000000064608a05708dac3f--