Hi internals,
I'd like to propose the addition of a ReflectionReference class, as
described in the following RFC:
https://wiki.php.net/rfc/reference_reflection
This topic was previously discussed at https://externals.io/message/102638.
The TL;DR is that some libraries need a way to detect references and
determine whether two references are the same. They previously used an ugly
hack to achieve this, but this hack will no longer (reliably) work due to
the introduction of typed properties, so we need to do something about this
for the 7.4 release.
Regards,
Nikita
A few issues with the RFC so far:
-
ReflectionReference
seems to be designed around arrays only: maybe
ReflectionArrayKeyReference
or such? - Can the
getId()
return type be restricted to eitherint
orstring
?
Why is it a union type right now? Technical limitation? -
ReflectionReference#__construct()
should beprivate
, since it is
unusable anyway -
fromArrayElem
=>fromArrayElement
- is this open for inheritance? If so, what scenarios would fit inheriting
fromReflectionReference
? - what happens when
ReflectionReference::fromArrayElement()
is given
invalid data, such as non-existing keys? I'd expect it to throw. - instead of
$ref1->getId() === $ref2->getId()
,$ref1->matches($ref2)
or such. - what are possible scenarios for getting the identifier as a primitive,
and then storing it somewhere (like an array of reference identifiers)? - There seems to be a lot of design around
ReflectionReference#getId()
to avoid leaking internal pointer information: can it be completely
dropped, if we have$ref1->matches($ref2)
instead?
Marco Pivetta
Hi internals,
I'd like to propose the addition of a ReflectionReference class, as
described in the following RFC:
https://wiki.php.net/rfc/reference_reflectionThis topic was previously discussed at https://externals.io/message/102638
.
The TL;DR is that some libraries need a way to detect references and
determine whether two references are the same. They previously used an ugly
hack to achieve this, but this hack will no longer (reliably) work due to
the introduction of typed properties, so we need to do something about this
for the 7.4 release.Regards,
Nikita
Thank you Nikita, I will definitely need that RFC!
Answering to Marco:
-
ReflectionReference
seems to be designed around arrays only: maybe
ReflectionArrayKeyReference
or such?
I agree with the wording in the RFC: the array is the primitive data
structure in PHP that holds references.
ALL other data structures can be turned to arrays. That includes objects
with (array)
and scopes with get_defined_vars()
.
Adding eg support for objects directly would open several questions, like:
what about magic methods?, etc.
So to me, we really need the simplest yet most powerful primite, and here
it is, anything else can be built on top.
- Can the
getId()
return type be restricted to eitherint
orstring
?
A string would be enough I suppose, but I'll let Nikita anwer :)
- what are possible scenarios for getting the identifier as a primitive,
Indexing. Eg VarCloner needs an index of already discovered references to
do its job.
- There seems to be a lot of design around
ReflectionReference#getId()
to avoid leaking internal pointer information: can it be completely
dropped, if we have$ref1->matches($ref2)
instead?
Nope, see previous answer. As such, I think matches()
would be added
complexity for core with no extra benefit/power for users.
In short, I really like the RFC as is :)
(I removed your other comments Marco because I don't have anything special
to say on them)
Cheers,
Nicolas
A few issues with the RFC so far:
ReflectionReference
seems to be designed around arrays only: maybe
ReflectionArrayKeyReference
or such?
ReflectionReference works with arbitrary references. Arrays only come in as
a detail of construction. Constructors such as fromLocalVariable() or
fromObjectProperty() could be added in the future under this design.
- Can the
getId()
return type be restricted to eitherint
or
string
? Why is it a union type right now? Technical limitation?
To allow changes in the future. The real return value here is an int, but
the size increase due to hashing makes it a string. If we find a way to
avoid that (or decided that we don't care about leaking addresses), we
could change this to a (faster) int-based API.
I'd be okay with renaming this to getHash() and restricting the return
value to string. We could add getId() returning int in the future, same as
we did for objects.
ReflectionReference#__construct()
should beprivate
, since it is
unusable anyway
Done.
fromArrayElem
=>fromArrayElement
Done.
- is this open for inheritance? If so, what scenarios would fit
inheriting fromReflectionReference
?
Marked as final.
- what happens when
ReflectionReference::fromArrayElement()
is given
invalid data, such as non-existing keys? I'd expect it to throw.
It will throw, yes. Added to the RFC:
If $array is not an array or $key is not an integer or string, a
TypeError is thrown. If $array[$key] does not exist, a ReflectionException
is thrown.
- instead of
$ref1->getId() === $ref2->getId()
,$ref1->matches($ref2)
or such.
- what are possible scenarios for getting the identifier as a primitive,
and then storing it somewhere (like an array of reference identifiers)?
- There seems to be a lot of design around
ReflectionReference#getId()
to avoid leaking internal pointer information: can it be completely
dropped, if we have$ref1->matches($ref2)
instead?
As Nicolas already wrote, the ID / Hash functionality is needed to allow
hashtable lookups. Having just matches() would require iterating over the
whole set of known references and comparison each and every one. With the
getId() API, it's just a single HT lookup.
We could add $ref1->equals($ref2) as an additional convenience method,
but it would not replace the getId() API.
Thanks for the feedback!
Nikita
- instead of
$ref1->getId() === $ref2->getId()
,$ref1->matches($ref2)
or such.
- what are possible scenarios for getting the identifier as a primitive,
and then storing it somewhere (like an array of reference identifiers)?
- There seems to be a lot of design around
ReflectionReference#getId()
to avoid leaking internal pointer information: can it be completely
dropped, if we have$ref1->matches($ref2)
instead?As Nicolas already wrote, the ID / Hash functionality is needed to allow
hashtable lookups. Having just matches() would require iterating over the
whole set of known references and comparison each and every one. With the
getId() API, it's just a single HT lookup.We could add $ref1->equals($ref2) as an additional convenience method,
but it would not replace the getId() API.
I'd say that if the indexing is a fairly common scenario (seems like it,
coming from Nicolas, who is the most direct API consumer here), the
->matches()
isn't needed at all.
Marco Pivetta
About using "int" as return type for getId():
- Can the
getId()
return type be restricted to eitherint
or
string
? Why is it a union type right now? Technical limitation?To allow changes in the future. The real return value here is an int, but
the size increase due to hashing makes it a string. If we find a way to
avoid that (or decided that we don't care about leaking addresses), we
could change this to a (faster) int-based API.
spl_object_hash()
uses a XOR to hide internal addresses - so they're
already "leaking".
we also added spl_object_id() which proved much better in term of memory
usage - not only CPU.
Mixing both arguments/approaches, what about returning a XORed int? Are we
OK with that? It'd be a welcome storage/CPU optimization from my pov.
On Tue, Jan 15, 2019 at 1:09 PM Nicolas Grekas nicolas.grekas+php@gmail.com
wrote:
About using "int" as return type for getId():
- Can the
getId()
return type be restricted to eitherint
or
string
? Why is it a union type right now? Technical limitation?To allow changes in the future. The real return value here is an int, but
the size increase due to hashing makes it a string. If we find a way to
avoid that (or decided that we don't care about leaking addresses), we
could change this to a (faster) int-based API.
spl_object_hash()
uses a XOR to hide internal addresses - so they're
already "leaking".
we also added spl_object_id() which proved much better in term of memory
usage - not only CPU.Mixing both arguments/approaches, what about returning a XORed int? Are we
OK with that? It'd be a welcome storage/CPU optimization from my pov.
A XORed int is not much better than returning the address directly -- it
will scramble the base address, but for example not the offsets between the
references (if the XOR key is the same).
If people think that this is good enough for us, then of course it's better
to go with the int for performance reasons. I've picked the sha1 here to be
conservative. How performance critical is this API?
Nikita
On Tue, Jan 15, 2019 at 1:09 PM Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:About using "int" as return type for getId():
- Can the
getId()
return type be restricted to eitherint
or
string
? Why is it a union type right now? Technical limitation?To allow changes in the future. The real return value here is an int, but
the size increase due to hashing makes it a string. If we find a way to
avoid that (or decided that we don't care about leaking addresses), we
could change this to a (faster) int-based API.
spl_object_hash()
uses a XOR to hide internal addresses - so they're
already "leaking".
we also added spl_object_id() which proved much better in term of memory
usage - not only CPU.Mixing both arguments/approaches, what about returning a XORed int? Are
we OK with that? It'd be a welcome storage/CPU optimization from my pov.A XORed int is not much better than returning the address directly -- it
will scramble the base address, but for example not the offsets between the
references (if the XOR key is the same).If people think that this is good enough for us, then of course it's
better to go with the int for performance reasons. I've picked the sha1
here to be conservative. How performance critical is this API?Nikita
I'd like to move forward with the RFC as is -- I'm uncomfortable with
switching this to a simple XOR for security reasons. I think we should
start off with the current implementation, with the option of switching to
an integer return value at a later time.
Nikita
I'd like to move forward with the RFC as is -- I'm uncomfortable with
switching this to a simple XOR for security reasons. I think we should
start off with the current implementation, with the option of switching to
an integer return value at a later time.
Works for me, thank you!
Nicolas
Hi!
I'd like to propose the addition of a ReflectionReference class, as
described in the following RFC:
https://wiki.php.net/rfc/reference_reflection
Do I understand correctly that the main use case here is to know if two
variables (treating this term expansively) point, by reference, to the
same zval/location? If so, I'd probably prefer a function that gives
answer to exactly this question.
Currently proposed API sounds a bit weird to me - why it's only
constructed from array element? It looks like a strange limitation.
Additionally, I'm not sure I understand whether there's a case for
"knowing whether array element is a reference to something" without it
being the use case above (i.e. comparing two things). I have no idea
what VarCloner does so it would be useful to have some clarity there.
The TL;DR is that some libraries need a way to detect references and
determine whether two references are the same. They previously used an ugly
Is this the same case or two different cases? Also, why do they need to
do it - what they are doing with this information afterwards?
--
Stas Malyshev
smalyshev@gmail.com
On Thu, Jan 17, 2019 at 6:48 AM Stanislav Malyshev smalyshev@gmail.com
wrote:
Hi!
I'd like to propose the addition of a ReflectionReference class, as
described in the following RFC:
https://wiki.php.net/rfc/reference_reflectionDo I understand correctly that the main use case here is to know if two
variables (treating this term expansively) point, by reference, to the
same zval/location? If so, I'd probably prefer a function that gives
answer to exactly this question.
Nearly. The main use case is to compare an arbitrary number of references
efficiently, rather than only two. As an example, consider something like
this:
$refs = [];
foreach ($array as $key => $value) {
$ref = ReflectionReference::fromArrayElement($array, $key);
if (null !== $ref) {
$refs[$ref->getId()][] = $key;
}
}
This will partition the keys into groups that all point to the same
reference. For $array = ['foo' => &$ref1, 'bar' => &$ref1, 'baz' => &$ref2,
'abc' => &$ref2] this will result in $refs = [$ref1Id => ['foo', 'bar'],
$ref2Id => ['baz', 'abc']].
This is not exactly what we want to do, but it illustrates the general type
of code that this API allows you to write.
Currently proposed API sounds a bit weird to me - why it's only
constructed from array element? It looks like a strange limitation.
This is explained in the RFC:
Construction of ReflectionReference instances faces the following
problem: A PHP function can either accept an argument by value or by
reference. This needs to be declared in the function signature, making it
impossible to distinguish whether the passed value was a reference
originally or not. To determine whether or not something is a reference,
access to the parent structure is necessary (which might be an array,
object property table or symbol table).
There is simply no other way to do this PHP short of introducing a new
language construct specifically for this purpose, which I think is overkill.
It's the same issue as debug_zval_dump()
has (which strips references) and
why XDebug has a xdebug_debug_zval() function which works on a
pseudo-variable-name syntax instead. If you want to inspect whether
something is a reference, you need access to the parent structure. The most
general parent structure in PHP is an array, because this covers all of
arrays (trivially), objects (via array cast) and even variables (via
get_defined_vars()
or $GLOBALS, depending on the kind of variable you're
interested in.)
Additionally, I'm not sure I understand whether there's a case for
"knowing whether array element is a reference to something" without it
being the use case above (i.e. comparing two things). I have no idea
what VarCloner does so it would be useful to have some clarity there.The TL;DR is that some libraries need a way to detect references and
determine whether two references are the same. They previously used an
uglyIs this the same case or two different cases? Also, why do they need to
do it - what they are doing with this information afterwards?
Nicolas can explain that better than me, but from what I understand
VarCloner converts a PHP value (of arbitrary complexity) into an object
representation that's more amenable to further processing. E.g. instead of
having PHP references, there will be a common object that's used everywhere
a certain PHP reference was used. So the first time we see a reference we
want to create a stub object for it, and then if we see a reference again
(stored in a different place) we want to replace it with the existing stub
object.
Nikita