Exposing object handles to userland

9 years ago by Anatol Belski — view source

unread

Hi,

-----Original Message-----
From: julienpauli@gmail.com [mailto:julienpauli@gmail.com] On Behalf Of
Julien Pauli
Sent: Friday, July 31, 2015 4:24 PM
To: PHP Internals internals@lists.php.net
Subject: [PHP-DEV] Exposing object handles to userland

Hi people.

I've been pinged many times to add a new spl_object_id() function to PHP, that
would return the internal object handle of an object.

Today, spl_object_hash() partially allows that, but adds many randomness to the
result, which is not very cool to use later (why does it even add randomness ?).

There has been topics about this subject.
For example, at http://marc.info/?l=php-internals&m=141814350920452&w=2

Beeing able to get the object handle back in PHP userland would ease many
tools, mainly debug-oriented tools.
I know PHPUnit, Symfony and many big projects today make use of
spl_object_hash() to identify objects.

I also know people that print_r($an_object) and parse the output just to extract
the object handle from there... Crazy isn't it ?
Why couldn't we help those people by simply adding a new function that does
the job ?

Regarding the ID keywords, that reminds on https://docs.python.org/2/library/functions.html#id . While not everything is object in PHP, fe it could be interesting at least for the strings since we have interned strings. Maybe there were some wider range of useful information to provide in user land, not only related to objects.

Regards

Anatol

9 years ago by Nicolas Grekas — view source

unread

I also know people that print_r($an_object) and parse the output just to
extract the object handle from there... Crazy isn't it ?

I plead guilty for doing this, but php let me no better choice for now ;)

The attached patch removes the XOR hashing for the object handle (it's
useless, the "secret" is trivially guessed after parsing the output of
var_dump).

It would be awesome if this patch could be applied for php 7.0!

Cheers,
Nicolas
<http://marc.info/?l=php-internals&m=141811755908008&w=2

9 years ago by Anatol Belski — view source

unread

Hi Nicolas,

Have you checked the impact of changing the existing function? Like

https://github.com/sebastianbergmann/phpunit/search?utf8=%E2%9C%93 https://github.com/sebastianbergmann/phpunit/search?utf8=%E2%9C%93&q=spl_object_hash &q=spl_object_hash

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

https://github.com/horde/horde/blob/master/imp/lib/Factory/MimeViewer.php

https://github.com/horde/horde/blob/master/framework/Support/lib/Horde/Support/Randomid.php

https://github.com/WordPress/WordPress/blob/master/wp-includes/plugin.php

Obviously there will be more. Seems some stuff even depends on the exact output. So it would be most likely a BC break.

Now, I’d see this more like a feature request, as Julien was mentioning another function at the start. But IMHO opinions from people specializing on test/debug user land tools would give a better sight, so CC’ing.

Regards

Anatol

From: nicolas.grekas@gmail.com [mailto:nicolas.grekas@gmail.com] On Behalf Of Nicolas Grekas
Sent: Friday, July 31, 2015 4:53 PM
To: Julien Pauli jpauli@php.net
Cc: PHP Internals internals@lists.php.net
Subject: Re: [PHP-DEV] Exposing object handles to userland

I also know people that print_r($an_object) and parse the output just to
extract the object handle from there... Crazy isn't it ?

I plead guilty for doing this, but php let me no better choice for now ;)

The attached patch removes the XOR hashing for the object handle (it's useless, the "secret" is trivially guessed after parsing the output of var_dump).

It would be awesome if this patch could be applied for php 7.0!

Cheers,

Nicolas

9 years ago by Nicolas Grekas — view source

unread

Have you checked the impact of changing the existing function?

Yes I did, and this breaks absolutely nothing: the spl_object_hash output
has exactly the same format (otherwise it's a bug).

https://github.com/sebastianbergmann/phpunit/search?utf8=%E2%9C%93&q=spl_object_hash

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

That's my code and the very place where I'd use this feature first :)

9 years ago by Anatol Belski — view source

unread

-----Original Message-----
From: nicolas.grekas@gmail.com [mailto:nicolas.grekas@gmail.com] On Behalf
Of Nicolas Grekas
Sent: Friday, July 31, 2015 6:11 PM
To: Anatol Belski anatol.php@belski.net
Cc: Julien Pauli jpauli@php.net; PHP Internals internals@lists.php.net;
Sebastian Bergmann sebastian@php.net; Ivan Enderlin@Hoa
ivan.enderlin@hoa-project.net; contact@jubianchi.fr
Subject: Re: [PHP-DEV] Exposing object handles to userland

Have you checked the impact of changing the existing function?

Yes I did, and this breaks absolutely nothing: the spl_object_hash output has
exactly the same format (otherwise it's a bug).

<https://github.com/sebastianbergmann/phpunit/search?utf8=%E2%9C%93&q=

spl_object_hash>

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/V

arDumper/Cloner/VarCloner.php

That's my code and the very place where I'd use this feature first :)
Yeah, saw your name in the commits :)

Anthony's argument about exposing the mem layout is crucial, though.

Regards

anatol

9 years ago by Nicolas Grekas — view source

unread

Anthony's argument about exposing the mem layout is crucial, though.

Yes it is!

The patch I attached un-xors only the part for the object's handle.
The memory pointer is kept xored.

9 years ago by Anthony Ferrara — view source

unread

Nicolas,

On Fri, Jul 31, 2015 at 2:24 PM, Nicolas Grekas
nicolas.grekas+php@gmail.com wrote:

Anthony's argument about exposing the mem layout is crucial, though.

Yes it is!

The patch I attached un-xors only the part for the object's handle.
The memory pointer is kept xored.

Just checked the patch, perfect :-)

You have a +1 from me, for what it's worth.

Anthony

9 years ago by Stanislav Malyshev — view source

unread

Hi!

Some suspicious use of spl_object_hash() out there...

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

Not sure what this one does... but calculations with spl_object_hash()
look very suspicious.

https://github.com/horde/horde/blob/master/imp/lib/Factory/MimeViewer.php

This one might be doing it right, but not sure as basing caching
instances on hash of a (mutable) object may produce weird results.

https://github.com/horde/horde/blob/master/framework/Support/lib/Horde/Support/Randomid.php

Oh wow, what's going on there? That's obviously not a proper use of
spl_object_hash().

https://github.com/WordPress/WordPress/blob/master/wp-includes/plugin.php

I guess this one is wrong too, as it mentions storage, and storing
object ID is pointless. Maybe I am misunderstanding what "storage" means
there.

--
Stas Malyshev
smalyshev@gmail.com

9 years ago by Bob Weinand — view source

unread

As I describe below, I agree with Nikita on spl_object_id().

Am 02.08.2015 um 08:52 schrieb Stanislav Malyshev smalyshev@gmail.com:

Hi!

Some suspicious use of spl_object_hash() out there...

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

Not sure what this one does... but calculations with spl_object_hash()
look very suspicious.

Actually, it's doing the right thing… calculating the value the object id is xor'ed with (as we know that consecutively defined objects have consecutive ids).
It's relying on the implementation of spl_object_hash() and will even continue to work when we remove that part of randomness as that value it's xor'ed with is then nothing else than 0.

That's why we should expose spl_object_id()… so that such hacks are unnecessary.

https://github.com/horde/horde/blob/master/imp/lib/Factory/MimeViewer.php

This one might be doing it right, but not sure as basing caching
instances on hash of a (mutable) object may produce weird results.

It works and I'm using it that way too… that's currently what spl_object_hash is good for. The only issue is when it overflows after 2^32 objects. Though it isn't an issue on 64 bit systems…

https://github.com/horde/horde/blob/master/framework/Support/lib/Horde/Support/Randomid.php

Oh wow, what's going on there? That's obviously not a proper use of
spl_object_hash().

Yup, that one is misuse as source of entropy… while it doesn't really provide much more entropy as it's internally anyway xor'ed with mt_rand().

https://github.com/WordPress/WordPress/blob/master/wp-includes/plugin.php

I guess this one is wrong too, as it mentions storage, and storing
object ID is pointless. Maybe I am misunderstanding what "storage" means
there.

The storage is only used in case spl_object_hash() does not exist for trying to get an unique identifier for that object (Wordpress has 5.2 compatibility, yeah…).

The object id / hash is basically just an identifier for an object in PHP, just like we sometimes put pointers in e.g. a binary tree to easily map two pointers together in C.

--
Stas Malyshev
smalyshev@gmail.com

Bob

9 years ago by Rowan Collins — view source

unread

Some suspicious use of spl_object_hash() out there...

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

Not sure what this one does... but calculations with spl_object_hash()
look very suspicious.
Actually, it's doing the right thing… calculating the value the object id is xor'ed with (as we know that consecutively defined objects have consecutive ids).
It's relying on the implementation of spl_object_hash() and will even continue to work when we remove that part of randomness as that value it's xor'ed with is then nothing else than 0.

The right thing for what purpose? Why does it need that ID, rather than
the value that spl_object_hash() gave in the first place? Just to be
prettier to the user?

Regards,

--
Rowan Collins
[IMSoP]

9 years ago by Nicolas Grekas — view source

unread

2015-08-02 20:03 GMT+02:00 Rowan Collins rowan.collins@gmail.com:

Some suspicious use of spl_object_hash() out there...

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

Not sure what this one does... but calculations with spl_object_hash()
look very suspicious.

Actually, it's doing the right thing… calculating the value the object id
is xor'ed with (as we know that consecutively defined objects have
consecutive ids).
It's relying on the implementation of spl_object_hash() and will even
continue to work when we remove that part of randomness as that value it's
xor'ed with is then nothing else than 0.

The right thing for what purpose? Why does it need that ID, rather than
the value that spl_object_hash() gave in the first place? Just to be
prettier to the user?

For the purpose of providing an id that humans can read and compare, to
easily spot if two objects are identical or not. Try comparing two
spl_object_hashes and you'll understand the need...

The id as displayed by e.g. var_dump is not an implementation detail that
leaks through it.
It's really an important feature of the output.

I learned that while implementing VarDumper (see link above), by user
requests.

Regards,
Nicolas

9 years ago by Rowan Collins — view source

unread

2015-08-02 20:03 GMT+02:00 Rowan Collins rowan.collins@gmail.com:

Some suspicious use of spl_object_hash() out there...

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

Not sure what this one does... but calculations with
spl_object_hash()
look very suspicious.

Actually, it's doing the right thing… calculating the value the
object id
is xor'ed with (as we know that consecutively defined objects have
consecutive ids).
It's relying on the implementation of spl_object_hash() and will
even
continue to work when we remove that part of randomness as that
value it's
xor'ed with is then nothing else than 0.

The right thing for what purpose? Why does it need that ID, rather
than
the value that spl_object_hash() gave in the first place? Just to be
prettier to the user?

For the purpose of providing an id that humans can read and compare,

So, yes, just to make it more human friendly. In which case, the actual value doesn't matter, and rather than reverse-engineering the hash, you could just make up your own ID:

function my_object_id($obj) {
static $hash_to_id=[];
$hash = spl_object_hash($obj);
if ( ! array_key_exists($hash, $hash_to_id) ) {
$hash_to_id[$hash] = count( $hash_to_id);
}
return $hash_to_id[$hash];
}

(Untested, but pretty much all you need.)

Regards,

Rowan Collins
[IMSoP]

9 years ago by Nicolas Grekas — view source

unread

So, yes, just to make it more human friendly. In which case, the actual
value doesn't matter, and rather than reverse-engineering the hash, you
could just make up your own ID:

function my_object_id($obj) {
static $hash_to_id=[];
$hash = spl_object_hash($obj);
if ( ! array_key_exists($hash, $hash_to_id) ) {
$hash_to_id[$hash] = count( $hash_to_id);
}
return $hash_to_id[$hash];
}

Yep, that would work, but with the following drawbacks:

it would create a memory leak/overhead
more importantly for users, these ids couldn't be compared to var_dumps'.

9 years ago by Rowan Collins — view source

unread

So, yes, just to make it more human friendly. In which case, the
actual
value doesn't matter, and rather than reverse-engineering the hash,
you
could just make up your own ID:

function my_object_id($obj) {
static $hash_to_id=[];
$hash = spl_object_hash($obj);
if ( ! array_key_exists($hash, $hash_to_id) ) {
$hash_to_id[$hash] = count( $hash_to_id);
}
return $hash_to_id[$hash];
}

Yep, that would work, but with the following drawbacks:

it would create a memory leak/overhead

If you're dumping so many objects that the array would grow to noticeable size, you're likely to have bigger problems actually using the output. There's probably more data in an optimised autoloader classmap than most people will generate with a function like this.

more importantly for users, these ids couldn't be compared to
var_dumps'.

Why do you need to compare the output of two dump functions? Just add a note to the documentation that these are not the same IDs.

Regards,

Rowan Collins
[IMSoP]

9 years ago by Nicolas Grekas — view source

unread

more importantly for users, these ids couldn't be compared to
var_dumps'.

Why do you need to compare the output of two dump functions? Just add a
note to the documentation that these are not the same IDs.

That's not a strong feature, but still, it makes the tool a little bit more
comfortable to use.

Oh, and an other benefit of the handle is that it gives a rough estimation
of the number of objects created and their instantiation order. When
debugging, any hints count.

For VarDumper, I don't need spl_objet_id(). As we spotted, the code is
already able to get the number it needs.

Having spl_object_id() is still a feature that is not exposed to userland,
but that is useful as I tried to demonstrate.

9 years ago by Rowan Collins — view source

unread

Oh, and an other benefit of the handle is that it gives a rough
estimation
of the number of objects created and their instantiation order. When
debugging, any hints count.

That is exactly the detail that people don't want to expose, or promise to deliver in future versions. It's the same debate that is often had regarding automatic database IDs - you shouldn't rely on them having any particular meaning. People still do, of course...

Regards,

Rowan Collins
[IMSoP]

9 years ago by Anatol Belski — view source

unread

Hi Nicolas,

-----Original Message-----
From: nicolas.grekas@gmail.com [mailto:nicolas.grekas@gmail.com] On Behalf
Of Nicolas Grekas
Sent: Monday, August 3, 2015 1:29 PM
To: Rowan Collins rowan.collins@gmail.com
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Exposing object handles to userland

more importantly for users, these ids couldn't be compared to
var_dumps'.

Why do you need to compare the output of two dump functions? Just add
a note to the documentation that these are not the same IDs.

That's not a strong feature, but still, it makes the tool a little bit more
comfortable to use.

Oh, and an other benefit of the handle is that it gives a rough estimation of the
number of objects created and their instantiation order. When debugging, any
hints count.

For VarDumper, I don't need spl_objet_id(). As we spotted, the code is already
able to get the number it needs.

Having spl_object_id() is still a feature that is not exposed to userland, but that is
useful as I tried to demonstrate.

I'd be on the side of creating new function, too, if it's really needed. And not touching the old one. As mentinod before, maybe some other useful cases could be addressed like immutable strings.

But with the object id - still, how are the issues mentioned going to be handled?

handle gets reused within the same script, even in same request. There's no way to realize an object was garbage collected and completely another one has the same id.
what is done in the case the internal implementation was changed, it'll be probably hard to emulate but some apps will already depend on it.

Maybe just incrementing a static variable were simple enough :) But IMHO these points need to be addressed prior the implementation. Maybe it's even not reliable enough to rely on the internal implementation.

Regards

Anatol

9 years ago by Anthony Ferrara — view source

unread

Anatol,

Hi Nicolas,

-----Original Message-----
From: nicolas.grekas@gmail.com [mailto:nicolas.grekas@gmail.com] On Behalf
Of Nicolas Grekas
Sent: Monday, August 3, 2015 1:29 PM
To: Rowan Collins rowan.collins@gmail.com
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Exposing object handles to userland

more importantly for users, these ids couldn't be compared to
var_dumps'.

Why do you need to compare the output of two dump functions? Just add
a note to the documentation that these are not the same IDs.

That's not a strong feature, but still, it makes the tool a little bit more
comfortable to use.

Oh, and an other benefit of the handle is that it gives a rough estimation of the
number of objects created and their instantiation order. When debugging, any
hints count.

For VarDumper, I don't need spl_objet_id(). As we spotted, the code is already
able to get the number it needs.

Having spl_object_id() is still a feature that is not exposed to userland, but that is
useful as I tried to demonstrate.

I'd be on the side of creating new function, too, if it's really needed. And not touching the old one. As mentinod before, maybe some other useful cases could be addressed like immutable strings.

But with the object id - still, how are the issues mentioned going to be handled?

handle gets reused within the same script, even in same request. There's no way to realize an object was garbage collected and completely another one has the same id.\

Well, one way that's handled is by storing a reference to the object.
That way you know it's not collected.

Basically, if you need to store something associated with an object's
id, also store a reference to the object.

This is how it's being done today when people use spl_object_hash()
and associated objects (like SplObjectStorage).

what is done in the case the internal implementation was changed, it'll be probably hard to emulate but some apps will already depend on it.

As long as the number returned is unique, it should be pretty straight
forward. The contract isn't that a specific integer is returned, it's
that it's a unique integer. So if object allocation changes the order
that ids are issued, that shouldn't matter.

And in practice we've seen that with a few of the changes in 7. A few
internal tests may fail because they depend on the detail, but that's
because they are bad tests, not the functionality changed.

Maybe just incrementing a static variable were simple enough :) But IMHO these points need to be addressed prior the implementation. Maybe it's even not reliable enough to rely on the internal implementation.

The problem here is that PHP is used by long-running processes as
well. Imagine a server that runs for weeks. It could over the course
of that time allocate trillions of objects. But since the handle is 32
bit (a signed int actually, which sounds wrong:
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_objects_API.h#43 ), you
can only have 2 billion objects allocated at one point in time.

Even moving to a 64 bit counter, that's better but is it really
necessary? You'd need to store the counter value for every single
object. Meaning that you add 8 bytes of memory to each object in the
system. Would it be a better experience? Perhaps. Is it necessary? Not
sure...

I support leaving spl_object_hash() as is, and adding a
spl_object_id() which is documented similarly and returns an integer
for each object.

My $0.02 at least

Anthony

9 years ago by Anatol Belski — view source

unread

Hi Anthony,

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, August 3, 2015 5:39 PM
To: Anatol Belski anatol.php@belski.net
Cc: Nicolas Grekas nicolas.grekas+php@gmail.com; Rowan Collins
rowan.collins@gmail.com; internals@lists.php.net
Subject: Re: [PHP-DEV] Exposing object handles to userland

handle gets reused within the same script, even in same request.
There's no way to realize an object was garbage collected and
completely another one has the same id.\

Well, one way that's handled is by storing a reference to the object.
That way you know it's not collected.

Basically, if you need to store something associated with an object's id, also
store a reference to the object.

This is how it's being done today when people use spl_object_hash() and
associated objects (like SplObjectStorage).

Do you mean the id function would ref++? Or is it done in user space? IMHO the latter would be quite a subtle way. But SplObjectStorage should do it automatically, otherwise some entries would become invalid.

what is done in the case the internal implementation was changed, it'll be
probably hard to emulate but some apps will already depend on it.

As long as the number returned is unique, it should be pretty straight forward.
The contract isn't that a specific integer is returned, it's that it's a unique integer.
So if object allocation changes the order that ids are issued, that shouldn't
matter.

And in practice we've seen that with a few of the changes in 7. A few internal
tests may fail because they depend on the detail, but that's because they are bad
tests, not the functionality changed.

I might be not getting it still, just if an id changes over time, isn't it not unique? A distinct relation object <=> id. Figurative - unique were a revision SHA in git, across all the branches. Concrete - in PHP it doesn't need to be over all requests, but within the same request? Maybe it needs a scope definition of unique. Coupling with disabling GC it would be of course.

Maybe just incrementing a static variable were simple enough :) But IMHO
these points need to be addressed prior the implementation. Maybe it's even not
reliable enough to rely on the internal implementation.

The problem here is that PHP is used by long-running processes as well. Imagine
a server that runs for weeks. It could over the course of that time allocate
trillions of objects. But since the handle is 32 bit (a signed int actually, which
sounds wrong:
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_objects_API.h#43 ), you can
only have 2 billion objects allocated at one point in time.

Even moving to a 64 bit counter, that's better but is it really necessary? You'd
need to store the counter value for every single object. Meaning that you add 8
bytes of memory to each object in the system. Would it be a better experience?
Perhaps. Is it necessary? Not sure...

Yep, a global counter isn't a solution, just my unsuccessful joke. But the object handle seems to be unsigned http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_types.h#_zend_object , also seeing how buckets are accessed. But also noticeable that the resource handle is signed, nevermind. Signed is probably more sensible when exporting directly in 32-bit build.

I support leaving spl_object_hash() as is, and adding a
spl_object_id() which is documented similarly and returns an integer for each
object.

Yes, spl_object_id() were a twin of spl_object_hash(), by the information known today.

Regards

Anatol

9 years ago by Alexander Lisachenko — view source

unread

Hello, internals!

I like the idea to assign a custom identifier to the object, but why this
should be only in the core? Java has a method 'hashCode' which can be used
to return an unique identifier for specific object. So, my suggestion is to
add 'Hashable' interface for that into the PHP.

Then, each developer can use a default implementation which will return
spl_object_id/spl_object_hash/whatever or implement this interface to
control the unique identifier of current object:

class Test implements Hashable
{
private static $objectCounter = 100500;
private $identifier = null;
public function __construct()
{
$this->identifier = self::$objectCounter++;
}

public function hashCode()
{
    return __CLASS__ . ':'. $this->identifier;
}

}

echo spl_object_hash(new Test); // Will return 'Test:100500'

This solution is more flexible and can give a freedom for implementations.
What do you think about this random thought?

Thanks.

2015-08-03 10:20 GMT+03:00 Nicolas Grekas nicolas.grekas+php@gmail.com:

2015-08-02 20:03 GMT+02:00 Rowan Collins rowan.collins@gmail.com:

Some suspicious use of spl_object_hash() out there...

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/VarDumper/Cloner/VarCloner.php

Not sure what this one does... but calculations with spl_object_hash()
look very suspicious.

Actually, it's doing the right thing… calculating the value the object
id
is xor'ed with (as we know that consecutively defined objects have
consecutive ids).
It's relying on the implementation of spl_object_hash() and will even
continue to work when we remove that part of randomness as that value
it's
xor'ed with is then nothing else than 0.

The right thing for what purpose? Why does it need that ID, rather than
the value that spl_object_hash() gave in the first place? Just to be
prettier to the user?

For the purpose of providing an id that humans can read and compare, to
easily spot if two objects are identical or not. Try comparing two
spl_object_hashes and you'll understand the need...

The id as displayed by e.g. var_dump is not an implementation detail that
leaks through it.
It's really an important feature of the output.

I learned that while implementing VarDumper (see link above), by user
requests.

Regards,
Nicolas

9 years ago by Etienne Kneuss — view source

unread

On Mon, Aug 3, 2015 at 2:26 PM Alexander Lisachenko lisachenko.it@gmail.com
wrote:

Hello, internals!

I like the idea to assign a custom identifier to the object, but why this
should be only in the core? Java has a method 'hashCode' which can be used
to return an unique identifier for specific object.

Java's hashcode is not unique,it is not meant to be: two live objects can
have the same hash.
Having a hashCode method that returns 0 for all objects is a valid (albeit
stupid) implementation.

As for PHP, It seems like what people actually want is a unique identifier
for objects (for the lifetime of the request), but they compromise and use
the handle instead as it is available without userland counter.

Best,

9 years ago by Alexander Lisachenko — view source

unread

2015-08-03 16:10 GMT+03:00 Etienne Kneuss colder@php.net:

As for PHP, It seems like what people actually want is a unique identifier
for objects (for the lifetime of the request), but they compromise and use
the handle instead as it is available without userland counter.

Probable, there is no actual need for that (to have a unique identifier for
each object). Reasons are following: unique identifier is not reliable
(depends on the object creation sequence, can be reused later), not
predictable (it's not possible to say that one object is the same with
another one, which was present in a previous request).

Only the case I can see is to store multiple objects in the one list for
several reasons (e.g. mediators, etc). But it isn't a task for the core to
generate a unique identifer for each object, it's a task for developer to
implement this, otherwise it's still unclear, how to compare different
objects (for example, during comparison of unserialized objects and an
existing one, which can have a different internal object ID).

Implementation of Hashable interface can give an answer to that: two
objects can be considered equal if they have a same hash code, for example,
sha1(json_encode(get_object_vars($this)));

9 years ago by Nikita Popov — view source

unread

On Fri, Jul 31, 2015 at 4:53 PM, Nicolas Grekas <
nicolas.grekas+php@gmail.com> wrote:

I also know people that print_r($an_object) and parse the output just to

extract the object handle from there... Crazy isn't it ?

I plead guilty for doing this, but php let me no better choice for now ;)

The attached patch removes the XOR hashing for the object handle (it's
useless, the "secret" is trivially guessed after parsing the output of
var_dump).

It would be awesome if this patch could be applied for php 7.0!

Cheers,
Nicolas
http://marc.info/?l=php-internals&m=141811755908008&w=2

--

I'd prefer to add a separate function spl_object_id, which directly returns
the handle. This should supersede spl_object_hash in the long run.
spl_object_hash does a bunch of pointless things that serve no purpose
other than making the function slower and making the result more bulky.

Nikita

9 years ago by Stanislav Malyshev — view source

unread

Hi!

I'd prefer to add a separate function spl_object_id, which directly returns
the handle. This should supersede spl_object_hash in the long run.

Since in PHP 7 there's no longer a possibility of having different
object stores, as far as I can see, spl_object_id can identify an object
(it wasn't possible before 7).

However, committing to having specific object ID (e.g. as integer) as
part of public API is pretty big commitment which I personally think
creates unnecessary coupling between PHP internals and user code and is
a bad design decision which may bite us later.

spl_object_hash does a bunch of pointless things that serve no purpose
other than making the function slower and making the result more bulky.

Before PHP 7, it created proper object identification, as objects could
be stored in different stores. As PHP 7 removed that possibility, indeed
adding Z_OBJ_HT_P is no longer serving any purpose. Performance though
has very little to do with it - it's just two XORs and one printf.

--
Stas Malyshev
smalyshev@gmail.com

9 years ago by Nicolas Grekas — view source

unread

Hi Stas,

thanks for hooking into this thread

However, committing to having specific object ID (e.g. as integer) as

part of public API is pretty big commitment

This is already part of the public API: var_dump and debug_zval_dump do
expose object IDs to userland. It's just inefficient to get for more
advanced use cases.
Even if its internal implementation is totally different from PHP's, HHVM
adds an id to every object, just for this requirement: matching what PHP
exposes to userland.

I'd more than happy to have an spl_object_id() function btw!

Nicolas

9 years ago by Stanislav Malyshev — view source

unread

Hi!

This is already part of the public API: var_dump and debug_zval_dump do
expose object IDs to userland. It's just inefficient to get for more

Debug functions are not the part of public API. If you rely on anything
var_dump is producing for your code to work, you risk your code being
broken any time, without warning or any BC consideration.

Even if its internal implementation is totally different from PHP's,
HHVM adds an id to every object, just for this requirement: matching
what PHP exposes to userland.

That's HHVM's decision, however the fact that they need to fake IDs is
not a good thing. Proper code should not depend on such details of
engine implementation.

I'd more than happy to have an spl_object_id() function btw!

I wouldn't be very happy, because it is wrong design to base userland
PHP code on a detail of an implementation of the engine. That's why
there was a hash and not direct ID - so that the code would not depend
on implementation detail and anything that produces different hash for
different objects would work.
The alternative (stuffing the ID into spl_object_hash) is worse however.

Stas Malyshev
smalyshev@gmail.com

9 years ago by Lester Caine — view source

unread

I'd more than happy to have an spl_object_id() function btw!
I wouldn't be very happy, because it is wrong design to base userland
PHP code on a detail of an implementation of the engine. That's why
there was a hash and not direct ID - so that the code would not depend
on implementation detail and anything that produces different hash for
different objects would work.
The alternative (stuffing the ID into spl_object_hash) is worse however.

What is actually trying to be achieved? A set of objects get created to
cache material from another source. One needs a way of identifying the
set of buckets which are the raw pointers to each object? Once a bucket
is populated then it's identity is part of the bucket. Identifying if a
record has already been loaded should happen before wasting the time to
download it again? Alternatively a duplicate may be required which
retains the same identity until one is happy to replace the original or
roll back. The 'identity' is something in the bucket not something that
any handle can address? It's implementation not a generic function.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

9 years ago by Stanislav Malyshev — view source

unread

Hi!

It would be awesome if this patch could be applied for php 7.0!

Please don't. This changes the whole meaning of spl_object_hash, and
serves no useful purpose. If you need the id so much, it should be a
separate function. I am still against exposing raw IDs, but if the
consensus is we need them then it should be done cleanly not by just
taking existing function and repurposing half of its output. That's not
the right design by any measure.

And rushing with it into 7.0 does not give any time to properly discuss it.

--
Stas Malyshev
smalyshev@gmail.com

9 years ago by Anthony Ferrara — view source

unread

Julien,

Hi people.

I've been pinged many times to add a new spl_object_id() function to PHP,
that would return the internal object handle of an object.

Today, spl_object_hash() partially allows that, but adds many randomness to
the result, which is not very cool to use later (why does it even add
randomness ?).

There has been topics about this subject.
For example, at http://marc.info/?l=php-internals&m=141814350920452&w=2

Beeing able to get the object handle back in PHP userland would ease many
tools, mainly debug-oriented tools.
I know PHPUnit, Symfony and many big projects today make use of
spl_object_hash() to identify objects.

I also know people that print_r($an_object) and parse the output just to
extract the object handle from there... Crazy isn't it ?
Why couldn't we help those people by simply adding a new function that does
the job ?

Thoughts ?

I'm not sure about the randomness to the handle, but the hash also
includes the object handler pointer. So without the random, we'd be
leaking information about the memory layout of the application.

And for the record, I am cool with simply exposing the handle.

Anthony

9 years ago by Stephen Coakley — view source

unread

Hi people.

I've been pinged many times to add a new spl_object_id() function to PHP,
that would return the internal object handle of an object.

Today, spl_object_hash() partially allows that, but adds many randomness to
the result, which is not very cool to use later (why does it even add
randomness ?).

There has been topics about this subject.
For example, at http://marc.info/?l=php-internals&m=141814350920452&w=2

Beeing able to get the object handle back in PHP userland would ease many
tools, mainly debug-oriented tools.
I know PHPUnit, Symfony and many big projects today make use of
spl_object_hash() to identify objects.

I also know people that print_r($an_object) and parse the output just to
extract the object handle from there... Crazy isn't it ?
Why couldn't we help those people by simply adding a new function that does
the job ?

Thoughts ?

Julien.Pauli

I can think of several use cases why this might be useful, and not just
for debugging-related code. It could be used for indexing some sort of
complex object storage data structure (if you can't/won't use
SplObjectStorage). I can think of a few libraries that us
spl_object_hash() to do the same thing but doesn't work well in
conjunction with forking (probably due to the randomness factor).

+1

--
Stephen Coakley

9 years ago by Stanislav Malyshev — view source

unread

Hi!

I've been pinged many times to add a new spl_object_id() function to PHP,
that would return the internal object handle of an object.

What would be the use of this?

Today, spl_object_hash() partially allows that, but adds many randomness to
the result, which is not very cool to use later (why does it even add
randomness ?).

So that people would be less inclined to use these hashes as if they
mean something. And to not reveal internal memory details if these IDs
are leaked externally for some reason (which they shouldn't be but sigh...)

There has been topics about this subject.
For example, at http://marc.info/?l=php-internals&m=141814350920452&w=2

Beeing able to get the object handle back in PHP userland would ease many
tools, mainly debug-oriented tools.

I would say debug tools probably need extensions which can access
internal data structures. Exposing internal data creates bad
dependencies, then we can't fix bugs or refactor stuff because somebody
used internal API in wrong way and now depends on it (like object
unserialization problem where people started using unserialization for
purposes it was never intended to). Let's say in 8.x we remove handles
altogether and replace them with pointers or some other magic? What
these tools would be doing then? By creating this function, we put
values of object IDs (not even their existence, but also their values!)
into public API, which means not we take BC obligations on them. I don't
think it is a good thing.

I also know people that print_r($an_object) and parse the output just to
extract the object handle from there... Crazy isn't it ?

Yes, they should not be using object handles for any practical purposes,
it's an implementation detail depending on which is a bad idea.

Why couldn't we help those people by simply adding a new function that does
the job ?

I'm not sure that is the job that should be done.

--
Stas Malyshev
smalyshev@gmail.com

9 years ago by Derick Rethans — view source

unread

Hi people.

I've been pinged many times to add a new spl_object_id() function to PHP,
that would return the internal object handle of an object.

Today, spl_object_hash() partially allows that, but adds many randomness to
the result, which is not very cool to use later (why does it even add
randomness ?).

There has been topics about this subject.
For example, at http://marc.info/?l=php-internals&m=141814350920452&w=2

Beeing able to get the object handle back in PHP userland would ease many
tools, mainly debug-oriented tools.
I know PHPUnit, Symfony and many big projects today make use of
spl_object_hash() to identify objects.

I also know people that print_r($an_object) and parse the output just to
extract the object handle from there... Crazy isn't it ?
Why couldn't we help those people by simply adding a new function that does
the job ?

You realize that these object handles aren't particularly stable? The
same object ID can be reused:

derick@whisky:/tmp $ cat objid.php
<?php
class Foo {}
$a = new Foo;
var_dump($a);
$a = new Foo;
var_dump($a);
$a = new Foo;
var_dump($a);

derick@whisky:/tmp $ php objid.php
class Foo#1 (0) {
}
class Foo#2 (0) {
}
class Foo#1 (0) {
}

You can't deterministically reference an object by it's class and
handle... so I also think this implementation detail should not be
shared through an API.

cheers,
Derick

9 years ago by Bob Weinand — view source

unread

Am 02.08.2015 um 19:09 schrieb Derick Rethans derick@php.net:

Hi people.

I've been pinged many times to add a new spl_object_id() function to PHP,
that would return the internal object handle of an object.

Today, spl_object_hash() partially allows that, but adds many randomness to
the result, which is not very cool to use later (why does it even add
randomness ?).

There has been topics about this subject.
For example, at http://marc.info/?l=php-internals&m=141814350920452&w=2

Beeing able to get the object handle back in PHP userland would ease many
tools, mainly debug-oriented tools.
I know PHPUnit, Symfony and many big projects today make use of
spl_object_hash() to identify objects.

I also know people that print_r($an_object) and parse the output just to
extract the object handle from there... Crazy isn't it ?
Why couldn't we help those people by simply adding a new function that does
the job ?

You realize that these object handles aren't particularly stable? The
same object ID can be reused:

derick@whisky:/tmp $ cat objid.php
<?php
class Foo {}
$a = new Foo;
var_dump($a);
$a = new Foo;
var_dump($a);
$a = new Foo;
var_dump($a);

derick@whisky:/tmp $ php objid.php
class Foo#1 (0) {
}
class Foo#2 (0) {
}
class Foo#1 (0) {
}

You can't deterministically reference an object by it's class and
handle... so I also think this implementation detail should not be
shared through an API.

cheers,
Derick

Ehm, you realize that object id is only reset because the old object is freed? As long as the target object is referenced, nothing will have the same object id.

Which is why this isn't an issue. Is there any point in having an unique id? It's only 32 bit on some systems. When it once overflows, it'd be reused anyway.
What's important is having an exact 1:1 mapping of id and [existing] object. Which is all we need.

Bob

9 years ago by Stanislav Malyshev — view source

unread

Hi!

Ehm, you realize that object id is only reset because the old object is freed? As long as the target object is referenced, nothing will have the same object id.

Yes, but if you use it as some sort of ID - e.g. as in "did I already
create a object of class X with parameters stated in class Y" and use
class Y's object ID as a key for that, you're in for a nasty surprise if
some other code deletes that object and creates new object of class Y
with the same ID but completely different content. That's what one of
the code examples referred to in this thread was doing, if I understood
it correctly. Which only goes to emphasize my point - a lot of the
people that want object IDs are either using them wrong or plan to use
them wrong. And wrapping more rope around their necks is not exactly
what we should be doing.

--
Stas Malyshev
smalyshev@gmail.com

Exposing object handles to userland

Regards,

Regards,

Regards,

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL