Hi,
I spoke with some devs yesterday about spl_object_hash performance and
alternatives to solve it.
Seems that md5 applied inside it is the responsable for that.
After some tips from Lars, we came with a patch (at the bottom of this email).
The new proposed function is already being used in Doctrine 2.0 (to be
released in 2009), which takes advantages of new features of 5.3.
During our tests, everything passed without conflicting or leading to
unpredictable values.
The performance did not increase too much as I expected, but it
reached to a good value for me now. It's as fast as calculating the
spl_object_hash once and do a method call, which is an interesting
value.
It seems that Marcus controls the commit access to SPL. So I'm turning
the conversation async, since I cannot find him online at IRC.
So, can anyone review the patch, comment it and commit if approved?
Thanks in advance,
Index: ext/spl/php_spl.c
RCS file: /repository/php-src/ext/spl/php_spl.c,v
retrieving revision 1.52.2.28.2.17.2.33
diff -u -r1.52.2.28.2.17.2.33 php_spl.c
--- ext/spl/php_spl.c 30 Nov 2008 00:23:06 -0000 1.52.2.28.2.17.2.33
+++ ext/spl/php_spl.c 16 Dec 2008 19:05:05 -0000
@@ -682,6 +682,23 @@
}
/* }}} */
+/* {{{ proto string spl_object_id(object obc)
- Return id for the given object */
+PHP_FUNCTION(spl_object_id)
+{ - zval *obj;
- char *string;
- int len;
- if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "o", &obj) == FAILURE) {
-
return;
- }
- len = spprintf(&string, 0, "%p%d", Z_OBJ_HT_P(obj), Z_OBJ_HANDLE_P(obj));
- RETURN_STRING(string, 1);
+}
+/* }}} */
PHPAPI void php_spl_object_hash(zval obj, char md5str TSRMLS_DC) / {{{/
{
int len;
@@ -798,6 +815,10 @@
ZEND_BEGIN_ARG_INFO_EX(arginfo_spl_object_hash, 0, 0, 1)
ZEND_ARG_INFO(0, obj)
ZEND_END_ARG_INFO()
+ZEND_BEGIN_ARG_INFO_EX(arginfo_spl_object_id, 0, 0, 1)
- ZEND_ARG_INFO(0, obj)
+ZEND_END_ARG_INFO()
/* }}} */
/* {{{ spl_functions
@@ -813,6 +834,7 @@
PHP_FE(class_parents, arginfo_class_parents)
PHP_FE(class_implements, arginfo_class_implements)
PHP_FE(spl_object_hash, arginfo_spl_object_hash)
- PHP_FE(spl_object_id, arginfo_spl_object_id)
#ifdef SPL_ITERATORS_H
PHP_FE(iterator_to_array, arginfo_iterator_to_array)
PHP_FE(iterator_count, arginfo_iterator)
--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
Hi Guilherme,
thanks for moving the discussion to the list.
Am Mittwoch, den 17.12.2008, 15:31 -0200 schrieb Guilherme Blanco:
[...]
It seems that Marcus controls the commit access to SPL. So I'm turning
the conversation async, since I cannot find him online at IRC.
So, can anyone review the patch, comment it and commit if approved?
Just for clarification, it is not about access, but about maintenance.
So if Marcus gives his go, we can happily apply the patch and add a few
tests (something you could start preparing now).
cu, Lars
Hello,
Hi Guilherme,
thanks for moving the discussion to the list.
Am Mittwoch, den 17.12.2008, 15:31 -0200 schrieb Guilherme Blanco:
[...]It seems that Marcus controls the commit access to SPL. So I'm turning
the conversation async, since I cannot find him online at IRC.
So, can anyone review the patch, comment it and commit if approved?Just for clarification, it is not about access, but about maintenance.
So if Marcus gives his go, we can happily apply the patch and add a few
tests (something you could start preparing now).cu, Lars
Last time I checked with Marcus, there were concerns about disclosing
a valid pointer to the user.
I'd be happy to see a use-case where this information is really needed
heavily. The only real usecase of heavy usages seems to be to
implement sets of objects. but splObjectStorage is here for that
precise use-case...
Regards
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Hello Etienne,
Wednesday, December 17, 2008, 7:59:01 PM, you wrote:
Hello,
Hi Guilherme,
thanks for moving the discussion to the list.
Am Mittwoch, den 17.12.2008, 15:31 -0200 schrieb Guilherme Blanco:
[...]It seems that Marcus controls the commit access to SPL. So I'm turning
the conversation async, since I cannot find him online at IRC.
So, can anyone review the patch, comment it and commit if approved?Just for clarification, it is not about access, but about maintenance.
So if Marcus gives his go, we can happily apply the patch and add a few
tests (something you could start preparing now).cu, Lars
Last time I checked with Marcus, there were concerns about disclosing
a valid pointer to the user.
I'd be happy to see a use-case where this information is really needed
heavily. The only real usecase of heavy usages seems to be to
implement sets of objects. but splObjectStorage is here for that
precise use-case...
Correct in all Etienne. The patch might be a tiny bit faster but exposes
valid pointers which is extremely bad and also allows other bad things.
That was the only reason I used md5 hashin. What I needed was something
that is really unique per object (object pointer or id plus pointer to
handler table). Since spl_object_hash()
does not say how it creates the
hash it should be fine change the way it does it. Since in a new session
the hashes are of no more use we can even do that in any new version.
However I must still insist on not exposing any valid information.
Last but not least. In your code you know the maximum length of the
extression, so you can allocate the string and snprintf into it. Even
faster is to do a hexdump into a preallocated string. For the size use:
char* hash = (char*)safe_emalloc(sizeof(void*), 2, 1);
Now the dump of the two pointers.
This approach should make it a bit faster for you. Something that might
work is to create a random 128 bit hash key that is xored onto the hash
created from the two pointers. This hash key can be allocated for each
session the first time the function will be used. If you do that I am more
than happy to accept that as a replacement for current spl_object_hash()
.
marcus
Regards
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Best regards,
Marcus
Ok,
We'll use this method inside Doctrine ORM version 2.0, scheduled to be
released on September 1st, 2009.
One main location where we are already using it is during Hydration process.
The process of grab a DB tuple and convert it into an Object graph.
Here is the usage.
Each Object of the graph is a Value Object
(http://en.wikipedia.org/wiki/Value_object). So it does not have any
other mapping else than to-be persisted ones. No internal method
implementation is needed. All Active Record like actions are
controlled by EntityManager.
Based on that, we have a ClassMetadata that is catch based on class
name (currently based on spl_object_id, but it's too resources
expensive and I'll change that). When we get the DB tuple, we need to
find the exact ClassMetadata of that item and apply the specific
DB/PHP type castings for example. Also there's a property attribution.
Property attribution is thanks to new Reflection API. We store the
ReflectionProperty of each field and assign it when we have its
definition.
Another location where we rely spl_object_id is inside UnitOfWork
(http://martinfowler.com/eaaCatalog/unitOfWork.html). We generate a
mapping of each Entity/Collection to be persisted/updated/deleted. We
define the order of appliance of these things based on first the
generated OID (spl_object_id return) and later by Topological Sorting
(http://en.wikipedia.org/wiki/Topological_sorting). Finally, we start
the transaction and the statements.
The point is that we may have being doing a huge hydration with a lots
of relationed objects. We may be dealing with a webpage that fetches
for more than 5000 records with even more associations. All of that
runtime. So I have to say performance is something VERY important for
us.
Why will we not use SplStorage?
Because it'll be used on different places and should share the same
OID. Including couple of this component is not a viable idea since
it'll go to a more memory expensive solution, which we're trying to
optimize a lot and also will force us to include another get call
(through method call), which will fall into an even slower
implementation.
Here are two files that we have being using spl_object_id (changed now
to spl_object_hash, since the idea is to update it with Marcus'
suggestions):
Object Driver for Hydration:
http://trac.doctrine-project.org/browser/trunk/lib/Doctrine/ORM/Internal/Hydration/ObjectDriver.php
UnitOfWork for Persistance:
http://trac.doctrine-project.org/browser/trunk/lib/Doctrine/ORM/UnitOfWork.php
Short version: Because we want a fast, easy way to associate
information (temporarily) with an object. Most of the time we use the
object id/hash as a key in an array. Basically, spl_object_hash is
fine, it would just be nice if it could be improved in speed.
It'll take me some time to dig into PHP source to try to implement it.
I'm not a C developer and there're more than 4 years I didn't touch a
single line o C code. Also I can read PHP source, but I'm not able to
create it.
I already spoke with Felipe which will help me solving questions about
src, but I cannot guarantee I'll be able to do the job.
Regards,
Hello Etienne,
Wednesday, December 17, 2008, 7:59:01 PM, you wrote:
Hello,
Hi Guilherme,
thanks for moving the discussion to the list.
Am Mittwoch, den 17.12.2008, 15:31 -0200 schrieb Guilherme Blanco:
[...]It seems that Marcus controls the commit access to SPL. So I'm turning
the conversation async, since I cannot find him online at IRC.
So, can anyone review the patch, comment it and commit if approved?Just for clarification, it is not about access, but about maintenance.
So if Marcus gives his go, we can happily apply the patch and add a few
tests (something you could start preparing now).cu, Lars
Last time I checked with Marcus, there were concerns about disclosing
a valid pointer to the user.
I'd be happy to see a use-case where this information is really needed
heavily. The only real usecase of heavy usages seems to be to
implement sets of objects. but splObjectStorage is here for that
precise use-case...Correct in all Etienne. The patch might be a tiny bit faster but exposes
valid pointers which is extremely bad and also allows other bad things.
That was the only reason I used md5 hashin. What I needed was something
that is really unique per object (object pointer or id plus pointer to
handler table). Sincespl_object_hash()
does not say how it creates the
hash it should be fine change the way it does it. Since in a new session
the hashes are of no more use we can even do that in any new version.
However I must still insist on not exposing any valid information.Last but not least. In your code you know the maximum length of the
extression, so you can allocate the string and snprintf into it. Even
faster is to do a hexdump into a preallocated string. For the size use:
char* hash = (char*)safe_emalloc(sizeof(void*), 2, 1);
Now the dump of the two pointers.
This approach should make it a bit faster for you. Something that might
work is to create a random 128 bit hash key that is xored onto the hash
created from the two pointers. This hash key can be allocated for each
session the first time the function will be used. If you do that I am more
than happy to accept that as a replacement for currentspl_object_hash()
.marcus
Regards
--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- PascalBest regards,
Marcus
--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
Hello,
We already had that discussion in private, but here is a on-list summary:
On Mon, Jan 19, 2009 at 5:39 PM, Guilherme Blanco
guilhermeblanco@gmail.com wrote:
Ok,
We'll use this method inside Doctrine ORM version 2.0, scheduled to be
released on September 1st, 2009.One main location where we are already using it is during Hydration process.
The process of grab a DB tuple and convert it into an Object graph.
Here is the usage.Each Object of the graph is a Value Object
(http://en.wikipedia.org/wiki/Value_object). So it does not have any
other mapping else than to-be persisted ones. No internal method
implementation is needed. All Active Record like actions are
controlled by EntityManager.Based on that, we have a ClassMetadata that is catch based on class
name (currently based on spl_object_id, but it's too resources
expensive and I'll change that). When we get the DB tuple, we need to
find the exact ClassMetadata of that item and apply the specific
DB/PHP type castings for example. Also there's a property attribution.
Property attribution is thanks to new Reflection API. We store the
ReflectionProperty of each field and assign it when we have its
definition.Another location where we rely spl_object_id is inside UnitOfWork
(http://martinfowler.com/eaaCatalog/unitOfWork.html). We generate a
mapping of each Entity/Collection to be persisted/updated/deleted. We
define the order of appliance of these things based on first the
generated OID (spl_object_id return) and later by Topological Sorting
(http://en.wikipedia.org/wiki/Topological_sorting). Finally, we start
the transaction and the statements.The point is that we may have being doing a huge hydration with a lots
of relationed objects. We may be dealing with a webpage that fetches
for more than 5000 records with even more associations. All of that
runtime. So I have to say performance is something VERY important for
us.Why will we not use SplStorage?
Because it'll be used on different places and should share the same
OID. Including couple of this component is not a viable idea since
it'll go to a more memory expensive solution, which we're trying to
optimize a lot and also will force us to include another get call
(through method call), which will fall into an even slower
implementation.Here are two files that we have being using spl_object_id (changed now
to spl_object_hash, since the idea is to update it with Marcus'
suggestions):
Object Driver for Hydration:
http://trac.doctrine-project.org/browser/trunk/lib/Doctrine/ORM/Internal/Hydration/ObjectDriver.php
UnitOfWork for Persistance:
http://trac.doctrine-project.org/browser/trunk/lib/Doctrine/ORM/UnitOfWork.phpShort version: Because we want a fast, easy way to associate
information (temporarily) with an object. Most of the time we use the
object id/hash as a key in an array. Basically, spl_object_hash is
fine, it would just be nice if it could be improved in speed.
All those use cases are related to a [object => data] map, which can
be solved by
SplObjectStorage:
$storage = new SplObjectStorage;
$storage[$obj1] = $data; ...
var_dump($storage[$obj1]); ...
There were three concerns:
- Speed: the main ground for spl_object_id is speed. =>
Splobjectstorage is faster than an array with spl_object_hash (and can
be made even faster). - $storage[$obj1]['index'] = 2; This is sadly a limitation of
ArrayAccess => It can be solved either by doing get+change+set, or
using an ArrayObject instead of an array. - Memory: Since the object itself will be referenced in the storage,
you'll have to delete it from every maps in order for GC to do its
work. => This is a security, indeed, an object stays unique as long as
it exists:
$a = new StdClass;
$h1 = spl_object_hash($a);
unset($a);
$b = new StdClass;
$h2 = spl_object_hash($b)
var_dump($h1===$h2); // bool(true)
Conclusion: If you clean your objects without properly taking care of
the metadata stored in the array indexed by object_id, you'll get
unexpected results anyway.
So far it looks like SplObjectStorage is fine with those use cases. If
somebody has a practical (with code) use case in which
SplObjectStorage can't be sanely used and where spl_object_id is the
only solution, please shoot.
It'll take me some time to dig into PHP source to try to implement it.
I'm not a C developer and there're more than 4 years I didn't touch a
single line o C code. Also I can read PHP source, but I'm not able to
create it.
I already spoke with Felipe which will help me solving questions about
src, but I cannot guarantee I'll be able to do the job.Regards,
Hello Etienne,
Wednesday, December 17, 2008, 7:59:01 PM, you wrote:
Hello,
Hi Guilherme,
thanks for moving the discussion to the list.
Am Mittwoch, den 17.12.2008, 15:31 -0200 schrieb Guilherme Blanco:
[...]It seems that Marcus controls the commit access to SPL. So I'm turning
the conversation async, since I cannot find him online at IRC.
So, can anyone review the patch, comment it and commit if approved?Just for clarification, it is not about access, but about maintenance.
So if Marcus gives his go, we can happily apply the patch and add a few
tests (something you could start preparing now).cu, Lars
Last time I checked with Marcus, there were concerns about disclosing
a valid pointer to the user.
I'd be happy to see a use-case where this information is really needed
heavily. The only real usecase of heavy usages seems to be to
implement sets of objects. but splObjectStorage is here for that
precise use-case...Correct in all Etienne. The patch might be a tiny bit faster but exposes
valid pointers which is extremely bad and also allows other bad things.
That was the only reason I used md5 hashin. What I needed was something
that is really unique per object (object pointer or id plus pointer to
handler table). Sincespl_object_hash()
does not say how it creates the
hash it should be fine change the way it does it. Since in a new session
the hashes are of no more use we can even do that in any new version.
However I must still insist on not exposing any valid information.Last but not least. In your code you know the maximum length of the
extression, so you can allocate the string and snprintf into it. Even
faster is to do a hexdump into a preallocated string. For the size use:
char* hash = (char*)safe_emalloc(sizeof(void*), 2, 1);
Now the dump of the two pointers.
This approach should make it a bit faster for you. Something that might
work is to create a random 128 bit hash key that is xored onto the hash
created from the two pointers. This hash key can be allocated for each
session the first time the function will be used. If you do that I am more
than happy to accept that as a replacement for currentspl_object_hash()
.marcus
Regards
--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- PascalBest regards,
Marcus--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
Regards,
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Hi,
I had a talk with Marcus, and he has agreed on this proposed solution:
- SPL generates a pseudo-random session id/mask (for the current request,
do not confuse with $_SESSION), which consists of 32 random
bytes/characters. - The object is and the handler pointer are used to create a unique
identifier for the object, by padding each pointer to 16 bytes and joining
them together. - The resulting byte sequence is XOR-red with the session mask so the
object id/handler are not retrievable in userland, and the resulting
sequence/string is returned.
This is better than spl_object_hash for two reasons:
- no printf of numbers, no expensive md5 hashing, no hex conversion.
- no possible collisions
Drawbacks:
- might generate some non-printable chars after xor (but this wouldn't
matter for operation). The idea above can be tweaked to solve this.
The "32-bit string" format is to preserve compatibility with
spl_object_hash, so it can replace it, instead of introducing a new similar
function.
I've had no time to look further into this, but Marcus seems to like the
basic idea, so feel welcome...
Regards,
Stan Vassilev
Hello,
On Tue, Jan 20, 2009 at 11:47 AM, Stan Vassilev | FM
sv_forums@fmethod.com wrote:
Hi,
I had a talk with Marcus, and he has agreed on this proposed solution:
- SPL generates a pseudo-random session id/mask (for the current request,
do not confuse with $_SESSION), which consists of 32 random
bytes/characters.- The object is and the handler pointer are used to create a unique
identifier for the object, by padding each pointer to 16 bytes and joining
them together.- The resulting byte sequence is XOR-red with the session mask so the
object id/handler are not retrievable in userland, and the resulting
sequence/string is returned.This is better than spl_object_hash for two reasons:
- no printf of numbers, no expensive md5 hashing, no hex conversion.
- no possible collisions
Drawbacks:
- might generate some non-printable chars after xor (but this wouldn't
matter for operation). The idea above can be tweaked to solve this.The "32-bit string" format is to preserve compatibility with
spl_object_hash, so it can replace it, instead of introducing a new similar
function.I've had no time to look further into this, but Marcus seems to like the
basic idea, so feel welcome...
Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)
Thanks.
Regards,
Stan Vassilev--
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)
Why?
As far as i understand, the issue is that spl_object_hash()
is slow
and people want a faster alternatives.
Although the initial reason for the faster alternative was to do
maintain some classmap for a "random" PHP project, this has nothing to
do with the fact spl_object_hash()
is slow, especially compared to the
proposed spl_object_id() function.
Now, your "workarounds" using SplObjectStorage may work for the
previous mentioned usecases - but how the frick is anyone supposed
know to know about the wonders of SplObjectStorage?
The doxygen is very confusing and definetly does not replace real
documentation - plus it doesn't mention the things you are referring
too (object => data map) or any of the other 5.3 features.
-Hannes
Hello,
On Tue, Jan 20, 2009 at 1:41 PM, Hannes Magnusson
hannes.magnusson@gmail.com wrote:
Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)Why?
As far as i understand, the issue is that
spl_object_hash()
is slow
and people want a faster alternatives.
Although the initial reason for the faster alternative was to do
maintain some classmap for a "random" PHP project, this has nothing to
do with the factspl_object_hash()
is slow, especially compared to the
proposed spl_object_id() function.
People that need a faster alternative usually need it to have an
efficient object hash map. So far all the use cases where a faster
spl_object_hash was needed is to do exactly that. Since this is
already possible, the need for spl_object_id is questionable. If there
is a clear need for it, I'll be happy to implement it myself.
Now, your "workarounds" using SplObjectStorage may work for the
previous mentioned usecases - but how the frick is anyone supposed
know to know about the wonders of SplObjectStorage?
The doxygen is very confusing and definetly does not replace real
documentation - plus it doesn't mention the things you are referring
too (object => data map) or any of the other 5.3 features.
So, since documentation is currently bad, we should implement some
additional functions to do the same?
Sorry, but your SPL documentation vendetta is irrelevant to the
question at hand.
-Hannes
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Etienne,
We all already considered to not implement spl_object_id as long as
spl_object_hash is optimized.
Regards,
Hello,
On Tue, Jan 20, 2009 at 1:41 PM, Hannes Magnusson
hannes.magnusson@gmail.com wrote:Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)Why?
As far as i understand, the issue is that
spl_object_hash()
is slow
and people want a faster alternatives.
Although the initial reason for the faster alternative was to do
maintain some classmap for a "random" PHP project, this has nothing to
do with the factspl_object_hash()
is slow, especially compared to the
proposed spl_object_id() function.People that need a faster alternative usually need it to have an
efficient object hash map. So far all the use cases where a faster
spl_object_hash was needed is to do exactly that. Since this is
already possible, the need for spl_object_id is questionable. If there
is a clear need for it, I'll be happy to implement it myself.Now, your "workarounds" using SplObjectStorage may work for the
previous mentioned usecases - but how the frick is anyone supposed
know to know about the wonders of SplObjectStorage?
The doxygen is very confusing and definetly does not replace real
documentation - plus it doesn't mention the things you are referring
too (object => data map) or any of the other 5.3 features.So, since documentation is currently bad, we should implement some
additional functions to do the same?
Sorry, but your SPL documentation vendetta is irrelevant to the
question at hand.-Hannes
--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal--
--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
Hello,
On Tue, Jan 20, 2009 at 2:23 PM, Guilherme Blanco
guilhermeblanco@gmail.com wrote:
Etienne,
We all already considered to not implement spl_object_id as long as
spl_object_hash is optimized.
Ok then, I'll provide a patch to improve spl_object_hash's
performance, which will also change it's output.
Documentation will also need to be updated so that people are aware
that most of the time SplObjectStorage will be a better alternative.
Regards,
Regards,
Hello,
On Tue, Jan 20, 2009 at 1:41 PM, Hannes Magnusson
hannes.magnusson@gmail.com wrote:Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)Why?
As far as i understand, the issue is that
spl_object_hash()
is slow
and people want a faster alternatives.
Although the initial reason for the faster alternative was to do
maintain some classmap for a "random" PHP project, this has nothing to
do with the factspl_object_hash()
is slow, especially compared to the
proposed spl_object_id() function.People that need a faster alternative usually need it to have an
efficient object hash map. So far all the use cases where a faster
spl_object_hash was needed is to do exactly that. Since this is
already possible, the need for spl_object_id is questionable. If there
is a clear need for it, I'll be happy to implement it myself.Now, your "workarounds" using SplObjectStorage may work for the
previous mentioned usecases - but how the frick is anyone supposed
know to know about the wonders of SplObjectStorage?
The doxygen is very confusing and definetly does not replace real
documentation - plus it doesn't mention the things you are referring
too (object => data map) or any of the other 5.3 features.So, since documentation is currently bad, we should implement some
additional functions to do the same?
Sorry, but your SPL documentation vendetta is irrelevant to the
question at hand.-Hannes
--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal--
--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Hello,
Hello,
On Tue, Jan 20, 2009 at 2:23 PM, Guilherme Blanco
guilhermeblanco@gmail.com wrote:Etienne,
We all already considered to not implement spl_object_id as long as
spl_object_hash is optimized.Ok then, I'll provide a patch to improve spl_object_hash's
performance, which will also change it's output.
Documentation will also need to be updated so that people are aware
that most of the time SplObjectStorage will be a better alternative.
Here is the mentioned patches:
http://patches.colder.ch/php-src/spl_object_hash-5_3.patch?markup
http://patches.colder.ch/php-src/spl_object_hash-HEAD.patch?markup
If nobody objects, I'll commit them Sunday.
Regards,
Regards,
Regards,
Hello,
On Tue, Jan 20, 2009 at 1:41 PM, Hannes Magnusson
hannes.magnusson@gmail.com wrote:Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)Why?
As far as i understand, the issue is that
spl_object_hash()
is slow
and people want a faster alternatives.
Although the initial reason for the faster alternative was to do
maintain some classmap for a "random" PHP project, this has nothing to
do with the factspl_object_hash()
is slow, especially compared to the
proposed spl_object_id() function.People that need a faster alternative usually need it to have an
efficient object hash map. So far all the use cases where a faster
spl_object_hash was needed is to do exactly that. Since this is
already possible, the need for spl_object_id is questionable. If there
is a clear need for it, I'll be happy to implement it myself.Now, your "workarounds" using SplObjectStorage may work for the
previous mentioned usecases - but how the frick is anyone supposed
know to know about the wonders of SplObjectStorage?
The doxygen is very confusing and definetly does not replace real
documentation - plus it doesn't mention the things you are referring
too (object => data map) or any of the other 5.3 features.So, since documentation is currently bad, we should implement some
additional functions to do the same?
Sorry, but your SPL documentation vendetta is irrelevant to the
question at hand.-Hannes
--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal--
--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Hi,
It seems SplObjectStorage will solve my issue.
We already spoke on pvt about how to handle things correctly too.
Although this solve my issue, the point that Stan highlighted is
valid. You may not find a good place where to use it, but
spl_object_hash is there and it has drawbacks. Just naming collision
possibility is enough to convince me about function change, even
though I can't imagine a usage of it now.
If none can find a usage, so we can drop spl_object_hash, right? If
you don't agree, then we need some change to make it faster and
non-collision proof.
Regards,
Hello,
On Tue, Jan 20, 2009 at 11:47 AM, Stan Vassilev | FM
sv_forums@fmethod.com wrote:Hi,
I had a talk with Marcus, and he has agreed on this proposed solution:
- SPL generates a pseudo-random session id/mask (for the current request,
do not confuse with $_SESSION), which consists of 32 random
bytes/characters.- The object is and the handler pointer are used to create a unique
identifier for the object, by padding each pointer to 16 bytes and joining
them together.- The resulting byte sequence is XOR-red with the session mask so the
object id/handler are not retrievable in userland, and the resulting
sequence/string is returned.This is better than spl_object_hash for two reasons:
- no printf of numbers, no expensive md5 hashing, no hex conversion.
- no possible collisions
Drawbacks:
- might generate some non-printable chars after xor (but this wouldn't
matter for operation). The idea above can be tweaked to solve this.The "32-bit string" format is to preserve compatibility with
spl_object_hash, so it can replace it, instead of introducing a new similar
function.I've had no time to look further into this, but Marcus seems to like the
basic idea, so feel welcome...Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)Thanks.
Regards,
Stan Vassilev--
--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal--
--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
Hello,
On Tue, Jan 20, 2009 at 1:45 PM, Guilherme Blanco
guilhermeblanco@gmail.com wrote:
Hi,
It seems SplObjectStorage will solve my issue.
We already spoke on pvt about how to handle things correctly too.Although this solve my issue, the point that Stan highlighted is
valid. You may not find a good place where to use it, but
spl_object_hash is there and it has drawbacks. Just naming collision
possibility is enough to convince me about function change, even
though I can't imagine a usage of it now.If none can find a usage, so we can drop spl_object_hash, right?
This is not possible for BC reasons. Also, changing it internally so
it no longer returns a md5 hash might be doable but it will have some
BC effects as well.
If those effects are considered to be harmless, then sure, I'd be
happy to make that function faster by implementing what was discussed
before.
If
you don't agree, then we need some change to make it faster and
non-collision proof.Regards,
Regards,
Hello,
On Tue, Jan 20, 2009 at 11:47 AM, Stan Vassilev | FM
sv_forums@fmethod.com wrote:Hi,
I had a talk with Marcus, and he has agreed on this proposed solution:
- SPL generates a pseudo-random session id/mask (for the current request,
do not confuse with $_SESSION), which consists of 32 random
bytes/characters.- The object is and the handler pointer are used to create a unique
identifier for the object, by padding each pointer to 16 bytes and joining
them together.- The resulting byte sequence is XOR-red with the session mask so the
object id/handler are not retrievable in userland, and the resulting
sequence/string is returned.This is better than spl_object_hash for two reasons:
- no printf of numbers, no expensive md5 hashing, no hex conversion.
- no possible collisions
Drawbacks:
- might generate some non-printable chars after xor (but this wouldn't
matter for operation). The idea above can be tweaked to solve this.The "32-bit string" format is to preserve compatibility with
spl_object_hash, so it can replace it, instead of introducing a new similar
function.I've had no time to look further into this, but Marcus seems to like the
basic idea, so feel welcome...Could you please provide an example, with code, in which this function
would be necessary ? (i.e. where you can't use SplObjeccctStorage)Thanks.
Regards,
Stan Vassilev--
--
Etienne Kneuss
http://www.colder.chMen never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal--
--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
--
Etienne Kneuss
http://www.colder.ch
Men never do evil so completely and cheerfully as
when they do it from a religious conviction.
-- Pascal
Shouldn't you be using RETURN_STRINGL with length and dup=0?
Andi
-----Original Message-----
From: Guilherme Blanco [mailto:guilhermeblanco@gmail.com]
Sent: Wednesday, December 17, 2008 9:31 AM
To: internals Mailing List
Subject: [PHP-DEV] New function proposal: spl_object_idHi,
I spoke with some devs yesterday about spl_object_hash performance and
alternatives to solve it.
Seems that md5 applied inside it is the responsable for that.After some tips from Lars, we came with a patch (at the bottom of this email).
The new proposed function is already being used in Doctrine 2.0 (to be
released in 2009), which takes advantages of new features of 5.3.
During our tests, everything passed without conflicting or leading to
unpredictable values.
The performance did not increase too much as I expected, but it
reached to a good value for me now. It's as fast as calculating the
spl_object_hash once and do a method call, which is an interesting
value.It seems that Marcus controls the commit access to SPL. So I'm turning
the conversation async, since I cannot find him online at IRC.
So, can anyone review the patch, comment it and commit if approved?Thanks in advance,
Index: ext/spl/php_spl.c
RCS file: /repository/php-src/ext/spl/php_spl.c,v
retrieving revision 1.52.2.28.2.17.2.33
diff -u -r1.52.2.28.2.17.2.33 php_spl.c
--- ext/spl/php_spl.c 30 Nov 2008 00:23:06 -0000 1.52.2.28.2.17.2.33
+++ ext/spl/php_spl.c 16 Dec 2008 19:05:05 -0000
@@ -682,6 +682,23 @@
}
/* }}} */+/* {{{ proto string spl_object_id(object obc)
- Return id for the given object */
+PHP_FUNCTION(spl_object_id)
+{- zval *obj;
- char *string;
- int len;
- if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "o", &obj) ==
FAILURE) {return;
- }
- len = spprintf(&string, 0, "%p%d", Z_OBJ_HT_P(obj),
Z_OBJ_HANDLE_P(obj));- RETURN_STRING(string, 1);
+}
+/* }}} */PHPAPI void php_spl_object_hash(zval obj, char md5str TSRMLS_DC) / {{{/
{
int len;
@@ -798,6 +815,10 @@
ZEND_BEGIN_ARG_INFO_EX(arginfo_spl_object_hash, 0, 0, 1)
ZEND_ARG_INFO(0, obj)
ZEND_END_ARG_INFO()
+ZEND_BEGIN_ARG_INFO_EX(arginfo_spl_object_id, 0, 0, 1)
- ZEND_ARG_INFO(0, obj)
+ZEND_END_ARG_INFO()
/* }}} *//* {{{ spl_functions
@@ -813,6 +834,7 @@
PHP_FE(class_parents, arginfo_class_parents)
PHP_FE(class_implements, arginfo_class_implements)
PHP_FE(spl_object_hash, arginfo_spl_object_hash)
- PHP_FE(spl_object_id, arginfo_spl_object_id)
#ifdef SPL_ITERATORS_H
PHP_FE(iterator_to_array, arginfo_iterator_to_array)
PHP_FE(iterator_count, arginfo_iterator)--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil
Hi Andi,
Am Mittwoch, den 17.12.2008, 10:33 -0800 schrieb Andi Gutmans:
Shouldn't you be using RETURN_STRINGL with length and dup=0?
Yes, the more recent implementation does exactly that :)
cu, Lars