Hi all,
I noticed awhile ago how most every use of zend[_u]_hash_init has nSize as
- Of course it isn't always known how many elements will be added to the
array (nTableSize), but there are places where an accurate value could be
used instead of getting the minimum default (8). For anyone who doesn't
know, HashTable sizes are a power of 2, and when there are more than that
many elements, zend_hash_do_resize realloc's more memory to double the table
size, and then zend_hash_rehash basically goes through and "reinserts" all
the elements again.
I ran some tests to see the speed improvement from specifying the actual
number of elements in hash_init, thus saving that extra work (_rehash
mostly). The "n+1" is with one more element (9, 17, 33...) that triggers
another rehash when the actual number wasn't specified (most extreme). (I
used add_next_index_zval, so the direct zend_hash* functions would give
slightly higher % I guess, being less overhead, right?) Results with
5.2.0-RC4:
Elements
n | n n+1
8 | 0% 14.2%
16 | 11.0% 20.9%
32 | 13.5% 22.1%
64 | 12.6% 21.3%
128 | 11.7% 18.5%
256 | 9.3% 16.4%
512 | 8.6% 17.0%
1024 | 7.9% 15.8%
2048 | 4.8% 33.3%
4096 | 7.8% 28.3%
8192 | 10.2% 58.4%
16384 | 24.1% 70.5%
32768 | 34.5% 80.4%
65536 | 34.8% 68.6%
I haven't looked thoroughly, but the only place I've seen that uses an
non-zero nSize in hash_init (besides some in the core engine) is the
unserialize()
function. :-/ It seems the most simple and obvious place that
could be changed is in Zend/zend_variables.c:_zval_copy_ctor_func? Just
replace
zend[_u]_hash_init(tmp_ht, 0, ...
with
zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...
Other than that, some of PHP's array functions and array-returning functions
(database fetch row functions, etc.) are ones I thought of that could be
optimized like this. Maybe create an array_init_size() macro to be used in
places where the number of elements can be easily determined? Thoughts?
I'd volunteer to look for places that could be updated and modify them. :-)
Thanks,
Matt
P.S. I guess the array stuff applies for objects also? But I don't know
much about their internals...
Aaahh nice catch ;)
From a quick lookup in PHP sources I also found these functions were the
same optimization can be applied:
zend_default_exception_new_ex() - trivial
convert_to_array() - not as easy as others
zend_object_std_init() - possibly
clone_wrapper_hash() - trivial (php-src/main/streams/streams.c)
compiler_globals_ctor()
...
Nuno
----- Original Message -----
Hi all,
I noticed awhile ago how most every use of zend[_u]_hash_init has nSize as
- Of course it isn't always known how many elements will be added to the
array (nTableSize), but there are places where an accurate value could be
used instead of getting the minimum default (8). For anyone who doesn't
know, HashTable sizes are a power of 2, and when there are more than that
many elements, zend_hash_do_resize realloc's more memory to double the
table
size, and then zend_hash_rehash basically goes through and "reinserts" all
the elements again.I ran some tests to see the speed improvement from specifying the actual
number of elements in hash_init, thus saving that extra work (_rehash
mostly). The "n+1" is with one more element (9, 17, 33...) that triggers
another rehash when the actual number wasn't specified (most extreme). (I
used add_next_index_zval, so the direct zend_hash* functions would give
slightly higher % I guess, being less overhead, right?) Results with
5.2.0-RC4:Elements
n | n n+1
8 | 0% 14.2%
16 | 11.0% 20.9%
32 | 13.5% 22.1%
64 | 12.6% 21.3%
128 | 11.7% 18.5%
256 | 9.3% 16.4%
512 | 8.6% 17.0%
1024 | 7.9% 15.8%
2048 | 4.8% 33.3%
4096 | 7.8% 28.3%
8192 | 10.2% 58.4%
16384 | 24.1% 70.5%
32768 | 34.5% 80.4%
65536 | 34.8% 68.6%I haven't looked thoroughly, but the only place I've seen that uses an
non-zero nSize in hash_init (besides some in the core engine) is the
unserialize()
function. :-/ It seems the most simple and obvious place
that
could be changed is in Zend/zend_variables.c:_zval_copy_ctor_func? Just
replacezend[_u]_hash_init(tmp_ht, 0, ...
with
zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...Other than that, some of PHP's array functions and array-returning
functions
(database fetch row functions, etc.) are ones I thought of that could be
optimized like this. Maybe create an array_init_size() macro to be used
in
places where the number of elements can be easily determined? Thoughts?
I'd volunteer to look for places that could be updated and modify them.
:-)Thanks,
MattP.S. I guess the array stuff applies for objects also? But I don't know
much about their internals...
Hi Nuno,
Late reply, but I'm glad the idea was able to be used. :-)
To the other hash people: While checking how Ilia made implode()
faster in
5.2, I think I found another optimization that can be done. In
zend_hash_copy/merge, it seems the zend_hash_quick_* functions can be used
instead for associative entries, thus saving the key from being hashed
again, right? I just checked, and it's definitely faster, how much so being
dependent on the key length of course...
Patches for this simple additional optimization (hopefully):
http://realplain.com/php/hash_quick_copy.diff
http://realplain.com/php/hash_quick_copy_5_2.diff
Thanks,
Matt
----- Original Message -----
From: "Nuno Lopes"
Sent: Wednesday, September 27, 2006
Aaahh nice catch ;)
From a quick lookup in PHP sources I also found these functions were the
same optimization can be applied:
zend_default_exception_new_ex() - trivial
convert_to_array() - not as easy as others
zend_object_std_init() - possibly
clone_wrapper_hash() - trivial (php-src/main/streams/streams.c)
compiler_globals_ctor()
...Nuno
----- Original Message -----
Hi all,
I noticed awhile ago how most every use of zend[_u]_hash_init has nSize
as
- Of course it isn't always known how many elements will be added to
the
array (nTableSize), but there are places where an accurate value could
be
used instead of getting the minimum default (8). For anyone who doesn't
know, HashTable sizes are a power of 2, and when there are more than
that
many elements, zend_hash_do_resize realloc's more memory to double the
table
size, and then zend_hash_rehash basically goes through and "reinserts"
all
the elements again.I ran some tests to see the speed improvement from specifying the
actual
number of elements in hash_init, thus saving that extra work (_rehash
mostly). The "n+1" is with one more element (9, 17, 33...) that
triggers
another rehash when the actual number wasn't specified (most extreme).
(I
used add_next_index_zval, so the direct zend_hash* functions would give
slightly higher % I guess, being less overhead, right?) Results with
5.2.0-RC4:Elements
n | n n+1
8 | 0% 14.2%
16 | 11.0% 20.9%
32 | 13.5% 22.1%
64 | 12.6% 21.3%
128 | 11.7% 18.5%
256 | 9.3% 16.4%
512 | 8.6% 17.0%
1024 | 7.9% 15.8%
2048 | 4.8% 33.3%
4096 | 7.8% 28.3%
8192 | 10.2% 58.4%
16384 | 24.1% 70.5%
32768 | 34.5% 80.4%
65536 | 34.8% 68.6%I haven't looked thoroughly, but the only place I've seen that uses an
non-zero nSize in hash_init (besides some in the core engine) is the
unserialize()
function. :-/ It seems the most simple and obvious place
that
could be changed is in Zend/zend_variables.c:_zval_copy_ctor_func? Just
replacezend[_u]_hash_init(tmp_ht, 0, ...
with
zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...Other than that, some of PHP's array functions and array-returning
functions
(database fetch row functions, etc.) are ones I thought of that could be
optimized like this. Maybe create an array_init_size() macro to be used
in
places where the number of elements can be easily determined? Thoughts?
I'd volunteer to look for places that could be updated and modify them.
:-)Thanks,
MattP.S. I guess the array stuff applies for objects also? But I don't
know
much about their internals...
Looks like a good optimization to me.
Hi Nuno,
Late reply, but I'm glad the idea was able to be used. :-)
To the other hash people: While checking how Ilia made
implode()
faster in
5.2, I think I found another optimization that can be done. In
zend_hash_copy/merge, it seems the zend_hash_quick_* functions can
be used
instead for associative entries, thus saving the key from being hashed
again, right? I just checked, and it's definitely faster, how much
so being
dependent on the key length of course...Patches for this simple additional optimization (hopefully):
http://realplain.com/php/hash_quick_copy.diff
http://realplain.com/php/hash_quick_copy_5_2.diffThanks,
Matt----- Original Message -----
From: "Nuno Lopes"
Sent: Wednesday, September 27, 2006Aaahh nice catch ;)
From a quick lookup in PHP sources I also found these functions
were the
same optimization can be applied:
zend_default_exception_new_ex() - trivial
convert_to_array() - not as easy as others
zend_object_std_init() - possibly
clone_wrapper_hash() - trivial (php-src/main/streams/streams.c)
compiler_globals_ctor()
...Nuno
----- Original Message -----
Hi all,
I noticed awhile ago how most every use of zend[_u]_hash_init has
nSize
as
- Of course it isn't always known how many elements will be
added to
the
array (nTableSize), but there are places where an accurate value
could
be
used instead of getting the minimum default (8). For anyone who
doesn't
know, HashTable sizes are a power of 2, and when there are more than
that
many elements, zend_hash_do_resize realloc's more memory to
double the
table
size, and then zend_hash_rehash basically goes through and
"reinserts"
all
the elements again.I ran some tests to see the speed improvement from specifying the
actual
number of elements in hash_init, thus saving that extra work
(_rehash
mostly). The "n+1" is with one more element (9, 17, 33...) that
triggers
another rehash when the actual number wasn't specified (most
extreme).
(I
used add_next_index_zval, so the direct zend_hash* functions
would give
slightly higher % I guess, being less overhead, right?) Results
with
5.2.0-RC4:Elements
n | n n+1
8 | 0% 14.2%
16 | 11.0% 20.9%
32 | 13.5% 22.1%
64 | 12.6% 21.3%
128 | 11.7% 18.5%
256 | 9.3% 16.4%
512 | 8.6% 17.0%
1024 | 7.9% 15.8%
2048 | 4.8% 33.3%
4096 | 7.8% 28.3%
8192 | 10.2% 58.4%
16384 | 24.1% 70.5%
32768 | 34.5% 80.4%
65536 | 34.8% 68.6%I haven't looked thoroughly, but the only place I've seen that
uses an
non-zero nSize in hash_init (besides some in the core engine) is the
unserialize()
function. :-/ It seems the most simple and obvious
place
that
could be changed is in Zend/
zend_variables.c:_zval_copy_ctor_func? Just
replacezend[_u]_hash_init(tmp_ht, 0, ...
with
zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...Other than that, some of PHP's array functions and array-returning
functions
(database fetch row functions, etc.) are ones I thought of that
could be
optimized like this. Maybe create an array_init_size() macro to
be used
in
places where the number of elements can be easily determined?
Thoughts?
I'd volunteer to look for places that could be updated and modify
them.
:-)Thanks,
MattP.S. I guess the array stuff applies for objects also? But I don't
know
much about their internals...--
Ilia Alshanetsky