Lazy keys comparison during hash lookups

9 years ago by Andone — view source

unread

Hi Guys,

I would like to propose a small change into the DJBX33A hash function algorithm which will make easier the key matching validations in hash lookup functions.

The change addresses the modulo 8 tailing bytes of the key. For these bytes we can use an 8 bit shift instead of a 5 bit shift; we also need to replace the ADD by XOR, in order to avoid byte level overflows. This change ensures the uniqueness of the hash function transformation for the tailing bytes: supposing two strings have same partial hash value for the first Nx8 bytes, different combinations of tailing characters (with the same tail size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,
same length,
same bytes for the first Nx8 positions,
then they are equal, and the tailing bytes can be skipped during comparison.

There is a visible performance gain if we apply this approach as we can use a lightweight memcmp() implementation based on longs comparison and completely free of the complexity incurred by tailing bytes. For Mediawiki I have a 1.7% performance gain while Wordpress reports 1.2% speedup on Haswell-EP.

Let’s take a small example:
Suppose we have a key=”this_is_a_key_value”.
The hash function for the first N x 8 byes are computed in the original way; suppose “this_is_a_key_va” (16bytes) will return a partial hash value h1; the final hash value will be computed by the following sequence:
h = ((h1<<8) ^ h1) ^ ‘l’;
h = ((h<<8) ^ h) ^ ‘u’;
h = ((h<<8) ^ h) ^ ‘e’;
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ (‘l’<<16) ^ ((‘l’^‘u’)<<8) ^ (‘l’^’u’^‘e’)
We can see that ht=(‘l’<<16) ^ ((‘l’^‘u’)<<8) ^ (‘l’^’u’^‘e’) cannot be obtained by any other 3 characters long tail. The statement is not true if we use ADD instead of XOR, as extended ASCII characters might generate overflows affecting the LSB of the higher byte in the hash value.

I pushed a pull request here: https://github.com/php/php-src/pull/1793. Unfortunately it does not pass the travis tests because “htmlspecialchars etc use a generated table that assumes the current hash function” as noticed by Nikita.

Let me know your thoughts on this idea.
Thanks,
Bogdan

9 years ago by Nikita Popov — view source

unread

On Tue, Mar 8, 2016 at 2:01 PM, Andone, Bogdan bogdan.andone@intel.com
wrote:

Hi Guys,

I would like to propose a small change into the DJBX33A hash function
algorithm which will make easier the key matching validations in hash
lookup functions.

The change addresses the modulo 8 tailing bytes of the key. For these
bytes we can use an 8 bit shift instead of a 5 bit shift; we also need to
replace the ADD by XOR, in order to avoid byte level overflows. This change
ensures the uniqueness of the hash function transformation for the tailing
bytes: supposing two strings have same partial hash value for the first Nx8
bytes, different combinations of tailing characters (with the same tail
size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,

same length,

same bytes for the first Nx8 positions,
then they are equal, and the tailing bytes can be skipped during
comparison.

There is a visible performance gain if we apply this approach as we can
use a lightweight memcmp() implementation based on longs comparison and
completely free of the complexity incurred by tailing bytes. For Mediawiki
I have a 1.7% performance gain while Wordpress reports 1.2% speedup on
Haswell-EP.

Let’s take a small example:
Suppose we have a key=”this_is_a_key_value”.
The hash function for the first N x 8 byes are computed in the original
way; suppose “this_is_a_key_va” (16bytes) will return a partial hash value
h1; the final hash value will be computed by the following sequence:
h = ((h1<<8) ^ h1) ^ ‘l’;
h = ((h<<8) ^ h) ^ ‘u’;
h = ((h<<8) ^ h) ^ ‘e’;
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ (‘l’<<16) ^ ((‘l’^‘u’)<<8) ^
(‘l’^’u’^‘e’)
We can see that ht=(‘l’<<16) ^ ((‘l’^‘u’)<<8) ^ (‘l’^’u’^‘e’) cannot be
obtained by any other 3 characters long tail. The statement is not true if
we use ADD instead of XOR, as extended ASCII characters might generate
overflows affecting the LSB of the higher byte in the hash value.

I pushed a pull request here: https://github.com/php/php-src/pull/1793.
Unfortunately it does not pass the travis tests because “htmlspecialchars
etc use a generated table that assumes the current hash function” as
noticed by Nikita.

Let me know your thoughts on this idea.

Hey Bogdan,

This looks like an interesting idea! I'm somewhat apprehensive about
coupling this to a change of the hash function, for two reasons:
a) This will make it more problematic if we want to change the hash
function in the future, e.g. if we want to switch to SipHash.
b) The quality of the new hash distribution is not immediately clear, but
likely non-trivially weaker.

So I'm wondering if we can keep the concept of using a zend_ulong aligned
memcmp while leaving the hash function alone: The zend_string allocation
policy already allocates the string data aligned and padded to zend_ulong
boundaries. If we were to additionally explicitly zero out the last byte
(to avoid valgrind warnings) we should be able to compare the character
data of two zend_strings using a zend_ulong memcmp. This would have the
additional benefit that it works for normal string comparisons (unrelated
to hashtables) as well. On the other hand, this is only possible for
zend_string to zend_string comparisons, not for comparisons with static
strings.

Regards,
Nikita

9 years ago by Nikita Popov — view source

unread

On Tue, Mar 8, 2016 at 2:01 PM, Andone, Bogdan bogdan.andone@intel.com
wrote:

Hi Guys,

I would like to propose a small change into the DJBX33A hash function
algorithm which will make easier the key matching validations in hash
lookup functions.

The change addresses the modulo 8 tailing bytes of the key. For these
bytes we can use an 8 bit shift instead of a 5 bit shift; we also need to
replace the ADD by XOR, in order to avoid byte level overflows. This change
ensures the uniqueness of the hash function transformation for the tailing
bytes: supposing two strings have same partial hash value for the first Nx8
bytes, different combinations of tailing characters (with the same tail
size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,

same length,

same bytes for the first Nx8 positions,
then they are equal, and the tailing bytes can be skipped during
comparison.

There is a visible performance gain if we apply this approach as we can
use a lightweight memcmp() implementation based on longs comparison and
completely free of the complexity incurred by tailing bytes. For Mediawiki
I have a 1.7% performance gain while Wordpress reports 1.2% speedup on
Haswell-EP.

Let’s take a small example:
Suppose we have a key=”this_is_a_key_value”.
The hash function for the first N x 8 byes are computed in the original
way; suppose “this_is_a_key_va” (16bytes) will return a partial hash value
h1; the final hash value will be computed by the following sequence:
h = ((h1<<8) ^ h1) ^ ‘l’;
h = ((h<<8) ^ h) ^ ‘u’;
h = ((h<<8) ^ h) ^ ‘e’;
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ (‘l’<<16) ^ ((‘l’^‘u’)<<8) ^
(‘l’^’u’^‘e’)
We can see that ht=(‘l’<<16) ^ ((‘l’^‘u’)<<8) ^ (‘l’^’u’^‘e’) cannot be
obtained by any other 3 characters long tail. The statement is not true if
we use ADD instead of XOR, as extended ASCII characters might generate
overflows affecting the LSB of the higher byte in the hash value.

I pushed a pull request here: https://github.com/php/php-src/pull/1793.
Unfortunately it does not pass the travis tests because “htmlspecialchars
etc use a generated table that assumes the current hash function” as
noticed by Nikita.

Let me know your thoughts on this idea.

Hey Bogdan,

This looks like an interesting idea! I'm somewhat apprehensive about
coupling this to a change of the hash function, for two reasons:
a) This will make it more problematic if we want to change the hash
function in the future, e.g. if we want to switch to SipHash.
b) The quality of the new hash distribution is not immediately clear, but
likely non-trivially weaker.

So I'm wondering if we can keep the concept of using a zend_ulong aligned
memcmp while leaving the hash function alone: The zend_string allocation
policy already allocates the string data aligned and padded to zend_ulong
boundaries. If we were to additionally explicitly zero out the last byte
(to avoid valgrind warnings) we should be able to compare the character
data of two zend_strings using a zend_ulong memcmp. This would have the
additional benefit that it works for normal string comparisons (unrelated
to hashtables) as well. On the other hand, this is only possible for
zend_string to zend_string comparisons, not for comparisons with static
strings.

s/zero out the last byte/zero out the last zend_ulong

I'd like to add another issue with relying on the hash for this which I
just remembered: We currently always set the top bit of the hash for
strings (see http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_string.h#351), in
order to ensure that hashes are never zero. This makes the hash non-unique.

Nikita

9 years ago by Andone — view source

unread

From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Tuesday, March 08, 2016 3:43 PM
To: Andone, Bogdan
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Lazy keys comparison during hash lookups

On Tue, Mar 8, 2016 at 2:01 PM, Andone, Bogdan
bogdan.andone@intel.com wrote:
Hi Guys,

I would like to propose a small change into the DJBX33A hash function
algorithm which will make easier the key matching validations in hash
lookup functions.

The change addresses the modulo 8 tailing bytes of the key. For these
bytes we can use an 8 bit shift instead of a 5 bit shift; we also need
to replace the ADD by XOR, in order to avoid byte level overflows. This
change ensures the uniqueness of the hash function transformation for
the tailing bytes: supposing two strings have same partial hash value
for the first Nx8 bytes, different combinations of tailing characters
(with the same tail size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,

same length,

same bytes for the first Nx8 positions,
then they are equal, and the tailing bytes can be skipped during
comparison.

There is a visible performance gain if we apply this approach as we
can use a lightweight memcmp() implementation based on longs comparison
and completely free of the complexity incurred by tailing bytes. For
Mediawiki I have a 1.7% performance gain while Wordpress reports 1.2%
speedup on Haswell-EP.

Let’s take a small example:
Suppose we have a key=”this_is_a_key_value”.
The hash function for the first N x 8 byes are computed in the
original way; suppose “this_is_a_key_va” (16bytes) will return a partial
hash value h1; the final hash value will be computed by the following
sequence:
h = ((h1<<8) ^ h1) ^ ‘l’;
h = ((h<<8) ^ h) ^ ‘u’;
h = ((h<<8) ^ h) ^ ‘e’;
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ (‘l’<<16) ^ ((‘l’^‘u’)<<8) ^
(‘l’^’u’^‘e’)
We can see that ht=(‘l’<<16) ^ ((‘l’^‘u’)<<8) ^ (‘l’^’u’^‘e’) cannot
be obtained by any other 3 characters long tail. The statement is not
true if we use ADD instead of XOR, as extended ASCII characters might
generate overflows affecting the LSB of the higher byte in the hash
value.

I pushed a pull request here: https://github.com/php/php-
src/pull/1793. Unfortunately it does not pass the travis tests because
“htmlspecialchars etc use a generated table that assumes the current
hash function” as noticed by Nikita.

Let me know your thoughts on this idea.

Hey Bogdan,
This looks like an interesting idea! I'm somewhat apprehensive about
coupling this to a change of the hash function, for two reasons:
a) This will make it more problematic if we want to change the hash
function in the future, e.g. if we want to switch to SipHash.
b) The quality of the new hash distribution is not immediately clear,
but likely non-trivially weaker.
So I'm wondering if we can keep the concept of using a zend_ulong
aligned memcmp while leaving the hash function alone: The zend_string
allocation policy already allocates the string data aligned and padded
to zend_ulong boundaries. If we were to additionally explicitly zero out
the last byte (to avoid valgrind warnings) we should be able to compare
the character data of two zend_strings using a zend_ulong memcmp. This
would have the additional benefit that it works for normal string
comparisons (unrelated to hashtables) as well. On the other hand, this
is only possible for zend_string to zend_string comparisons, not for
comparisons with static strings.
Regards,
s/zero out the last byte/zero out the last zend_ulong
I'd like to add another issue with relying on the hash for this which I
just remembered: We currently always set the top bit of the hash for
strings (see http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_string.h#351),
in order to ensure that hashes are never zero. This makes the hash non-
unique.
Nikita
Yeah... I missed the top bit set, but I think it could be solved somehow. Imho the strongest reason against the patch is the dependency between the hash function and the key validation method but I have the impression that this is not be the most dirty hack deployed ever in PHP; and I don't know very well what is the degree of compromise acceptable for 1% speedup :-)
On the other hand, your idea to zero the last zend_ulong in the zend_string looks nice. There could be an additional potential gain also from lightweight implementation of memcmp(). I will give it a try to see if it gets better results than the current proposal.
I assume all zend_string allocation functions are located in zend_string.h, correct?

Thanks,
Bogdan

9 years ago by Andone — view source

unread

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Tuesday, March 08, 2016 4:19 PM
To: 'Nikita Popov'
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Tuesday, March 08, 2016 3:43 PM
To: Andone, Bogdan
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Lazy keys comparison during hash lookups

On Tue, Mar 8, 2016 at 2:18 PM, Nikita Popov nikita.ppv@gmail.com
wrote:

On Tue, Mar 8, 2016 at 2:01 PM, Andone, Bogdan
bogdan.andone@intel.com wrote:
Hi Guys,

I would like to propose a small change into the DJBX33A hash
function
algorithm which will make easier the key matching validations in hash
lookup functions.

The change addresses the modulo 8 tailing bytes of the key. For
these
bytes we can use an 8 bit shift instead of a 5 bit shift; we also need
to replace the ADD by XOR, in order to avoid byte level overflows.
This
change ensures the uniqueness of the hash function transformation for
the tailing bytes: supposing two strings have same partial hash value
for the first Nx8 bytes, different combinations of tailing characters
(with the same tail size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,

same length,

same bytes for the first Nx8 positions,
then they are equal, and the tailing bytes can be skipped during
comparison.

There is a visible performance gain if we apply this approach as we
can use a lightweight memcmp() implementation based on longs
comparison
and completely free of the complexity incurred by tailing bytes. For
Mediawiki I have a 1.7% performance gain while Wordpress reports 1.2%
speedup on Haswell-EP.

Let’s take a small example:
Suppose we have a key=”this_is_a_key_value”.
The hash function for the first N x 8 byes are computed in the
original way; suppose “this_is_a_key_va” (16bytes) will return a
partial
hash value h1; the final hash value will be computed by the following
sequence:
h = ((h1<<8) ^ h1) ^ ‘l’;
h = ((h<<8) ^ h) ^ ‘u’;
h = ((h<<8) ^ h) ^ ‘e’;
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ (‘l’<<16) ^ ((‘l’^‘u’)<<8)
^
(‘l’^’u’^‘e’)
We can see that ht=(‘l’<<16) ^ ((‘l’^‘u’)<<8) ^
(‘l’^’u’^‘e’) cannot
be obtained by any other 3 characters long tail. The statement is not
true if we use ADD instead of XOR, as extended ASCII characters might
generate overflows affecting the LSB of the higher byte in the hash
value.

I pushed a pull request here: https://github.com/php/php-
src/pull/1793. Unfortunately it does not pass the travis tests because
“htmlspecialchars etc use a generated table that assumes the current
hash function” as noticed by Nikita.

Let me know your thoughts on this idea.

Hey Bogdan,
This looks like an interesting idea! I'm somewhat apprehensive about
coupling this to a change of the hash function, for two reasons:
a) This will make it more problematic if we want to change the hash
function in the future, e.g. if we want to switch to SipHash.
b) The quality of the new hash distribution is not immediately
clear,
but likely non-trivially weaker.
So I'm wondering if we can keep the concept of using a zend_ulong
aligned memcmp while leaving the hash function alone: The zend_string
allocation policy already allocates the string data aligned and padded
to zend_ulong boundaries. If we were to additionally explicitly zero
out
the last byte (to avoid valgrind warnings) we should be able to
compare
the character data of two zend_strings using a zend_ulong memcmp. This
would have the additional benefit that it works for normal string
comparisons (unrelated to hashtables) as well. On the other hand, this
is only possible for zend_string to zend_string comparisons, not for
comparisons with static strings.
Regards,
s/zero out the last byte/zero out the last zend_ulong
I'd like to add another issue with relying on the hash for this which
I
just remembered: We currently always set the top bit of the hash for
strings (see
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_string.h#351),
in order to ensure that hashes are never zero. This makes the hash
non-
unique.
Nikita
Yeah... I missed the top bit set, but I think it could be solved
somehow.
Actually setting the top bit is not an issue. The transformation unicity shall be ensured for the tailing bytes only which are shifted to the left maximum 7(on 64bits)/3(on 32bits) positions; so they will never reach the top byte and setting the top bit will not harm in terms of tailing bytes.

Imho the strongest reason against the patch is the dependency
between the hash function and the key validation method but I have the
impression that this is not be the most dirty hack deployed ever in PHP;
and I don't know very well what is the degree of compromise acceptable
for 1% speedup :-)
Otherwise the concerns about the quality of the hash function and the intention to change it with SipHash or something else, I am afraid it will not bring much in terms of collisions reductions; the issue here is the fact that hash bucket selection uses only the last few bits of the hash value. Not sure how much could help here SipHash...
Just for defending the technical feasibility of this POC I think that injective transformation for last tailing bytes could be deployed over theoretically any hash function.

Thanks,
Bogdan

On the other hand, your idea to zero the last zend_ulong in the
zend_string looks nice. There could be an additional potential gain also
from lightweight implementation of memcmp(). I will give it a try to see
if it gets better results than the current proposal.
I assume all zend_string allocation functions are located in
zend_string.h, correct?

Thanks,
Bogdan

9 years ago by Andone — view source

unread

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Wednesday, March 09, 2016 9:33 AM
To: 'Nikita Popov'
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Tuesday, March 08, 2016 4:19 PM
To: 'Nikita Popov'
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Tuesday, March 08, 2016 3:43 PM
To: Andone, Bogdan
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Lazy keys comparison during hash lookups

On Tue, Mar 8, 2016 at 2:18 PM, Nikita Popov nikita.ppv@gmail.com
wrote:

On Tue, Mar 8, 2016 at 2:01 PM, Andone, Bogdan
bogdan.andone@intel.com wrote:
Hi Guys,

I would like to propose a small change into the DJBX33A hash
function
algorithm which will make easier the key matching validations in
hash
lookup functions.

The change addresses the modulo 8 tailing bytes of the key. For
these
bytes we can use an 8 bit shift instead of a 5 bit shift; we also
need
to replace the ADD by XOR, in order to avoid byte level overflows.
This
change ensures the uniqueness of the hash function transformation
for
the tailing bytes: supposing two strings have same partial hash
value
for the first Nx8 bytes, different combinations of tailing
characters
(with the same tail size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,

same length,

same bytes for the first Nx8 positions,
then they are equal, and the tailing bytes can be skipped during
comparison.

There is a visible performance gain if we apply this approach as
we
can use a lightweight memcmp() implementation based on longs
comparison
and completely free of the complexity incurred by tailing bytes. For
Mediawiki I have a 1.7% performance gain while Wordpress reports
1.2%
speedup on Haswell-EP.

Let’s take a small example:
Suppose we have a key=”this_is_a_key_value”.
The hash function for the first N x 8 byes are computed in the
original way; suppose “this_is_a_key_va” (16bytes) will return a
partial
hash value h1; the final hash value will be computed by the
following
sequence:
h = ((h1<<8) ^ h1) ^ ‘l’;
h = ((h<<8) ^ h) ^ ‘u’;
h = ((h<<8) ^ h) ^ ‘e’;
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ (‘l’<<16) ^
((‘l’^‘u’)<<8)
^
(‘l’^’u’^‘e’)
We can see that ht=(‘l’<<16) ^ ((‘l’^‘u’)<<8) ^
(‘l’^’u’^‘e’) cannot
be obtained by any other 3 characters long tail. The statement is
not
true if we use ADD instead of XOR, as extended ASCII characters
might
generate overflows affecting the LSB of the higher byte in the hash
value.

I pushed a pull request here: https://github.com/php/php-
src/pull/1793. Unfortunately it does not pass the travis tests
because
“htmlspecialchars etc use a generated table that assumes the current
hash function” as noticed by Nikita.

Let me know your thoughts on this idea.

Hey Bogdan,
This looks like an interesting idea! I'm somewhat apprehensive
about
coupling this to a change of the hash function, for two reasons:
a) This will make it more problematic if we want to change the
hash
function in the future, e.g. if we want to switch to SipHash.
b) The quality of the new hash distribution is not immediately
clear,
but likely non-trivially weaker.
So I'm wondering if we can keep the concept of using a zend_ulong
aligned memcmp while leaving the hash function alone: The
zend_string
allocation policy already allocates the string data aligned and
padded
to zend_ulong boundaries. If we were to additionally explicitly zero
out
the last byte (to avoid valgrind warnings) we should be able to
compare
the character data of two zend_strings using a zend_ulong memcmp.
This
would have the additional benefit that it works for normal string
comparisons (unrelated to hashtables) as well. On the other hand,
this
is only possible for zend_string to zend_string comparisons, not for
comparisons with static strings.
Regards,
s/zero out the last byte/zero out the last zend_ulong
I'd like to add another issue with relying on the hash for this
which
I
just remembered: We currently always set the top bit of the hash for
strings (see
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_string.h#351),
in order to ensure that hashes are never zero. This makes the hash
non-
unique.
Nikita
Yeah... I missed the top bit set, but I think it could be solved
somehow.
Actually setting the top bit is not an issue. The transformation unicity
shall be ensured for the tailing bytes only which are shifted to the
left maximum 7(on 64bits)/3(on 32bits) positions; so they will never
reach the top byte and setting the top bit will not harm in terms of
tailing bytes.

Imho the strongest reason against the patch is the dependency
between the hash function and the key validation method but I have the
impression that this is not be the most dirty hack deployed ever in
PHP;
and I don't know very well what is the degree of compromise acceptable
for 1% speedup :-)
Otherwise the concerns about the quality of the hash function and the
intention to change it with SipHash or something else, I am afraid it
will not bring much in terms of collisions reductions; the issue here is
the fact that hash bucket selection uses only the last few bits of the
hash value. Not sure how much could help here SipHash...
Just for defending the technical feasibility of this POC I think that
injective transformation for last tailing bytes could be deployed over
theoretically any hash function.

Thanks,
Bogdan

On the other hand, your idea to zero the last zend_ulong in the
zend_string looks nice. There could be an additional potential gain
also
from lightweight implementation of memcmp(). I will give it a try to
see
if it gets better results than the current proposal.
I assume all zend_string allocation functions are located in
zend_string.h, correct?

Thanks,
Bogdan
I worked a little bit of adapting the zend string allocation functions for ensuring long size alignment for both Zend and standard allocation, and for zeroing out the end of the strings.
The sad surprise was to discover that there are other >100 situations, in extensions mainly, where the zend strings are truncated by directly inserting '0' terminators in the middle of strings; all these situations needs to be revised for zeroing out the entire memory word enclosing the end of the string.
It will probably take a while for revising most of these places before having a correct real life run for being able to validate the POC.
Optimistically speaking, supposing POC will show motivating results, we will probably need to enforce the Zend interface with respect to string truncation - at least a getter/setter pair for ZSTR_LEN where the setter should ensure also synchronizing the string termination.
I suppose it will be an important effort for the community and I am not sure how much willingness will be in it. On the other side, the originally proposed solution does not have all this drawbacks and it offers 1% speedup by a localized change which could be easily reverted in the case we will get better results by a better string alignment in future.

Thanks,
Bogdan

9 years ago by Andone — view source

unread

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Friday, March 18, 2016 11:58 AM
To: 'Nikita Popov' nikita.ppv@gmail.com
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Wednesday, March 09, 2016 9:33 AM
To: 'Nikita Popov'
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Tuesday, March 08, 2016 4:19 PM
To: 'Nikita Popov'
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Tuesday, March 08, 2016 3:43 PM
To: Andone, Bogdan
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Lazy keys comparison during hash lookups

On Tue, Mar 8, 2016 at 2:18 PM, Nikita Popov
nikita.ppv@gmail.com
wrote:

On Tue, Mar 8, 2016 at 2:01 PM, Andone, Bogdan
bogdan.andone@intel.com wrote:
Hi Guys,

I would like to propose a small change into the DJBX33A hash
function
algorithm which will make easier the key matching validations in
hash
lookup functions.

The change addresses the modulo 8 tailing bytes of the key. For
these
bytes we can use an 8 bit shift instead of a 5 bit shift; we also
need
to replace the ADD by XOR, in order to avoid byte level overflows.
This
change ensures the uniqueness of the hash function transformation
for
the tailing bytes: supposing two strings have same partial hash
value
for the first Nx8 bytes, different combinations of tailing
characters
(with the same tail size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,

same length,

same bytes for the first Nx8 positions, then they are equal,
and the tailing bytes can be skipped during
comparison.

There is a visible performance gain if we apply this approach
as
we
can use a lightweight memcmp() implementation based on longs
comparison
and completely free of the complexity incurred by tailing bytes.
For Mediawiki I have a 1.7% performance gain while Wordpress
reports
1.2%
speedup on Haswell-EP.

Let’s take a small example:
Suppose we have a key=”this_is_a_key_value”.
The hash function for the first N x 8 byes are computed in the
original way; suppose “this_is_a_key_va” (16bytes) will return a
partial
hash value h1; the final hash value will be computed by the
following
sequence:
h = ((h1<<8) ^ h1) ^ ‘l’;
h = ((h<<8) ^ h) ^ ‘u’;
h = ((h<<8) ^ h) ^ ‘e’;
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ (‘l’<<16) ^
((‘l’^‘u’)<<8)
^
(‘l’^’u’^‘e’)
We can see that ht=(‘l’<<16) ^ ((‘l’^‘u’)<<8) ^
(‘l’^’u’^‘e’) cannot
be obtained by any other 3 characters long tail. The statement is
not
true if we use ADD instead of XOR, as extended ASCII characters
might
generate overflows affecting the LSB of the higher byte in the
hash value.

I pushed a pull request here: https://github.com/php/php-
src/pull/1793. Unfortunately it does not pass the travis tests
because
“htmlspecialchars etc use a generated table that assumes the
current hash function” as noticed by Nikita.

Let me know your thoughts on this idea.

Hey Bogdan,
This looks like an interesting idea! I'm somewhat apprehensive
about
coupling this to a change of the hash function, for two reasons:
a) This will make it more problematic if we want to change the
hash
function in the future, e.g. if we want to switch to SipHash.
b) The quality of the new hash distribution is not immediately
clear,
but likely non-trivially weaker.
So I'm wondering if we can keep the concept of using a
zend_ulong
aligned memcmp while leaving the hash function alone: The
zend_string
allocation policy already allocates the string data aligned and
padded
to zend_ulong boundaries. If we were to additionally explicitly
zero
out
the last byte (to avoid valgrind warnings) we should be able to
compare
the character data of two zend_strings using a zend_ulong memcmp.
This
would have the additional benefit that it works for normal string
comparisons (unrelated to hashtables) as well. On the other hand,
this
is only possible for zend_string to zend_string comparisons, not
for comparisons with static strings.
Regards,
s/zero out the last byte/zero out the last zend_ulong I'd like to
add another issue with relying on the hash for this
which
I
just remembered: We currently always set the top bit of the hash
for strings (see
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_string.h#351),
in order to ensure that hashes are never zero. This makes the hash
non-
unique.
Nikita
Yeah... I missed the top bit set, but I think it could be solved
somehow.
Actually setting the top bit is not an issue. The transformation
unicity shall be ensured for the tailing bytes only which are shifted
to the left maximum 7(on 64bits)/3(on 32bits) positions; so they will
never reach the top byte and setting the top bit will not harm in
terms of tailing bytes.

Imho the strongest reason against the patch is the dependency
between the hash function and the key validation method but I have
the impression that this is not be the most dirty hack deployed ever
in
PHP;
and I don't know very well what is the degree of compromise
acceptable for 1% speedup :-)
Otherwise the concerns about the quality of the hash function and the
intention to change it with SipHash or something else, I am afraid it
will not bring much in terms of collisions reductions; the issue here
is the fact that hash bucket selection uses only the last few bits of
the hash value. Not sure how much could help here SipHash...
Just for defending the technical feasibility of this POC I think that
injective transformation for last tailing bytes could be deployed over
theoretically any hash function.

Thanks,
Bogdan

On the other hand, your idea to zero the last zend_ulong in the
zend_string looks nice. There could be an additional potential gain
also
from lightweight implementation of memcmp(). I will give it a try to
see
if it gets better results than the current proposal.
I assume all zend_string allocation functions are located in
zend_string.h, correct?

Thanks,
Bogdan
I worked a little bit of adapting the zend string allocation functions for ensuring
long size alignment for both Zend and standard allocation, and for zeroing out
the end of the strings.
The sad surprise was to discover that there are other >100 situations, in
extensions mainly, where the zend strings are truncated by directly inserting '0'
terminators in the middle of strings; all these situations needs to be revised for
zeroing out the entire memory word enclosing the end of the string.
It will probably take a while for revising most of these places before having a
correct real life run for being able to validate the POC.
Optimistically speaking, supposing POC will show motivating results, we will
probably need to enforce the Zend interface with respect to string truncation -
at least a getter/setter pair for ZSTR_LEN where the setter should ensure also
synchronizing the string termination.
I suppose it will be an important effort for the community and I am not sure how
much willingness will be in it. On the other side, the originally proposed solution
does not have all this drawbacks and it offers 1% speedup by a localized change
which could be easily reverted in the case we will get better results by a better
string alignment in future.

Thanks,
Bogdan
I finalized the POC for ensuring strings granularity to zend_long size; you can find it here: https://gist.github.com/bogdanandone/cfde764f64137b57ce2a6f4a9b52e757
Unfortunately, the effort to zero-out the end of strings generates an overhead higher than the gain incurred by simplified zend_memcpy_aligned memcpy() & memcmp() implementations - I defined a simpler zend_memcpy_aligned() function used in zend_string.h and zend_ulong_memeq() used in zend_hash.c inherited from my original place. Overall I see ~1% loss on Worpress on my platform. I didn't spend time in fine tuning each place needing changes but I don't expect a real difference.
Probably this is not the way to go and I finalize with a last buzz for the original patch proposal :)

Thanks,
Bogdan

9 years ago by Andone — view source

unread

-----Original Message-----
From: Matt Wilmas [mailto:php_lists@realplain.com]
Sent: Wednesday, April 27, 2016 5:44 PM
To: Andone, Bogdan bogdan.andone@intel.com; 'Nikita Popov'
nikita.ppv@gmail.com
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Lazy keys comparison during hash lookups

Hi Bogdan, all,

----- Original Message -----
From: "Andone, Bogdan"
Sent: Monday, April 25, 2016

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Friday, March 18, 2016 11:58 AM
To: 'Nikita Popov' nikita.ppv@gmail.com
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Wednesday, March 09, 2016 9:33 AM
To: 'Nikita Popov'
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

-----Original Message-----
From: Andone, Bogdan [mailto:bogdan.andone@intel.com]
Sent: Tuesday, March 08, 2016 4:19 PM
To: 'Nikita Popov'
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] Lazy keys comparison during hash lookups

From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Tuesday, March 08, 2016 3:43 PM
To: Andone, Bogdan
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] Lazy keys comparison during hash lookups

On Tue, Mar 8, 2016 at 2:18 PM, Nikita Popov
nikita.ppv@gmail.com
wrote:

On Tue, Mar 8, 2016 at 2:01 PM, Andone, Bogdan
bogdan.andone@intel.com wrote:
Hi Guys,

I would like to propose a small change into the DJBX33A hash
function
algorithm which will make easier the key matching validations
in
hash
lookup functions.

The change addresses the modulo 8 tailing bytes of the key.
For
these
bytes we can use an 8 bit shift instead of a 5 bit shift; we
also
need
to replace the ADD by XOR, in order to avoid byte level overflows.
This
change ensures the uniqueness of the hash function
transformation
for
the tailing bytes: supposing two strings have same partial hash
value
for the first Nx8 bytes, different combinations of tailing
characters
(with the same tail size) will always generate different keys.
We have the following consequences:
If two strings have:

same hash value,

same length,

same bytes for the first Nx8 positions, then they are
equal, and the tailing bytes can be skipped during
comparison.

There is a visible performance gain if we apply this
approach as
we
can use a lightweight memcmp() implementation based on longs
comparison
and completely free of the complexity incurred by tailing bytes.
For Mediawiki I have a 1.7% performance gain while Wordpress
reports
1.2%
speedup on Haswell-EP.

Let's take a small example:
Suppose we have a key="this_is_a_key_value".
The hash function for the first N x 8 byes are computed in
the
original way; suppose "this_is_a_key_va" (16bytes) will return
a
partial
hash value h1; the final hash value will be computed by the
following
sequence:
h = ((h1<<8) ^ h1) ^ 'l';
h = ((h<<8) ^ h) ^ 'u';
h = ((h<<8) ^ h) ^ 'e';
or, in only one operation:
h = (h1<<24) ^ (h1<<16) ^ (h1<<8) ^ h1 ^ ('l'<<16) ^
(('l'^'u')<<8)
^
('l'^'u'^'e')
We can see that ht=('l'<<16) ^ (('l'^'u')<<8) ^
('l'^'u'^'e') cannot
be obtained by any other 3 characters long tail. The statement
is
not
true if we use ADD instead of XOR, as extended ASCII characters
might
generate overflows affecting the LSB of the higher byte in the
hash value.

I pushed a pull request here: https://github.com/php/php-
src/pull/1793. Unfortunately it does not pass the travis tests
because
"htmlspecialchars etc use a generated table that assumes the
current hash function" as noticed by Nikita.

Let me know your thoughts on this idea.

Hey Bogdan,
This looks like an interesting idea! I'm somewhat
apprehensive
about
coupling this to a change of the hash function, for two reasons:
a) This will make it more problematic if we want to change
the
hash
function in the future, e.g. if we want to switch to SipHash.
b) The quality of the new hash distribution is not
immediately
clear,
but likely non-trivially weaker.
So I'm wondering if we can keep the concept of using a
zend_ulong
aligned memcmp while leaving the hash function alone: The
zend_string
allocation policy already allocates the string data aligned and
padded
to zend_ulong boundaries. If we were to additionally explicitly
zero
out
the last byte (to avoid valgrind warnings) we should be able to
compare
the character data of two zend_strings using a zend_ulong memcmp.
This
would have the additional benefit that it works for normal
string comparisons (unrelated to hashtables) as well. On the
other hand,
this
is only possible for zend_string to zend_string comparisons,
not for comparisons with static strings.
Regards,
s/zero out the last byte/zero out the last zend_ulong I'd like
to add another issue with relying on the hash for this
which
I
just remembered: We currently always set the top bit of the
hash for strings (see
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_string.h#351),
in order to ensure that hashes are never zero. This makes the
hash
non-
unique.
Nikita
Yeah... I missed the top bit set, but I think it could be solved
somehow.
Actually setting the top bit is not an issue. The transformation
unicity shall be ensured for the tailing bytes only which are
shifted to the left maximum 7(on 64bits)/3(on 32bits) positions; so
they will never reach the top byte and setting the top bit will not
harm in terms of tailing bytes.

Imho the strongest reason against the patch is the dependency
between the hash function and the key validation method but I
have the impression that this is not be the most dirty hack
deployed ever in
PHP;
and I don't know very well what is the degree of compromise
acceptable for 1% speedup :-)
Otherwise the concerns about the quality of the hash function and
the intention to change it with SipHash or something else, I am
afraid it will not bring much in terms of collisions reductions;
the issue here is the fact that hash bucket selection uses only the
last few bits of the hash value. Not sure how much could help here
SipHash...
Just for defending the technical feasibility of this POC I think
that injective transformation for last tailing bytes could be
deployed over theoretically any hash function.

Thanks,
Bogdan

On the other hand, your idea to zero the last zend_ulong in the
zend_string looks nice. There could be an additional potential
gain
also
from lightweight implementation of memcmp(). I will give it a try
to
see
if it gets better results than the current proposal.
I assume all zend_string allocation functions are located in
zend_string.h, correct?

Thanks,
Bogdan
I worked a little bit of adapting the zend string allocation
functions for ensuring long size alignment for both Zend and standard
allocation, and for zeroing out the end of the strings.
The sad surprise was to discover that there are other >100
situations, in extensions mainly, where the zend strings are
truncated by directly inserting '0'
terminators in the middle of strings; all these situations needs to
be revised for zeroing out the entire memory word enclosing the end
of the string.
It will probably take a while for revising most of these places
before having a correct real life run for being able to validate the
POC.
Optimistically speaking, supposing POC will show motivating results,
we will probably need to enforce the Zend interface with respect to
string truncation - at least a getter/setter pair for ZSTR_LEN where
the setter should ensure also synchronizing the string termination.
I suppose it will be an important effort for the community and I am
not sure how much willingness will be in it. On the other side, the
originally proposed solution does not have all this drawbacks and it
offers 1% speedup by a localized change which could be easily
reverted in the case we will get better results by a better string
alignment in future.

Thanks,
Bogdan
I finalized the POC for ensuring strings granularity to zend_long
size; you can find it here:
https://gist.github.com/bogdanandone/cfde764f64137b57ce2a6f4a9b52e757
Unfortunately, the effort to zero-out the end of strings generates an
overhead higher than the gain incurred by simplified
zend_memcpy_aligned
memcpy() & memcmp() implementations - I defined a simpler
zend_memcpy_aligned() function used in zend_string.h and
zend_ulong_memeq() used in zend_hash.c inherited from my original place.
Overall I see ~1% loss on Worpress on my platform. I didn't spend time
in fine tuning each place needing changes but I don't expect a real
difference.
Probably this is not the way to go and I finalize with a last buzz for
the original patch proposal :)

I thought, "Oh crap" when I saw your message, after you went to all that
trouble! I think there's a simple way to check the trailing bytes, but I thought
Valgrind would complain about it. However, just about 18 hours before your
mail, I found out that's not the case, so it seems fine. :-O I'll come back to this
last...

First, I wanted to thank you for the original idea -- because that led me to look
into PHP 7 hash stuff because I thought I wasn't familiar with enough of it.
Turned out I knew more than I realized, but I hope now my knowledge is much
more thorough... Just because of your e-mail. :^)

At first, I thought of, "Hey, SSE compares!" (starting with length, into the string,
at once). Ignoring even more alignment/trailing issues, I tested standalone
(using 16-1xx chars, Haswell). But _mm_empeq_epi8 +
_mm_movemask_epi8 were exactly the same as 64-bit scalar (don't think I tried
ints/-m32). The pmovmskb latency I guess kills any gain from checking 16-at-a-
time.

I guess for longer strings, could do 32 (64?) byte chunks, with 2 (4?) cmpeq's +
por + movemask to get closer to what memcmp() would do...?

Yeah, checking memcmp() standalone, the call overhead is high for common,
short strings. I'm guessing that's where most of the 1% came from?
Your feeling is right. My solution targets scenarios with predominant short keys.
What I observed on real life workloads like Wordpress is that hash table keys are usually short strings: I did a profiling on WP one year ago and if I remember correctly, ~%80 of keys have sizes < 32bytes; few isolated cases of keys with sizes >200bytes.
For short strings the time spent in memcmp() in dealing with trailing bytes is much higher than the two-three loops for copying 8bytes chunks. This has also consequences at branch prediction level as trailing bytes involves more conditional branches which are more miss-predicted that loop related branches.
My previous impression was that these facts above explain the 1% gain but, after the painful experiment of trying to round up the string sizes, I should say that part of the gain could be explained also by the fact that lazy strings involve less data accesses with some reduction of data misses.
My initial thought was also to enable SSE based implementation but, as you also observed, it does not worth on short strings.

Then, after learning about some other hash stuff, I came up with some ideas
for a new HashTable implementation in PHP! :^) I hope to start and have a
working version shortly (see how that works, before further improvements).
I think I'll send a new message about it first (I don't think anyone else is
planning any major changes, though).

Now, about your original "lazy" idea, specifically changing the hash
function to xor the trailing bytes with the hash value. I think there's a
major problem: Consider a "specially" (easily!) crafted string where the
trailing bytes can clear/set almost all hash bits. So almost the entire
hash value can be set by those trailing bytes. :-O This may not be a
problem if using a seeded hash function; not sure.
Not sure I understood your point.
For any string there is always a combination of trailing bytes for clearing/setting all hash bytes; this combination will be unique for any string. But I don't see any problem if this is not a common situation that will generate collisions. Your statement applies also to the current hash function, I think.
An example would probably help...

Finally, the [branchless] way to check trailing bytes instead? A few weeks
ago, I thought of using xor to compare, then shift out the unwanted bytes
(left/right for little/big endian, I guess). But Valgrind will complain
about about the uninitialized parts, I thought. Until I came across some
info. [1] Sat. night suggesting that Valgrind tracks values on a bit
level. So I had to try it, outside of PHP, and it worked fine!

[1]
https://www.usenix.org/legacy/publications/library/proceedings/usenix05/tech
/general/full_papers/seward/seward_html/usenix2005.html

Here's what I thought, which could even be run uncondtionally, without
checking for trailing bytes, since there's always another long-sized chunk:

mov (str1), %r
xor (str2), %r
and $7, %ecx # % 8
shl $3, %ecx # * 8
xor %any, %any # clear flags, for free, in case %cl is 0
shl %cl, %r
jne ...

The idea is nice but I think it needs additional tuning.
I assume %ecx keeps the string size, right? If this is true, supposing the trailing bytes count is 3, the code will shift out 3 bytes and it will return the comparison result for 5 bytes which is not correct; I think more instructions will be involved for doing the right shift.
Otherwise, it seems that your proposal works only for the trailing bytes. You cannot apply the same steps for the full chunks; in this case also only the number of trailing bytes will be compared in each chunk. Not talking about the fact that the code should still preserve the chuck index from an iteration to another, decrement it and so on. For being short you still have to deal with two parts of the implementation: one for entire chunks and one for trailing bytes and a conditional logic for jumping from one to another.

Should only cost the latency of the variable shift (slower than AMD :-P).

But the compilers aren't smart enough (?) to use the shift result for the
jump, and put an extra "test %r, %r." Well it will probably fuse with the
jump, so no extra micro-ops anyway. :-)

And my "ideal" instructions cause false errors with Valgrind anyway, when
the shift moves uninitialized bits to the carry flag. (%eflags is tracked
as a whole.) Adding a free "clear carry" doesn't help...

So I guess a check for trailing bytes is needed in C. But same idea, with
the full background. ;-)
Why bothering about carry flag? I assume Valgrind cries because the code accesses bytes that are not allocated and that some sanity allocation overhead is needed as you say, probably not at the complexity I've tried to deploy. By the way, the statement that there's always another long-sized chunk is not true for the cases where strings are not allocated with the zend allocator but with the system malloc().

Thanks,
Bogdan

Thoughts?

Thanks,
Bogdan

Matt