Improvements to Hastable Bucket structure

10 years ago by Benjamin Coutu — view source

unread

Hi,

Currently a hashtable bucket has to store both, the numeric index "h" and a potential pointer to a string key "key".

There is room for improvement here because "h" and "key" are conceptually mutually exclusive. I therefore propose to unionize "h" and "key" to effectively save the overhead of having to reserve space for both.

Now, in order to still be able to distinguish between an integer key and a pointer to a string key, one could use either of two approaches.

=== 1. Use flags of embedded zval (maybe _zval_struct.u1.v.*) ===

If the is-key-flag is set in the embedded zval then its a string key, if not then its an integer. This is of course only possible if _zval_struct.u1 is usable for this (I could not quickly tell).

=== 2. Pointer tagging ===

On a 64/32-bit machine all pointers will be aligned to 8/4 bytes meaning that their last 3/2 bits will always be zero. We can exploit this to effectively store information about the pointer in those bits.

We will use the last bit to distinguish between a pointer and an 63/31-bit integer, and the second last bit to distinguish between a pointer to a string key or pointer to the overflowing integer (64/32 bit). Here is a (8-bit) sample:

LLLLLLL1: integer (effectively 63/31)
PPPPPP00: pointer to zend_string (string key)
PPPPPP10: pointer to zend_ulong (in case integer key is > 63/31 bit)

One will then be able to use simple shifting and bitwise operations to extract the correct meaning.

Please let me know your thoughts!

Cheers,

Ben

Benjamin Coutu
Zeyon Technologies Inc.
http://www.zeyos.com

10 years ago by Paul Dragoonis — view source

unread

Hi,

Currently a hashtable bucket has to store both, the numeric index "h" and
a potential pointer to a string key "key".

There is room for improvement here because "h" and "key" are conceptually
mutually exclusive. I therefore propose to unionize "h" and "key" to
effectively save the overhead of having to reserve space for both.

This conceptually makes sense if you I deed just need one or the other but
not both.

Have you implemented this yet ? Have you benchmarked it ?

Now, in order to still be able to distinguish between an integer key and
a pointer to a string key, one could use either of two approaches.

=== 1. Use flags of embedded zval (maybe _zval_struct.u1.v.*) ===

If the is-key-flag is set in the embedded zval then its a string key, if
not then its an integer. This is of course only possible if _zval_struct.u1
is usable for this (I could not quickly tell).

=== 2. Pointer tagging ===

On a 64/32-bit machine all pointers will be aligned to 8/4 bytes meaning
that their last 3/2 bits will always be zero. We can exploit this to
effectively store information about the pointer in those bits.

We will use the last bit to distinguish between a pointer and an
63/31-bit integer, and the second last bit to distinguish between a pointer
to a string key or pointer to the overflowing integer (64/32 bit). Here is
a (8-bit) sample:

LLLLLLL1: integer (effectively 63/31)
PPPPPP00: pointer to zend_string (string key)
PPPPPP10: pointer to zend_ulong (in case integer key is > 63/31 bit)

One will then be able to use simple shifting and bitwise operations to
extract the correct meaning.

Please let me know your thoughts!

Cheers,

Ben

--

Benjamin Coutu
Zeyon Technologies Inc.
http://www.zeyos.com