Hello everyone,
I've identified a few more array/hash use cases where it might make sense to introduce special short circuit logic for packed arrays.
Specifically, there is an additional property of certain packed arrays (apart from being packed obviously) that we can utilize: A packed array with nNumUsed equal to nNumOfElements must be consecutively indexed from zero onward without any gaps (no IS_UNDEF). Let's call this special pure form of a packed array a "fully packed array".
Earlier I have posted on how this convenient property could speed things up for array_slice, even if preserved_keys=true (for the offset=0 case), see https://marc.info/?l=php-internals&m=147048569215717&w=2
I therefore propose to introduce the following two new macros in /Zend/zend_types.h in order to make it easier for developers to introduce special packed (or fully packed) array logic:
#define HT_IS_PACKED(ht) ((ht)->u.flags & HASH_FLAG_PACKED)
#define HT_IS_FULLY_PACKED(ht) (HT_IS_PACKED(HT) && (ht)->nNumUsed == (ht)->nNumOfElements)
With this we can easily speed things up (and more importantly preserve the packed array characteristics) for the common case of array_slice($packed_array, $offset=0, $length, $preserved_keys=true) by using the follwing snippet for https://github.com/php/php-src/blob/master/ext/standard/array.c#L2906 :
if (preserve_keys ? (offset == 0 && HT_IS_FULLY_PACKED(Z_ARRVAL_P(input))) : HT_IS_PACKED(Z_ARRVAL_P(input)))
... ZEND_HASH_FILL_PACKED ...
Another example where this could come in handy involves the encoding of JSON. For every array that it encodes, json_encode must first detemine wheter the array contains string keys or is not consecutively indexed, so that it can decide wheter to use a JSON object or a plain JSON array. That is accomplished through php_json_determine_array_type in /ext/json/json_encoder.c. That code currently has to iterate through the array until it finds a string key or a non-consecutive index (which in the worst case would mean iterating through the entire array).
Now, for fully packed arrays we can short circuit things and jump directly to the conclusion of it being a real plain old array, if it is packed and we can prove it has no gaps. That can of course easily be done via the new HT_IS_FULLY_PACKED macro. We can improve this code for a large amount of arrays with the following simple snippet for https://github.com/php/php-src/blob/master/ext/json/json_encoder.c#L38 :
if (HT_IS_FULLY_PACKED(myht))
return PHP_JSON_OUTPUT_ARRAY;
I'll be happy to make an effort to screen the entire code base in search for more cases where this could be useful once this is picked up by a lead core developer (maybe Xinchen?) who is willing to commit something on the lines of the above.
Any thoughts?
Benjamin
--
Bejamin Coutu
ben.coutu@zeyos.com
ZeyOS, Inc.
http://www.zeyos.com
Hi Benjamin,
These are interesting optimisations. I definitely see the usefulness of
detecting packed arrays and short-circuiting: I've done that in my patch
to fix object/array casting, in order to avoid wasting time checking for
the existence of non-string keys, even if (object)[1, 2, 3] is probably
quite a rare case.
Personally, having macros for this in the Zend API might bother me
slightly, because it would make it easier to rely on what is essentially
an implementation detail, and therefore perhaps slightly complicate
future changes to the hashtable internals.
However, so long as they're only used as hints about the shape of the
array, I guess this wouldn't be an issue. In any case, I don't know if
my concern here is worth worrying about.
Two further thoughts. First, are these the best names for the macros? I
don't know if the meaning of “fully packed array” would be clear without
an explanatory comment. Second, do you know if any other PHP functions
do a similar check to JSON's php_json_determine_array_type for whether
an array is free of string keys and consecutively indexed? I wonder if
that could be abtracted into a zend_hash.c function.
Thanks!
Andrea Faulds
https://ajf.me/
Hi again,
Andrea Faulds wrote:
Second, do you know if any other PHP functions
do a similar check to JSON's php_json_determine_array_type for whether
an array is free of string keys and consecutively indexed? I wonder if
that could be abtracted into a zend_hash.c function.
It seems Dmitry was way ahead of me here. My bad. :)
--
Andrea Faulds
https://ajf.me/