Hello,
For the past couple years I've been working with a PHP code base that
at times deals with quite large payloads memory wise. This has made me
pay more attention to some array operations in PHP that are rather
frustrating to deal with in userland PHP, but could perhaps be
optimized more in PHP core.
A common pattern that I've seen that could dearly use PHP internal
optimization, if possible, would be:
foreach (array_keys($array) as $key) {
}
The problem with this pattern, of course, is the fact that it
needlessly duplicates the array passed to foreach, as can be seen from
this example: https://3v4l.org/MRSv6
I would be ever so grateful, if it would be possible to improve the
PHP engine to detect that fully qualified function name array_keys is
used with foreach, in which case it would simply perform a foreach
over the keys of the array without creating a copy. Optimizing this
wouldn't even require any userland changes. Not sure if the PHP engine
makes it at all feasible, though.
Of course, you could just be using something like this in code:
foreach ($array as $key => $_) {
}
Which has actually become a pattern for us in some memory sensitive
places, but using array_keys inside foreach is a very intuitive and
common approach and doesn't require the unused variable, so it would
be nice to see the usage enshired.
Another similar problem with creating array copies is the detection of
"indexed" arrays (as opposed to associative arrays). Particularly when
dealing with JSON, it's a common need to detect if an array has keys
from 0 to n-1 and in that order. My understanding is that at least in
some cases this would be trivial and fast to tell internally in PHP,
but the functionality is not exposed to userland.
Current common practices include for example:
array_keys($array) === range(0, count($array) - 1)
Memory optimized way of dealing with this is via foreach, but it's
quite cumbersome and again, you must not use array_keys in the
foreach. The following example demonstrates that the worst case
scenario triples the memory usage using range: https://3v4l.org/FiWdk
Interestingly, using "array_values($array) === $array" is the fastest
and most optimized way in best case scenarios, since php just returns
the array itself in cases it's "packed" and "without holes". However,
this could get hairy in worst case scenarios since it starts comparing
the values as well.
So, it would be nice to have a core PHP function implementing this
test, because the userland way of doing it is unnecessarily
unoptimized. I don't know what the function should be called. In our
code base the function is called is_indexed_array, but PHP doesn't
really have a standard term for this, afaik.
I regret my lack of C skills so I can't really propose
implementations, but I would be truly appreciative if these
suggestions would gain some traction.
--
Riikka Kalliomäki
https://github.com/Riimu
Another similar problem with creating array copies is the detection of
"indexed" arrays (as opposed to associative arrays).
I'm looking forward to an answer from someone proficient in this area
because I think the engine internally keeps track of this for
performance optimizations, though I don't think it exposes it.
- Markus
NB: You have SplFixedArray which can cover some use-cases.
Another similar problem with creating array copies is the detection of
"indexed" arrays (as opposed to associative arrays).I'm looking forward to an answer from someone proficient in this area
because I think the engine internally keeps track of this for
performance optimizations, though I don't think it exposes it.
- Markus
--
To unsubscribe, visit: https://www.php.net/unsub.php
Optimizing foreach (array_keys($arr) as $k) is very important, not only
because of memory, but because of speed when not all elements needs to
be iterated, like:
foreach (array_keys($arr) as $k) {
if ($k some condition) {
break;
}
}
please, can someone send a PR for this?
With kind regards / Mit freundlichen Grüßen / S přátelským pozdravem,
Michael Voříšek
Hello,
For the past couple years I've been working with a PHP code base that
at times deals with quite large payloads memory wise. This has made me
pay more attention to some array operations in PHP that are rather
frustrating to deal with in userland PHP, but could perhaps be
optimized more in PHP core.A common pattern that I've seen that could dearly use PHP internal
optimization, if possible, would be:foreach (array_keys($array) as $key) {
}The problem with this pattern, of course, is the fact that it
needlessly duplicates the array passed to foreach, as can be seen from
this example: https://3v4l.org/MRSv6I would be ever so grateful, if it would be possible to improve the
PHP engine to detect that fully qualified function name array_keys is
used with foreach, in which case it would simply perform a foreach
over the keys of the array without creating a copy. Optimizing this
wouldn't even require any userland changes. Not sure if the PHP engine
makes it at all feasible, though.Of course, you could just be using something like this in code:
foreach ($array as $key => $_) {
}Which has actually become a pattern for us in some memory sensitive
places, but using array_keys inside foreach is a very intuitive and
common approach and doesn't require the unused variable, so it would
be nice to see the usage enshired.Another similar problem with creating array copies is the detection of
"indexed" arrays (as opposed to associative arrays). Particularly when
dealing with JSON, it's a common need to detect if an array has keys
from 0 to n-1 and in that order. My understanding is that at least in
some cases this would be trivial and fast to tell internally in PHP,
but the functionality is not exposed to userland.Current common practices include for example:
array_keys($array) === range(0, count($array) - 1)
Memory optimized way of dealing with this is via foreach, but it's
quite cumbersome and again, you must not use array_keys in the
foreach. The following example demonstrates that the worst case
scenario triples the memory usage using range: https://3v4l.org/FiWdkInterestingly, using "array_values($array) === $array" is the fastest
and most optimized way in best case scenarios, since php just returns
the array itself in cases it's "packed" and "without holes". However,
this could get hairy in worst case scenarios since it starts comparing
the values as well.So, it would be nice to have a core PHP function implementing this
test, because the userland way of doing it is unnecessarily
unoptimized. I don't know what the function should be called. In our
code base the function is called is_indexed_array, but PHP doesn't
really have a standard term for this, afaik.I regret my lack of C skills so I can't really propose
implementations, but I would be truly appreciative if these
suggestions would gain some traction.
On Mon, Aug 31, 2020 at 11:53 PM Michael Voříšek - ČVUT FEL <
vorismi3@fel.cvut.cz> wrote:
Optimizing foreach (array_keys($arr) as $k) is very important, not only
because of memory, but because of speed when not all elements needs to
be iterated, like:foreach (array_keys($arr) as $k) {
if ($k some condition) { break; }
}
Can an iterator be the answer? E.g.
foreach (new ArrayKeysIterator($arr) as $k) { ... }
--
Best regards,
Max Semenik
I would highly prefer php optimalization for it. ArrayKeysIterator can
not even used if the input is iterable instead of pure array.
With kind regards / Mit freundlichen Grüßen / S přátelským pozdravem,
Michael Voříšek
On Mon, Aug 31, 2020 at 11:53 PM Michael Voříšek - ČVUT FEL vorismi3@fel.cvut.cz wrote:
Optimizing foreach (array_keys($arr) as $k) is very important, not only
because of memory, but because of speed when not all elements needs to
be iterated, like:foreach (array_keys($arr) as $k) {
if ($k some condition) {
break;
}
}
Can an iterator be the answer? E.g.
foreach (new ArrayKeysIterator($arr) as $k) { ... }
--
Best regards,
Max Semenik
Hi Riikka,
A common pattern that I've seen that could dearly use PHP internal
optimization, if possible, would be:foreach (array_keys($array) as $key) {
}
I have a draft RFC (https://wiki.php.net/rfc/foreach_void) and patch (https://github.com/jbafford/php-src/tree/foreachvoid against php 7.0, I think) that helps with this, by allowing the following syntax:
foreach($iterable as $key => void) { ... }
This would iterate over the keys of the iterable, and does not retrieve the values at all.
In theory, this should be a general performance win any time one needs to iterate over only the keys of an iterable, because it does not require the use of an O(n) iteration and building of the array that array_keys()
would, plus it works on non-array types (such as generators or iterators). It also is more performant than foreach($iterable as $key => $_) {}, because it omits the opcode that instructs the engine to retrieve the value. (Presumably, a future direction could include using this request to inform generators or iterators to only return keys, and not values, which could further improve performance, especially if value generation is expensive. But that is out of scope for this RFC.)
If this is something we'd like for PHP 8.1 and there are no major objections to the idea, then after 8.0 is released, I can move the RFC out of Draft and into Under Discussion, and presuming a vote passes, I'll update the patch as necessary to work against 8.0. But my time is limited and I'm not willing to put further time into the code unless an RFC vote passes.
Thoughts anyone?
-John
Hi Riikka Kalliomäki,
Another similar problem with creating array copies is the detection of
"indexed" arrays (as opposed to associative arrays). Particularly when
dealing with JSON, it's a common need to detect if an array has keys
from 0 to n-1 and in that order. My understanding is that at least in
some cases this would be trivial and fast to tell internally in PHP,
but the functionality is not exposed to userland.Current common practices include for example:
array_keys($array) === range(0, count($array) - 1)
Memory optimized way of dealing with this is via foreach, but it's
quite cumbersome and again, you must not use array_keys in the
foreach. The following example demonstrates that the worst case
scenario triples the memory usage using range: https://3v4l.org/FiWdk
This was brought up in https://externals.io/message/109760#109795
there's a PR against php-src but there was some discussion about
whether php should strive to add an actual "list" type instead
and what it should be named - https://github.com/php/php-src/pull/4886
I'm fairly certain is_list() would have to go through the RFC process given
the proposals for alternate approaches or alternate names
Cheers,
- Tyson