Hi,
this post is a fork of the "[PHP-DEV] Fixing strange foreach behavior" thread. It proposes a more efficient for-each mechanism (that does NOT change the conceptual behaviour).
Currently on for-each the engine will have to copy the array if that array is visible anywhere else in the program because it will reset the internal position pointer (which is part of the underlying hashtable structure) and another part of the program might rely on it.
Essentially the array gets duplicated prematurely, only because of the internal position pointer. Of course it might have to anyways be duplicated within the for-each loop, but if (any only if) it is actually altered. In most cases one just iterates over without altering. Please consider the following sample, taken from my recent post:
$arr = $obj->arr; // property "arr" is an array
foreach ($arr as $val) ...;
This will currently copy the array, because $arr is also visible through $obj->arr although this is not really necessary unless the array is actually changed during iteration.
If one would use an external position variable that is initialized in FE_RESET (TEMPVAR) and then incremented in FE_FETCH one could just increment the ref_count of the array while being traversed without the initial need to perform copy-on-write.
Now, if the hashtable is in any way altered during the traversal then the usual copy-on-write would kick in because for-each initialization made sure that ref_count was incremented before starting traversal. In that case PHP would - just like currently - have to duplicate, but only on first actual alteration, not prematurely on for-each initialization.
So in 90% (just a guess) of the cases, when you just traverse without altering you get the full benefit of no-copy-necessary, while in the other cases you will basically have the previous performance penalty of duplication, but at least postponed to the first alteration (which might be inside a branch that is not even taken).
Nested for-each loops would not have to revert to copy-on-write either, because they have their own pointer.
This would effectively speed up most for-each operations and would have the extra benefit of not having to store an internal pointer in the hashtable structure.
Please let me know your thoughts!
Cheers,
Ben
--
Benjamin Coutu
Zeyon Technologies Inc.
http://www.zeyos.com