Hi Dmitry,
One could use an external pointer for the iteration and flag the hashtable while being iterated.
In order to preserve the constraint that for-each is conceptually working on a copy, one would have to duplicate (zend_array_dup) the hastable everytime one alters it (e.g. adding/changing buckets) if (and only if) the iteration flag is set.
Nested for-each would also have to check for the flag and duplicate if necessary. Appart from saving the space for the internal pointer it would also mean less copy-on-write semantics kicking in than with the current implementation, because duplication is only performed if the array is actually altered, not merely because it is traversed.
Example:
$arr = $obj->arr; // property "arr" is an array
foreach ($arr as $val) ...; // this will CURRENTLY copy the array, because $arr is also visible through $obj->arr although this is not really necessary unless the array is actually changed during iteration
Cheers,
Ben
========== Original ==========
From: Dmitry Stogov dmitry@zend.com
To: Yasuo Ohgaki yohgaki@ohgaki.net, Nikita Popov nikita.ppv@gmail.com
Date: Wed, 21 Jan 2015 14:57:59 +0100
Subject: Re: [PHP-DEV] Fixing strange foreach behavior.
Hi,
Yeah, I think changing foreach behaviour in more consistent and efficient
way may make sense.
If we won't use HashTable.nInternalPointer we won't need to copy immutable
arrays.
The same for nested foreach on the same array.
We could also eliminate all the HashPosition magic introduced to keep PHP5
behavior.
On the other hand some apps may relay on current weird behavior.
I remember, long time ago Nikita made some related proposal.
Nikita, could you please send a link.
Thanks. Dmitry.
Hi Rowan,
On Mon, Jan 19, 2015 at 12:05 AM, Rowan Collins rowan.collins@gmail.com
wrote:Hi Rowan,
On Sat, Jan 17, 2015 at 8:43 PM, Rowan Collins rowan.collins@gmail.com
wrote:My concern is, at what cost? Given how rarely used the internal pointer
is,
are we carrying around a chunk of extra memory with every array just on
the
off-chance that it will be used, or is there some magic that makes it
zero-width until it's needed?Reusing it for foreach makes perfect sense if it increases performance
for
the majority of cases, and the only way to cover every edge-case like
the
one at the top of this thread would be to ban that optimisation
outright.
If there's some way of separating things such that the rarely used
constructs take the performance hit, I'm all for it, though.External position pointer needs a little more memory for sure. However,
external position pointer is used all over the place in standard and
other
modules.I agree that some benchmark should be taken before we proceed.
I meant more that the internal pointer takes up memory which is unused by
99% of PHP applications, so we might as well get some benefit from it, but
it comes to the same thing.It might be a good idea. I agree that more than 99% of PHP codes do not
need internal position pointer.
Dmitry might be interested in this area or he might take some benchmarks
already. However, PHP does not
support bock scope, so getting rid of internal hash position pointer may
not be feasible even for simple loops.
There may be workarounds in the engine, though.Besides scope, user may call
current()
/next()/reset() anywhere for an
array in code. External position
pointer is required to support this. Obvious workaround is to have array
position resource which is attached
to specific array.$pos = array_fetch_position($array); // Get array position resource of
$array. Like external hash position in C.
reset($pos); // Reset position;
$var = current($array, $pos);
$var = next($array, $pos)
$var = next($array, $pos)This requires script modification, though.
External position resource would be useful for complex array operations.
It could be implemented in standard
module. It may be used to traverse depth first, breadth first,etc. I might
work on this.Alternatively, we could achieve consistency the other way round, by
making
foreach() reset the internal pointer even when it doesn't use it, and
documenting that fact like we do for certain array functions. Code
relying
on the current behaviour when mixing them is probably buggy anyway,
because
it is not well-defined, so the BC concern should be low.IIRC, there were discussions when foreach is added that point out
internal position
pointer usage. Before foreach, everyone was using
current()
/next()/reset(). I think
the reason foreach sets internal position pointer to the end was the
code used in
those days. It was not a technical/performance reasons, wasn't it? I
don't remember
well because it was more than 10 years ago. Anyone?IIRC, foreach was introduced by Andi. I think foreach used external
position pointer and
didn't touch internal position pointer at all with his first proposal.
Please correct me if I'm
wrong.As for nesting, I think PHP is doing the right thing for plain arrays and
non-rewindable integrators, because I would expect a for each loop to
start
at the beginning, thus have an implicit reset/rewind. The rewindable
behaviour is awkward, though - intuitively, the iteraror needs to be
able
to track multiple positions simultaneously, which isn't guaranteed by
the
interface design. Maybe an error should be issued if the integrator is
already subject of a foreach loop?Iterator is headache, I agree.
Iterator nesting may be detected by a flag easily, but it may be better
to generate
independent iterators for each "foreach".I don't think that's an either/or situation: in most cases, it would be
up to the user to create that independent iterator, because there's no
general algorithm for cloning one that the engine could use. (Think of an
iterator proxying a database cursor, for instance - it could be rewindable
directly, but cloning it would imply issuing a new query to create a new
cursor with separate state.) So the engine needs to issue an error telling
the user that they're reusing an object in an dangerous way. An extended
interface could be created later for those cases that can handle the
situation, although I doubt it would be that common a use case.I think if users need nested database cursor based iterator or like, they
should create new iterator object by themselves.
i.e. User should have __clone() that opens new cursor.It might be good to have un-rewindable iterator also. e.g. If
rewind()
method returns FALSE, then it's not rewindable.Anyway, position independent array iteration with foreach might be good
feature for PHP 7.This behavior is reasonable.
http://3v4l.org/lJTQG
but this is not reasonable now. (It was used to be)
http://3v4l.org/HbVndRegards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net