Hi all,
Take a look at
foreach should not affect internal(zval) array position, but it does.
I'm not sure why foreach works this way, but HHVM behavior is
reasonable. IMHO.
PHP 7 would be perfect opportunity fix this behavior.
Any comments?
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Yasuo Ohgaki wrote on 16/01/2015 08:40:
Hi all,
Take a look at
foreach should not affect internal(zval) array position, but it does.
I'm not sure why foreach works this way, but HHVM behavior is
reasonable. IMHO.PHP 7 would be perfect opportunity fix this behavior.
Any comments?
I seem to remember somebody mentioning that the pointer is reused in
some situations and not others. I think the reason is efficiency - we
have a pointer associated with each array, which is rarely used, so why
not borrow it for foreach? (Answer: because it causes the oddity you
just spotted. :P)
For instance, if you put a reset()
in the middle of the foreach loop, it
doesn't interrupt the progress of the loop, but does cause it to be
tracked separately and output "Apple" after the loop finishes:
http://3v4l.org/KTCrI
There are other weird cases, like nesting two foreach loops over the
same variable:
- with a true array, the loops are independent (both loops see the full
set of values): http://3v4l.org/lJTQG - with a rewindable Traversable, such as SimpleXMLElement, they interact
(the inner loop sees all values, but the outer loop doesn't continue
after the inner one completes): http://3v4l.org/fs2Jv - with a non-rewindable Traversable, such as a Generator, PHP tries to
rewind and gives a fatal error, while HHVM simply uses up the last
values of the generator on the inner loop (possibly a bug?):
http://3v4l.org/BYOmQ
My conclusion: don't try and nest foreach or any related constructs
using the same variable, because the behaviour is not guaranteed to be
in any way sane.
Regards,
Rowan Collins
[IMSoP]
Hi Rowan,
On Sat, Jan 17, 2015 at 1:22 AM, Rowan Collins rowan.collins@gmail.com
wrote:
Yasuo Ohgaki wrote on 16/01/2015 08:40:
Hi all,
Take a look at
foreach should not affect internal(zval) array position, but it does.
I'm not sure why foreach works this way, but HHVM behavior is
reasonable. IMHO.PHP 7 would be perfect opportunity fix this behavior.
Any comments?I seem to remember somebody mentioning that the pointer is reused in some
situations and not others. I think the reason is efficiency - we have a
pointer associated with each array, which is rarely used, so why not borrow
it for foreach? (Answer: because it causes the oddity you just spotted. :P)
Right. foreach should not affect internal(zval) position pointer, but it
does.
For instance, if you put a
reset()
in the middle of the foreach loop, it
doesn't interrupt the progress of the loop, but does cause it to be tracked
separately and output "Apple" after the loop finishes:
http://3v4l.org/KTCrIThere are other weird cases, like nesting two foreach loops over the same
variable:
- with a true array, the loops are independent (both loops see the full
set of values): http://3v4l.org/lJTQG- with a rewindable Traversable, such as SimpleXMLElement, they interact
(the inner loop sees all values, but the outer loop doesn't continue after
the inner one completes): http://3v4l.org/fs2Jv- with a non-rewindable Traversable, such as a Generator, PHP tries to
rewind and gives a fatal error, while HHVM simply uses up the last values
of the generator on the inner loop (possibly a bug?):
http://3v4l.org/BYOmQMy conclusion: don't try and nest foreach or any related constructs using
the same variable, because the behaviour is not guaranteed to be in any way
sane.
Nesting foreach is strange, too. There are two issues
- array nesting does not work as it seems
- generator nesting does not work at all
Besides generator, current behavior is strange.
It's matter of using external position pointer for arrays, isn't it?
How about make foreach behaves sane manner by PHP 7?
It's rare that program uses internal position pointer, but
it's a basic language construct. Basic language construct
should behave intuitive way. IMHO.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Rowan,
On Sat, Jan 17, 2015 at 1:22 AM, Rowan Collins rowan.collins@gmail.com
wrote:Yasuo Ohgaki wrote on 16/01/2015 08:40:
Hi all,
Take a look at
foreach should not affect internal(zval) array position, but it does.
I'm not sure why foreach works this way, but HHVM behavior is
reasonable. IMHO.PHP 7 would be perfect opportunity fix this behavior.
Any comments?I seem to remember somebody mentioning that the pointer is reused in some
situations and not others. I think the reason is efficiency - we have a
pointer associated with each array, which is rarely used, so why not borrow
it for foreach? (Answer: because it causes the oddity you just spotted. :P)Right. foreach should not affect internal(zval) position pointer, but it
does.For instance, if you put a
reset()
in the middle of the foreach loop, it
doesn't interrupt the progress of the loop, but does cause it to be tracked
separately and output "Apple" after the loop finishes:
http://3v4l.org/KTCrIThere are other weird cases, like nesting two foreach loops over the same
variable:
- with a true array, the loops are independent (both loops see the full
set of values): http://3v4l.org/lJTQG- with a rewindable Traversable, such as SimpleXMLElement, they interact
(the inner loop sees all values, but the outer loop doesn't continue after
the inner one completes): http://3v4l.org/fs2Jv- with a non-rewindable Traversable, such as a Generator, PHP tries to
rewind and gives a fatal error, while HHVM simply uses up the last values
of the generator on the inner loop (possibly a bug?):
http://3v4l.org/BYOmQMy conclusion: don't try and nest foreach or any related constructs using
the same variable, because the behaviour is not guaranteed to be in any way
sane.Nesting foreach is strange, too. There are two issues
- array nesting does not work as it seems
- generator nesting does not work at all
Besides generator, current behavior is strange.
It's matter of using external position pointer for arrays, isn't it?How about make foreach behaves sane manner by PHP 7?
It's rare that program uses internal position pointer, but
it's a basic language construct. Basic language construct
should behave intuitive way. IMHO.
My concern is, at what cost? Given how rarely used the internal pointer is,
are we carrying around a chunk of extra memory with every array just on the
off-chance that it will be used, or is there some magic that makes it
zero-width until it's needed?
Reusing it for foreach makes perfect sense if it increases performance for
the majority of cases, and the only way to cover every edge-case like the
one at the top of this thread would be to ban that optimisation outright.
If there's some way of separating things such that the rarely used
constructs take the performance hit, I'm all for it, though.
Alternatively, we could achieve consistency the other way round, by making
foreach() reset the internal pointer even when it doesn't use it, and
documenting that fact like we do for certain array functions. Code relying
on the current behaviour when mixing them is probably buggy anyway, because
it is not well-defined, so the BC concern should be low.
As for nesting, I think PHP is doing the right thing for plain arrays and
non-rewindable integrators, because I would expect a for each loop to start
at the beginning, thus have an implicit reset/rewind. The rewindable
behaviour is awkward, though - intuitively, the iteraror needs to be able
to track multiple positions simultaneously, which isn't guaranteed by the
interface design. Maybe an error should be issued if the integrator is
already subject of a foreach loop?
--
Rowan Collins
[IMSoP]
Hi Rowan,
On Sat, Jan 17, 2015 at 8:43 PM, Rowan Collins rowan.collins@gmail.com
wrote:
My concern is, at what cost? Given how rarely used the internal pointer is,
are we carrying around a chunk of extra memory with every array just on the
off-chance that it will be used, or is there some magic that makes it
zero-width until it's needed?Reusing it for foreach makes perfect sense if it increases performance for
the majority of cases, and the only way to cover every edge-case like the
one at the top of this thread would be to ban that optimisation outright.
If there's some way of separating things such that the rarely used
constructs take the performance hit, I'm all for it, though.
External position pointer needs a little more memory for sure. However,
external position pointer is used all over the place in standard and other
modules.
I agree that some benchmark should be taken before we proceed.
Alternatively, we could achieve consistency the other way round, by making
foreach() reset the internal pointer even when it doesn't use it, and
documenting that fact like we do for certain array functions. Code relying
on the current behaviour when mixing them is probably buggy anyway, because
it is not well-defined, so the BC concern should be low.
IIRC, there were discussions when foreach is added that point out internal
position
pointer usage. Before foreach, everyone was using current()
/next()/reset().
I think
the reason foreach sets internal position pointer to the end was the code
used in
those days. It was not a technical/performance reasons, wasn't it? I don't
remember
well because it was more than 10 years ago. Anyone?
IIRC, foreach was introduced by Andi. I think foreach used external
position pointer and
didn't touch internal position pointer at all with his first proposal.
Please correct me if I'm
wrong.
As for nesting, I think PHP is doing the right thing for plain arrays and
non-rewindable integrators, because I would expect a for each loop to start
at the beginning, thus have an implicit reset/rewind. The rewindable
behaviour is awkward, though - intuitively, the iteraror needs to be able
to track multiple positions simultaneously, which isn't guaranteed by the
interface design. Maybe an error should be issued if the integrator is
already subject of a foreach loop?
Iterator is headache, I agree.
Iterator nesting may be detected by a flag easily, but it may be better to
generate
independent iterators for each "foreach".
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Rowan,
On Sat, Jan 17, 2015 at 8:43 PM, Rowan Collins rowan.collins@gmail.com
wrote:My concern is, at what cost? Given how rarely used the internal pointer
is,
are we carrying around a chunk of extra memory with every array just on
the
off-chance that it will be used, or is there some magic that makes it
zero-width until it's needed?Reusing it for foreach makes perfect sense if it increases performance for
the majority of cases, and the only way to cover every edge-case like the
one at the top of this thread would be to ban that optimisation outright.
If there's some way of separating things such that the rarely used
constructs take the performance hit, I'm all for it, though.External position pointer needs a little more memory for sure. However,
external position pointer is used all over the place in standard and other
modules.I agree that some benchmark should be taken before we proceed.
I meant more that the internal pointer takes up memory which is unused by
99% of PHP applications, so we might as well get some benefit from it, but
it comes to the same thing.
Alternatively, we could achieve consistency the other way round, by making
foreach() reset the internal pointer even when it doesn't use it, and
documenting that fact like we do for certain array functions. Code relying
on the current behaviour when mixing them is probably buggy anyway,
because
it is not well-defined, so the BC concern should be low.IIRC, there were discussions when foreach is added that point out internal
position
pointer usage. Before foreach, everyone was using
current()
/next()/reset(). I think
the reason foreach sets internal position pointer to the end was the code
used in
those days. It was not a technical/performance reasons, wasn't it? I don't
remember
well because it was more than 10 years ago. Anyone?IIRC, foreach was introduced by Andi. I think foreach used external
position pointer and
didn't touch internal position pointer at all with his first proposal.
Please correct me if I'm
wrong.As for nesting, I think PHP is doing the right thing for plain arrays and
non-rewindable integrators, because I would expect a for each loop to
start
at the beginning, thus have an implicit reset/rewind. The rewindable
behaviour is awkward, though - intuitively, the iteraror needs to be able
to track multiple positions simultaneously, which isn't guaranteed by the
interface design. Maybe an error should be issued if the integrator is
already subject of a foreach loop?Iterator is headache, I agree.
Iterator nesting may be detected by a flag easily, but it may be better to
generate
independent iterators for each "foreach".
I don't think that's an either/or situation: in most cases, it would be up
to the user to create that independent iterator, because there's no general
algorithm for cloning one that the engine could use. (Think of an iterator
proxying a database cursor, for instance - it could be rewindable directly,
but cloning it would imply issuing a new query to create a new cursor with
separate state.) So the engine needs to issue an error telling the user
that they're reusing an object in an dangerous way. An extended interface
could be created later for those cases that can handle the situation,
although I doubt it would be that common a use case.
--
Rowan Collins
[IMSoP]
Hi Rowan,
On Mon, Jan 19, 2015 at 12:05 AM, Rowan Collins rowan.collins@gmail.com
wrote:
Hi Rowan,
On Sat, Jan 17, 2015 at 8:43 PM, Rowan Collins rowan.collins@gmail.com
wrote:My concern is, at what cost? Given how rarely used the internal pointer
is,
are we carrying around a chunk of extra memory with every array just on
the
off-chance that it will be used, or is there some magic that makes it
zero-width until it's needed?Reusing it for foreach makes perfect sense if it increases performance
for
the majority of cases, and the only way to cover every edge-case like the
one at the top of this thread would be to ban that optimisation outright.
If there's some way of separating things such that the rarely used
constructs take the performance hit, I'm all for it, though.External position pointer needs a little more memory for sure. However,
external position pointer is used all over the place in standard and other
modules.I agree that some benchmark should be taken before we proceed.
I meant more that the internal pointer takes up memory which is unused by
99% of PHP applications, so we might as well get some benefit from it, but
it comes to the same thing.
It might be a good idea. I agree that more than 99% of PHP codes do not
need internal position pointer.
Dmitry might be interested in this area or he might take some benchmarks
already. However, PHP does not
support bock scope, so getting rid of internal hash position pointer may
not be feasible even for simple loops.
There may be workarounds in the engine, though.
Besides scope, user may call current()
/next()/reset() anywhere for an array
in code. External position
pointer is required to support this. Obvious workaround is to have array
position resource which is attached
to specific array.
$pos = array_fetch_position($array); // Get array position resource of
$array. Like external hash position in C.
reset($pos); // Reset position;
$var = current($array, $pos);
$var = next($array, $pos)
$var = next($array, $pos)
This requires script modification, though.
External position resource would be useful for complex array operations. It
could be implemented in standard
module. It may be used to traverse depth first, breadth first,etc. I might
work on this.
Alternatively, we could achieve consistency the other way round, by making
foreach() reset the internal pointer even when it doesn't use it, and
documenting that fact like we do for certain array functions. Code
relying
on the current behaviour when mixing them is probably buggy anyway,
because
it is not well-defined, so the BC concern should be low.IIRC, there were discussions when foreach is added that point out
internal position
pointer usage. Before foreach, everyone was using
current()
/next()/reset(). I think
the reason foreach sets internal position pointer to the end was the code
used in
those days. It was not a technical/performance reasons, wasn't it? I
don't remember
well because it was more than 10 years ago. Anyone?IIRC, foreach was introduced by Andi. I think foreach used external
position pointer and
didn't touch internal position pointer at all with his first proposal.
Please correct me if I'm
wrong.As for nesting, I think PHP is doing the right thing for plain arrays and
non-rewindable integrators, because I would expect a for each loop to
start
at the beginning, thus have an implicit reset/rewind. The rewindable
behaviour is awkward, though - intuitively, the iteraror needs to be able
to track multiple positions simultaneously, which isn't guaranteed by the
interface design. Maybe an error should be issued if the integrator is
already subject of a foreach loop?Iterator is headache, I agree.
Iterator nesting may be detected by a flag easily, but it may be better
to generate
independent iterators for each "foreach".I don't think that's an either/or situation: in most cases, it would be up
to the user to create that independent iterator, because there's no general
algorithm for cloning one that the engine could use. (Think of an iterator
proxying a database cursor, for instance - it could be rewindable directly,
but cloning it would imply issuing a new query to create a new cursor with
separate state.) So the engine needs to issue an error telling the user
that they're reusing an object in an dangerous way. An extended interface
could be created later for those cases that can handle the situation,
although I doubt it would be that common a use case.
I think if users need nested database cursor based iterator or like, they
should create new iterator object by themselves.
i.e. User should have __clone() that opens new cursor.
It might be good to have un-rewindable iterator also. e.g. If rewind()
method returns FALSE, then it's not rewindable.
Anyway, position independent array iteration with foreach might be good
feature for PHP 7.
This behavior is reasonable.
http://3v4l.org/lJTQG
but this is not reasonable now. (It was used to be)
http://3v4l.org/HbVnd
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi,
Yeah, I think changing foreach behaviour in more consistent and efficient
way may make sense.
If we won't use HashTable.nInternalPointer we won't need to copy immutable
arrays.
The same for nested foreach on the same array.
We could also eliminate all the HashPosition magic introduced to keep PHP5
behavior.
On the other hand some apps may relay on current weird behavior.
I remember, long time ago Nikita made some related proposal.
Nikita, could you please send a link.
Thanks. Dmitry.
Hi Rowan,
On Mon, Jan 19, 2015 at 12:05 AM, Rowan Collins rowan.collins@gmail.com
wrote:Hi Rowan,
On Sat, Jan 17, 2015 at 8:43 PM, Rowan Collins rowan.collins@gmail.com
wrote:My concern is, at what cost? Given how rarely used the internal pointer
is,
are we carrying around a chunk of extra memory with every array just on
the
off-chance that it will be used, or is there some magic that makes it
zero-width until it's needed?Reusing it for foreach makes perfect sense if it increases performance
for
the majority of cases, and the only way to cover every edge-case like
the
one at the top of this thread would be to ban that optimisation
outright.
If there's some way of separating things such that the rarely used
constructs take the performance hit, I'm all for it, though.External position pointer needs a little more memory for sure. However,
external position pointer is used all over the place in standard and
other
modules.I agree that some benchmark should be taken before we proceed.
I meant more that the internal pointer takes up memory which is unused by
99% of PHP applications, so we might as well get some benefit from it, but
it comes to the same thing.It might be a good idea. I agree that more than 99% of PHP codes do not
need internal position pointer.
Dmitry might be interested in this area or he might take some benchmarks
already. However, PHP does not
support bock scope, so getting rid of internal hash position pointer may
not be feasible even for simple loops.
There may be workarounds in the engine, though.Besides scope, user may call
current()
/next()/reset() anywhere for an
array in code. External position
pointer is required to support this. Obvious workaround is to have array
position resource which is attached
to specific array.$pos = array_fetch_position($array); // Get array position resource of
$array. Like external hash position in C.
reset($pos); // Reset position;
$var = current($array, $pos);
$var = next($array, $pos)
$var = next($array, $pos)This requires script modification, though.
External position resource would be useful for complex array operations.
It could be implemented in standard
module. It may be used to traverse depth first, breadth first,etc. I might
work on this.Alternatively, we could achieve consistency the other way round, by
making
foreach() reset the internal pointer even when it doesn't use it, and
documenting that fact like we do for certain array functions. Code
relying
on the current behaviour when mixing them is probably buggy anyway,
because
it is not well-defined, so the BC concern should be low.IIRC, there were discussions when foreach is added that point out
internal position
pointer usage. Before foreach, everyone was using
current()
/next()/reset(). I think
the reason foreach sets internal position pointer to the end was the
code used in
those days. It was not a technical/performance reasons, wasn't it? I
don't remember
well because it was more than 10 years ago. Anyone?IIRC, foreach was introduced by Andi. I think foreach used external
position pointer and
didn't touch internal position pointer at all with his first proposal.
Please correct me if I'm
wrong.As for nesting, I think PHP is doing the right thing for plain arrays and
non-rewindable integrators, because I would expect a for each loop to
start
at the beginning, thus have an implicit reset/rewind. The rewindable
behaviour is awkward, though - intuitively, the iteraror needs to be
able
to track multiple positions simultaneously, which isn't guaranteed by
the
interface design. Maybe an error should be issued if the integrator is
already subject of a foreach loop?Iterator is headache, I agree.
Iterator nesting may be detected by a flag easily, but it may be better
to generate
independent iterators for each "foreach".I don't think that's an either/or situation: in most cases, it would be
up to the user to create that independent iterator, because there's no
general algorithm for cloning one that the engine could use. (Think of an
iterator proxying a database cursor, for instance - it could be rewindable
directly, but cloning it would imply issuing a new query to create a new
cursor with separate state.) So the engine needs to issue an error telling
the user that they're reusing an object in an dangerous way. An extended
interface could be created later for those cases that can handle the
situation, although I doubt it would be that common a use case.I think if users need nested database cursor based iterator or like, they
should create new iterator object by themselves.
i.e. User should have __clone() that opens new cursor.It might be good to have un-rewindable iterator also. e.g. If
rewind()
method returns FALSE, then it's not rewindable.Anyway, position independent array iteration with foreach might be good
feature for PHP 7.This behavior is reasonable.
http://3v4l.org/lJTQG
but this is not reasonable now. (It was used to be)
http://3v4l.org/HbVndRegards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi,
Yeah, I think changing foreach behaviour in more consistent and efficient
way may make sense.
If we won't use HashTable.nInternalPointer we won't need to copy immutable
arrays.
The same for nested foreach on the same array.
We could also eliminate all the HashPosition magic introduced to keep PHP5
behavior.On the other hand some apps may relay on current weird behavior.
I remember, long time ago Nikita made some related proposal.
Nikita, could you please send a link.Thanks. Dmitry.
Original proposal is the patch linked at the end of
https://bugs.php.net/bug.php?id=53405. However that was aimed at making
external HashPosition iteration to work like the internal pointer, so we
could make it fully independent while keeping most of the current behavior.
I fear it would make the implementation even more complicated than it
already is.
For PHP 7 we have the option to alter the iteration semantics for edge
cases a bit (especially interaction of iteration and changes to the array).
I'll have to try out some ideas in that direction and report back.
Nikita
Hi Nikita,
Hi,
Yeah, I think changing foreach behaviour in more consistent and efficient
way may make sense.
If we won't use HashTable.nInternalPointer we won't need to copy
immutable arrays.
The same for nested foreach on the same array.
We could also eliminate all the HashPosition magic introduced to keep
PHP5 behavior.On the other hand some apps may relay on current weird behavior.
I remember, long time ago Nikita made some related proposal.
Nikita, could you please send a link.Thanks. Dmitry.
Original proposal is the patch linked at the end of
https://bugs.php.net/bug.php?id=53405. However that was aimed at making
external HashPosition iteration to work like the internal pointer, so we
could make it fully independent while keeping most of the current behavior.
I fear it would make the implementation even more complicated than it
already is.For PHP 7 we have the option to alter the iteration semantics for edge
cases a bit (especially interaction of iteration and changes to the array).
I'll have to try out some ideas in that direction and report back.
Great to hear this.
I'm looking forward outcomes!
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Yasuo Ohgaki wrote on 21/01/2015 02:48:
I don't think that's an either/or situation: in most cases, it would be up to the user to create that independent iterator, because there's no general algorithm for cloning one that the engine could use. (Think of an iterator proxying a database cursor, for instance - it could be rewindable directly, but cloning it would imply issuing a new query to create a new cursor with separate state.) So the engine needs to issue an error telling the user that they're reusing an object in an dangerous way. An extended interface could be created later for those cases that can handle the situation, although I doubt it would be that common a use case.
I think if users need nested database cursor based iterator or like,
they should create new iterator object by themselves.
i.e. User should have __clone() that opens new cursor.
I agree, and that's why I think nesting should be disabled by default.
It needs to be a run-time flag, as nesting could be arbitrarily indirect:
function iterator_dump($iter) {
foreach ( $iter as $x ) {
var_dump($x);
}
}
foreach ( $some_iterator as $foo ) {
dump ( $foo );
iterator_dump( $some_iterator ); // ERROR: Iterator cannot be used
within nested foreach loops.
iterator_dump( clone $some_iterator ); // OK, up to creator of
iterator to implement __clone()
}
Perhaps foreach ( $some_iterator as $foo ) { ... } should be translated
something like this:
- if check_foreach_flag( $some_iterator ) == 1 then raise error
- set_foreach_flag( $some_iterator )
- if check_initialised( $some_iterator ) then $some_iterator->rewind()
- (if rewind fails, raise error)
--- perform loop until iterator exhausted or break statement encountered --- - clear_foreach_flag( $some_iterator )
Where check_initialised() is some way of checking if the iterator has
emitted values somewhere else, so needs rewinding. Without that, you
couldn't use foreach on any iterator that couldn't be rewound, which
would be pretty useless.
I've no idea how feasible such flags are in the current implementation,
though.
Regards,
Rowan Collins
[IMSoP]