Hi Nikita,
I would suggest using the proposal for the by-value case and sticking with the current behavior for the by-reference case as you suggested. Granted, we then cannot remove the internal pointer all together, but we would just use it for the less common by-reference case as well as for the legacy reset/next functions, of course. This would still improve most for-each traversals. At least we then wouldn't have to find a replacement solution for rest/prev/next/end. Moreover we can then move the internal position pointer out of the hashtable structure into zend_array because it is only used for userland arrays, not other hash tables (such as object properties or internal structures that build upon the hashtable struct).
Thanks,
Ben
========== Original ==========
From: Nikita Popov nikita.ppv@gmail.com
To: Benjamin Coutu ben.coutu@zeyos.com
Date: Thu, 22 Jan 2015 10:53:19 +0100
Subject: Re: [PHP-DEV] Improvements to for-each implementation
Doing this was the idea I had in mind as well, i.e. change the semantics of
foreach to say that it will always work on a copy for by-value iteration
(which ironically avoids having to actually copy it). Note that this will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration will
not influence the iteration.
The real question is what we should do in the by-reference case. Given that
we need to acquire references to elements of the original array we can't
reasonably work with copy-semantics (at least I don't see how). So would we
just stick with the previous behavior (using the hash position hack) for
that?
Nikita
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4
Only few PHPT tests had to be changed to confirm new behavior and actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to be
improved to support different array modifications.
The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)
Thanks. Dmitry.
Hi Nikita,
I would suggest using the proposal for the by-value case and sticking with
the current behavior for the by-reference case as you suggested. Granted,
we then cannot remove the internal pointer all together, but we would just
use it for the less common by-reference case as well as for the legacy
reset/next functions, of course. This would still improve most for-each
traversals. At least we then wouldn't have to find a replacement solution
for rest/prev/next/end. Moreover we can then move the internal position
pointer out of the hashtable structure into zend_array because it is only
used for userland arrays, not other hash tables (such as object properties
or internal structures that build upon the hashtable struct).Thanks,
Ben
========== Original ==========
From: Nikita Popov nikita.ppv@gmail.com
To: Benjamin Coutu ben.coutu@zeyos.com
Date: Thu, 22 Jan 2015 10:53:19 +0100
Subject: Re: [PHP-DEV] Improvements to for-each implementationDoing this was the idea I had in mind as well, i.e. change the semantics of
foreach to say that it will always work on a copy for by-value iteration
(which ironically avoids having to actually copy it). Note that this will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration will
not influence the iteration.The real question is what we should do in the by-reference case. Given that
we need to acquire references to elements of the original array we can't
reasonably work with copy-semantics (at least I don't see how). So would we
just stick with the previous behavior (using the hash position hack) for
that?Nikita
The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)
Awesome!
Hi Dmitry,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to be
improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)
Great!
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to
be improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)Thanks. Dmitry.
I quickly looked over the patch, some notes:
- 171: Can directly use GET_OP1_ZVAL_PTR_DEREF
- For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this
enough? If the object is a TMP_VAR, don't we have to free it as well? - For objects it will still be possible to modify the hashtable during
iteration even in the _R case, correct? I assume we just don't care about
this edge case. - 315: In RESET_RW you now always create a reference for VAR|CV operands.
However for the get_iterator case we don't need this, right? - 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going to
change the IAP, this is probably necessary in this case as well. - For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably leaks
in the TMP case. (Same for the "invalid argument" case.)
What concerns me a bit is the new FETCH_RW implementation, because it no
longer checks the internal pointer against the external one. This
effectively means that foreach by reference behavior will be influenced a
lot by details of the hashtable implementation. An example:
$array = [0, 1, 2, 3, 4, 5, 6, 7];
unset($array[0], $array[1], $array[2], $array[3]);
foreach ($array as &$ref) {
var_dump($ref);
//$array[42] = 42;
}
Without the commented line this will just print the numbers 4, 5, 6, 7. If
you uncomment the line it will only print 4. (Because the append caused a
rehash and arData was compacted, so the position now points past all
elements). The previous output would have been 4, 5, 6, 7, 42, which is
closer to what you would expect. Maybe better to keep the pos !=
nInternalPointer code?
Thanks,
Nikita
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and
actually they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to
be improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)Thanks. Dmitry.
I quickly looked over the patch, some notes:
- 171: Can directly use GET_OP1_ZVAL_PTR_DEREF
Right.
- For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this
enough? If the object is a TMP_VAR, don't we have to free it as well?
For IS_TMP_VAR I try to avoid refcounter incrementation/decrementation. So
the OP1 zval has to be freed in ZEND_FREE or in HANDLE_EXCEPTION. Hawever,
I'm not 100% sure. It need to be checked more careful.
- For objects it will still be possible to modify the hashtable during
iteration even in the _R case, correct? I assume we just don't care about
this edge case.
I think you are right. I missed this and it may make us problems :(
- 315: In RESET_RW you now always create a reference for VAR|CV operands.
However for the get_iterator case we don't need this, right?
Right. Good catch.
- 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going
to change the IAP, this is probably necessary in this case as well.
In this case SEPARATE_ARRAY is done in FE_FETCH_RW.
- For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably
leaks in the TMP case. (Same for the "invalid argument" case.)
The same as above, but needs to be checked.
What concerns me a bit is the new FETCH_RW implementation, because it no
longer checks the internal pointer against the external one. This
effectively means that foreach by reference behavior will be influenced a
lot by details of the hashtable implementation. An example:$array = [0, 1, 2, 3, 4, 5, 6, 7];
unset($array[0], $array[1], $array[2], $array[3]);
foreach ($array as &$ref) {
var_dump($ref);
//$array[42] = 42;
}Without the commented line this will just print the numbers 4, 5, 6, 7.
If you uncomment the line it will only print 4. (Because the append caused
a rehash and arData was compacted, so the position now points past all
elements). The previous output would have been 4, 5, 6, 7, 42, which is
closer to what you would expect. Maybe better to keep the pos !=
nInternalPointer code?
Yes. This is the main problem, however, the old implementation wasn't good
enough as well. It caught many (but not all cases). It's the reason why we
had few tests XFAILEd. This need to be solved in some other way, similar to
your old proposal but in more efficient way.
Thanks for review, it's very useful.
I'll probably able to provide next version of PoC in a couple of days.
Dmitry.
Thanks,
Nikita
I've created a branch and pull request for easier collaboration and
tracking.
https://github.com/php/php-src/pull/1034
It's the same diff I published yesterday. Nothing changed or added yet.
Thanks. Dmitry.
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and
actually they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to
be improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)Thanks. Dmitry.
I quickly looked over the patch, some notes:
- 171: Can directly use GET_OP1_ZVAL_PTR_DEREF
- For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this
enough? If the object is a TMP_VAR, don't we have to free it as well?- For objects it will still be possible to modify the hashtable during
iteration even in the _R case, correct? I assume we just don't care about
this edge case.- 315: In RESET_RW you now always create a reference for VAR|CV operands.
However for the get_iterator case we don't need this, right?- 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going to
change the IAP, this is probably necessary in this case as well.- For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably
leaks in the TMP case. (Same for the "invalid argument" case.)What concerns me a bit is the new FETCH_RW implementation, because it no
longer checks the internal pointer against the external one. This
effectively means that foreach by reference behavior will be influenced a
lot by details of the hashtable implementation. An example:$array = [0, 1, 2, 3, 4, 5, 6, 7];
unset($array[0], $array[1], $array[2], $array[3]);
foreach ($array as &$ref) {
var_dump($ref);
//$array[42] = 42;
}Without the commented line this will just print the numbers 4, 5, 6, 7. If
you uncomment the line it will only print 4. (Because the append caused a
rehash and arData was compacted, so the position now points past all
elements). The previous output would have been 4, 5, 6, 7, 42, which is
closer to what you would expect. Maybe better to keep the pos !=
nInternalPointer code?Thanks,
Nikita
Hi,
I've solved most of the reported problems and started working on RFC.
Nikta, please take a look into:
https://wiki.php.net/rfc/php7_foreach
https://github.com/php/php-src/pull/1034
and especially commit 15a23b1
Only iteration by value over plain objects is not completely consistent now.
However, the implementation may have bugs and other problems (your code
review may be very helpful).
I'm still working...
Thanks. Dmitry.
I've created a branch and pull request for easier collaboration and
tracking.https://github.com/php/php-src/pull/1034
It's the same diff I published yesterday. Nothing changed or added yet.
Thanks. Dmitry.
On Wed, Jan 28, 2015 at 1:36 AM, Nikita Popov nikita.ppv@gmail.com
wrote:Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and
actually they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to
be improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)Thanks. Dmitry.
I quickly looked over the patch, some notes:
- 171: Can directly use GET_OP1_ZVAL_PTR_DEREF
- For iterator failures (exception) you use FREE_OP1_IF_VAR(). Is this
enough? If the object is a TMP_VAR, don't we have to free it as well?- For objects it will still be possible to modify the hashtable during
iteration even in the _R case, correct? I assume we just don't care about
this edge case.- 315: In RESET_RW you now always create a reference for VAR|CV operands.
However for the get_iterator case we don't need this, right?- 328: In the non VAR|CV case SEPARTE_ARRAY is not used. As we're going
to change the IAP, this is probably necessary in this case as well.- For RW iterator failures FREE_OP1_VAR_PTR() is used. This probably
leaks in the TMP case. (Same for the "invalid argument" case.)What concerns me a bit is the new FETCH_RW implementation, because it no
longer checks the internal pointer against the external one. This
effectively means that foreach by reference behavior will be influenced a
lot by details of the hashtable implementation. An example:$array = [0, 1, 2, 3, 4, 5, 6, 7];
unset($array[0], $array[1], $array[2], $array[3]);
foreach ($array as &$ref) {
var_dump($ref);
//$array[42] = 42;
}Without the commented line this will just print the numbers 4, 5, 6, 7.
If you uncomment the line it will only print 4. (Because the append caused
a rehash and arData was compacted, so the position now points past all
elements). The previous output would have been 4, 5, 6, 7, 42, which is
closer to what you would expect. Maybe better to keep the pos !=
nInternalPointer code?Thanks,
Nikita
Hey:
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to be
improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)
I just have got an idea:
Droping use of ZEND_OP_DATA make it possible to generate better
foreach loop opcodes
previously FE_RESET and FE_FETCH share one same OP_DATA based on offset.
0 FE_RESET
1 FE_FETCH fail-> 5
2 OP_DATA
3 ..... statements
4 JMP 1
5....
without depending on that OP_DATA, we can change this to
0 FE_RESET
1 JMP 3
2 ....statements
3 FE_FETCH success->2
4 ....
that will reduce on JMP in every foreach loop
could you please also try this?
thanks
Thanks. Dmitry.
Hi Nikita,
I would suggest using the proposal for the by-value case and sticking with
the current behavior for the by-reference case as you suggested. Granted, we
then cannot remove the internal pointer all together, but we would just use
it for the less common by-reference case as well as for the legacy
reset/next functions, of course. This would still improve most for-each
traversals. At least we then wouldn't have to find a replacement solution
for rest/prev/next/end. Moreover we can then move the internal position
pointer out of the hashtable structure into zend_array because it is only
used for userland arrays, not other hash tables (such as object properties
or internal structures that build upon the hashtable struct).Thanks,
Ben
========== Original ==========
From: Nikita Popov nikita.ppv@gmail.com
To: Benjamin Coutu ben.coutu@zeyos.com
Date: Thu, 22 Jan 2015 10:53:19 +0100
Subject: Re: [PHP-DEV] Improvements to for-each implementationDoing this was the idea I had in mind as well, i.e. change the semantics
of
foreach to say that it will always work on a copy for by-value iteration
(which ironically avoids having to actually copy it). Note that this will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration will
not influence the iteration.The real question is what we should do in the by-reference case. Given
that
we need to acquire references to elements of the original array we can't
reasonably work with copy-semantics (at least I don't see how). So would
we
just stick with the previous behavior (using the hash position hack) for
that?Nikita
--
--
Xinchen Hui
@Laruence
http://www.laruence.com/
Hey:
Hey:
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to be
improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)I just have got an idea:
Droping use of ZEND_OP_DATA make it possible to generate better
foreach loop opcodespreviously FE_RESET and FE_FETCH share one same OP_DATA based on offset.
0 FE_RESET
1 FE_FETCH fail-> 5
2 OP_DATA
3 ..... statements
4 JMP 1
5....without depending on that OP_DATA, we can change this to
0 FE_RESET
1 JMP 3
2 ....statements
3 FE_FETCH success->2
4 ....that will reduce on JMP in every foreach loop
could you please also try this?
I mean, dropping both ZEND_FE_RESET and FE_FETCH share one ZEND_OP_DATA.
the previously example should be:
0 FE_RESET
1 JMP 3
2 ....statements
3 FE_FETCH success->2
4 ZEND_OP_DATA
5 ....
thanks
thanks
Thanks. Dmitry.
Hi Nikita,
I would suggest using the proposal for the by-value case and sticking with
the current behavior for the by-reference case as you suggested. Granted, we
then cannot remove the internal pointer all together, but we would just use
it for the less common by-reference case as well as for the legacy
reset/next functions, of course. This would still improve most for-each
traversals. At least we then wouldn't have to find a replacement solution
for rest/prev/next/end. Moreover we can then move the internal position
pointer out of the hashtable structure into zend_array because it is only
used for userland arrays, not other hash tables (such as object properties
or internal structures that build upon the hashtable struct).Thanks,
Ben
========== Original ==========
From: Nikita Popov nikita.ppv@gmail.com
To: Benjamin Coutu ben.coutu@zeyos.com
Date: Thu, 22 Jan 2015 10:53:19 +0100
Subject: Re: [PHP-DEV] Improvements to for-each implementationDoing this was the idea I had in mind as well, i.e. change the semantics
of
foreach to say that it will always work on a copy for by-value iteration
(which ironically avoids having to actually copy it). Note that this will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration will
not influence the iteration.The real question is what we should do in the by-reference case. Given
that
we need to acquire references to elements of the original array we can't
reasonably work with copy-semantics (at least I don't see how). So would
we
just stick with the previous behavior (using the hash position hack) for
that?Nikita
--
--
Xinchen Hui
@Laruence
http://www.laruence.com/
--
Xinchen Hui
@Laruence
http://www.laruence.com/
Hi,
I have this in mind. May be It's even possible to remove JMP.
FE_RESET op1=<array> op2=<label 2> result=<var>
OP_DATA op1=<iterator> result=<key>
1: statements...
FE_FETCH op1=<iterator> op2=<label 1> result=<var>
OP_DATA result=<key>
2: FREE <iterator>
Anyway, it's not a big problem implementing this, but eliminating of array
duplication is more important.
We still have to solve more serious problems first.
Thanks. Dmitry.
Hey:
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and
actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to
be
improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)I just have got an idea:
Droping use of ZEND_OP_DATA make it possible to generate better
foreach loop opcodespreviously FE_RESET and FE_FETCH share one same OP_DATA based on offset.
0 FE_RESET
1 FE_FETCH fail-> 5
2 OP_DATA
3 ..... statements
4 JMP 1
5....without depending on that OP_DATA, we can change this to
0 FE_RESET
1 JMP 3
2 ....statements
3 FE_FETCH success->2
4 ....that will reduce on JMP in every foreach loop
could you please also try this?
thanks
Thanks. Dmitry.
On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.coutu@zeyos.com
wrote:Hi Nikita,
I would suggest using the proposal for the by-value case and sticking
with
the current behavior for the by-reference case as you suggested.
Granted, we
then cannot remove the internal pointer all together, but we would just
use
it for the less common by-reference case as well as for the legacy
reset/next functions, of course. This would still improve most for-each
traversals. At least we then wouldn't have to find a replacement
solution
for rest/prev/next/end. Moreover we can then move the internal position
pointer out of the hashtable structure into zend_array because it is
only
used for userland arrays, not other hash tables (such as object
properties
or internal structures that build upon the hashtable struct).Thanks,
Ben
========== Original ==========
From: Nikita Popov nikita.ppv@gmail.com
To: Benjamin Coutu ben.coutu@zeyos.com
Date: Thu, 22 Jan 2015 10:53:19 +0100
Subject: Re: [PHP-DEV] Improvements to for-each implementationDoing this was the idea I had in mind as well, i.e. change the semantics
of
foreach to say that it will always work on a copy for by-value iteration
(which ironically avoids having to actually copy it). Note that this
will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration
will
not influence the iteration.The real question is what we should do in the by-reference case. Given
that
we need to acquire references to elements of the original array we can't
reasonably work with copy-semantics (at least I don't see how). So would
we
just stick with the previous behavior (using the hash position hack) for
that?Nikita
--
--
Xinchen Hui
@Laruence
http://www.laruence.com/
Hey:
Hi,
I have this in mind. May be It's even possible to remove JMP.
FE_RESET op1=<array> op2=<label 2> result=<var> OP_DATA op1=<iterator> result=<key>
1: statements...
FE_FETCH op1=<iterator> op2=<label 1> result=<var>
OP_DATA result=<key>
2: FREE <iterator>Anyway, it's not a big problem implementing this, but eliminating of array
duplication is more important.
We still have to solve more serious problems first.
of course, I just was curious maybe this could give us more
improvement added to that ~1%.
thanks
Thanks. Dmitry.
Hey:
Hi,
I'm working on a PoC, implementing the proposed behavior -
https://gist.github.com/dstogov/a311e8b0b2cabee4dab4Only few PHPT tests had to be changed to confirm new behavior and
actually
they started to behave as HHVM.
2 tests are still unfixed XFAILed. Foreach by reference is still need to
be
improved to support different array modifications.The patch makes ~1% improvement on Wordpress-3.6 (saving duplication and
destruction of 200 arrays on each request)I just have got an idea:
Droping use of ZEND_OP_DATA make it possible to generate better
foreach loop opcodespreviously FE_RESET and FE_FETCH share one same OP_DATA based on offset.
0 FE_RESET
1 FE_FETCH fail-> 5
2 OP_DATA
3 ..... statements
4 JMP 1
5....without depending on that OP_DATA, we can change this to
0 FE_RESET
1 JMP 3
2 ....statements
3 FE_FETCH success->2
4 ....that will reduce on JMP in every foreach loop
could you please also try this?
thanks
Thanks. Dmitry.
On Thu, Jan 22, 2015 at 1:38 PM, Benjamin Coutu ben.coutu@zeyos.com
wrote:Hi Nikita,
I would suggest using the proposal for the by-value case and sticking
with
the current behavior for the by-reference case as you suggested.
Granted, we
then cannot remove the internal pointer all together, but we would just
use
it for the less common by-reference case as well as for the legacy
reset/next functions, of course. This would still improve most for-each
traversals. At least we then wouldn't have to find a replacement
solution
for rest/prev/next/end. Moreover we can then move the internal position
pointer out of the hashtable structure into zend_array because it is
only
used for userland arrays, not other hash tables (such as object
properties
or internal structures that build upon the hashtable struct).Thanks,
Ben
========== Original ==========
From: Nikita Popov nikita.ppv@gmail.com
To: Benjamin Coutu ben.coutu@zeyos.com
Date: Thu, 22 Jan 2015 10:53:19 +0100
Subject: Re: [PHP-DEV] Improvements to for-each implementationDoing this was the idea I had in mind as well, i.e. change the
semantics
of
foreach to say that it will always work on a copy for by-value
iteration
(which ironically avoids having to actually copy it). Note that this
will
differ from the current behavior in a number of ways. In particular it
means that changes to arrays that were references prior to iteration
will
not influence the iteration.The real question is what we should do in the by-reference case. Given
that
we need to acquire references to elements of the original array we
can't
reasonably work with copy-semantics (at least I don't see how). So
would
we
just stick with the previous behavior (using the hash position hack)
for
that?Nikita
--
--
Xinchen Hui
@Laruence
http://www.laruence.com/
--
Xinchen Hui
@Laruence
http://www.laruence.com/