??= and function calls

2 years ago by Ilija Tovilo — view source

unread

Hi everyone

I recently discovered some unfortunate behavior of the coalesce
assignment operator (??=) in combination with function calls. Here's
the TL;DR:

foo()['bar'] ??= 42;

Currently, this code calls foo() twice. This seems rather unexpected.
The technical reason as to why this happens is not straight-forward,
but I will attempt to explain below. The behavior was not specified in
the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is
completely untested, and as such I don't believe it is by design. My
proposal is to change it so that foo() is only called once.

This is what is happening in detail.

??= is special in that it needs to evaluate the lhs (left hand side)
twice. At first, we need to check if the offset exists, then
conditionally execute the rhs (right hand side), re-fetch the offset
and assign the rhs value to it. The reason for the re-fetching of the
offset is that the evaluation of the rhs may invalidate the offset.
This is explained in the following blog post:
https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety
Essentially, the offset may be a pointer into an array element or
object property. If the rhs frees the array or object, or grows the
array causing a reallocation (meaning it is moved to some other place
in memory), the pointer is no longer valid. For this reason, PHP makes
sure no user code may execute between the fetching of an offset and
the assignment to it. Normally, that just means evaluating the rhs
before fetching the offset. In this case, we need to evaluate the lhs
first to know if we even should evaluate the rhs.

Naively evaluating the lhs again poses a problem for expressions with
side-effects. For example:

$array[$x++] ??= 42;

We do not want to re-evaluate the entire expression because $x++ will
lead to a different array offset the second time around. The way this
is solved is by "memoizing" any compiled expression in the lhs that is
not a variable, meaning not part of the offset that may be
invalidated. Internally, a variable is considered anything that may be
written to, i.e. local variables ($foo), properties ($foo->bar,
Foo::$bar), array offsets ($foo['bar']), and function calls (foo(),
$foo->bar(), Foo::bar(), $foo(), as they may return a modifiable
reference). The fact that function calls are included in that list
leads to the problem presented above. It is not actually necessary to
exclude them from memoization because their result may not be
invalidated.

Another inconsistency is that function call arguments will be
re-evaluated, but only if they are not part of some other expression.

a. foo(bar())['baz'] ??= 42;
b. foo(bar() + 0)['baz'] ??= 42;

a calls both foo() and bar() twice. b however calls foo() twice but
bar() only once. That is because the expression bar() + 0 is not
considered a variable and as such gets memoized.

I propose to unconditionally memoize calls (in all forms) when they
appear in the lhs of a coalesce expression. This will ensure that
calls are only executed once, including function arguments and the lhs
of method calls. Consequently, the assignment will be performed on the
same offset that was previously tested, even if the expression
contains a function call with side-effects.

The implementation for this change is simple:
https://github.com/php/php-src/pull/11592

Let me know if you have any concerns. I'm planning on merging this for
master if there is consensus on the semantics.

Ilija

2 years ago by Dmitry Stogov — view source

unread

Hi everyone

I recently discovered some unfortunate behavior of the coalesce
assignment operator (??=) in combination with function calls. Here's
the TL;DR:

foo()['bar'] ??= 42;

Currently, this code calls foo() twice. This seems rather unexpected.
The technical reason as to why this happens is not straight-forward,
but I will attempt to explain below. The behavior was not specified in
the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is
completely untested, and as such I don't believe it is by design. My
proposal is to change it so that foo() is only called once.

This is what is happening in detail.

??= is special in that it needs to evaluate the lhs (left hand side)
twice. At first, we need to check if the offset exists, then
conditionally execute the rhs (right hand side), re-fetch the offset
and assign the rhs value to it. The reason for the re-fetching of the
offset is that the evaluation of the rhs may invalidate the offset.
This is explained in the following blog post:

https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety
Essentially, the offset may be a pointer into an array element or
object property. If the rhs frees the array or object, or grows the
array causing a reallocation (meaning it is moved to some other place
in memory), the pointer is no longer valid. For this reason, PHP makes
sure no user code may execute between the fetching of an offset and
the assignment to it. Normally, that just means evaluating the rhs
before fetching the offset. In this case, we need to evaluate the lhs
first to know if we even should evaluate the rhs.

Naively evaluating the lhs again poses a problem for expressions with
side-effects. For example:

$array[$x++] ??= 42;

We do not want to re-evaluate the entire expression because $x++ will
lead to a different array offset the second time around. The way this
is solved is by "memoizing" any compiled expression in the lhs that is
not a variable, meaning not part of the offset that may be
invalidated. Internally, a variable is considered anything that may be
written to, i.e. local variables ($foo), properties ($foo->bar,
Foo::$bar), array offsets ($foo['bar']), and function calls (foo(),
$foo->bar(), Foo::bar(), $foo(), as they may return a modifiable
reference). The fact that function calls are included in that list
leads to the problem presented above. It is not actually necessary to
exclude them from memoization because their result may not be
invalidated.

Another inconsistency is that function call arguments will be
re-evaluated, but only if they are not part of some other expression.

a. foo(bar())['baz'] ??= 42;
b. foo(bar() + 0)['baz'] ??= 42;

a calls both foo() and bar() twice. b however calls foo() twice but
bar() only once. That is because the expression bar() + 0 is not
considered a variable and as such gets memoized.

This is definitely a bug in the original implementation.
In case a function is evaluated twice and returns different values, we
check one value, but assign to another.

I propose to unconditionally memoize calls (in all forms) when they
appear in the lhs of a coalesce expression. This will ensure that
calls are only executed once, including function arguments and the lhs
of method calls. Consequently, the assignment will be performed on the
same offset that was previously tested, even if the expression
contains a function call with side-effects.

The implementation for this change is simple:
https://github.com/php/php-src/pull/11592

Let me know if you have any concerns. I'm planning on merging this for
master if there is consensus on the semantics.

Thanks. Dmitry.

Ilija

--

To unsubscribe, visit: https://www.php.net/unsub.php

2 years ago by flaviohbatista@gmail.com — view source

unread

Great catch Ilija!

Do you mind sharing how did you stumble upon it?

Thank you!

On Wed, Jul 5, 2023 at 1:15 AM Ilija Tovilo tovilo.ilija@gmail.com
wrote:

Hi everyone

I recently discovered some unfortunate behavior of the coalesce
assignment operator (??=) in combination with function calls. Here's
the TL;DR:

foo()['bar'] ??= 42;

Currently, this code calls foo() twice. This seems rather unexpected.
The technical reason as to why this happens is not straight-forward,
but I will attempt to explain below. The behavior was not specified in
the RFC (https://wiki.php.net/rfc/null_coalesce_equal_operator) and is
completely untested, and as such I don't believe it is by design. My
proposal is to change it so that foo() is only called once.

This is what is happening in detail.

??= is special in that it needs to evaluate the lhs (left hand side)
twice. At first, we need to check if the offset exists, then
conditionally execute the rhs (right hand side), re-fetch the offset
and assign the rhs value to it. The reason for the re-fetching of the
offset is that the evaluation of the rhs may invalidate the offset.
This is explained in the following blog post:

https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety

Essentially, the offset may be a pointer into an array element or
object property. If the rhs frees the array or object, or grows the
array causing a reallocation (meaning it is moved to some other place
in memory), the pointer is no longer valid. For this reason, PHP makes
sure no user code may execute between the fetching of an offset and
the assignment to it. Normally, that just means evaluating the rhs
before fetching the offset. In this case, we need to evaluate the lhs
first to know if we even should evaluate the rhs.

Naively evaluating the lhs again poses a problem for expressions with
side-effects. For example:

$array[$x++] ??= 42;

We do not want to re-evaluate the entire expression because $x++ will
lead to a different array offset the second time around. The way this
is solved is by "memoizing" any compiled expression in the lhs that is
not a variable, meaning not part of the offset that may be
invalidated. Internally, a variable is considered anything that may be
written to, i.e. local variables ($foo), properties ($foo->bar,
Foo::$bar), array offsets ($foo['bar']), and function calls (foo(),
$foo->bar(), Foo::bar(), $foo(), as they may return a modifiable
reference). The fact that function calls are included in that list
leads to the problem presented above. It is not actually necessary to
exclude them from memoization because their result may not be
invalidated.

Another inconsistency is that function call arguments will be
re-evaluated, but only if they are not part of some other expression.

a. foo(bar())['baz'] ??= 42;
b. foo(bar() + 0)['baz'] ??= 42;

a calls both foo() and bar() twice. b however calls foo() twice but
bar() only once. That is because the expression bar() + 0 is not
considered a variable and as such gets memoized.

This is definitely a bug in the original implementation.
In case a function is evaluated twice and returns different values, we
check one value, but assign to another.

I propose to unconditionally memoize calls (in all forms) when they
appear in the lhs of a coalesce expression. This will ensure that
calls are only executed once, including function arguments and the lhs
of method calls. Consequently, the assignment will be performed on the
same offset that was previously tested, even if the expression
contains a function call with side-effects.

The implementation for this change is simple:
https://github.com/php/php-src/pull/11592

Let me know if you have any concerns. I'm planning on merging this for
master if there is consensus on the semantics.

+1

Thanks. Dmitry.

Ilija

--

To unsubscribe, visit: https://www.php.net/unsub.php

2 years ago by Ilija Tovilo — view source

unread

Hi Flávio

On Wed, Jul 5, 2023 at 12:17 PM Flávio Heleno
flaviohbatista@gmail.com wrote:

I recently discovered some unfortunate behavior of the coalesce
assignment operator (??=) in combination with function calls.

Great catch Ilija!

Do you mind sharing how did you stumble upon it?

Oh, it's unspectacular. oss-fuzz found a related problem
(https://github.com/php/php-src/pull/11581) which made me look into
the implementation of ??= which is when I noticed the strange
behavior.

Ilija

2 years ago by mickmackusa — view source

unread

I didn't even realize that there was a difference between preincrementation
and postincrementation when used in array access.

I guess this is because my default preference is preincrementation and also
that I probably don't use a lot of incrementation while accessing array
data.

I held a false belief that the different syntaxes only mattered when
echoing.

$x = 0;
$array1[1] = 43;
$array1[++$x] ??= 42;
var_dump($x, $array1);

echo "\n---\n";

$y = 0;
$array2[0] = 43;
$array2[$y++] ??= 42;
var_dump($y, $array2);

Thanks Ilija

mickmackusa