Discussion: making continue and break into an expression

1 year ago by rokas.sleinius@gmail.com — view source

unread

Hello internals,

Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?

Instead of:

while(true) {
...
if(is_null($arr['var'])) continue;
if($something) continue; else break;
...
}

You could write

while(true) {
...
$arr['var'] ?? continue;
$something ? continue : break;
...
}

Robert Landers
Software Engineer
Utrecht NL

--

To unsubscribe, visit: https://www.php.net/unsub.php

Me personally would love to see return receive such treatment too!

(for a SerenityOS-like pattern of returning ”ErrorOr”-like objects)

P.S. sorry for a simple ”me too” reply

1 year ago by Ilija Tovilo — view source

unread

Hi Robert

On Thu, Jan 25, 2024 at 10:16 AM Robert Landers
landers.robert@gmail.com wrote:

Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?

Instead of:

while(true) {
...
if(is_null($arr['var'])) continue;
if($something) continue; else break;
...
}

You could write

while(true) {
...
$arr['var'] ?? continue;
$something ? continue : break;
...
}

This leads to very similar issues as break/continue inside blocks. See:
https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statements

I'll try to explain.

The VM works with temporary variables. For the expression foo() +
bar() two temporary variables for the result of foo() and bar() will
be created, which are then used for the + operation. Normally, + will
consume both operands, i.e. use and then free them. However, with
break/continue etc. being expressions, it would become possible to
skip over the consuming instructions.

do {
    echo foo() + break;
} while (true);
echo 'Done';

Pseudo opcodes:

0000 V1 = CALL foo
0001 JMP 0005
0002 V2 = ADD V1 false ; false is here represents a bottom value

that will never actually be used
0003 ECHO V2
0004 JMP 0000
0005 ECHO 'Done'

Since JMP will skip over the ADD instruction, V1 remains unused. A
similar problem already exists for break/continue in foreach itself.

foreach ($foos as $foo) {
   foreach ($bars as $bar) {
       break 2;
   }
}

foreach holds a copy of $bars (in case it gets modified) that normally
gets cleaned up when the loop ends. With break over multiple
loop-boundaries, we can completely skip over this freeing mechanism.
PHP solves this by inserting an explicit FE_FREE instruction before
the break 2, which itself is essentially just a JMP to the end of the
outer loop.

Hopefully it's now more evident why this is a problem:

while (true) {
   foo() && break;
}

foo() returns a value that would normally be consumed by the &&
operation. However, with break, we may skip over the && operation
entirely. As such, the break itself becomes responsible for freeing
these values. This requires significant changes in the compiler to
track variables that are currently "live" (i.e. haven't been consumed
yet), and emitting FREE opcodes for them as needed. I've implemented
this for match blocks here:

https://github.com/php/php-src/compare/master...iluuu1994:php-src:match-blocks-var-tracking

However, note that due to complexity, I've decided to disallow using
break/continue and the likes in such contexts to avoid this issue
completely, which isn't possible for what you are suggesting.

There's another related issue.

foo(bar(), break);

Function calls in PHP consist of multiple instructions, namely an
INIT_CALL, 0-n SEND and a DO_CALL opcode. INIT_CALL creates a stack
frame, SEND pushes arguments onto the stack frame, and DO_CALL starts
the execution of the function and frees both arguments and stack frame
when the function ends. If prior to a SEND opcode we break, we skip
over the DO_CALL, so the stack frame needs to be freed manually.

The patch linked above solves this by inserting CLEAN_UNFINISHED_CALLS
opcodes that do as the name suggests. This mechanism is already used
for exceptions. This should work for you, but was insufficient for
match blocks, for reasons I won't get into here.

All this to say: Don't expect the implementation here to be trivial.

Regards,
Ilija

1 year ago by Larry Garfield — view source

unread

This leads to very similar issues as break/continue inside blocks. See:
https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statements

I'll try to explain.

The VM works with temporary variables. For the expression foo() +
bar() two temporary variables for the result of foo() and bar() will
be created, which are then used for the + operation. Normally, + will
consume both operands, i.e. use and then free them. However, with
break/continue etc. being expressions, it would become possible to
skip over the consuming instructions.
do {
    echo foo() + break;
} while (true);
echo 'Done';
Pseudo opcodes:
0000 V1 = CALL foo
0001 JMP 0005
0002 V2 = ADD V1 false ; false is here represents a bottom value
that will never actually be used
0003 ECHO V2
0004 JMP 0000
0005 ECHO 'Done'

Since JMP will skip over the ADD instruction, V1 remains unused. A
similar problem already exists for break/continue in foreach itself.
foreach ($foos as $foo) {
   foreach ($bars as $bar) {
       break 2;
   }
}
foreach holds a copy of $bars (in case it gets modified) that normally
gets cleaned up when the loop ends. With break over multiple
loop-boundaries, we can completely skip over this freeing mechanism.
PHP solves this by inserting an explicit FE_FREE instruction before
the break 2, which itself is essentially just a JMP to the end of the
outer loop.

Hopefully it's now more evident why this is a problem:
while (true) {
   foo() && break;
}
foo() returns a value that would normally be consumed by the &&
operation. However, with break, we may skip over the && operation
entirely. As such, the break itself becomes responsible for freeing
these values. This requires significant changes in the compiler to
track variables that are currently "live" (i.e. haven't been consumed
yet), and emitting FREE opcodes for them as needed. I've implemented
this for match blocks here:

https://github.com/php/php-src/compare/master...iluuu1994:php-src:match-blocks-var-tracking

However, note that due to complexity, I've decided to disallow using
break/continue and the likes in such contexts to avoid this issue
completely, which isn't possible for what you are suggesting.

There's another related issue.
foo(bar(), break);
Function calls in PHP consist of multiple instructions, namely an
INIT_CALL, 0-n SEND and a DO_CALL opcode. INIT_CALL creates a stack
frame, SEND pushes arguments onto the stack frame, and DO_CALL starts
the execution of the function and frees both arguments and stack frame
when the function ends. If prior to a SEND opcode we break, we skip
over the DO_CALL, so the stack frame needs to be freed manually.

The patch linked above solves this by inserting CLEAN_UNFINISHED_CALLS
opcodes that do as the name suggests. This mechanism is already used
for exceptions. This should work for you, but was insufficient for
match blocks, for reasons I won't get into here.

All this to say: Don't expect the implementation here to be trivial.

Regards,
Ilija

I'm curious, how did throw expressions manage to avoid these issues? Or was it just "Ilija did the hard work of tracking down the weirdness?"

--Larry Garfield

1 year ago by Ilija Tovilo — view source

unread

Hi Larry

This leads to very similar issues as break/continue inside blocks. See:
https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statements

I'm curious, how did throw expressions manage to avoid these issues? Or was it just "Ilija did the hard work of tracking down the weirdness?"

Can't really take the credit for this. This issue went over my head,
as this was my first RFC.

Exceptions work a bit differently, in that they use something called
live-ranges. Essentially, we look at the generated opcodes and figure
out which variables are "live" (i.e. valid and unfreed) during which
opcodes. For something like echo foo() + bar():

Pseudo opcodes:

0000 V1 = CALL foo
0001 V2 = CALL bar
0003 V3 = ADD V1 V2
0004 ECHO V3

V1 would be live for 0000-0003, V2 for 0001-0003, V3 for 0003-0004. If
an exception is thrown (or rethrown across function boundaries) the VM
checks which temporary variables are currently live and frees them. So
if CALL bar were to throw, we'd see that V1 is currently live and
needs to be freed. For something like foo() + throw new Exception(),
if you replace the second CALL with a throw, you'll see that the
live-range for V1 doesn't change, and so this "just works".

There was, however, a related issue with the optimizer.

echo foo() + throw new Exception();

0000 V1 = CALL foo
0001 THROW
0003 V3 = ADD V1 false
0004 ECHO V3

Where the optimizer would remove the dead instructions after the
throw, breaking live-range analysis.

0000 V1 = CALL foo
0001 THROW

V1 no longer had a consuming opcode, and as such the algorithm could
no longer determine the live-range of V1. This would cause V1 to leak.
The solution was simply to disable dead code elimination for this
case. The solution was suggested by Tyson Andre and implemented by
Nikita.

In theory, break/continue expressions might try to re-use live-ranges.
I recall thinking about this, but I can't seem to remember if there
was a reason not to do it.

Ilija

5 months ago by Dmitry Derepko — view source

unread

Hello internals,

Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?

Hi!
I had similar idea to make break, continue and return be expressions instead of statements to simplify almost the same cases as Robert described above.

Grammar corrections in the PR. https://github.com/php/php-src/pull/17647
Ilija pointed to memory leaking problems as well.

Thinking about Ilija memory leaking case:
new Foo + return 1

I think we may have a workaround here, by allowing all of these constructions only available at some specific points:

<point> as the statement now
$cond ? <point> : <point>;
match ($v) { … => <point> }

So it will deny such cases:

operand OPERATOR <point> (1 + return; $cond && break; etc)

It may prevent memory leaking problems. Isn’t it?

I’m writing RFC: https://wiki.php.net/rfc/return_break_continue_expressions
I’ll start a new discussion when it will be ready for it.

Best regards,
Dmitrii Derepko.
@xepozz

5 months ago by Ilija Tovilo — view source

unread

Hi Dmitrii

I had similar idea to make break, continue and return be expressions instead of statements to simplify almost the same cases as Robert described above.

Grammar corrections in the PR. https://github.com/php/php-src/pull/17647
Ilija pointed to memory leaking problems as well.

Thinking about Ilija memory leaking case:
new Foo + return 1

I think we may have a workaround here, by allowing all of these constructions only available at some specific points:

<point> as the statement now

$cond ? <point> : <point>;

match ($v) { … => <point> }

So it will deny such cases:

operand OPERATOR <point> (1 + return; $cond && break; etc)

It may prevent memory leaking problems. Isn’t it?

Note that technical discussions don't need to happen on the internals
mailing list. It has a lot of recipients and they are usually not
interesting to the majority of people. GH is a better place for that.

This alone will not be sufficient, because the ?: and match
expressions themselves may be nested in other expressions that create
temporary variables. E.g. new Foo + ($cond ? $x : break) will create
the same issue. Restricting control flow statements fully to a
combination of these expressions would work, however is also less
useful and thus less persuasive. In my experience, incomprehensible
limitations are generally not well received by internals.

Ilija

5 months ago by Rob Landers — view source

unread

Hello internals,

Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?

Hi!
I had similar idea to make break, continue and return be expressions instead of statements to simplify almost the same cases as Robert described above.

Grammar corrections in the PR. https://github.com/php/php-src/pull/17647
Ilija pointed to memory leaking problems as well.

Thinking about Ilija memory leaking case:
new Foo + return 1

I think we may have a workaround here, by allowing all of these constructions only available at some specific points:

<point> as the statement now

$cond ? <point> : <point>;

match ($v) { … => <point> }

So it will deny such cases:

operand OPERATOR <point> (1 + return; $cond && break; etc)

It may prevent memory leaking problems. Isn’t it?

I’m writing RFC: https://wiki.php.net/rfc/return_break_continue_expressions
I’ll start a new discussion when it will be ready for it.

Best regards,
Dmitrii Derepko.
@xepozz

Oh, this was a fun one!

I ended up rewriting the AST during compilation (pass 2) to return/break on the next statement instead of in the current statement. That let me get around the issue but I guessed nobody would like it and it probably wouldn’t pass on technical reasons.

In other words, yes it was an expression on the grammar level, but it was compiled as a statement.

— Rob

5 months ago by Rob Landers — view source

unread

Hello internals,

Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?

Hi!
I had similar idea to make break, continue and return be expressions instead of statements to simplify almost the same cases as Robert described above.

Grammar corrections in the PR. https://github.com/php/php-src/pull/17647
Ilija pointed to memory leaking problems as well.

Thinking about Ilija memory leaking case:
new Foo + return 1

I think we may have a workaround here, by allowing all of these constructions only available at some specific points:

<point> as the statement now

$cond ? <point> : <point>;

match ($v) { … => <point> }

So it will deny such cases:

operand OPERATOR <point> (1 + return; $cond && break; etc)

It may prevent memory leaking problems. Isn’t it?

I’m writing RFC: https://wiki.php.net/rfc/return_break_continue_expressions
I’ll start a new discussion when it will be ready for it.

Best regards,
Dmitrii Derepko.
@xepozz

Oh, this was a fun one!

I ended up rewriting the AST during compilation (pass 2) to return/break on the next statement instead of in the current statement. That let me get around the issue but I guessed nobody would like it and it probably wouldn’t pass on technical reasons.

In other words, yes it was an expression on the grammar level, but it was compiled as a statement.

— Rob

Hello again,

I just finished reviewing the diff and didn't notice any tests that show what the value of things actually become. In my diff, I did something like this:

$this->x = return $y; // $this->x === $y

Which is basically shorthand for:

$this->x = $y;
return $y;

An empty return, break, or continue (well, only one of these has a value) contains the value NULL:

$this->x = break; // $this->x === null

Shorthand for:

$thix->x = null;
break;

I'm trying to figure out which branch this was on ... but it was likely on my old computer since I originally sent it the email from my gmail address.

— Rob