Hello internals,
Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?
Instead of:
while(true) {
...
if(is_null($arr['var'])) continue;
if($something) continue; else break;
...
}
You could write
while(true) {
...
$arr['var'] ?? continue;
$something ? continue : break;
...
}
Robert Landers
Software Engineer
Utrecht NL
Hello internals,
Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?Instead of:
while(true) {
...
if(is_null($arr['var'])) continue;
if($something) continue; else break;
...
}You could write
while(true) {
...
$arr['var'] ?? continue;
$something ? continue : break;
...
}Robert Landers
Software Engineer
Utrecht NL--
To unsubscribe, visit: https://www.php.net/unsub.php
Me personally would love to see return
receive such treatment too!
(for a SerenityOS-like pattern of returning ”ErrorOr”-like objects)
P.S. sorry for a simple ”me too” reply
Hi Robert
On Thu, Jan 25, 2024 at 10:16 AM Robert Landers
landers.robert@gmail.com wrote:
Now that throwing is an expression, it allows for some very concise
programming. What are your thoughts on making a break/continue into an
expression as well?Instead of:
while(true) {
...
if(is_null($arr['var'])) continue;
if($something) continue; else break;
...
}You could write
while(true) {
...
$arr['var'] ?? continue;
$something ? continue : break;
...
}
This leads to very similar issues as break/continue inside blocks. See:
https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statements
I'll try to explain.
The VM works with temporary variables. For the expression foo() +
bar() two temporary variables for the result of foo() and bar() will
be created, which are then used for the + operation. Normally, + will
consume both operands, i.e. use and then free them. However, with
break/continue etc. being expressions, it would become possible to
skip over the consuming instructions.
do {
echo foo() + break;
} while (true);
echo 'Done';
Pseudo opcodes:
0000 V1 = CALL foo
0001 JMP 0005
0002 V2 = ADD V1 false ; false is here represents a bottom value
that will never actually be used
0003 ECHO V2
0004 JMP 0000
0005 ECHO 'Done'
Since JMP will skip over the ADD instruction, V1 remains unused. A
similar problem already exists for break/continue in foreach itself.
foreach ($foos as $foo) {
foreach ($bars as $bar) {
break 2;
}
}
foreach holds a copy of $bars (in case it gets modified) that normally
gets cleaned up when the loop ends. With break over multiple
loop-boundaries, we can completely skip over this freeing mechanism.
PHP solves this by inserting an explicit FE_FREE instruction before
the break 2, which itself is essentially just a JMP to the end of the
outer loop.
Hopefully it's now more evident why this is a problem:
while (true) {
foo() && break;
}
foo() returns a value that would normally be consumed by the &&
operation. However, with break, we may skip over the && operation
entirely. As such, the break itself becomes responsible for freeing
these values. This requires significant changes in the compiler to
track variables that are currently "live" (i.e. haven't been consumed
yet), and emitting FREE opcodes for them as needed. I've implemented
this for match blocks here:
https://github.com/php/php-src/compare/master...iluuu1994:php-src:match-blocks-var-tracking
However, note that due to complexity, I've decided to disallow using
break/continue and the likes in such contexts to avoid this issue
completely, which isn't possible for what you are suggesting.
There's another related issue.
foo(bar(), break);
Function calls in PHP consist of multiple instructions, namely an
INIT_CALL, 0-n SEND and a DO_CALL opcode. INIT_CALL creates a stack
frame, SEND pushes arguments onto the stack frame, and DO_CALL starts
the execution of the function and frees both arguments and stack frame
when the function ends. If prior to a SEND opcode we break, we skip
over the DO_CALL, so the stack frame needs to be freed manually.
The patch linked above solves this by inserting CLEAN_UNFINISHED_CALLS
opcodes that do as the name suggests. This mechanism is already used
for exceptions. This should work for you, but was insufficient for
match blocks, for reasons I won't get into here.
All this to say: Don't expect the implementation here to be trivial.
Regards,
Ilija
This leads to very similar issues as break/continue inside blocks. See:
https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statementsI'll try to explain.
The VM works with temporary variables. For the expression foo() +
bar() two temporary variables for the result of foo() and bar() will
be created, which are then used for the + operation. Normally, + will
consume both operands, i.e. use and then free them. However, with
break/continue etc. being expressions, it would become possible to
skip over the consuming instructions.do { echo foo() + break; } while (true); echo 'Done';
Pseudo opcodes:
0000 V1 = CALL foo 0001 JMP 0005 0002 V2 = ADD V1 false ; false is here represents a bottom value
that will never actually be used
0003 ECHO V2
0004 JMP 0000
0005 ECHO 'Done'Since JMP will skip over the ADD instruction, V1 remains unused. A
similar problem already exists for break/continue in foreach itself.foreach ($foos as $foo) { foreach ($bars as $bar) { break 2; } }
foreach holds a copy of $bars (in case it gets modified) that normally
gets cleaned up when the loop ends. With break over multiple
loop-boundaries, we can completely skip over this freeing mechanism.
PHP solves this by inserting an explicit FE_FREE instruction before
the break 2, which itself is essentially just a JMP to the end of the
outer loop.Hopefully it's now more evident why this is a problem:
while (true) { foo() && break; }
foo() returns a value that would normally be consumed by the &&
operation. However, with break, we may skip over the && operation
entirely. As such, the break itself becomes responsible for freeing
these values. This requires significant changes in the compiler to
track variables that are currently "live" (i.e. haven't been consumed
yet), and emitting FREE opcodes for them as needed. I've implemented
this for match blocks here:https://github.com/php/php-src/compare/master...iluuu1994:php-src:match-blocks-var-tracking
However, note that due to complexity, I've decided to disallow using
break/continue and the likes in such contexts to avoid this issue
completely, which isn't possible for what you are suggesting.There's another related issue.
foo(bar(), break);
Function calls in PHP consist of multiple instructions, namely an
INIT_CALL, 0-n SEND and a DO_CALL opcode. INIT_CALL creates a stack
frame, SEND pushes arguments onto the stack frame, and DO_CALL starts
the execution of the function and frees both arguments and stack frame
when the function ends. If prior to a SEND opcode we break, we skip
over the DO_CALL, so the stack frame needs to be freed manually.The patch linked above solves this by inserting CLEAN_UNFINISHED_CALLS
opcodes that do as the name suggests. This mechanism is already used
for exceptions. This should work for you, but was insufficient for
match blocks, for reasons I won't get into here.All this to say: Don't expect the implementation here to be trivial.
Regards,
Ilija
I'm curious, how did throw
expressions manage to avoid these issues? Or was it just "Ilija did the hard work of tracking down the weirdness?"
--Larry Garfield
Hi Larry
This leads to very similar issues as break/continue inside blocks. See:
https://wiki.php.net/rfc/match_blocks#technical_implications_of_control_statementsI'm curious, how did
throw
expressions manage to avoid these issues? Or was it just "Ilija did the hard work of tracking down the weirdness?"
Can't really take the credit for this. This issue went over my head,
as this was my first RFC.
Exceptions work a bit differently, in that they use something called
live-ranges. Essentially, we look at the generated opcodes and figure
out which variables are "live" (i.e. valid and unfreed) during which
opcodes. For something like echo foo() + bar():
Pseudo opcodes:
0000 V1 = CALL foo
0001 V2 = CALL bar
0003 V3 = ADD V1 V2
0004 ECHO V3
V1 would be live for 0000-0003, V2 for 0001-0003, V3 for 0003-0004. If
an exception is thrown (or rethrown across function boundaries) the VM
checks which temporary variables are currently live and frees them. So
if CALL bar were to throw, we'd see that V1 is currently live and
needs to be freed. For something like foo() + throw new Exception(),
if you replace the second CALL with a throw, you'll see that the
live-range for V1 doesn't change, and so this "just works".
There was, however, a related issue with the optimizer.
echo foo() + throw new Exception();
0000 V1 = CALL foo
0001 THROW
0003 V3 = ADD V1 false
0004 ECHO V3
Where the optimizer would remove the dead instructions after the
throw, breaking live-range analysis.
0000 V1 = CALL foo
0001 THROW
V1 no longer had a consuming opcode, and as such the algorithm could
no longer determine the live-range of V1. This would cause V1 to leak.
The solution was simply to disable dead code elimination for this
case. The solution was suggested by Tyson Andre and implemented by
Nikita.
In theory, break/continue expressions might try to re-use live-ranges.
I recall thinking about this, but I can't seem to remember if there
was a reason not to do it.
Ilija