Pipe precedence challenges

2 months ago by Larry Garfield — view source — reply

unread

Hi folks. We have discovered a subtle issue with the pipes implementation that needs to be addressed, although we're not entirely sure how. Specifically, Derick found it while trying to ensure Xdebug plays nice with pipes.

The problem has to do with the operator precedence of short closures vs pipes. For example:

$result = $arr
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

It's logical to assume that would be parsed as

$result = $arr
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'));

Which then compiles into, essentially:

$temp = (fn($x) => array_map(strtoupper(...), $x))($arr);
$temp = (fn($x) => array_filter($x, fn($v) => $v != 'O'))($temp);
$result = $temp;

$result = (fn($x) => array_filter($x, fn($v) => $v != 'O'))((fn($x) => array_map(strtoupper(...), $x))($arr));

(depending on how you want to visualize it.)

That was the intent of the RFC.

However, because short closures are "greedy" and assume anything before the next semi-colon is part of the closure body, it's actually getting parsed like this:

$result = $arr
|> fn($x) => (
array_map(strtoupper(...), $x)
|> fn($x) => (
array_filter($x, fn($v) => $v != 'O')
)
)
;

Which would compile into something like:

$result = (fn($x) => (fn($x) => array_filter($x, fn($v) => $v != 'O'))(array_map(strtoupper(...), $x)))($arr);

Which is not the intent.

Humorously, if all the functions and closures involved are pure, this parsing difference usually doesn't matter. The result is still computed as intended. That's why it wasn't caught during earlier reviews or by automated tests. However, there are cases where it matters. For example:

42
|> fn($x) => $x < 42
|> fn($x) => var_dump($x);

One would expect that to evaluate to false in the first segment, then var_dump() false. But it actually var_dump()s 42, because it gets interpreted as (42 |> fn($x) => var_dump($x)) first.

The incorrect wrapping also makes debugging vastly harder, even if the computed result is the same, as the expression is broken up "wrong" into multiple nested closures, stack traces are different than one would expect, etc.

The challenge is conflicting requirements. Closures have extremely low precedence right now, specifically so that they will grab everything that comes after them as a single expression. However, we also want pipes to allow a step to be a closure; that would typically mean making pipe bind even lower than closures, but that's not viable because it would result in

$result = 'x' |> fn ($x) => strtoupper($x)

being interpreted as

($result = 'x') |> (fn ($x) => strtoupper($x))

Which would be rather pointless.

So far, the best suggestion that's been put forward (though we've not tried implementing it yet) is to disallow a pipe inside a short-closure body, unless the body is surrounded by (). So this:

fn($x) => $x
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

Today, that would run somewhat by accident, as the outer-most closure would claim everything after it. With the new approach, it would be interpreted as passing fn($x) => $x as the argument to the first pipe segment, which would then be mapped over, which would fail. You'd instead need to do this:

fn($x) => ($x
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'))
);

Which is not wonderful, but it's not too bad, either. That's probably the only case where pipes inside a short-closure body would be useful anyway. And if PFA (https://wiki.php.net/rfc/partial_function_application_v2) and similar closure improvements pass, it will greatly reduce the need for mixing short closures and pipes together in either precedence, so it won't come up very often.

There are a few other operators that bind lower than pipe (see https://github.com/php/php-src/blob/fd8dfe1bfda62a3bd9dd1ff7c0577da75db02fcf/Zend/zend_language_parser.y#L56-L73), which would therefore need wrapping parentheses. For many of them we do want them to be lower than pipe, so just moving pipe's priority down isn't viable. However, most of those are unlikely to be used inside a pipe segment, so are less likely to come up. The most likely would be a bail-out exception:

$value = null;
$value
|> fn ($x) => $x ?? throw new Exception('Value may not be null')
|> fn ($x) => var_dump($x);

Which would currently be interpreted something like:

$c = function ($x) {
$c = function ($x) {
return var_dump($x);
};
return $x ?? throw $c(new Exception('Value may not be null'));
};
$c(null);

This would not throw the exception as expected, unless parentheses are added. It would var_dump() an exception and then try to throw the return of varl_dump(), which would fatal.

RM approval to address this during the 8.5 beta phase has been given, but we still want to have some discussion to make sure we have a good solution.

So, the question:

Does this seem like a good solution, or is there a problem we've not spotted yet?
Does anyone have a better solution to suggest?

Thanks to Derick, Ilija, and Tim for tracking down this annoying edge case.

--
Larry Garfield
larry@garfieldtech.com

2 months ago by Derick Rethans — view source — reply

unread

So, the question:

Does this seem like a good solution, or is there a problem we've not spotted yet?

Yes, I think this will be a good (temporary) solution until PFAs make it into the language, when it becomes less of an issue.

cheers
Derick

2 months ago by Rob Landers — view source — reply

unread

Hi folks. We have discovered a subtle issue with the pipes implementation that needs to be addressed, although we're not entirely sure how. Specifically, Derick found it while trying to ensure Xdebug plays nice with pipes.

The problem has to do with the operator precedence of short closures vs pipes. For example:

$result = $arr
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

It's logical to assume that would be parsed as

$result = $arr
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'));

Which then compiles into, essentially:

$temp = (fn($x) => array_map(strtoupper(...), $x))($arr);
$temp = (fn($x) => array_filter($x, fn($v) => $v != 'O'))($temp);
$result = $temp;

Or

$result = (fn($x) => array_filter($x, fn($v) => $v != 'O'))((fn($x) => array_map(strtoupper(...), $x))($arr));

(depending on how you want to visualize it.)

That was the intent of the RFC.

However, because short closures are "greedy" and assume anything before the next semi-colon is part of the closure body, it's actually getting parsed like this:

$result = $arr
|> fn($x) => (
array_map(strtoupper(...), $x)
|> fn($x) => (
array_filter($x, fn($v) => $v != 'O')
)
)
;

Which would compile into something like:

$result = (fn($x) => (fn($x) => array_filter($x, fn($v) => $v != 'O'))(array_map(strtoupper(...), $x)))($arr);

Which is not the intent.

Humorously, if all the functions and closures involved are pure, this parsing difference usually doesn't matter. The result is still computed as intended. That's why it wasn't caught during earlier reviews or by automated tests. However, there are cases where it matters. For example:

42
|> fn($x) => $x < 42
|> fn($x) => var_dump($x);

One would expect that to evaluate to false in the first segment, then var_dump() false. But it actually var_dump()s 42, because it gets interpreted as (42 |> fn($x) => var_dump($x)) first.

The incorrect wrapping also makes debugging vastly harder, even if the computed result is the same, as the expression is broken up "wrong" into multiple nested closures, stack traces are different than one would expect, etc.

The challenge is conflicting requirements. Closures have extremely low precedence right now, specifically so that they will grab everything that comes after them as a single expression. However, we also want pipes to allow a step to be a closure; that would typically mean making pipe bind even lower than closures, but that's not viable because it would result in

$result = 'x' |> fn ($x) => strtoupper($x)

being interpreted as

($result = 'x') |> (fn ($x) => strtoupper($x))

Which would be rather pointless.

So far, the best suggestion that's been put forward (though we've not tried implementing it yet) is to disallow a pipe inside a short-closure body, unless the body is surrounded by (). So this:

fn($x) => $x
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

Today, that would run somewhat by accident, as the outer-most closure would claim everything after it. With the new approach, it would be interpreted as passing fn($x) => $x as the argument to the first pipe segment, which would then be mapped over, which would fail. You'd instead need to do this:

fn($x) => ($x
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'))
);

Which is not wonderful, but it's not too bad, either. That's probably the only case where pipes inside a short-closure body would be useful anyway. And if PFA (https://wiki.php.net/rfc/partial_function_application_v2) and similar closure improvements pass, it will greatly reduce the need for mixing short closures and pipes together in either precedence, so it won't come up very often.

There are a few other operators that bind lower than pipe (see https://github.com/php/php-src/blob/fd8dfe1bfda62a3bd9dd1ff7c0577da75db02fcf/Zend/zend_language_parser.y#L56-L73), which would therefore need wrapping parentheses. For many of them we do want them to be lower than pipe, so just moving pipe's priority down isn't viable. However, most of those are unlikely to be used inside a pipe segment, so are less likely to come up. The most likely would be a bail-out exception:

$value = null;
$value
|> fn ($x) => $x ?? throw new Exception('Value may not be null')
|> fn ($x) => var_dump($x);

Which would currently be interpreted something like:

$c = function ($x) {
$c = function ($x) {
return var_dump($x);
};
return $x ?? throw $c(new Exception('Value may not be null'));
};
$c(null);

This would not throw the exception as expected, unless parentheses are added. It would var_dump() an exception and then try to throw the return of varl_dump(), which would fatal.

RM approval to address this during the 8.5 beta phase has been given, but we still want to have some discussion to make sure we have a good solution.

So, the question:

Does this seem like a good solution, or is there a problem we've not spotted yet?

Does anyone have a better solution to suggest?

Thanks to Derick, Ilija, and Tim for tracking down this annoying edge case.

--
Larry Garfield
larry@garfieldtech.com

Hi Larry,

What would happen to this:

$x = fn($y) => $y |> strtoupper(…) |> var_dump(…);

with this change? I would expect $x to be a chain — which is what it currently is. But if I understand correctly, it will become:

$x = (fn($y) => $y) |> strtoupper(...) |> var_dump(...);

which would be interesting... but this that you shared as the current "correct" way:

fn($x) => ($x
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'))
);

This looks correct to me… but I’ve been using pipes since they were merged. So, I might be biased. I would have written the above like this though:

fn($x) => $x
|> __(array_map(...))(strtoupper(...), __)
|> (array_filter(...))(, fn($v) => $v != 'O');

or if I was feeling wordy:

$array_map = __(array_map(...);
$array_filter = __(array_filter(...);
fn($x) => $x => $array_map(strtoupper(...), ) |> $array_filter(, fn($v) => $v != 'O');

This suggested change would completely break that if I'm understanding correctly. Then PFA would reallow this syntax? I dunno ... changing it makes it sound inconsistent to me.

FWIW, I like it how it is for writing reusable pipes chains.

PS. The code above is implemented like so:

function __(...$steps): Closure {
$first = array_shift($steps) ?? throw new LogicException();

return function(...$steps) use ($first) {
    return function($x) use ($first, $steps) {
        foreach($steps as &$val) {
            if ($val === __) {
                $val = $x;
            }
        }
        return $first(...$steps);
    };
};

}
define('__', __(...));

ish.

— Rob

2 months ago by Hans Krentel — view source — reply

unread

Does anyone have a better solution to suggest?

I've been following the discussion on the pipe/short-closure precedence
issue, and while the parentheses-based solution works, it does lead to
syntactic noise that undermines the readability benefits of pipes.

I'd like to suggest an alternative that turns this problem into an
opportunity for a more elegant syntax.

Inspired by prior art like Haskell's do notation, I propose what I
call "short closure canned pipes." The idea is to leverage the greedy
nature of short closures to define a clean, linear pipe chain within a
closure context. Instead of fighting the precedence, we embrace it to
create a new form:

Current problematic example (even with parentheses):

fn($x) => ($x
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'))
);

Proposed "canned pipes" syntax:

fn($x) =>
|> array_map(strtoupper(...), $x)
|> array_filter($x, fn($v) => $v != 'O')
;

This syntax:

Resolves the ambiguity: The fn($x) => explicitly "cans" the value
for the entire pipe chain, making the precedence clear without
parentheses.
Reduces boilerplate: It eliminates the repetition of fn($x) => and
nested parentheses, making the code more readable and maintainable.
Aligns with intent: It reads as a simple sequence of transformations,
which is the core goal of the pipe operator.

This approach effectively provides syntactic sugar for the common case
of processing a value through a pipe chain within a closure. It turns
the late-discovered defect into a win for developer ergonomics and
future-proofing.

I understand that implementation feasibility needs to be assessed, but
if possible, this could be a more satisfying solution than mandatory
parentheses. Thanks to the team for highlighting this issue -- it's
sparked a creative way to enhance the feature.

Would love to hear thoughts on this.

Best,
hakre

2 months ago by Larry Garfield — view source — reply

unread

Does anyone have a better solution to suggest?

I've been following the discussion on the pipe/short-closure precedence
issue, and while the parentheses-based solution works, it does lead to
syntactic noise that undermines the readability benefits of pipes.

I'd like to suggest an alternative that turns this problem into an
opportunity for a more elegant syntax.

Inspired by prior art like Haskell's do notation, I propose what I
call "short closure canned pipes." The idea is to leverage the greedy
nature of short closures to define a clean, linear pipe chain within a
closure context. Instead of fighting the precedence, we embrace it to
create a new form:

Current problematic example (even with parentheses):
fn($x) => ($x
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'))
);
Proposed "canned pipes" syntax:
fn($x) =>
|> array_map(strtoupper(...), $x)
|> array_filter($x, fn($v) => $v != 'O')
;
This syntax:

Resolves the ambiguity: The fn($x) => explicitly "cans" the value
for the entire pipe chain, making the precedence clear without
parentheses.

Reduces boilerplate: It eliminates the repetition of fn($x) => and
nested parentheses, making the code more readable and maintainable.

Aligns with intent: It reads as a simple sequence of transformations,
which is the core goal of the pipe operator.

This approach effectively provides syntactic sugar for the common case
of processing a value through a pipe chain within a closure. It turns
the late-discovered defect into a win for developer ergonomics and
future-proofing.

I understand that implementation feasibility needs to be assessed, but
if possible, this could be a more satisfying solution than mandatory
parentheses. Thanks to the team for highlighting this issue -- it's
sparked a creative way to enhance the feature.

Would love to hear thoughts on this.

Best,
hakre

That's an interesting idea, but I don't think it solves the problem.

It would make the syntax for pipes-in-closures nicer, yes. However:

Ideally, I'm hoping such a pipe-in-closure is replaced with a Composition operator soon enough (https://wiki.php.net/rfc/function-composition), so that wouldn't even be necessary.
The main problem is not when a pipe is inside a closure, but when a closure is inside a pipe, which is likely the far more common case.

$input
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O')
;

The problem is this gets interpreted "backwards", with the first closure wrapping around second, rather than being a sibling of it in a pipe chain. A short-hand for pipes-in-closures wouldn't address this issue, which is the more pressing concern.

The annoying extra parens there would also be resolved using PFA, which side steps the arrow function in the common case. As the issue is the syntactic parsing of arrow functions, not having arrow functions would also resolve the issue. (As would using higher-order functions instead, which frankly is what I am likely to do 90% of the time myself.)

--Larry Garfield

2 months ago by Rob Landers — view source — reply

unread

Does anyone have a better solution to suggest?

I've been following the discussion on the pipe/short-closure precedence
issue, and while the parentheses-based solution works, it does lead to
syntactic noise that undermines the readability benefits of pipes.

I'd like to suggest an alternative that turns this problem into an
opportunity for a more elegant syntax.

Inspired by prior art like Haskell's do notation, I propose what I
call "short closure canned pipes." The idea is to leverage the greedy
nature of short closures to define a clean, linear pipe chain within a
closure context. Instead of fighting the precedence, we embrace it to
create a new form:

Current problematic example (even with parentheses):
fn($x) => ($x
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'))
);
Proposed "canned pipes" syntax:
fn($x) =>
|> array_map(strtoupper(...), $x)
|> array_filter($x, fn($v) => $v != 'O')
;
This syntax:

Resolves the ambiguity: The fn($x) => explicitly "cans" the value
for the entire pipe chain, making the precedence clear without
parentheses.

Reduces boilerplate: It eliminates the repetition of fn($x) => and
nested parentheses, making the code more readable and maintainable.

Aligns with intent: It reads as a simple sequence of transformations,
which is the core goal of the pipe operator.

This approach effectively provides syntactic sugar for the common case
of processing a value through a pipe chain within a closure. It turns
the late-discovered defect into a win for developer ergonomics and
future-proofing.

I understand that implementation feasibility needs to be assessed, but
if possible, this could be a more satisfying solution than mandatory
parentheses. Thanks to the team for highlighting this issue -- it's
sparked a creative way to enhance the feature.

Would love to hear thoughts on this.

Best,
hakre
That's an interesting idea, but I don't think it solves the problem.

It would make the syntax for pipes-in-closures nicer, yes. However:

Ideally, I'm hoping such a pipe-in-closure is replaced with a Composition operator soon enough (https://wiki.php.net/rfc/function-composition), so that wouldn't even be necessary.

The main problem is not when a pipe is inside a closure, but when a closure is inside a pipe, which is likely the far more common case.

$input
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O')
;

The problem is this gets interpreted "backwards", with the first closure wrapping around second, rather than being a sibling of it in a pipe chain. A short-hand for pipes-in-closures wouldn't address this issue, which is the more pressing concern.

The annoying extra parens there would also be resolved using PFA, which side steps the arrow function in the common case. As the issue is the syntactic parsing of arrow functions, not having arrow functions would also resolve the issue. (As would using higher-order functions instead, which frankly is what I am likely to do 90% of the time myself.)

--Larry Garfield

I think the point Hans was making is that the fn($x) becomes implicit so you don't even need it, so your example would be something like this in this case:

$input |> fn($x) => |> array_map(strtoupper(...), $x) |> array_filter($x, fn($v) => $v != 'O');

— Rob

2 months ago by Hans Krentel — view source — reply

unread

I think the point Hans was making is that the fn($x) becomes implicit
so you don't even need it, so your example would be something like
this in this case:

$input |> fn($x) => |> array_map(strtoupper(...), $x) |> array_filter
($x, fn($v) => $v != 'O');

Hi Rob, thanks for jumping in and thinking about this! I really
appreciate it.

You've correctly grasped the goal: reducing the fn($x) => boilerplate.
However, the syntax I had in mind is a bit different and is designed
specifically to avoid the parsing ambiguity altogether by making the
structure unambiguous.

The proposal is for a self-contained "canned pipe" where the fn($x) =>
applies to the entire subsequent pipeline, like this:

// 'Canned pipe' creating a reusable function
$processor = fn($x) =>
    |> array_map(strtoupper(...), $x)
    |> array_filter($x, fn($v) => $v != 'O');

// This $processor can then be used anywhere, including in a pipe:
$result = $input |> $processor;

The key difference is that the |> lines are not inside the closure's
expression body. Instead, they are the body, in a dedicated gram-
matical form. This is why it avoids the greedy parsing issue—the parser
would see fn($x) => |> as a single unit defining a pipeline, not a
closure that contains pipe expressions.

Your example mixes the outer pipe $input |> with the inner pipe
|> array_map(...), which is a really interesting idea for a more
advanced integration! However, that likely reintroduces the very pre-
cedence conflict we're trying to solve, as the parser would have to un-
tangle which |> belongs to which level.

By keeping the proposal focused on this self-contained form, we get a
clean win for the "pipes-in-closure" case without adding new complexity
to the grammar for parsing nested pipes. It's very much inspired by how
Haskell's do notation provides a clean syntax for a sequence of
actions within a context.

Thanks again for the feedback—it helped me clarify the concept!

Best,
Hans

2 months ago by Hans Krentel — view source — reply

unread

The main problem is not when a pipe is inside a closure, but when a
closure is inside a pipe, which is likely the far more common case.

...The annoying extra parens there would also be resolved using PFA
... (As would using higher-order functions instead...)

Thank you for pointing to the PFA RFC. It indeed highlights the very
verbosity problem that leads to the current parsing conflict. It's clear
that PFA is the ideal, holistic solution to making functions pipe-
compatible.

Given that PFA is not part of the current release cycle, we are
effectively in a transitional state where we have the Pipe operator but
not its ideal companion. The mandatory parentheses are a necessary fix
for the grammar, but they accentuate the very syntactic noise we aim to
reduce.

This precedence issue has led me to look back at the history of this
feature, and I was fascinated to find Sara Golemon's original 2016 RFC.
It's striking that the very first proposal for a pipe operator in PHP
included a placeholder syntax ($$) as a core part of the design.

This suggests that the challenge of cleanly integrating multi-argument
functions into a pipeline was identified from the very beginning. The
current situation—where we have the pipe operator but not its intended
companion (PFA)—is an intermediate state that the original design sought
to avoid.

My 'canned pipes' suggestion is an attempt to address this same funda-
mental need for a placeholder within the current grammatical constraints.
It provides a path to recapture that original vision of clean, readable
pipelines today, while we work towards the broader solution of general-
purpose PFA in the future.

It's a proposal I'm happy with whatever the outcome—however it yields,
and whether or not we can capture the return value with or without
parentheses.

Looking forward to seeing the fix evolve and thanks again for your work
on this!

Best,
hakre

2 months ago by Larry Garfield — view source — reply

unread

So far, the best suggestion that's been put forward (though we've not
tried implementing it yet) is to disallow a pipe inside a short-closure
body, unless the body is surrounded by (). So this:

fn($x) => $x
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

Today, that would run somewhat by accident, as the outer-most closure
would claim everything after it. With the new approach, it would be
interpreted as passing fn($x) => $x as the argument to the first pipe
segment, which would then be mapped over, which would fail. You'd
instead need to do this:

fn($x) => ($x
|> (fn($x) => array_map(strtoupper(...), $x))
|> (fn($x) => array_filter($x, fn($v) => $v != 'O'))
);

Which is not wonderful, but it's not too bad, either. That's probably
the only case where pipes inside a short-closure body would be useful
anyway. And if PFA
(https://wiki.php.net/rfc/partial_function_application_v2) and similar
closure improvements pass, it will greatly reduce the need for mixing
short closures and pipes together in either precedence, so it won't
come up very often.

There are a few other operators that bind lower than pipe (see
https://github.com/php/php-src/blob/fd8dfe1bfda62a3bd9dd1ff7c0577da75db02fcf/Zend/zend_language_parser.y#L56-L73),
which would therefore need wrapping parentheses. For many of them we
do want them to be lower than pipe, so just moving pipe's priority down
isn't viable. However, most of those are unlikely to be used inside a
pipe segment, so are less likely to come up. The most likely would be
a bail-out exception:

$value = null;
$value
|> fn ($x) => $x ?? throw new Exception('Value may not be null')
|> fn ($x) => var_dump($x);

Which would currently be interpreted something like:

$c = function ($x) {
$c = function ($x) {
return var_dump($x);
};
return $x ?? throw $c(new Exception('Value may not be null'));
};
$c(null);

This would not throw the exception as expected, unless parentheses are
added. It would var_dump() an exception and then try to throw the
return of varl_dump(), which would fatal.

RM approval to address this during the 8.5 beta phase has been given,
but we still want to have some discussion to make sure we have a good
solution.

So, the question:

Does this seem like a good solution, or is there a problem we've not
spotted yet?

Does anyone have a better solution to suggest?

Thanks to Derick, Ilija, and Tim for tracking down this annoying edge case.

Ilija has implemented the above solution, and it has now been merged. It's not ideal, but it seems like the best option at present. Hopefully, we can land both composition and PFA in 8.6, which should make this situation a fairly rare occurrence anyway.

Thanks, Ilija!

--Larry Garfield