[RFC] Pipe Operator (again)

5 months ago by Oladoyinbo Vincent — view source

unread

PHP codebase in general is quite unreadable due to robust way of doing
things. Pipe operator might make things more complicated even more...

But after reading the RFC, something came to my mind, a way to simplify
this stuff

What if we implement it this way:


$pipe = " hello world "
           |> strtoupper(self)
           |> trim(self, ' ')
           |> htmlentities(self)
           |> fn (self): string => ....

Maybe 'self' or '$this' can be used as the keyword param.

It's a suggestion anyways.

Hi folks. A few years ago I posted an RFC for a pipe operator, as seen in
many other languages. At the time it didn't pass, in no small part because
the implementation was a bit shaky and it was right before freeze.
Nonetheless, there are now even more (bad) user-space implementations in
the wild, as it gets brought up frequently in "what do you want in PHP?"
threads (though nowhere near generics or better async, of course), so it
seems clear there is demand in the market for it.

It is now back with a better implementation (many thanks to Ilija for his
help and guidance in that), and it's nowhere close to freeze, so here we go
again:

https://wiki.php.net/rfc/pipe-operator-v3

Of particular note, since the last RFC I have concluded that a compose
operator is a necessary complement to a pipe operator. However, it's also
going to be notably more work, and the two operators don't actually
interact at all at the code level, so since people keep saying "Small
RFCs!", here's a small RFC. :-)

--
Larry Garfield
larry@garfieldtech.com

5 months ago by Eugene Sidelnyk — view source

unread

Hi, Larry, That's super! I hope it will pass!

Oladoyinbo, IMO the way it is described right now (e.g. explicit closures)
is much more elegant than a new way of doing things that's not so obvious
and will be necessary to keep in mind and support anyway.

If it'd be necessary to simplify the stuff, like passing particular
parameter from the input pipe into the function at the particular position,

I think it would be possible to do it with partial function application I
hope to see in the future. (e.g. bind callback for array_map function,
making a new function for the pipe that will accept the only parameter -
input array)

Thank you

On Fri, Feb 7, 2025, 9:16 AM Oladoyinbo Vincent oladoyinbov@gmail.com
wrote:

PHP codebase in general is quite unreadable due to robust way of doing
things. Pipe operator might make things more complicated even more...

But after reading the RFC, something came to my mind, a way to simplify
this stuff

What if we implement it this way:
$pipe = " hello world "
           |> strtoupper(self)
           |> trim(self, ' ')
           |> htmlentities(self)
           |> fn (self): string => ....
Maybe 'self' or '$this' can be used as the keyword param.

It's a suggestion anyways.

On Fri, 7 Feb 2025, 5:58 am Larry Garfield, larry@garfieldtech.com
wrote:

Hi folks. A few years ago I posted an RFC for a pipe operator, as seen
in many other languages. At the time it didn't pass, in no small part
because the implementation was a bit shaky and it was right before freeze.
Nonetheless, there are now even more (bad) user-space implementations in
the wild, as it gets brought up frequently in "what do you want in PHP?"
threads (though nowhere near generics or better async, of course), so it
seems clear there is demand in the market for it.

It is now back with a better implementation (many thanks to Ilija for his
help and guidance in that), and it's nowhere close to freeze, so here we go
again:

https://wiki.php.net/rfc/pipe-operator-v3

Of particular note, since the last RFC I have concluded that a compose
operator is a necessary complement to a pipe operator. However, it's also
going to be notably more work, and the two operators don't actually
interact at all at the code level, so since people keep saying "Small
RFCs!", here's a small RFC. :-)

--
Larry Garfield
larry@garfieldtech.com

5 months ago by Larry Garfield — view source

unread

Hi, Larry, That's super! I hope it will pass!

Oladoyinbo, IMO the way it is described right now (e.g. explicit
closures) is much more elegant than a new way of doing things that's
not so obvious and will be necessary to keep in mind and support
anyway.

If it'd be necessary to simplify the stuff, like passing particular
parameter from the input pipe into the function at the particular
position, - I think it would be possible to do it with partial function
application I hope to see in the future. (e.g. bind callback for
array_map function, making a new function for the pipe that will accept
the only parameter - input array)

Thank you

Both of you, please don't top post. :-)

That said, Eugene is correct. Hack (Facebook's PHP fork) had a pipe operator that took an expression with a magic placeholder on the right, rather than a callable. Every other language splits it into two parts, a pipe that takes a function on the right and some way to do easy partial application. I am firmly of the belief that Hack is wrong on this one and two separate features that dovetail together is the superior design over making a single pipe syntax that is less flexible. Especially with FCC now, any purpose-built unary function will be trivial to use, and a higher-order function that returns a unary function is also trivial to write.

As noted in Future Scope, I do want to revisit the PFA RFC at some point, but I need a collaborator who can help with the implementation as that is definitely over my head. (I have ideas for how to simplify the implementation, in concept, but my engine skill is too low to do it myself.)

--Larry Garfield

5 months ago by Rob Landers — view source

unread

Hi folks. A few years ago I posted an RFC for a pipe operator, as seen in many other languages. At the time it didn't pass, in no small part because the implementation was a bit shaky and it was right before freeze. Nonetheless, there are now even more (bad) user-space implementations in the wild, as it gets brought up frequently in "what do you want in PHP?" threads (though nowhere near generics or better async, of course), so it seems clear there is demand in the market for it.

It is now back with a better implementation (many thanks to Ilija for his help and guidance in that), and it's nowhere close to freeze, so here we go again:

https://wiki.php.net/rfc/pipe-operator-v3

Of particular note, since the last RFC I have concluded that a compose operator is a necessary complement to a pipe operator. However, it's also going to be notably more work, and the two operators don't actually interact at all at the code level, so since people keep saying "Small RFCs!", here's a small RFC. :-)

--
Larry Garfield
larry@garfieldtech.com

Hey Larry,

Maybe I missed it, but what happens here?

[1,2] |> add(…)

Is the array deconstructed or passed as-is? Further, if it is passed as-is (my gut is telling me it will be), then what is the error? Is it the normal “missing second parameter when calling add()” error or a new error specific to pipes?

If it is passed as-is, would the following be legal?

…[1,2] |> add(…)

— Rob

5 months ago by tim@bastelstu.be — view source

unread

Hi

Am 2025-02-07 05:57, schrieb Larry Garfield:

It is now back with a better implementation (many thanks to Ilija for
his help and guidance in that), and it's nowhere close to freeze, so
here we go again:

https://wiki.php.net/rfc/pipe-operator-v3

There's some editorial issues:

Status: Draft needs to be updated.
The RFC needs to be added to the overview page.
List formatting issues in “Future Scope” and “Patches and Tests”.

Would also help having a closed voting widget in the “Proposed Voting
Choices” section to be crystal clear on what is being voted on (see
below the next quote).

Regarding the contents:

“That is, the following two code fragments are also exactly
equivalent:”.

I do not believe this is true (specifically referring to the “exactly”
word in there), since the second code fragment does not have the short
closures, which likely results in an observable behavioral difference
when throwing Exceptions (in the stack trace) and also for debuggers. Or
is the implementation able to elide the the extra closure? (Of course
there's also the difference between the temporary variable existing,
with would be observable for get_defined_vars() and possibly
destructors / object lifetimes).

The “References” (as in reference variables) section would do well
with an example of what doesn't work.
In the “Compose” section: The section always uses the word
“callables”, but doesn't explain how it resolves the ambiguity of
[Foo::class, 'bar'] + [Bar::class, 'foo'].

Should it read “Closures” instead of “callables”?

In the “Compose” section: It would be useful to explicitly spell out
in which order the individual callables are called.

Will (strrev(...) + ucfirst(...))("foo") result in ooF or will it
result in Oof?

In the “Compose” section: The RFC says that “ComposedClosure” is not
quite equivalent, but it doesn't go into detail what is not quite
equivalent.

Specifically: Is the result actually limited to a single argument? Using
the ooF evaluation order, (strlen(...) + str_replace(...))("o", "", "foo") could reasonably result in 1.

In the “Why in the engine?” section: The RFC makes a claim about
performance.

Do you have any numbers?

Of particular note, since the last RFC I have concluded that a compose
operator is a necessary complement to a pipe operator.

The RFC lists “Compose” as part of the “Proposal” section, but also the
“Future Scope”. Should the part in “Proposal” be removed?

However, it's also going to be notably more work, and the two operators
don't actually interact at all at the code level, so since people keep
saying "Small RFCs!", here's a small RFC. :-)

I like this.

Best regards
Tim Düsterhus

5 months ago by Larry Garfield — view source

unread

Merging a few replies together here, since they overlap. Also reordering a few of Tim's comments...

Hi

Am 2025-02-07 05:57, schrieb Larry Garfield:

It is now back with a better implementation (many thanks to Ilija for
his help and guidance in that), and it's nowhere close to freeze, so
here we go again:

https://wiki.php.net/rfc/pipe-operator-v3

There's some editorial issues:

Status: Draft needs to be updated.

The RFC needs to be added to the overview page.

List formatting issues in “Future Scope” and “Patches and Tests”.

Would also help having a closed voting widget in the “Proposed Voting
Choices” section to be crystal clear on what is being voted on (see
below the next quote).

I split pipes off from the Composition RFC late last night right before posting; I guess I missed a few things while doing so. :-/ Most notably, the Compose section is now removed from pipes, as it is not in scope for this RFC. (As noted, it's going to be more work so has its own RFC.) Sorry for the confusion. I think it should all be handled now.

The “References” (as in reference variables) section would do well
with an example of what doesn't work.

Example block added.

In the “Why in the engine?” section: The RFC makes a claim about
performance.

Do you have any numbers?

Not currently. The statements here are based on simply counting the number of function calls necessary, and PHP function calls are sadly non-cheap. In previous benchmarks of my own libraries using my Crell/fp library, I did find that the number of function calls involved in some tight pipe operations was both a performance and debugging concern, but I don't have any hard numbers laying about at present to share.

If you think that's critical, please advise on how to best get meaningful numbers here.

Regarding the equivalency of pipes:

Tim Düsterhus wrote:

“That is, the following two code fragments are also exactly
equivalent:”.

I do not believe this is true (specifically referring to the “exactly”
word in there), since the second code fragment does not have the short
closures, which likely results in an observable behavioral difference
when throwing Exceptions (in the stack trace) and also for debuggers. Or
is the implementation able to elide the the extra closure? (Of course
there's also the difference between the temporary variable existing,
with would be observable for get_defined_vars() and possibly
destructors / object lifetimes).

Thomas Hruska wrote:

The repeated assignment to $temp in your second example is not
actually equal to the earlier example as you claim. The second example
with all of the $temp variables should, IMO, just be:

$temp = "Hello World";
$result = array_filter(array_map('strtoupper',
str_split(htmlentities($temp))), fn($v) { return $v != 'O'; });

Juris Evertovskis wrote:

Does the implementation actually turn 1 |> f(...) |> g(...) into
$π = f(1); g($π)? Is g(f(1)) not performanter? Or is the engine
clever enough with the var reuse anyways?

There's some subtlety here on these points. The v2 RFC used the lexer to mutate $a |> $b |> $c into the same AST as $c($b($a)), which would then compile as though that had been written in the first place. However, that made addressing references much harder, and there's an important caveat around order of operations. (See below.) The v3 RFC instead uses a compile function to take the AST of $a |> $b |> $c and produce opcodes that are effectively equivalent to $t = $b($a); $t = $c($t); I have not compared to see if they are the precise same opcodes, but they net effect is the same. So "effectively equivalent" may be a more accurate statement.

In particular, Tim is correct that, technically, the short lambdas would be used as-is, so you'd end up with the equivalent of:

$temp = (fn($x) => array_map(strtoupper(...), $x))($temp);

I'm not sure if there's a good way to automatically unwrap the closure there. (If someone knows of one, please share; I'm fine with including it.) However, the intent is that it would be largely unnecessary in the future with a revised PFA implementation, which would obviate the need for the explicit wrapping closure. You would instead write

$a |> array_map(strtoupper(...), ?);

Alternatively, one can use higher order user-space functions already. In trivial cases:

function amap(Closure $fn): Closure {
return fn(array $x) => array_map($fn, $x);
}

$a |> amap(strtoupper(...));

Which I am already using in Crell/fp and several libraries that leverage it, and it's quite ergonomic.

There's a whole bunch of such simple higher order functions here:
https://github.com/Crell/fp/blob/master/src/array.php
https://github.com/Crell/fp/blob/master/src/string.php

Which leads to the subtle difference between that and the v2 implementation, and why Thomas' statement is incorrect. If the expression on the right side that produces a Closure has side effects (output, DB interaction, etc.), then the order in which those side effects happen may change with the different restructuring. With all pure functions, that won't make a practical difference, and normally one should be using pure functions, but that's not something PHP can enforce.

I don't think there would be an appreciable performance difference between the two compiled versions, either way, but using the temp-var approach makes dealing with references easier, so it's what we're doing.

Juris Evertovskis wrote:

Do you think it would be hard to add some shorthand for |> $condition ? $callable : fn($😐) => $😐?

I'm not sure I follow here. Assuming you're talking about "branch in the next step", the standard way of doing that is with a higher order user-space function. Something like:

function cond(bool $cond, Closure $t, Closure $f): Closure {
return $cond ? $t : $f;
}

$a |> cond($config > 10, bigval(...), smallval(...)) |> otherstuff(...);

I think it's premature to try and bake that logic into the language, especially when I don't know of any other function-composition-having language that does so at the language level rather than the standard library level. (There are a number of fun operations people build into pipelines, but they are all generally done in user space.)

--Larry Garfield

5 months ago by Rob Landers — view source

unread

Merging a few replies together here, since they overlap. Also reordering a few of Tim's comments...

Hi

Am 2025-02-07 05:57, schrieb Larry Garfield:

It is now back with a better implementation (many thanks to Ilija for
his help and guidance in that), and it's nowhere close to freeze, so
here we go again:

https://wiki.php.net/rfc/pipe-operator-v3

There's some editorial issues:

Status: Draft needs to be updated.

The RFC needs to be added to the overview page.

List formatting issues in “Future Scope” and “Patches and Tests”.

Would also help having a closed voting widget in the “Proposed Voting
Choices” section to be crystal clear on what is being voted on (see
below the next quote).

I split pipes off from the Composition RFC late last night right before posting; I guess I missed a few things while doing so. :-/ Most notably, the Compose section is now removed from pipes, as it is not in scope for this RFC. (As noted, it's going to be more work so has its own RFC.) Sorry for the confusion. I think it should all be handled now.

The “References” (as in reference variables) section would do well
with an example of what doesn't work.

Example block added.

In the “Why in the engine?” section: The RFC makes a claim about
performance.

Do you have any numbers?

Not currently. The statements here are based on simply counting the number of function calls necessary, and PHP function calls are sadly non-cheap. In previous benchmarks of my own libraries using my Crell/fp library, I did find that the number of function calls involved in some tight pipe operations was both a performance and debugging concern, but I don't have any hard numbers laying about at present to share.

If you think that's critical, please advise on how to best get meaningful numbers here.

Regarding the equivalency of pipes:

Tim Düsterhus wrote:

“That is, the following two code fragments are also exactly
equivalent:”.

I do not believe this is true (specifically referring to the “exactly”
word in there), since the second code fragment does not have the short
closures, which likely results in an observable behavioral difference
when throwing Exceptions (in the stack trace) and also for debuggers. Or
is the implementation able to elide the the extra closure? (Of course
there's also the difference between the temporary variable existing,
with would be observable for get_defined_vars() and possibly
destructors / object lifetimes).

Thomas Hruska wrote:

The repeated assignment to $temp in your second example is not
actually equal to the earlier example as you claim. The second example
with all of the $temp variables should, IMO, just be:

$temp = "Hello World";
$result = array_filter(array_map('strtoupper',
str_split(htmlentities($temp))), fn($v) { return $v != 'O'; });

Juris Evertovskis wrote:

Does the implementation actually turn 1 |> f(...) |> g(...) into
$π = f(1); g($π)? Is g(f(1)) not performanter? Or is the engine
clever enough with the var reuse anyways?

There's some subtlety here on these points. The v2 RFC used the lexer to mutate $a |> $b |> $c into the same AST as $c($b($a)), which would then compile as though that had been written in the first place. However, that made addressing references much harder, and there's an important caveat around order of operations. (See below.) The v3 RFC instead uses a compile function to take the AST of $a |> $b |> $c and produce opcodes that are effectively equivalent to $t = $b($a); $t = $c($t); I have not compared to see if they are the precise same opcodes, but they net effect is the same. So "effectively equivalent" may be a more accurate statement.

In particular, Tim is correct that, technically, the short lambdas would be used as-is, so you'd end up with the equivalent of:

$temp = (fn($x) => array_map(strtoupper(...), $x))($temp);

I'm not sure if there's a good way to automatically unwrap the closure there. (If someone knows of one, please share; I'm fine with including it.) However, the intent is that it would be largely unnecessary in the future with a revised PFA implementation, which would obviate the need for the explicit wrapping closure. You would instead write

$a |> array_map(strtoupper(...), ?);

Alternatively, one can use higher order user-space functions already. In trivial cases:

function amap(Closure $fn): Closure {
return fn(array $x) => array_map($fn, $x);
}

$a |> amap(strtoupper(...));

Which I am already using in Crell/fp and several libraries that leverage it, and it's quite ergonomic.

There's a whole bunch of such simple higher order functions here:
https://github.com/Crell/fp/blob/master/src/array.php
https://github.com/Crell/fp/blob/master/src/string.php

Which leads to the subtle difference between that and the v2 implementation, and why Thomas' statement is incorrect. If the expression on the right side that produces a Closure has side effects (output, DB interaction, etc.), then the order in which those side effects happen may change with the different restructuring. With all pure functions, that won't make a practical difference, and normally one should be using pure functions, but that's not something PHP can enforce.

I don't think there would be an appreciable performance difference between the two compiled versions, either way, but using the temp-var approach makes dealing with references easier, so it's what we're doing.

Juris Evertovskis wrote:

Do you think it would be hard to add some shorthand for |> $condition ? $callable : fn($😐) => $😐?

I'm not sure I follow here. Assuming you're talking about "branch in the next step", the standard way of doing that is with a higher order user-space function. Something like:

function cond(bool $cond, Closure $t, Closure $f): Closure {
return $cond ? $t : $f;
}

$a |> cond($config > 10, bigval(...), smallval(...)) |> otherstuff(...);

I think it's premature to try and bake that logic into the language, especially when I don't know of any other function-composition-having language that does so at the language level rather than the standard library level. (There are a number of fun operations people build into pipelines, but they are all generally done in user space.)

--Larry Garfield

Put another way, what is the order of operations for this new operator?

For example, what is the output of

$x ? $y |> strlen(…) : $z

$x + $y |> sqrt(…) . EOL

Etc.

I noticed this seems to be missing from the RFC. As a new operator, I think it should be important to specify that.

— Rob

5 months ago by Larry Garfield — view source

unread

Put another way, what is the order of operations for this new operator?

For example, what is the output of

$x ? $y |> strlen(…) : $z

$x + $y |> sqrt(…) . EOL

Etc.

I noticed this seems to be missing from the RFC. As a new operator, I
think it should be important to specify that.

— Rob

Pipe deliberately binds fairly low, so most other operators will happen first. Including +, ?? and ? :, for which there are tests:

https://github.com/php/php-src/pull/17118/files#diff-81789df7e324801626ef4ef8f629cc95dceed4c09073a2b58b70c811bf776904

https://github.com/php/php-src/pull/17118/files#diff-56cbcf85bd7f68fa7a1f837eb15dcc536576986f366976f9642ad20867c471fd

https://github.com/php/php-src/pull/17118/files#diff-775c14f54cd1a27719d30bfab62024aeb1625bc3f3621fa0e7c16fb1c7957fdd

So in the examples above, the second would add $x and $y first, then square-root the result. The first, I think would probably need parens to avoid being invalid but I'd have to try it to be sure.

--Larry Garfield

5 months ago by Rob Landers — view source

unread

Put another way, what is the order of operations for this new operator?

For example, what is the output of

$x ? $y |> strlen(…) : $z

$x + $y |> sqrt(…) . EOL

Etc.

I noticed this seems to be missing from the RFC. As a new operator, I
think it should be important to specify that.

— Rob

Pipe deliberately binds fairly low, so most other operators will happen first. Including +, ?? and ? :, for which there are tests:

https://github.com/php/php-src/pull/17118/files#diff-81789df7e324801626ef4ef8f629cc95dceed4c09073a2b58b70c811bf776904

https://github.com/php/php-src/pull/17118/files#diff-56cbcf85bd7f68fa7a1f837eb15dcc536576986f366976f9642ad20867c471fd

https://github.com/php/php-src/pull/17118/files#diff-775c14f54cd1a27719d30bfab62024aeb1625bc3f3621fa0e7c16fb1c7957fdd

So in the examples above, the second would add $x and $y first, then square-root the result. The first, I think would probably need parens to avoid being invalid but I'd have to try it to be sure.

--Larry Garfield

It might be good to specify it in the RFC so if there are any strange behavior, decades from now, there will be an intent to figure out if it is a feature or a bug.

As to the ternary, it is the difference between that example being valid and this $x |> $x > 3 ? foo(…) : bar(...) |> baz(…) making sense or not. Personally, I wouldn’t write this code and would use parens to disambiguate, but it’d be handy to know when doing code reviews of authors who don’t.

— Rob

5 months ago by Christoph M. Becker — view source

unread

Put another way, what is the order of operations for this new operator?

For example, what is the output of

$x ? $y |> strlen(…) : $z

$x + $y |> sqrt(…) . EOL

Etc.

According to the reference implementation[1], that would be equivalent to

$x ? ($y |> strlen(…)) : $z

($x + $y) |> (sqrt(…) . EOL)

I noticed this seems to be missing from the RFC. As a new operator, I think it should be important to specify that.

Indeed, precendence and associativity need to be mentioned in the RFC.

[1] https://github.com/php/php-src/pull/17118

Christoph

5 months ago by Larry Garfield — view source

unread

Put another way, what is the order of operations for this new operator?

For example, what is the output of

$x ? $y |> strlen(…) : $z

$x + $y |> sqrt(…) . EOL

Etc.

According to the reference implementation[1], that would be equivalent to

$x ? ($y |> strlen(…)) : $z

($x + $y) |> (sqrt(…) . EOL)

I noticed this seems to be missing from the RFC. As a new operator, I think it should be important to specify that.

Indeed, precendence and associativity need to be mentioned in the RFC.

[1] https://github.com/php/php-src/pull/17118

Christoph

I've added a precedence section, using examples from the tests and this thread.

--Larry Garfield

5 months ago by tim@bastelstu.be — view source

unread

Hi

Indeed, precendence and associativity need to be mentioned in the RFC.

I've added a precedence section, using examples from the tests and this thread.

Associativity is not explicitly spelled out (though only left
associativity makes sense).

And for the ternary conditional, the phrasing is pretty non-technical:

it will likely need to be enclosed in () or else it will be misinterpreted.

What does “misinterpreted” mean in concrete terms? In the stated example
there is only one possible way to interpret it as a legal PHP program.
Does this mean it will syntax error without the parentheses? Explicitly
state the error message then.

Best regards
Tim Düsterhus

5 months ago by tim@bastelstu.be — view source

unread

Hi

I split pipes off from the Composition RFC late last night right before posting; I guess I missed a few things while doing so. :-/ Most notably, the Compose section is now removed from pipes, as it is not in scope for this RFC. (As noted, it's going to be more work so has its own RFC.) Sorry for the confusion. I think it should all be handled now.

The “Introduction” section still talks about function composition rather
than the pipe operator, I believe.

The “References” (as in reference variables) section would do well
with an example of what doesn't work.

Example block added.

I don't understand that example. If I would write this as regular
function calls it works fine. Did you mean to compare against:

 inc_print(['a' => 'A', 'b' => 'B']);

i.e.

 ['a' => 'A', 'b' => 'B'] |> inc_print(...);

? If not, then you will need to expand on “breaks” which is a
non-technical term.

In the “Why in the engine?” section: The RFC makes a claim about
performance.

Do you have any numbers?

Not currently. The statements here are based on simply counting the number of function calls necessary, and PHP function calls are sadly non-cheap. In previous benchmarks of my own libraries using my Crell/fp library, I did find that the number of function calls involved in some tight pipe operations was both a performance and debugging concern, but I don't have any hard numbers laying about at present to share.

If you think that's critical, please advise on how to best get meaningful numbers here.

Not sure if I missed the dedicated performance section on my first read
through the RFC or if it is actually new. It also claims:

The result is that pipe has virtually no runtime overhead.

Which given your claim that “function calls are non-cheap” and combined
with the intermediate closure for calls taking more than one parameter
is contradictory.

Generally speaking, if your RFC makes a claim (about performance), then
it needs to back this up by evidence and not with feelings.

Regarding the “How”:

A hyperfine
(https://tideways.com/profiler/blog/how-we-use-hyperfine-to-measure-php-engine-performance)
comparison for a release build comparing:

An implementation based on regular function calls without
intermediate variables.
An implementation based on regular function calls with an
intermediate temporary variable.
A performance-optimized userland pipe operator implementation.
The pipe operator RFC.

would certainly appropriate to gain a first insight.

Having an OPcode dump to compare (1) against (4) would help gain more
insights as to where the performance differences come from.

If the expression on the right side that produces a Closure has side effects (output, DB interaction, etc.), then the order in which those side effects happen may change with the different restructuring.

That is a good point. I see you added a precedence section, but this
does not fully explain the order of operations in face of side-effects
and more generally with regard to “short-circuiting” behavior. An OPcode
dump would explain that.

Specifically for:

 function foo()     { echo __FUNCTION__, PHP_EOL; return 1; }
 function bar()     { echo __FUNCTION__, PHP_EOL; return false; }
 function baz($in)  { echo __FUNCTION__, PHP_EOL; return $in; }
 function quux($in) { echo __FUNCTION__, PHP_EOL; return $in; }

 foo()
     |> (bar() ? baz(...) : quux(...))
     |> var_dump(...);

What will the output be?

but using the temp-var approach makes dealing with references easier

I thought the RFC said that references were disallowed?

Best regards
Tim Düsterhus

4 months ago by tim@bastelstu.be — view source

unread

Hi

If the expression on the right side that produces a Closure has side effects (output, DB interaction, etc.), then the order in which those side effects happen may change with the different restructuring.

That is a good point. I see you added a precedence section, but this
does not fully explain the order of operations in face of side-effects
and more generally with regard to “short-circuiting” behavior. An OPcode
dump would explain that.

Specifically for:
  function foo()     { echo __FUNCTION__, PHP_EOL; return 1; }
  function bar()     { echo __FUNCTION__, PHP_EOL; return false; }
  function baz($in)  { echo __FUNCTION__, PHP_EOL; return $in; }
  function quux($in) { echo __FUNCTION__, PHP_EOL; return $in; }

  foo()
      |> (bar() ? baz(...) : quux(...))
      |> var_dump(...);
What will the output be?

This is unresolved.

Best regards
Tim Düsterhus

5 months ago by Juris Evertovskis — view source

unread

Hi folks. A few years ago I posted an RFC for a pipe operator, as seen
in
many other languages. At the time it didn't pass, in no small part
because the implementation was a bit shaky and it was right before
freeze.
Nonetheless, there are now even more (bad) user-space implementations
in
the wild, as it gets brought up frequently in "what do you want in
PHP?"
threads (though nowhere near generics or better async, of course), so
it
seems clear there is demand in the market for it.

It is now back with a better implementation (many thanks to Ilija for
his
help and guidance in that), and it's nowhere close to freeze, so here
we
go again:

https://wiki.php.net/rfc/pipe-operator-v3

Of particular note, since the last RFC I have concluded that a compose
operator is a necessary complement to a pipe operator. However, it's
also
going to be notably more work, and the two operators don't actually
interact at all at the code level, so since people keep saying "Small
RFCs!", here's a small RFC. :-)

Great feature! Three questions and a comment from me.

Do you think it would be hard to add some shorthand for |> $condition ? $callable : fn($😐) => $😐?
Is compose in the scope or not? You mention it in both the main RFC
body and the future scope. Or are those different composes?
Does the implementation actually turn 1 |> f(...) |> g(...) into
$π = f(1); g($π)? Is g(f(1)) not performanter? Or is the engine
clever enough with the var reuse anyways?

I don't think Laravel's pipeline is relevant here. In it each callback
is responsible for invoking the rest of the chain. Thus it allows early
returns and interacting with the return value of the following chain
(return 5 + $next($v)). More like a middleware chaining tool, not a
pipe in the same meaning as in this RFC.

BR,
Juris

5 months ago by Christoph M. Becker — view source

unread

Hi folks. A few years ago I posted an RFC for a pipe operator, as seen in many other languages. At the time it didn't pass, in no small part because the implementation was a bit shaky and it was right before freeze. Nonetheless, there are now even more (bad) user-space implementations in the wild, as it gets brought up frequently in "what do you want in PHP?" threads (though nowhere near generics or better async, of course), so it seems clear there is demand in the market for it.

It is now back with a better implementation (many thanks to Ilija for his help and guidance in that), and it's nowhere close to freeze, so here we go again:

https://wiki.php.net/rfc/pipe-operator-v3

Thank you! I very much appreciate the simplicity (and efficiency) of
the implementation.

Of particular note, since the last RFC I have concluded that a compose operator is a necessary complement to a pipe operator. However, it's also going to be notably more work, and the two operators don't actually interact at all at the code level, so since people keep saying "Small RFCs!", here's a small RFC. :-)

Fair enough. And with the pipe operator, one might live without a
compose operator, e.g.

$f1 = fn($x) => 2 * $x;
$f2 = fn($x) => $x + 3;
// $f3 = $f2 ∘ $f1
$f3 = fn($x) => $x |> $f1 |> $f2;

Christoph

5 months ago by Larry Garfield — view source

unread

Of particular note, since the last RFC I have concluded that a compose operator is a necessary complement to a pipe operator. However, it's also going to be notably more work, and the two operators don't actually interact at all at the code level, so since people keep saying "Small RFCs!", here's a small RFC. :-)

Fair enough. And with the pipe operator, one might live without a
compose operator, e.g.

$f1 = fn($x) => 2 * $x;
$f2 = fn($x) => $x + 3;
// $f3 = $f2 ∘ $f1
$f3 = fn($x) => $x |> $f1 |> $f2;

Christoph

The v2 RFC took that position, that compose was easy enough to emulate via pipe. Indeed, pipe and compose can both be implemented in terms of each other. However, since the previous RFC I've concluded[1] that both are sufficiently useful that we really out to include both of them. PIpes are just way easier to implement in practice. :-)

--Larry Garfield

[1] https://peakd.com/hive-168588/@crell/aoc2021-review

5 months ago by Thomas Hruska — view source

unread

Hi folks. A few years ago I posted an RFC for a pipe operator, as seen in many other languages. At the time it didn't pass, in no small part because the implementation was a bit shaky and it was right before freeze. Nonetheless, there are now even more (bad) user-space implementations in the wild, as it gets brought up frequently in "what do you want in PHP?" threads (though nowhere near generics or better async, of course), so it seems clear there is demand in the market for it.

It is now back with a better implementation (many thanks to Ilija for his help and guidance in that), and it's nowhere close to freeze, so here we go again:

https://wiki.php.net/rfc/pipe-operator-v3

Of particular note, since the last RFC I have concluded that a compose operator is a necessary complement to a pipe operator. However, it's also going to be notably more work, and the two operators don't actually interact at all at the code level, so since people keep saying "Small RFCs!", here's a small RFC. :-)

There's a song in here somewhere that goes:

♪♫♬ PHP continues turning into...symbol SOUUUUUUUP! [Oh no.] ♪♫♬

The main example provided in the RFC makes its own excellent argument
against the proposed feature:

$result = "Hello World"
|> 'htmlentities'
|> str_split(...)
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

Symbols make languages harder to grok. I don't want a language like
COBOL where things that should sensibly be symbols are words but I also
don't want a code golfing language like APL that is just all the symbols
all day long. Language features should be able to be easily found via
search and every new symbol (or combination of symbols) is inherently
unsearchable on most/all search engines. That includes the search
engine on php.net. Go try searching for '...' or '=>' or '!=' operators
on php.net and you get...nothing! "Texture is the conductor of flavor."
-- French Chef Jean-Pierre. Balancing out symbols (liquids like water
which have no flavor) and words (meat and veggies packed with flavor) is
a language author's core responsibility in the language design soup kitchen.

While I'm not against adding symbols that serve a valuable purpose,
there is nothing to be gained by encouraging bad coding habits at the
outset. When a limitation is established up front such as "Functions
with more than one required parameter are not allowed" then users will
find ways to bypass the limitation such that it will kill performance in
favor of their perceived and flawed idea of "convenience." This
proposal will minimally result in creating an anonymous function to
call any basic function with more than one required parameter but also
encourage abuse of the splat operator which should be used exceedingly
sparingly. What I mean by that is: Users will construct arrays
(expensive) to pass to anonymous functions with one parameter
(expensive) and then use the splat operator inside the anonymous
functions to unpack the input array to call the actual function (VERY
expensive). Whatever performance gains made by moving bad application
design into PHP core will be far outweighed by the abuse that naturally
follows to circumvent limitations. In fact, your own contrived example
usage includes two anonymous functions that call functions with more
than one required parameter! You are already working around the known
limitations of your own proposed feature 🤦!! Why in the world would
you ever advertise that?! If that's not enough to kill an RFC before it
even goes to a vote, I don't know what is.

The repeated assignment to $temp in your second example is not
actually equal to the earlier example as you claim. The second example
with all of the $temp variables should, IMO, just be:

$temp = "Hello World";
$result = array_filter(array_map('strtoupper',
str_split(htmlentities($temp))), fn($v) { return $v != 'O'; });

By storing the result into $temp for each modification just so that you
can have multiline code, you are actually making the engine work harder
whereas a single statement saves the engine some unnecessary
refcounting/allocation/free work but accomplishes the same objective.
I'm nitpicking the clearly contrived second code example that didn't at
all improve my impression of the first example and where your own
example usage ended up exposing the fundamental flaws in the RFC. I
also consider the above compact code to be plenty readable and not
particularly necessary to span multiple lines, but that's obviously
subjective.

Just because someone can do something doesn't mean that they should.
More than likely, users trying to do pipe-like operations in PHP
shouldn't be doing them in the first place.

--
Thomas Hruska
CubicleSoft President

CubicleSoft has over 80 original open source projects and counting.
Plus a couple of commercial/retail products.

What software are you looking to build?

5 months ago by Faizan Akram Dar — view source

unread

The repeated assignment to $temp in your second example is not
actually equal to the earlier example as you claim. The second example
with all of the $temp variables should, IMO, just be:

$temp = "Hello World";
$result = array_filter(array_map('strtoupper',
str_split(htmlentities($temp))), fn($v) { return $v != 'O'; });

Tbh, this is unreadable. Larry's example with an intermediate variable is a
magnitude times more readable. This is exactly why we need pipe operator.

I also consider the above compact code to be plenty readable and not

particularly necessary to span multiple lines, but that's obviously
subjective.

It is not, the functions are being applied from in to out (or right to
left), which become hard to read with addition of each new function. Pipe
operator makes it natural as they are applied from left to right which is
how you read code, literally 0 cognitive load.

Just because someone can do something doesn't mean that they should.

More than likely, users trying to do pipe-like operations in PHP
shouldn't be doing them in the first place.

Why not? It clearly makes code more readable and in future with PFA (🤞)
will allow composing non-unary functions.
PHP is and always has been a multi paradigm language, there is no reason to
not add stuff which makes using functional paradigm easier.

Kind regards,
Faizan

4 months ago by come@chilliet.eu — view source

unread

Hello,

I’m also wondering when I see code examples in the RFC like:

This would be way better on performances as a single foreach, no?
I feel like this pipe operator encourages coders to use array_* functions with closures, which is often terrible performances compared to a loop.

How would the performance of the above compare with:

$profit = 0;
foreach (loadMany($input) as $item) {
$widget = makeWidget($item);
if (!isOnSale($widget)) {
continue;
}
$profit += sellWidget($widget);
}

Côme

4 months ago by Rowan Tommins [IMSoP] — view source

unread

This would be way better on performances as a single foreach, no?
I feel like this pipe operator encourages coders to use array_* functions with closures, which is often terrible performances compared to a loop.

I think this highlights something that has been mentioned a few times over the years: PHP badly needs more native functions for working with iterators. If each stage of the pipeline is lazily consuming an iterator and yielding each value in turn, one major source of performance impact goes away, because we don't have to repeatedly allocate intermediate arrays. It also makes it much easier to work with infinite inputs, which obviously can't be flattened to an array.

It also highlights why just letting all array functions accept iterable would not be the right approach: array_map(iterable):array would still have to eagerly iterate its input, so we need a separate iter_map(iterable):NonRewindableIterator (or whatever name). Even iter_sum() might shortcut if an invalid value was defined as an Error rather than Warning.

This feels like one of those cases where different proposals complement rather than blocking each other: iterator functions make pipes more efficient to use, and pipes make iterator functions more pleasant to use. I'd like both please. :)

Rowan Tommins
[IMSoP]

5 months ago by Gina P. Banyard — view source

unread

Hi folks. A few years ago I posted an RFC for a pipe operator, as seen in many other languages. At the time it didn't pass, in no small part because the implementation was a bit shaky and it was right before freeze. Nonetheless, there are now even more (bad) user-space implementations in the wild, as it gets brought up frequently in "what do you want in PHP?" threads (though nowhere near generics or better async, of course), so it seems clear there is demand in the market for it.

It is now back with a better implementation (many thanks to Ilija for his help and guidance in that), and it's nowhere close to freeze, so here we go again:

https://wiki.php.net/rfc/pipe-operator-v3

Of particular note, since the last RFC I have concluded that a compose operator is a necessary complement to a pipe operator. However, it's also going to be notably more work, and the two operators don't actually interact at all at the code level, so since people keep saying "Small RFCs!", here's a small RFC. :-)

I'm very much in favour of this RFC, it will make writing functional and date pipeline code less cumbersome.
I was curious how the blocking of by-ref parameter is done, and was pleasantly surprised that it is done at run-time, so "prefer-by-ref" parameters work without issues.
This is good motivation for me to go back and push the by-value sort() RFC [1] as it uses that mechanism.
I've also submitted a PR [1] to add such a test case.
Probably a good idea to specify this in the RFC.

Best regards,

Gina P. Banyard

[1] https://wiki.php.net/rfc/array-sort-return-array
[2] https://github.com/Crell/php-src/pull/1

5 months ago by tim@bastelstu.be — view source

unread

Hi

Am 2025-02-07 05:57, schrieb Larry Garfield:

https://wiki.php.net/rfc/pipe-operator-v3

After also having taken a look at the implementation and then the
updated “Precedence” section, I'd like to argue in favor of moving |>
to have a higher precedence than the comparison operators (i.e. between
string concatenation and <). This would mean that |> has higher
precedence than ??, but looking at the following examples, that
appears to be the more useful default anyways.

I'm rather interested in handling a null pipe result:

 $user = $request->get('id')
     |> $database->fetchUser(...)
     ?? new GuestUser();

Than handling a null callback (using the RFC example, because I can't
even think of a real-world use-case):

 $res1 = 5
     |> $null_func ?? defaultFunc(...);

To give some more examples of what would be possible without parentheses
then:

 $containsNotOnlyZero = $someString
     |> fn ($str) => str_replace('0', '', $str)
     |> strlen(...)
     > 0;

Which is not particularly pretty, but appears to be more useful than
either passing a boolean into a single-argument function or piping into
a boolean (which would error).

Best regards
Tim Düsterhus

4 months ago by Larry Garfield — view source

unread

Hi

Am 2025-02-07 05:57, schrieb Larry Garfield:

https://wiki.php.net/rfc/pipe-operator-v3

After also having taken a look at the implementation and then the
updated “Precedence” section, I'd like to argue in favor of moving |>
to have a higher precedence than the comparison operators (i.e. between
string concatenation and <). This would mean that |> has higher
precedence than ??, but looking at the following examples, that
appears to be the more useful default anyways.

I'm rather interested in handling a null pipe result:
 $user = $request->get('id')
     |> $database->fetchUser(...)
     ?? new GuestUser();
Than handling a null callback (using the RFC example, because I can't
even think of a real-world use-case):
 $res1 = 5
     |> $null_func ?? defaultFunc(...);
To give some more examples of what would be possible without parentheses
then:
 $containsNotOnlyZero = $someString
     |> fn ($str) => str_replace('0', '', $str)
     |> strlen(...)
     > 0;
Which is not particularly pretty, but appears to be more useful than
either passing a boolean into a single-argument function or piping into
a boolean (which would error).

Best regards
Tim Düsterhus

I have updated the patch and RFC accordingly. I think you're right, it does make a bit more sense this way.

--Larry Garfield

4 months ago by tim@bastelstu.be — view source

unread

Hi

I have updated the patch and RFC accordingly. I think you're right, it does make a bit more sense this way.

Is this paragraph in the RFC a left-over from before the change? It
appears redundant with the paragraph before:

The pipe operator has a deliberately low binding order, so that most surrounding operators will execute first. In particular, arithmetic operations, null coalesce, and ternaries all have higher binding priority, allowing for the RHS to have arbitrarily complex expressions in it that will still evaluate to a callable. For example:

Best regards
Tim Düsterhus

3 months ago by Ilija Tovilo — view source

unread

Hi Larry

Sorry for the late response.

https://wiki.php.net/rfc/pipe-operator-v3

We have already discussed this topic extensively off-list, so let me
bring the list up-to-date.

The current pipes proposal is elegantly simple. This has many upsides,
but it comes with an obvious limitation:
It only works well when the called function takes only a single argument.

$sourceCode |> lexer(...) |> parser(...) |> compiler(...) |> vm(...)

Such code is nice, but is also quite niche. I have argued off-list
that the predominant use-case for pipes are arrays and iterators
(including strings immediately split into chunks), and it seems most
agree. However, most array/iterator functions (e.g. filter, map,
reduce, first, all, etc.) don't fall into the one-parameter category.

A slightly simplified example from the RFC:

$result = "Hello World"
|> str_split(...)
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

IMO, this is harder to understand than the alternative of using
multiple statements with a temporary variable.

$tmp = "Hello World";
$tmp = str_split($tmp);
$tmp = array_map(strtoupper(...), $tmp);
$result = array_filter($tmp, fn($v) => $v != 'O');

The RFC has a solution for this: Partial function application [1].

$result = "Hello World"
|> str_split(...)
|> array_map(strtoupper(...), ?)
|> array_filter(?, fn($v) => $v != 'O');

This still causes more cognitive overhead than it should, at least to me.

The placement of ? is hard to detect, especially when it's not the
first argument.
The user now has to think about immediately-invoked closures that
exist solely for argument-reordering. The closure can be elided
through the optimizer, but we cannot elide the additional cognitive
overhead in the user.
The implementation of ? is significantly more complex than that of
pipes, making the supposed simplicity of pipes somewhat misleading.

If my assumption is correct that the primary use-case for pipes are
arrays, it might be worth investigating the possibility of introducing
a new iterator API, which has been proposed before [2], optimized for
pipes. Specifically, this API would ensure consistent placement of the
subject, i.e. the iterable in this case, as the first argument. Pipes
would no longer have the form of expr |> expr, where the
right-hand-side is expected to return a callable. Instead, it would
have the form of expr |> function_call, where the left-hand-side is
implicitly inserted as the first parameter of the call.

namespace Iter {
function map(iterable $iterable, \Closure $callback): \Iterator;
function filter(iterable $iterable, \Closure $callback): \Iterator;
}

namespace {
use function Iter{map, filter};

$result = "Hello World"
    |> `str_split()`
    |> map(strtoupper(...))
    |> filter(fn($v) => $v != 'O');

}

This is the same approach taken by Elixir [3]. It has a few benefits:

We don't need to think about closures that are immediately invoked,
because there are none. The code is exactly the same as if you had
written it through nested function calls. This simplifies things
significantly for both the engine and the user.
It closely resembles code that would be written in an
object-oriented manner, making it more familiar.
It is the shortest and most readable of all the proposed options.

As with everything, there are downsides.

It only works well for subject-first APIs. There are not an
insignificant number of existing functions that do not follow this
convention (e.g. explode(), preg_match(), etc.). That said, explode('
', $s) |> filter($c1) |> map($c2) still composes well, given explode()
is usually first first in the chain, while preg_match() is rarely
chained at all.
People have voiced concerns for potential confusion regarding the
right-hand-side. It may not be any arbitrary expression, but is
restricted to a function call. Hence, $param |> $myClosure is not
valid code, requiring additional braces: $param |> $myClosure().
This approach resembles the -> operator, where at least conceptually,
the left-hand-side is implicitly passed as a $this parameter. However,
the spaces between |> do not signal this fact as well, making it look
like the right-hand-side is evaluated separately. Potentially, a
different symbol might work better.

Internal reactions to this idea were mixed, so I'm interested to hear
what the community thinks about it.

Ilija

[1] https://wiki.php.net/rfc/partial_function_application
[2] https://externals.io/message/118896
[3] https://elixirschool.com/en/lessons/basics/pipe_operator

3 months ago by ojschmidt@kde.org — view source

unread

Hi Ilija and Larry,

thank you so much for your great work bringing PHP forward. I have been passively reading this list for a while and would like to chime in with two thoughts.

Pipes would no longer have the form of expr |> expr, where the right-hand-side is expected to return a callable. Instead, it would have the form of expr |> function_call, where the left-hand-side is implicitly inserted as the first parameter of the call.

namespace Iter {
function map(iterable $iterable, \Closure $callback): \Iterator;
function filter(iterable $iterable, \Closure $callback): \Iterator;
}

namespace {
use function Iter{map, filter};

$result = "Hello World"
|> str_split()
|> map(strtoupper(...))
|> filter(fn($v) => $v != 'O');
}

With named parameters, you could even make this approach work without the suggested (but still useful) new Iterator API:

$result = "Hello World"
|> str_split()
|> array_map(callback: strtoupper(...))
|> array_filter(callback: fn($v) => $v != 'O');

or

$result = "Hello World"
|> str_split()
|> array_map(callback: strtoupper(...))
|> array_filter(fn($v) => $v != 'O');

I am also wondering whether |> and -> should have the same operator precedence.

Best regards,

Olaf Schmidt-Wischhöfer

3 months ago by Larry Garfield — view source

unread

Hi Larry

Sorry for the late response.

https://wiki.php.net/rfc/pipe-operator-v3

We have already discussed this topic extensively off-list, so let me
bring the list up-to-date.

The current pipes proposal is elegantly simple. This has many upsides,
but it comes with an obvious limitation:
It only works well when the called function takes only a single argument.

$sourceCode |> lexer(...) |> parser(...) |> compiler(...) |> vm(...)

Such code is nice, but is also quite niche. I have argued off-list
that the predominant use-case for pipes are arrays and iterators
(including strings immediately split into chunks), and it seems most
agree. However, most array/iterator functions (e.g. filter, map,
reduce, first, all, etc.) don't fall into the one-parameter category.

A slightly simplified example from the RFC:

$result = "Hello World"
|> str_split(...)
|> fn($x) => array_map(strtoupper(...), $x)
|> fn($x) => array_filter($x, fn($v) => $v != 'O');

IMO, this is harder to understand than the alternative of using
multiple statements with a temporary variable.

$tmp = "Hello World";
$tmp = str_split($tmp);
$tmp = array_map(strtoupper(...), $tmp);
$result = array_filter($tmp, fn($v) => $v != 'O');

The RFC has a solution for this: Partial function application [1].

$result = "Hello World"
|> str_split(...)
|> array_map(strtoupper(...), ?)
|> array_filter(?, fn($v) => $v != 'O');

This still causes more cognitive overhead than it should, at least to me.

The placement of ? is hard to detect, especially when it's not the
first argument.

The user now has to think about immediately-invoked closures that
exist solely for argument-reordering. The closure can be elided
through the optimizer, but we cannot elide the additional cognitive
overhead in the user.

The implementation of ? is significantly more complex than that of
pipes, making the supposed simplicity of pipes somewhat misleading.

If my assumption is correct that the primary use-case for pipes are
arrays, it might be worth investigating the possibility of introducing
a new iterator API, which has been proposed before [2], optimized for
pipes. Specifically, this API would ensure consistent placement of the
subject, i.e. the iterable in this case, as the first argument. Pipes
would no longer have the form of expr |> expr, where the
right-hand-side is expected to return a callable. Instead, it would
have the form of expr |> function_call, where the left-hand-side is
implicitly inserted as the first parameter of the call.

namespace Iter {
function map(iterable $iterable, \Closure $callback): \Iterator;
function filter(iterable $iterable, \Closure $callback): \Iterator;
}

namespace {
use function Iter{map, filter};
$result = "Hello World"
    |> `str_split()`
    |> map(strtoupper(...))
    |> filter(fn($v) => $v != 'O');
}

This is the same approach taken by Elixir [3]. It has a few benefits:

We don't need to think about closures that are immediately invoked,
because there are none. The code is exactly the same as if you had
written it through nested function calls. This simplifies things
significantly for both the engine and the user.

It closely resembles code that would be written in an
object-oriented manner, making it more familiar.

It is the shortest and most readable of all the proposed options.

As with everything, there are downsides.

It only works well for subject-first APIs. There are not an
insignificant number of existing functions that do not follow this
convention (e.g. explode(), preg_match(), etc.). That said, explode('
', $s) |> filter($c1) |> map($c2) still composes well, given explode()
is usually first first in the chain, while preg_match() is rarely
chained at all.

People have voiced concerns for potential confusion regarding the
right-hand-side. It may not be any arbitrary expression, but is
restricted to a function call. Hence, $param |> $myClosure is not
valid code, requiring additional braces: $param |> $myClosure().
This approach resembles the -> operator, where at least conceptually,
the left-hand-side is implicitly passed as a $this parameter. However,
the spaces between |> do not signal this fact as well, making it look
like the right-hand-side is evaluated separately. Potentially, a
different symbol might work better.

Internal reactions to this idea were mixed, so I'm interested to hear
what the community thinks about it.

Ilija

[1] https://wiki.php.net/rfc/partial_function_application
[2] https://externals.io/message/118896
[3] https://elixirschool.com/en/lessons/basics/pipe_operator

To clarify my stance on the above: I am open to this, and I agree with Ilija that in the typical case it would be more convenient. The argument that it would be confusing to have a "hidden" first param is valid, but as with any new feature I think it's obvious once you know it, so that's a small issue. I didn't propose it originally as I suspected folks would balk at the added complexity, but I do like the concept.

Part of Ilija's proposal does include offering $val |> ($expr) (or similar) to allow arbitrary expressions on the left, which would need to return a unary function. Basically the () would make it the same as what the RFC is doing now.

However, it also received significant pushback off-list from folks who felt it was too much magic. I don't want to torpedo pipes on over-reaching. But without feedback from other voters, I don't know if this is over-reaching. Is it? Please, someone tell me which approach you'd be more willing to vote for. :-)

One concern of this approach is that it gets even closer to "real" extension functions. But real extension functions (which let you write code that looks like you're adding arbitrary methods to arbitrary objects, even though under the hood it's just a plain function that takes an object as a parameter) also run into a lot of additional complexity. Chief among them, they don't handle name collisions, so you can have only one "map" function rather than one-per-class. Unless you have an alternate syntax for the extension functions to specify the type they work on (which is what Kotlin does), but then you run into questions around inheritance and polymorphism that are hard to resolve in a runtime-centric environment. I haven't fully thought through all of these details.

It's also been proposed to use +> as an operator for extension functions and/or first-param pipes like Elixir. I'm not sure how I feel about that; my main concern is which one it would apply to, since as noted above full extension functions introduce a lot of extra considerations.

But I really don't want to hold up pipes on speculation on multiple future maybe-features. As the RFC notes, there are a number of follow ups that I want to try and get at least some of into the same release.

So, consider this me begging for voters to actually speak up on this issue and give feedback on a way forward, because right now I have no idea what to do with it.

--Larry Garfield

3 months ago by Rowan Tommins [IMSoP] — view source

unread

However, it also received significant pushback off-list from folks who felt it was too much magic. I don't want to torpedo pipes on over-reaching. But without feedback from other voters, I don't know if this is over-reaching. Is it? Please, someone tell me which approach you'd be more willing to vote for. :-)

At first, I thought Ilija's example looked pretty neat, but having
thought about it a bit more, I think the "first-arg" approach makes a
handful of cases nicer at the cost of a lot of magic, and making other
cases worse.

The right-hand side is magic in two ways:

it looks like an expression, but actually has to be a syntactic
function call for the engine to inject an argument into
it looks like it's calling a function with the wrong arguments

If we have a special case where the right-hand side is an expression,
evaluated as a single-argument callable/Closure, that's even more scope
for confusion. [cf my thoughts in the async thread about keeping the
right-hand side of "spawn" consistent]

The cases it makes nicer are where you are chaining existing functions
with the placeholder as first (but not only) parameter. If you want to
pipe into a non-first parameter, you have a few options:

a) Write a new function or explicit wrapper - equally possible with
either option

// for first-arg chaining:
function swapped_explode(string $string, string $separator): string {
return explode($separator, $string); }
$someChain |> swapped_explode(':');

// for only-arg chaining:
function curried_explode(string $separator, string $string): callable {
return fn(string $string) => explode($separator, $string); }
$someChain |> curried_explode(':');

b) Use an immediate closure as the wrapper - only-arg chaining seems better

// first-arg chaining
$someChain |> fn($string) => explode(':', $string)();

// first-arg chaining with special case syntax for closures
$someChain |> ( fn($string) => explode(':', $string) );

// for only-arg chaining:
$someChain |> fn($string) => explode(':', $string);

c) Use a new partial application syntax - same problem as immediate closure

// for first-arg chaining
$someChain |> explode(':', ?)();

// or with overloaded syntax
$someChain |> ( explode(':', ?) );

// for only-arg chaining
$someChain |> explode(':', ?);

It's also quite easy to write a helper for the special-case of
"partially apply all except the first argument":

function partial_first(callable $fn, mixed ...$fixedArgs): callable {
return fn(mixed $firstArg) => $fn($firstArg, ...$fixedArgs);
}

// first-arg chaining
$someChain |> array_filter(fn($v, $k) => $k === $v, ARRAY_FILTER_USE_BOTH);

// native partial application
$someChain |> array_filter(?, fn($v, $k) => $k === $v,
ARRAY_FILTER_USE_BOTH);

// workaround
$someChain |> partial_first(array_filter(...), fn($v, $k) => $k === $v,
ARRAY_FILTER_USE_BOTH));

--
Rowan Tommins
[IMSoP]

3 months ago by Larry Garfield — view source

unread

However, it also received significant pushback off-list from folks who felt it was too much magic. I don't want to torpedo pipes on over-reaching. But without feedback from other voters, I don't know if this is over-reaching. Is it? Please, someone tell me which approach you'd be more willing to vote for. :-)

At first, I thought Ilija's example looked pretty neat, but having
thought about it a bit more, I think the "first-arg" approach makes a
handful of cases nicer at the cost of a lot of magic, and making other
cases worse.

The right-hand side is magic in two ways:

it looks like an expression, but actually has to be a syntactic
function call for the engine to inject an argument into

it looks like it's calling a function with the wrong arguments

If we have a special case where the right-hand side is an expression,
evaluated as a single-argument callable/Closure, that's even more scope
for confusion. [cf my thoughts in the async thread about keeping the
right-hand side of "spawn" consistent]

The cases it makes nicer are where you are chaining existing functions
with the placeholder as first (but not only) parameter. If you want to
pipe into a non-first parameter, you have a few options:

a) Write a new function or explicit wrapper - equally possible with
either option

// for first-arg chaining:
function swapped_explode(string $string, string $separator): string {
return explode($separator, $string); }
$someChain |> swapped_explode(':');

// for only-arg chaining:
function curried_explode(string $separator, string $string): callable {
return fn(string $string) => explode($separator, $string); }
$someChain |> curried_explode(':');

b) Use an immediate closure as the wrapper - only-arg chaining seems better

// first-arg chaining
$someChain |> fn($string) => explode(':', $string)();

// first-arg chaining with special case syntax for closures
$someChain |> ( fn($string) => explode(':', $string) );

// for only-arg chaining:
$someChain |> fn($string) => explode(':', $string);

c) Use a new partial application syntax - same problem as immediate closure

// for first-arg chaining
$someChain |> explode(':', ?)();

// or with overloaded syntax
$someChain |> ( explode(':', ?) );

// for only-arg chaining
$someChain |> explode(':', ?);

It's also quite easy to write a helper for the special-case of
"partially apply all except the first argument":

function partial_first(callable $fn, mixed ...$fixedArgs): callable {
return fn(mixed $firstArg) => $fn($firstArg, ...$fixedArgs);
}

// first-arg chaining
$someChain |> array_filter(fn($v, $k) => $k === $v, ARRAY_FILTER_USE_BOTH);

// native partial application
$someChain |> array_filter(?, fn($v, $k) => $k === $v,
ARRAY_FILTER_USE_BOTH);

// workaround
$someChain |> partial_first(array_filter(...), fn($v, $k) => $k === $v,
ARRAY_FILTER_USE_BOTH));

Writing higher order functions to simulate first-arg is indeed quite straightforward. The RFC has some simple examples, and I've written a whole bunch of more robust ones here:

https://github.com/Crell/fp/blob/master/src/array.php
https://github.com/Crell/fp/blob/master/src/string.php

The issue is performance. With foo(...), foo(?, 'bar'), or implicit first-arg, it's fairly straightforward to compile it down to a normal function call so there's no runtime cost. If you have an expression that produces a callable that gets used, that cannot be optimized away.

So we could get this resulting syntax with either higher order user-space functions or with auto-first-arg:

$foo
|> map($fn1)
|> filter($fn2)
|> implode(',');

However, if map() is a higher order function that returns a unary callable, there are two function invocations involved. If it's custom syntax that turns into map($foo, $fn1), then it's only one function invocation.

So if we expect higher order functions to be common (and I would probably mainly use them myself), then it would be wise to figure out some way to make them more efficient. Auto-first-arg is one way. "Suck it up and use PFA with the ?" is another way that would work, but be less ergonomic. I'm not sure of other options off hand.

--Larry Garfield

3 months ago by Rowan Tommins [IMSoP] — view source

unread

So if we expect higher order functions to be common (and I would probably mainly use them myself), then it would be wise to figure out some way to make them more efficient. Auto-first-arg is one way.

From this angle, auto-first-arg is a very limited compiler optimisation
for partial application.

With auto-first-arg, you have a parser rule that matches this:

$foo |> bar($baz);

and results in the same AST/opcodes as this:

bar($foo, $baz);

With PFA and one-arg-callable pipes, you could add a parser rule that
matches this, with the same output:

$foo |> bar(?, $baz);

But you'd also be able to do this:

$baz |> bar($foo, ?);

And maybe the compiler could optimise that case too.

Neither helps with the performance of higher order functions which are
doing more than partial application, like map and filter themselves. I
understand there's a high cost to context-switching between C and PHP;
presumably if there was an easy solution for that someone would have
done it already.

To me, pipes improve readability when they behave like methods, i.e.
they perform some operation on a subject. This resembles Swift's
protocol extensions or Rust's trait default implementations, except
using a different "method" call operator.
[...]
If we decide not to add an iterator API that works well with
first-arg, then I agree that this is not the right approach. But if we
do, then neither of your examples are problematic.

I guess those two things go together quite well as a mental model: pipes
as a way to implement extension methods, and new functions designed for
use as extension methods.

I think I'd be more welcoming of it if we actually implemented extension
methods instead of pipes, and then the new iterator API was
extension-method-only. It feels less like "one of the arguments is
missing" if that argument is always expressed as the left-hand side of
an arrow or some sort.

--
Rowan Tommins
[IMSoP]

3 months ago by Larry Garfield — view source

unread

So if we expect higher order functions to be common (and I would probably mainly use them myself), then it would be wise to figure out some way to make them more efficient. Auto-first-arg is one way.

From this angle, auto-first-arg is a very limited compiler optimisation
for partial application.

I'd say it has the dual benefit of optimization and ergonomics. (Though see discussion below.)

With PFA and one-arg-callable pipes, you could add a parser rule that
matches this, with the same output:

$foo |> bar(?, $baz);

But you'd also be able to do this:

$baz |> bar($foo, ?);

And maybe the compiler could optimise that case too.

From what Arnaud has told me, any PFA that has a single, fixed-position-number argument remaining should be optimizable. (Though that's a task for whenever PFA is next worked on, if it is next worked on.)

Neither helps with the performance of higher order functions which are
doing more than partial application, like map and filter themselves. I
understand there's a high cost to context-switching between C and PHP;
presumably if there was an easy solution for that someone would have
done it already.

On 03/04/2025 18:39, Ilija Tovilo wrote:

To me, pipes improve readability when they behave like methods, i.e.
they perform some operation on a subject. This resembles Swift's
protocol extensions or Rust's trait default implementations, except
using a different "method" call operator.
[...]
If we decide not to add an iterator API that works well with
first-arg, then I agree that this is not the right approach. But if we
do, then neither of your examples are problematic.

I guess those two things go together quite well as a mental model:
pipes as a way to implement extension methods, and new functions
designed for use as extension methods.

I think I'd be more welcoming of it if we actually implemented
extension methods instead of pipes, and then the new iterator API was
extension-method-only. It feels less like "one of the arguments is
missing" if that argument is always expressed as the left-hand side
of an arrow or some sort.

As I've noted, classic pipes (current RFC, unary function only) and extension functions are not mutually exclusive, and I see no reason we couldn't add both. Auto-partialing first-arg pipes and dedicated extension functions step on each other's toes a bit more, however.

To address both this and Ilija's email, I was toying with extension functions as a concept a while back. I also did extensive research into "collections" in other languages last year with Derick. (See discussion in a previous PHP Foundation report[1]). That led me to a number of conclusions that I still hold to:

A new iterable API is absolutely a good thing and we should do it.
That said, we need to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn't have them as separate constructs with their own APIs.
The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP's design that we should not double-down on. It's ergonomically awful, it's bad for performance, and it invites major security holes. (The "Drupageddon" remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which type you're dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there's the laziness question.

So we'd effectively want three different versions of map(), filter(), etc. if we didn't want to perpetuate and further entrench the design flaw and security hole that is "sequences and hashes are the same thing if you squint." And... frankly I'd probably vote against an interable/collections API that didn't address that issue.

However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn't really address the issue since you would have no way to ensure you're using the "right" version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is... probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this form.

Which brings us then to extension functions. Pipes and higher order functions, or first-arg pipes, can act as a sort of "junior" extension functions, but for the reasons listed above fall short of being real extension functions.

For comparison, extension functions in Kotlin look like this:

fun SomeType.foo(a: Int) {
// a is a variable. "this" is the SomeType the function was called on.
// However, this is still "external" scope so only public members are usable.
}

val s = SomeType()
s->foo(5)

(Kotlin doesn't have a "new" keyword; the above is how you instantiate an object.)

Arguably, Go is entirely built as extension functions. It looks like this:

func (st SomeType) foo(a int) {
// st and a are both variables here. Do as you will.
}

Notably for us, the same function can be defined multiple times against different types. That allows the system to differentiate between A.foo() and B.foo(). You can also attach extension functions to interfaces. In fact, most of Kotlin's collections (list, set, map) API is implemented as extension functions on interfaces, of which they have many.

However, both Go and Kotlin are compiled languages, which means the compiler has a complete view of the code at compile time, and can sort out which extension function to use in a given situation statically. That is, of course, not the case in PHP.

That means even if we figure out a way to define multiple foo() functions that apply to different types, and can agree that doing so is not evil (some have argued it's too close to function/method overloading, which they claim is evil; I disagree with both points), there is still a very non-trivial task of figuring out how to resolve the function to call at runtime, probably somehow leveraging autoloading, which also then runs us up against function autoloading, etc. I hope that is a solvable problem, but I don't currently know how to solve it.

So "real" extension functions are an epic unto themselves, even though I really really want them. (They are fantastically ergonomic for converting from one representation to another, like from an ORM entity to a minimal struct to serialize as JSON, and vice versa. I quite miss them from Kotlin).

It would be really nice if we could follow Kotlin's example and build 3 different collection types (likely via objects), and then build most of the API for them in extension functions rather than as methods. However, that sounds harder every time I dig into it.

As a side note to Yakov[2], a Uniform Function Call Syntax in PHP would have all the same problems as extension functions, even before we get into the issue that Rowan, Tim, and others have brought up that PHP is wildly inconsistent in having the "subject" first in a function call. Without that UFCS doesn't make much sense. While I appreciate the elegance of it, in practice, figuring out extension functions as a dedicated syntax (akin to Kotlin or Go above) is probably the best we could do, if we can even do that.

All of which is to say... I think I may have talked myself back around to just using basic unary function pipes and "suck it up" on the extra call for higher order functions for now, unless someone can show a fair number of non-iterable use cases where it would be helpful. That then would unblock the other incremental improvements listed in the RFC (compose, PFA, and $$->foo()). True extension functions could then be explored later (likely by people with way more engine knowledge than me) as their own thing, whether using ->, +>, or something else entirely. We just need to agree that the existence of pipes does not render extension functions moot.

Thoughts?

--Larry Garfield

[1] https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
[2] https://externals.io/message/127037

3 months ago by Ilija Tovilo — view source

unread

Hi Larry

Sorry again for the delay.

A new iterable API is absolutely a good thing and we should do it.

That said, we need to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn't have them as separate constructs with their own APIs.

The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP's design that we should not double-down on. It's ergonomically awful, it's bad for performance, and it invites major security holes. (The "Drupageddon" remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which type you're dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there's the laziness question.

So we'd effectively want three different versions of map(), filter(), etc. if we didn't want to perpetuate and further entrench the design flaw and security hole that is "sequences and hashes are the same thing if you squint." And... frankly I'd probably vote against an interable/collections API that didn't address that issue.

I fundamentally disagree with this assessment. In most languages,
including PHP, iterators are simply a sequence of values that can be
consumed. Usually, the consumer should not be concerned with the data
structure of the iterated value, this is abstracted away through the
iterator. For most languages, both Sequences and Sets are translated
1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
Dictionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
is a bit different in that all iterators require a key. Semantically,
this makes sense for both Sequences (which are logically indexed by
the elements position in the sequence, so Sequence<T> => Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, U> =>
Iterator<T, U>). Sets don't technically have a logical key, but IMO
this is not enough of a reason to fundamentally change how iterators
work. A sequential number would be fine, which is also what yield
without providing a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.
https://3v4l.org/LvIjP

The big upside of treating all iterators the same, regardless of their
data source is 1. the code becomes more generic, you don't need three
variants of a value map() functions when the one works on all of them.
And 2. you can populate any of the data structures from a generic
iterator without any data shuffling.

$users
|> Iter\mapKeys(fn($u) => $u->getId())
|> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some
other key. Actually, it works for any Traversable. If mapKeys() only
applied to Dict iterators you would necessarily have to create a
temporary dictionary first, or just not use the iterator API at all.

However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn't really address the issue since you would have no way to ensure you're using the "right" version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is... probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this form.

After having talked to you directly, it seemed to me that there is
some confusion about the iterator API vs. the API offered by the data
structure itself. For example:

$l = new List(1,2, 3);
$l2 = $l |> map(fn($x) => $x*2);

What is the type of $l2? I would expect it to be a List, but there's currently
no way to write a map() that statically guarantees that. (And that's before we
get into generics.)

$l2 wouldn't be a List (or Sequence, to stick with the same
terminology) but an iterator, specifically Iterator<int, int>. If you
want to get back a sequence, you need to populate a new sequence from
the iterator using Iter\toSeq(). We may also decide to introduce a
Sequence::map() method that maps directly to a new sequence, which may
be more efficient for single transformations. That said, the nice
thing about the iterator API is that it generically applies to all
data structures implementing Traversable. For example, an Iter\max()
function would not need to care about the implementation details of
the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension
functions that are exclusively local, static and detached from the
type system are rather useless. Looking at an example:

function PointEntity.toMessage(): PointMessage {
return new PointMessage($this->x, $this->y);
}

$result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,
there's arguably no benefit of $point->toMessage() over $point |> PointEntityExtension\toMessage() (with an optional import to make it
almost as short). All the extension really achieves is changing the
syntax, but we would already have the pipe operator for this.
Technically, you can use such extensions for untyped, local
polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { ... }
function RectEntity.toMessage(): RectMessage { ... }

$entities = [new Point, new Rect];

foreach ($entities as $e) {
$e->toMessage(); // Technically works, but the type system is
entirely unaware.
takesToMessage($e); // This breaks, because Point and Rect don't
actually implement the ToMessage interface.
}

Where extensions would really shine is if they could hook into the
type system by implementing interfaces on types that aren't in your
control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { ... }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension
functions already would. I won't go into detail because this e-mail is
already too long, but I'm happy to discuss it further off-list. All
this to say, I don't think extensions will work well in PHP, but I
also don't think they are necessary for the iterator API.

Regards,
Ilija

3 months ago by Rob Landers — view source

unread

Hi Larry

Sorry again for the delay.

A new iterable API is absolutely a good thing and we should do it.

That said, we need to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn't have them as separate constructs with their own APIs.

The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP's design that we should not double-down on. It's ergonomically awful, it's bad for performance, and it invites major security holes. (The "Drupageddon" remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which type you're dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there's the laziness question.

So we'd effectively want three different versions of map(), filter(), etc. if we didn't want to perpetuate and further entrench the design flaw and security hole that is "sequences and hashes are the same thing if you squint." And... frankly I'd probably vote against an interable/collections API that didn't address that issue.

I fundamentally disagree with this assessment. In most languages,
including PHP, iterators are simply a sequence of values that can be
consumed. Usually, the consumer should not be concerned with the data
structure of the iterated value, this is abstracted away through the
iterator. For most languages, both Sequences and Sets are translated
1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
Dictionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
is a bit different in that all iterators require a key. Semantically,
this makes sense for both Sequences (which are logically indexed by
the elements position in the sequence, so Sequence<T> => Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, U> =>
Iterator<T, U>). Sets don't technically have a logical key, but IMO
this is not enough of a reason to fundamentally change how iterators
work. A sequential number would be fine, which is also what yield
without providing a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.
https://3v4l.org/LvIjP

The big upside of treating all iterators the same, regardless of their
data source is 1. the code becomes more generic, you don't need three
variants of a value map() functions when the one works on all of them.
And 2. you can populate any of the data structures from a generic
iterator without any data shuffling.

$users
|> Iter\mapKeys(fn($u) => $u->getId())
|> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some
other key. Actually, it works for any Traversable. If mapKeys() only
applied to Dict iterators you would necessarily have to create a
temporary dictionary first, or just not use the iterator API at all.

However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn't really address the issue since you would have no way to ensure you're using the "right" version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is... probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this form.

After having talked to you directly, it seemed to me that there is
some confusion about the iterator API vs. the API offered by the data
structure itself. For example:

$l = new List(1,2, 3);
$l2 = $l |> map(fn($x) => $x*2);

What is the type of $l2? I would expect it to be a List, but there's currently
no way to write a map() that statically guarantees that. (And that's before we
get into generics.)

$l2 wouldn't be a List (or Sequence, to stick with the same
terminology) but an iterator, specifically Iterator<int, int>. If you
want to get back a sequence, you need to populate a new sequence from
the iterator using Iter\toSeq(). We may also decide to introduce a
Sequence::map() method that maps directly to a new sequence, which may
be more efficient for single transformations. That said, the nice
thing about the iterator API is that it generically applies to all
data structures implementing Traversable. For example, an Iter\max()
function would not need to care about the implementation details of
the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension
functions that are exclusively local, static and detached from the
type system are rather useless. Looking at an example:

function PointEntity.toMessage(): PointMessage {
return new PointMessage($this->x, $this->y);
}

$result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,
there's arguably no benefit of $point->toMessage() over $point |> PointEntityExtension\toMessage() (with an optional import to make it
almost as short). All the extension really achieves is changing the
syntax, but we would already have the pipe operator for this.
Technically, you can use such extensions for untyped, local
polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { ... }
function RectEntity.toMessage(): RectMessage { ... }

$entities = [new Point, new Rect];

foreach ($entities as $e) {
$e->toMessage(); // Technically works, but the type system is
entirely unaware.
takesToMessage($e); // This breaks, because Point and Rect don't
actually implement the ToMessage interface.
}

Where extensions would really shine is if they could hook into the
type system by implementing interfaces on types that aren't in your
control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { ... }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension
functions already would. I won't go into detail because this e-mail is
already too long, but I'm happy to discuss it further off-list. All
this to say, I don't think extensions will work well in PHP, but I
also don't think they are necessary for the iterator API.

Regards,
Ilija

Hi Ilija and Larry,

This got me thinking: what if instead of "magically" passing a first value to a function, or partial applications, we create a new interface; something like:

interface PipeCompatible {
function receiveContext(mixed $lastValue): void;
}

If the implementing type implements this interface, it will receive the last value via the interface before being called

This would then force userland to implement a bunch of functionality to take true advantage of the pipe operator, but at the same time, allow for extensions (or core / SPL) to also take full advantage of them.

I have no idea if such a thing works in practice, so I'm just spit balling here.

— Rob

3 months ago by Larry Garfield — view source

unread

Hi Larry

Sorry again for the delay.

A new iterable API is absolutely a good thing and we should do it.

That said, we need to split Sequence, Set, and Dictionary into separate types. We are the only language I reviewed that didn't have them as separate constructs with their own APIs.

The use of the same construct (arrays and iterables) for all three types is a fundamental and core flaw in PHP's design that we should not double-down on. It's ergonomically awful, it's bad for performance, and it invites major security holes. (The "Drupageddon" remote exploit was caused by using an array and assuming it was sequential when it was actually a map.)

So while I want a new iterable API, the more I think on it, the more I think a bunch of map(iterable $it, callable $fn) style functions would not be the right way to do it. That would be easy, but also ineffective.

The behavior of even basic operations like map and filter are subtly different depending on which type you're dealing with. Whether the input is lazy or not is the least of the concerns. The bigger issue is when to pass keys to the $fn; probably always in Dict, probably never in Seq, and certainly never in Set (as there are no meaningful keys). Similarly, when filtering a Dict, you would want keys preserved. When filtering a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given or take implementation details.) And then, yes, there's the laziness question.

So we'd effectively want three different versions of map(), filter(), etc. if we didn't want to perpetuate and further entrench the design flaw and security hole that is "sequences and hashes are the same thing if you squint." And... frankly I'd probably vote against an interable/collections API that didn't address that issue.

I fundamentally disagree with this assessment. In most languages,
including PHP, iterators are simply a sequence of values that can be
consumed. Usually, the consumer should not be concerned with the data
structure of the iterated value, this is abstracted away through the
iterator. For most languages, both Sequences and Sets are translated
1:1 (i.e. Sequence<T> => Iterator<T>, Set<T> => Iterator<T>).
Dictionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> => Iterator<(T, U)>). PHP
is a bit different in that all iterators require a key. Semantically,
this makes sense for both Sequences (which are logically indexed by
the elements position in the sequence, so Sequence<T> => Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, U> =>
Iterator<T, U>). Sets don't technically have a logical key, but IMO
this is not enough of a reason to fundamentally change how iterators
work. A sequential number would be fine, which is also what yield
without providing a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.
https://3v4l.org/LvIjP

The big upside of treating all iterators the same, regardless of their
data source is 1. the code becomes more generic, you don't need three
variants of a value map() functions when the one works on all of them.
And 2. you can populate any of the data structures from a generic
iterator without any data shuffling.

$users
|> Iter\mapKeys(fn($u) => $u->getId())
|> Iter\toDict();

This will work if $users is a Sequence, Set or existing Dict with some
other key. Actually, it works for any Traversable. If mapKeys() only
applied to Dict iterators you would necessarily have to create a
temporary dictionary first, or just not use the iterator API at all.

However, a simple "first arg" pipe wouldn't allow for that. Or rather, we'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, callable $fn). And the same split for filter, and probably a few other things. That seems ergonomically suspect, at best, and still wouldn't really address the issue since you would have no way to ensure you're using the "right" version of each function. Similarly, a dict version of implode() would likely need to take 2 separators, whereas the other types would take only one.

So the more I think on it, the more I think the sort of iterable API that first-arg pipes would make easy is... probably not the iterable API we want anyway. There may well be other cases for Elixir-style first-arg pipes, but a new iterable API isn't one of them, at least not in this form.

After having talked to you directly, it seemed to me that there is
some confusion about the iterator API vs. the API offered by the data
structure itself. For example:

$l = new List(1,2, 3);
$l2 = $l |> map(fn($x) => $x*2);

What is the type of $l2? I would expect it to be a List, but there's currently
no way to write a map() that statically guarantees that. (And that's before we
get into generics.)

$l2 wouldn't be a List (or Sequence, to stick with the same
terminology) but an iterator, specifically Iterator<int, int>. If you
want to get back a sequence, you need to populate a new sequence from
the iterator using Iter\toSeq(). We may also decide to introduce a
Sequence::map() method that maps directly to a new sequence, which may
be more efficient for single transformations. That said, the nice
thing about the iterator API is that it generically applies to all
data structures implementing Traversable. For example, an Iter\max()
function would not need to care about the implementation details of
the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

I agree that max() likely would not need multiple versions. My concern is with cases where the signature of the callback changes depending on the type it's on, which is mainly map, filter, and maybe reduce. Possibly sorted as well, if you want to allow sorting by keys.

If I'm following you correctly, you're saying that because PHP is already weird (in that abstract iterators are always keyed), it's not increasing the weird for dedicated collection objects to have implicit keys when used with an abstract iterator API. Yes?

I think that's valid, but I also know just how many times I've been bitten by arrays doing double-duty. Keys getting lost during a transformation when they shouldn't, etc. I am highly skeptical about perpetuating that, and if we're going to revisit collections and iterators I would want to get the kind of guarantees that PHP has never given us, but most languages have always had.

That means, eg, seq/set/dict values/objects would pretty much have to have their own versions of map, filter, etc. So that means we'd have 4 versions of map: seq::map, set::map, dict::map, and iter\map(). When would you use the latter over the former?

In any case, I fear this question is moot. Basically no one but you and I seems to like the implicit-first-arg approach, so whether it's viable or not sadly doesn't matter.

Unless any voters want to speak up now to correct that impression?

Which brings us then to extension functions.

I have largely changed my mind on extension functions. Extension
functions that are exclusively local, static and detached from the
type system are rather useless. Looking at an example:

function PointEntity.toMessage(): PointMessage {
return new PointMessage($this->x, $this->y);
}

$result = json_encode($point->toMessage());

If for some reason toMessage() cannot be implemented on PointEntity,
there's arguably no benefit of $point->toMessage() over $point |> PointEntityExtension\toMessage() (with an optional import to make it
almost as short). All the extension really achieves is changing the
syntax, but we would already have the pipe operator for this.
Technically, you can use such extensions for untyped, local
polymorphism, but this does not seem like a good approach.

function PointEntity.toMessage(): PointMessage { ... }
function RectEntity.toMessage(): RectMessage { ... }

$entities = [new Point, new Rect];

foreach ($entities as $e) {
$e->toMessage(); // Technically works, but the type system is
entirely unaware.
takesToMessage($e); // This breaks, because Point and Rect don't
actually implement the ToMessage interface.
}

You wouldn't pass $e directly to takesToMessage(). You'd call takesMessage($e->toMessage()). It's literally just a function that you're reversing the syntax order on. It is not supposed to impact the type signature. If it does, then it's Rust Traits, not extension functions.

Where extensions would really shine is if they could hook into the
type system by implementing interfaces on types that aren't in your
control. Rust and Swift are two examples that take this approach.

implement ToMessage for Rect { ... }

takesToMessage(new Rect); // Now this actually works.

However, this becomes even harder to implement than extension
functions already would. I won't go into detail because this e-mail is
already too long, but I'm happy to discuss it further off-list. All
this to say, I don't think extensions will work well in PHP, but I
also don't think they are necessary for the iterator API.

Regards,
Ilija

Every time I daydream about what my ideal object-type-definition syntax would be, I eventually end up at Rust. :-) And then I get sad that as an interpreted language, PHP makes that basically impossible.

All of the above leads me back around to "well if we don't do first-arg, then we'll want a way to make higher order functions easier to implement." Which I am all for, and have proposed RFCs for in the past, and they've all been rejected. So, yeah. Maybe once pipes get used people will realize the value. :-)

Hi Ilija and Larry,

This got me thinking: what if instead of "magically" passing a first
value to a function, or partial applications, we create a new
interface; something like:

interface PipeCompatible {
function receiveContext(mixed $lastValue): void;
}

If the implementing type implements this interface, it will receive the
last value via the interface before being called

This would then force userland to implement a bunch of functionality to
take true advantage of the pipe operator, but at the same time, allow
for extensions (or core / SPL) to also take full advantage of them.

I have no idea if such a thing works in practice, so I'm just spit balling here.

— Rob

This approach would only be viable on objects. So you'd have to do

$a |> new B('c') |> ... ;

to get it to work. Most of what we would want to use here are functions or methods, not manually created objects. This would also be slower, as it involves two function calls instead of one.

Besides, that can already be achieved with __invoke().

class B {
public function __construct(private $arg1) {}

public function __invoke($passedValue): Whatever {
// Do stuff with both $arg1 and $passedValue
}
}

--Larry Garfield

2 months ago by Larry Garfield — view source

unread

Hello world.

The discussion has been dormant for a while. For now, I'm going to proceed with the simple-callable approach to pipes, rather than Elixir-style auto-partialling. I have also added a discussion of a possible future iterator API built for pipes to the RFC, and another example using stream resources and a few utilities to build lazy, self-cleaning stream processing chains. It actually looks really nice, I think. :-) Neither change the design or implementation.

Also, since Derick asked off-list, I am 90% certain that the current implementation will still allow Xdebug to "catch" on each step in a pipe chain, since at the opcode level it's just a bunch of function calls with anonymous intermediary values. And on the off chance it's not, I've been advised by other engine devs that the implementation is simple enough to tweak to make that work. So we're debug friendly.

Baring any other feedback, I am going to open the vote Monday/Tuesday.

--Larry Garfield

3 months ago by Ilija Tovilo — view source

unread

Hi Rowan

On Thu, Apr 3, 2025 at 1:59 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:

At first, I thought Ilija's example looked pretty neat, but having
thought about it a bit more, I think the "first-arg" approach makes a
handful of cases nicer at the cost of a lot of magic, and making other
cases worse.

I think "handful" is the word to focus on. As noted, I believe the
primary use-case for pipes are iterators. If that's true, then an
implicit first-arg approach should cover the majority of examples,
while complicating the rest. Whether that's a worthwhile trade-off is
for the community to decide.

To me, pipes improve readability when they behave like methods, i.e.
they perform some operation on a subject. This resembles Swift's
protocol extensions or Rust's trait default implementations, except
using a different "method" call operator. With this mental model, the
first-arg approach seems intuitive to me. Once parameters are out of
order, the pipe examples with partial function application cause more
cognitive overhead for me, but this is entirely subjective.

If we have a special case where the right-hand side is an expression,
evaluated as a single-argument callable/Closure, that's even more scope
for confusion. [cf my thoughts in the async thread about keeping the
right-hand side of "spawn" consistent]

To clarify: I'm not in favor of this syntax either. While I originally
mentioned it as a possibility, I later noted that lhs |> {rhs} would
be less ambiguous, given that {} is not legal in the general
expression context, while also resembling the lhs->{rhs} syntax to a
degree. However, because {} is not simpler than lhs |> rhs(), I
mentioned neither in my e-mail.

The cases it makes nicer are where you are chaining existing functions
with the placeholder as first (but not only) parameter.

If we decide not to add an iterator API that works well with
first-arg, then I agree that this is not the right approach. But if we
do, then neither of your examples are problematic.

// first-arg chaining
$someChain |> fn($string) => explode(':', $string)();

As for string functions, I had a quick look through the stubs and
could only find a handful of functions that are not already
subject-first:

preg_/mb_ereg
mb_split
explode

Maybe my search was flawed, let me know if there are any that I
missed. explode() specifically usually appears first in a chain (or
deepest in nested calls), which means it could just remain a normal
function call.

$result = explode(' ', $str) |> filter(...) |> map(...) |> join(' ');

The iterator API would improve the array_filter() example. Admittedly,
you might not always want to use iterators. A single array_map() would
likely be faster than going through the iterator API. But then again,
single calls aren't chains, so they won't benefit much from pipes to
begin with.

Ilija

1 month ago by Dmitry Derepko — view source

unread

Hi Larry!

It's been a long, long way to get this feature, awesome work.

Have you considered adding a compose function that does the same thing but
in the classic PHP function style?

There's not much difference between the new style:

$processor = fn ($data) => htmlentities($data)

|> str_split(...),

    |> fn($x) => array_map(strtoupper(...), $x),

    |> fn($x) => array_filter($x, fn($v) => $v != 'O’);

and the old one:

$processor = compose(

htmlentities(...),

str_split(...),

fn ($x) => array_map(strtoupper(...), $x),

fn ($x) => array_filter($x, fn ($v) => $v != 'O'),

);

But the classic looks better when you create real pipes.

I’ve created examples with comparison.

https://3v4l.org/jY0Vg

https://3v4l.org/87Sj2

https://3v4l.org/4EE6b

New syntax just makes code shorter, but the compose function still have
benefits:

it will be able to add a polyfill for older versions
it will be possible to write the first function without passing the first
argument ($data in the "fn ($data) => htmlentities($data)”)
it will be possible to re-use the compose function along with the new
operator $data |> compose(…$functions)

Best regards,

Dmitrii Derepko.

@xepozz

1 month ago by Larry Garfield — view source

unread

Hi Larry!

It's been a long, long way to get this feature, awesome work.

Have you considered adding a compose function that does the same thing
but in the classic PHP function style?

There's not much difference between the new style:

$processor = fn ($data) => htmlentities($data)
|> str_split(...),

    |> fn($x) => array_map(strtoupper(...), $x),

    |> fn($x) => array_filter($x, fn($v) => $v != 'O’);
and the old one:

$processor = compose(
htmlentities(...),

str_split(...),

fn ($x) => array_map(strtoupper(...), $x),

fn ($x) => array_filter($x, fn ($v) => $v != 'O'),
);

But the classic looks better when you create real pipes.

I’ve created examples with comparison.

https://3v4l.org/jY0Vg

https://3v4l.org/87Sj2

https://3v4l.org/4EE6b

New syntax just makes code shorter, but the compose function still have
benefits:

it will be able to add a polyfill for older versions

it will be possible to write the first function without passing the
first argument ($data in the "fn ($data) => htmlentities($data)”)

it will be possible to re-use the compose function along with the new
operator $data |> compose(…$functions)

Pipe and compose are importantly different operations. I've had user-space implementations of both available for years in crell/fp: https://github.com/Crell/fp/blob/master/src/composition.php

I'd love to have a compose operator natively in PHP, too. The RFC for that is already written, just needs code. I hope to formally propose it soon: https://wiki.php.net/rfc/function-composition

--Larry Garfield

1 month ago by Dmitry Derepko — view source

unread

Pipe and compose are importantly different operations. I've had
user-space implementations of both available for years in crell/fp:
https://github.com/Crell/fp/blob/master/src/composition.php

I'd love to have a compose operator natively in PHP, too. The RFC for
that is already written, just needs code. I hope to formally propose it
soon: https://wiki.php.net/rfc/function-composition

Hmm, that's great, but looks like we are missing something.

Would you propose the RFC to 8.5? I think it should be proposed with the
new pipe operator any way
RFC early feedback

There is no big difference in DX with the new operator "+" for closures:

- operator may be overridden in some extensions and it also may implement
  __invoke. What's expected behavior? It will be totally unclear
code examples from the rfc: https://3v4l.org/n7UB0 vs
https://3v4l.org/tOlft the first approach is better for me because it may
be batch-processed / combined / filtered / modified easily. So taking my
first message there are not so many changes: https://3v4l.org/ncpEE

Just try to imagine how to work with the composition and how it perfectly
works now: https://3v4l.org/ArK2O

By the way, RFC describes userland "compose" function performance problems,
but there is no suggestion to make it natively, why so?

--

Best regards,
Dmitrii Derepko.
@xepozz

1 month ago by Larry Garfield — view source

unread

Pipe and compose are importantly different operations. I've had user-space implementations of both available for years in crell/fp: https://github.com/Crell/fp/blob/master/src/composition.php
I'd love to have a compose operator natively in PHP, too. The RFC for that is already written, just needs code. I hope to formally propose it soon: https://wiki.php.net/rfc/function-composition
Hmm, that's great, but looks like we are missing something.

Would you propose the RFC to 8.5? I think it should be proposed with
the new pipe operator any way

I am working with someone on the implementation. As soon as that's done I want to post it. Whether it manages to get into 8.5 at this point is an open question.

RFC early feedback

There is no big difference in DX with the new operator "+" for closures:

operator may be overridden in some extensions and it also may
implement __invoke. What's expected behavior? It will be totally unclear

I expect the overlap there to be tiny, so it will rarely be encountered. As to which "wins", I'd think probably the extension.

code examples from the rfc: https://3v4l.org/n7UB0 vs
https://3v4l.org/tOlft the first approach is better for me because it
may be batch-processed / combined / filtered / modified easily. So
taking my first message there are not so many changes:
https://3v4l.org/ncpEE

Just try to imagine how to work with the composition and how it
perfectly works now: https://3v4l.org/ArK2O

If you want to do it that way, Crell/fp has you covered, have fun. But the whole point of the RFC is to provide a native operator for concatenating functions. Decomposing/deconcatenating an already composed function chain is... not a thing.

By the way, RFC describes userland "compose" function performance
problems, but there is no suggestion to make it natively, why so?

Uh. That's the entire point of the RFC? Make a native compose operator that isn't even a function.

--Larry Garfield