Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127084 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 624D71A00BC for ; Wed, 9 Apr 2025 05:57:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1744178096; bh=JKPej7Si7zvf9Qpf+gV/BzEdrz6WhvzILQ2cQWLvmqQ=; h=Date:From:To:In-Reply-To:References:Subject:From; b=FujdsDPztnE9HU8ViiVdyEfKCfA/OnE40Kz/YxGstS+qyM3hxFVIcpcutwNnJE4mj wSgm42tZo4FKoAiy4Cn/Z1ak0S3cBpYuw2BqJrkEo8cLBl4afiYT7uNzRSkOq+tGzt /kykWE79SmiE/aE6g6I4C4zpH0YnqU1sdewEZnvcALfVdWwRPbzFpHr4eheo35hAhK I1IrogzXwj+nlnUGOnBnDdYafAdWsJYmg1aYzDsSvsYPfD3j41tOyJ3pakxZJxyvKx 9yt+3IZpzWziyYbjhcGCRCtF9t/YUKBCDq+hx7Yi29qH7fD1HSzEvleHV7LW5Y4V+1 9vEwVMaEnlG8w== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 27371180056 for ; Wed, 9 Apr 2025 05:54:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fhigh-b8-smtp.messagingengine.com (fhigh-b8-smtp.messagingengine.com [202.12.124.159]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 9 Apr 2025 05:54:54 +0000 (UTC) Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfhigh.stl.internal (Postfix) with ESMTP id 775FA2540253 for ; Wed, 9 Apr 2025 01:57:18 -0400 (EDT) Received: from phl-imap-09 ([10.202.2.99]) by phl-compute-12.internal (MEProxy); Wed, 09 Apr 2025 01:57:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1744178238; x=1744264638; bh=UVpVILDfJk gMEvZrThF3UAonFnNqWVPUNniCjdptk0Y=; b=Ch0b/nbXfs5wh0jzc9JdLL84XY DDJGjSqP1r6oxoHO6VnpB8RfLjo1zSlaCFvpQBmlAqur8eiUAq/HOpv1jKXdgY1G da5y1RuJDxBgLi+SPxka5NuhzF4H+5tMr3hfng+Pvm7vPiKgWs0R7CYUJrrdsXyy Vo69CiZQGRADGF9kxecmL6H1fD3M7VAxgvlhECXuTpwfSKGk8vpBw2XdaSUgLgAZ VjeF5rZWkPXJgLZ6SmLVK3rrnFFNNL5dh0j/2lfaBa2XW2kSZxzkAf1QSSi2mTuv 4IRIVwfiu9AR56GGy9Vp4jKF3m1mLauO9pYx/RvtD2ugPlFkXFvnEZAUIHAg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1744178238; x=1744264638; bh=UVpVILDfJkgMEvZrThF3UAonFnNqWVPUNni Cjdptk0Y=; b=W744I20m5/F5kQ79kt/ljSz/2jwMHqTO9O8hTOHHrft/mxiSpcf 5VSkI4G+HUmCkMkUTp0AH3XIkwU2mW0m1oCPm73OT4ZFyOwuPhsjRnqLWWRGka5K B4zjeDv5Ox39yBxU9IWAmzCDeY04Gi5TH1HFP/wepDSyyjMocsrG/s2kTZiw8Bc6 44wjCMHPU/Cumr41rwhE6fX5xqBFQnKx/vQfbSG4T0/310kOT55s33cO8XV0Hje5 ePhczQo11Ak2T8XOzsMpGyRSK155owrj3miYfcutnqBqFhPV0TjPQ34PKmOJibRU TpMB2oeK2TQQrxCChZTXGd3EKtf6WP69LqQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdehvddtucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucgoufhushhpvggtth ffohhmrghinhculdegledmnecujfgurhepofggfffhvffkjghfufgtsegrtderreertdej necuhfhrohhmpedftfhosgcunfgrnhguvghrshdfuceorhhosgessghothhtlhgvugdrtg houggvsheqnecuggftrfgrthhtvghrnheptdeitddvvdevhfdufffhgeelffetgeffveek heekfeeluedutdeiveekvdetjedvnecuffhomhgrihhnpeefvheglhdrohhrghenucevlh hushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehrohgssegsohht thhlvggurdgtohguvghspdhnsggprhgtphhtthhopedupdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehinhhtvghrnhgrlhhssehlihhsthhsrdhphhhprdhnvght X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id E7CE3780069; Wed, 9 Apr 2025 01:57:17 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 X-ThreadId: T6f3d3de074ce894f Date: Wed, 09 Apr 2025 07:56:57 +0200 To: internals@lists.php.net Message-ID: In-Reply-To: References: <5efa2f02-dd1d-4d59-ae07-c75f193b4096@app.fastmail.com> <92b7f1ea-900b-4438-bed7-3fd766bb2d61@rwec.co.uk> <7e2a3dea-aaaf-4427-b1b2-32c568af8b77@app.fastmail.com> <51df1d77-33ce-414e-b489-8a62f9768811@rwec.co.uk> <78cb31b7-ac23-4ee6-8317-5ba265db8de2@app.fastmail.com> Subject: Re: [PHP-DEV] [RFC] Pipe Operator (again) Content-Type: multipart/alternative; boundary=4161ad2e7aa3448f8a680ddcbc528409 From: rob@bottled.codes ("Rob Landers") --4161ad2e7aa3448f8a680ddcbc528409 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, Apr 9, 2025, at 01:29, Ilija Tovilo wrote: > Hi Larry >=20 > Sorry again for the delay. >=20 > On Fri, Apr 4, 2025 at 6:37=E2=80=AFAM Larry Garfield wrote: > > > > * A new iterable API is absolutely a good thing and we should do it. > > * That said, we *need* to split Sequence, Set, and Dictionary into s= eparate types. We are the only language I reviewed that didn't have the= m as separate constructs with their own APIs. > > * The use of the same construct (arrays and iterables) for all three= types is a fundamental and core flaw in PHP's design that we should not= double-down on. It's ergonomically awful, it's bad for performance, an= d it invites major security holes. (The "Drupageddon" remote exploit wa= s caused by using an array and assuming it was sequential when it was ac= tually a map.) > > > > So while I want a new iterable API, the more I think on it, the more= I think a bunch of map(iterable $it, callable $fn) style functions woul= d not be the right way to do it. That would be easy, but also ineffecti= ve. > > > > The behavior of even basic operations like map and filter are subtly= different depending on which type you're dealing with. Whether the inp= ut is lazy or not is the least of the concerns. The bigger issue is whe= n to pass keys to the $fn; probably always in Dict, probably never in Se= q, and certainly never in Set (as there are no meaningful keys). Simila= rly, when filtering a Dict, you would want keys preserved. When filteri= ng a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given = or take implementation details.) And then, yes, there's the laziness qu= estion. > > > > So we'd effectively want three different versions of map(), filter()= , etc. if we didn't want to perpetuate and further entrench the design f= law and security hole that is "sequences and hashes are the same thing i= f you squint." And... frankly I'd probably vote against an interable/co= llections API that didn't address that issue. >=20 > I fundamentally disagree with this assessment. In most languages, > including PHP, iterators are simply a sequence of values that can be > consumed. Usually, the consumer should not be concerned with the data > structure of the iterated value, this is abstracted away through the > iterator. For most languages, both Sequences and Sets are translated > 1:1 (i.e. Sequence =3D> Iterator, Set =3D> Iterator). > Dictionaries usually result in a tuple, combining both the key and > value into a single value pair (Dict =3D> Iterator<(T, U)>). PHP > is a bit different in that all iterators require a key. Semantically, > this makes sense for both Sequences (which are logically indexed by > the elements position in the sequence, so Sequence =3D> Iterator T>) and Dicts (which have an explicit key, so Dict =3D> > Iterator). Sets don't technically have a logical key, but IMO > this is not enough of a reason to fundamentally change how iterators > work. A sequential number would be fine, which is also what yield > without providing a key does. If we really wanted to avoid it, we can > make it return null, as this is already allowed for generators. > https://3v4l.org/LvIjP >=20 > The big upside of treating all iterators the same, regardless of their > data source is 1. the code becomes more generic, you don't need three > variants of a value map() functions when the one works on all of them. > And 2. you can populate any of the data structures from a generic > iterator without any data shuffling. >=20 > $users > |> Iter\mapKeys(fn($u) =3D> $u->getId()) > |> Iter\toDict(); >=20 > This will work if $users is a Sequence, Set or existing Dict with some > other key. Actually, it works for any Traversable. If mapKeys() only > applied to Dict iterators you would necessarily have to create a > temporary dictionary first, or just not use the iterator API at all. >=20 > > However, a simple "first arg" pipe wouldn't allow for that. Or rath= er, we'd need to implement seqMap(iterable $it, callable $fn), setMap(it= erable $it, callable $fn), and dictMap(iterable $it, callable $fn). And= the same split for filter, and probably a few other things. That seems= ergonomically suspect, at best, and still wouldn't really address the i= ssue since you would have no way to ensure you're using the "right" vers= ion of each function. Similarly, a dict version of implode() would likel= y need to take 2 separators, whereas the other types would take only one. > > > > So the more I think on it, the more I think the sort of iterable API= that first-arg pipes would make easy is... probably not the iterable AP= I we want anyway. There may well be other cases for Elixir-style first-= arg pipes, but a new iterable API isn't one of them, at least not in thi= s form. >=20 > After having talked to you directly, it seemed to me that there is > some confusion about the iterator API vs. the API offered by the data > structure itself. For example: >=20 > > $l =3D new List(1,2, 3); > > $l2 =3D $l |> map(fn($x) =3D> $x*2); > > > > What is the type of $l2? I would expect it to be a List, but there's= currently > > no way to write a map() that statically guarantees that. (And that's= before we > > get into generics.) >=20 > $l2 wouldn't be a List (or Sequence, to stick with the same > terminology) but an iterator, specifically Iterator. If you > want to get back a sequence, you need to populate a new sequence from > the iterator using Iter\toSeq(). We may also decide to introduce a > Sequence::map() method that maps directly to a new sequence, which may > be more efficient for single transformations. That said, the nice > thing about the iterator API is that it generically applies to all > data structures implementing Traversable. For example, an Iter\max() > function would not need to care about the implementation details of > the underlying data structure, nor do all data structures need to > reimplement their own versions of max(). >=20 > > Which brings us then to extension functions. >=20 > I have largely changed my mind on extension functions. Extension > functions that are exclusively local, static and detached from the > type system are rather useless. Looking at an example: >=20 > > function PointEntity.toMessage(): PointMessage { > > return new PointMessage($this->x, $this->y); > > } > > > > $result =3D json_encode($point->toMessage()); >=20 > If for some reason toMessage() cannot be implemented on PointEntity, > there's arguably no benefit of $point->toMessage() over `$point |> > PointEntityExtension\toMessage()` (with an optional import to make it > almost as short). All the extension really achieves is changing the > syntax, but we would already have the pipe operator for this. > Technically, you can use such extensions for untyped, local > polymorphism, but this does not seem like a good approach. >=20 > function PointEntity.toMessage(): PointMessage { ... } > function RectEntity.toMessage(): RectMessage { ... } >=20 > $entities =3D [new Point, new Rect]; >=20 > foreach ($entities as $e) { > $e->toMessage(); // Technically works, but the type system is > entirely unaware. > takesToMessage($e); // This breaks, because Point and Rect don't > actually implement the ToMessage interface. > } >=20 > Where extensions would really shine is if they could hook into the > type system by implementing interfaces on types that aren't in your > control. Rust and Swift are two examples that take this approach. >=20 > implement ToMessage for Rect { ... } >=20 > takesToMessage(new Rect); // Now this actually works. >=20 > However, this becomes even harder to implement than extension > functions already would. I won't go into detail because this e-mail is > already too long, but I'm happy to discuss it further off-list. All > this to say, I don't think extensions will work well in PHP, but I > also don't think they are necessary for the iterator API. >=20 > Regards, > Ilija >=20 Hi Ilija and Larry, This got me thinking: what if instead of "magically" passing a first val= ue to a function, or partial applications, we create a new interface; so= mething like: interface PipeCompatible { function receiveContext(mixed $lastValue): void; } If the implementing type implements this interface, it will receive the = last value via the interface before being called This would then force userland to implement a bunch of functionality to = take true advantage of the pipe operator, but at the same time, allow fo= r extensions (or core / SPL) to also take full advantage of them.=20 I have no idea if such a thing works in practice, so I'm just spit balli= ng here. =E2=80=94 Rob --4161ad2e7aa3448f8a680ddcbc528409 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
On Wed, Apr 9, = 2025, at 01:29, Ilija Tovilo wrote:
Hi Larry

Sorry aga= in for the delay.

On Fri, Apr 4, 2025 at 6:= 37=E2=80=AFAM Larry Garfield <larry@garfieldtech.com> wrote:
>
> * A new iterable API is absolutely a good thing and we should do = it.
> * That said, we *need* to split Sequence, Set, an= d Dictionary into separate types.  We are the only language I revie= wed that didn't have them as separate constructs with their own APIs.
> * The use of the same construct (arrays and iterables) = for all three types is a fundamental and core flaw in PHP's design that = we should not double-down on.  It's ergonomically awful, it's bad f= or performance, and it invites major security holes.  (The "Drupage= ddon" remote exploit was caused by using an array and assuming it was se= quential when it was actually a map.)
>
&= gt; So while I want a new iterable API, the more I think on it, the more= I think a bunch of map(iterable $it, callable $fn) style functions woul= d not be the right way to do it.  That would be easy, but also inef= fective.
>
> The behavior of even basi= c operations like map and filter are subtly different depending on which= type you're dealing with.  Whether the input is lazy or not is the= least of the concerns.  The bigger issue is when to pass keys to t= he $fn; probably always in Dict, probably never in Seq, and certainly ne= ver in Set (as there are no meaningful keys).  Similarly, when filt= ering a Dict, you would want keys preserved.  When filtering a Seq,= you'd want the indexes re-zeroed.  (Or to seem like it, given or t= ake implementation details.)  And then, yes, there's the laziness q= uestion.
>
> So we'd effectively want = three different versions of map(), filter(), etc. if we didn't want to p= erpetuate and further entrench the design flaw and security hole that is= "sequences and hashes are the same thing if you squint."  And... f= rankly I'd probably vote against an interable/collections API that didn'= t address that issue.

I fundamentally disag= ree with this assessment. In most languages,
including PHP= , iterators are simply a sequence of values that can be
co= nsumed. Usually, the consumer should not be concerned with the data
<= /div>
structure of the iterated value, this is abstracted away throu= gh the
iterator. For most languages, both Sequences and Se= ts are translated
1:1 (i.e. Sequence<T> =3D> Iter= ator<T>, Set<T> =3D> Iterator<T>).
Di= ctionaries usually result in a tuple, combining both the key and
value into a single value pair (Dict<T, U> =3D> Iterator= <(T, U)>). PHP
is a bit different in that all iterat= ors require a key. Semantically,
this makes sense for both= Sequences (which are logically indexed by
the elements po= sition in the sequence, so Sequence<T> =3D> Iterator<int,
T>) and Dicts (which have an explicit key, so Dict<T, = U> =3D>
Iterator<T, U>). Sets don't technicall= y have a logical key, but IMO
this is not enough of a reas= on to fundamentally change how iterators
work. A sequentia= l number would be fine, which is also what yield
without p= roviding a key does. If we really wanted to avoid it, we can
make it return null, as this is already allowed for generators.

The big upside of treating all iterators the= same, regardless of their
data source is 1. the code beco= mes more generic, you don't need three
variants of a value= map() functions when the one works on all of them.
And 2.= you can populate any of the data structures from a generic
iterator without any data shuffling.

$use= rs
    |> Iter\mapKeys(fn($u) =3D> $u= ->getId())
    |> Iter\toDict();
<= /div>

This will work if $users is a Sequence, Set or = existing Dict with some
other key. Actually, it works for = any Traversable. If mapKeys() only
applied to Dict iterato= rs you would necessarily have to create a
temporary dictio= nary first, or just not use the iterator API at all.

<= /div>
> However, a simple "first arg" pipe wouldn't allow for tha= t.  Or rather, we'd need to implement seqMap(iterable $it, callable= $fn), setMap(iterable $it, callable $fn), and dictMap(iterable $it, cal= lable $fn).  And the same split for filter, and probably a few othe= r things.  That seems ergonomically suspect, at best, and still wou= ldn't really address the issue since you would have no way to ensure you= 're using the "right" version of each function. Similarly, a dict versio= n of implode() would likely need to take 2 separators, whereas the other= types would take only one.
>
> So the= more I think on it, the more I think the sort of iterable API that firs= t-arg pipes would make easy is... probably not the iterable API we want = anyway.  There may well be other cases for Elixir-style first-arg p= ipes, but a new iterable API isn't one of them, at least not in this for= m.

After having talked to you directly, it = seemed to me that there is
some confusion about the iterat= or API vs. the API offered by the data
structure itself. F= or example:

> $l =3D new List(1,2, 3);
> $l2 =3D $l |> map(fn($x) =3D> $x*2);
>
> What is the type of $l2? I would expect it to= be a List, but there's currently
> no way to write a m= ap() that statically guarantees that. (And that's before we
> get into generics.)

$l2 wouldn't be = a List (or Sequence, to stick with the same
terminology) b= ut an iterator, specifically Iterator<int, int>. If you
<= div>want to get back a sequence, you need to populate a new sequence fro= m
the iterator using Iter\toSeq(). We may also decide to i= ntroduce a
Sequence::map() method that maps directly to a = new sequence, which may
be more efficient for single trans= formations. That said, the nice
thing about the iterator A= PI is that it generically applies to all
data structures i= mplementing Traversable. For example, an Iter\max()
functi= on would not need to care about the implementation details of
<= div>the underlying data structure, nor do all data structures need to
reimplement their own versions of max().

<= /div>
> Which brings us then to extension functions.

I have largely changed my mind on extension functions. = Extension
functions that are exclusively local, static and= detached from the
type system are rather useless. Looking= at an example:

> function PointEntity.t= oMessage(): PointMessage {
>     re= turn new PointMessage($this->x, $this->y);
> }
>
> $result =3D json_encode($point->t= oMessage());

If for some reason toMessage()= cannot be implemented on PointEntity,
there's arguably no= benefit of $point->toMessage() over `$point |>
Poin= tEntityExtension\toMessage()` (with an optional import to make it
almost as short). All the extension really achieves is changing = the
syntax, but we would already have the pipe operator fo= r this.
Technically, you can use such extensions for untyp= ed, local
polymorphism, but this does not seem like a good= approach.

function PointEntity.toMessage()= : PointMessage { ... }
function RectEntity.toMessage(): Re= ctMessage { ... }

$entities =3D [new Point,= new Rect];

foreach ($entities as $e) {
=
    $e->toMessage(); // Technically works, = but the type system is
entirely unaware.
&nb= sp;   takesToMessage($e); // This breaks, because Point and Re= ct don't
actually implement the ToMessage interface.
}

Where extensions would really sh= ine is if they could hook into the
type system by implemen= ting interfaces on types that aren't in your
control. Rust= and Swift are two examples that take this approach.

<= /div>
implement ToMessage for Rect { ... }

<= div>takesToMessage(new Rect); // Now this actually works.
=
However, this becomes even harder to implement than exten= sion
functions already would. I won't go into detail becau= se this e-mail is
already too long, but I'm happy to discu= ss it further off-list. All
this to say, I don't think ext= ensions will work well in PHP, but I
also don't think they= are necessary for the iterator API.

Regard= s,
Ilija


Hi Ilija and Larry,

This got m= e thinking: what if instead of "magically" passing a first value to a fu= nction, or partial applications, we create a new interface; something li= ke:

interface PipeCompatible {
  function receiveContext(mixed $lastValue): void;
= }

If the implementing type implements this = interface, it will receive the last value via the interface before being= called

This would then force userland to i= mplement a bunch of functionality to take true advantage of the pipe ope= rator, but at the same time, allow for extensions (or core / SPL) to als= o take full advantage of them. 

I have no = idea if such a thing works in practice, so I'm just spit balling here.

=E2=80=94 Rob
--4161ad2e7aa3448f8a680ddcbc528409--