Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127086 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id E0C791A00BC for ; Thu, 10 Apr 2025 16:00:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1744300700; bh=P4qAWohW4b0sNLo3YTXAogbSnKzBqbmHgd35P97QUXs=; h=Date:From:To:In-Reply-To:References:Subject:From; b=H7JTQA6+Nz1xOdW9wP0JlkPt5vJjPzhxyoN4xwtubsVJI6T6nEAkLYwqXijLAR+in yQpvMz8Zd8ZdN/HaOGNiJiMWcZEPa9qUDIjVv01VTZNiX7L9PTi0VCyAZgEWLTL8O2 gUQiBA6EeZUxAK9eOY1hpP6SklyASkJkFytxZmx6o3xcAgqne4srU5iX8zJugwExqo 0Bb89KJqB4aOKj3H/xqAC1MfdlkkypOyCcIZW5l4JyPdXfPzPRM8bs1gXBqvlwQTuT BcK6tzo3p1p3qBtCnLmU8BpeLQu+4BaKvlMDd9BN6/Ks7SswR/cbA3FAt17r05bRsM B+abfDtUiJplw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 37F8F180080 for ; Thu, 10 Apr 2025 15:58:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_NONE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 10 Apr 2025 15:58:18 +0000 (UTC) Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfout.stl.internal (Postfix) with ESMTP id 2B9CC114011A for ; Thu, 10 Apr 2025 12:00:42 -0400 (EDT) Received: from phl-imap-06 ([10.202.2.83]) by phl-compute-10.internal (MEProxy); Thu, 10 Apr 2025 12:00:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= garfieldtech.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to; s=fm2; t=1744300842; x=1744387242; bh=gEKy/sm9pTlR0LbLbRm5E 3covoDaoDkCKOsvRtx949c=; b=ZoAheXA27OdNvZOqX5+6UC5hnGpuz9H86xe23 +6SV0DAgRfdDCkyZ26mZaZ/5/aiYklEnz87vL3gEgOH23OTTgW522b0Bk8WeluLE n/eurKpYjMqth9um55T9Sb7gH7eJvExBthHxDyOH3A3AfN7RHWE1n3rfGYHiG5lo uFhL/HgCrC3q+TLU5K4MQ/dBFBV+td7DY4mBKNu2r2HiIQqfv8tvBXqYhxr3SaCs rt1q2IDlAJu/kU2CqXhlZBRhZkzIj9ClVtsslJEOnj4UHw+GSvKnYa3LMGwEg2cs JfmrrzqV80LoXhOCPo9hTGwKvH/k2RMG0qKXlozsrXqiRgl8g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1744300842; x=1744387242; bh=g EKy/sm9pTlR0LbLbRm5E3covoDaoDkCKOsvRtx949c=; b=hyK/p9xNeB/y3CukG OTp4wpldFVMzoUXkmKF0pPcppBihwe8NlL0rvtjT74AennBbwce76a5b3GHBdkN9 hPEva9BtzOxUqgvQtcIoPrewYsxZC823SCd/zH4EPVSraDSd7AFCTv8rhmt035an FuagqLpsnM5delHHwp6DVdJT7Osl4qZp6/TqwyrKKcrgWQFGtaoSZjUY6HMQpYDb hcAdx5DM63c7aLTz6L7+8BGJ3DE8yv8g08VzngceVPWGgmC/d3r/ApN77Sl4hGbf 4248DQuaqDBOEigwcCUFeDs7dOmAGvx9kV8ZqMsG/N2Ksm4HyrZkeSy0hUbVM78x iZbyQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtdelfeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnegoufhushhpvggtthffohhmrghinhculdegledmnecujfgu rhepofggfffhvffkjghfufgtgfesthhqredtredtjeenucfhrhhomhepfdfnrghrrhihuc firghrfhhivghlugdfuceolhgrrhhrhiesghgrrhhfihgvlhguthgvtghhrdgtohhmqeen ucggtffrrghtthgvrhhnpeehffdvteejtedvhfetgfetheefjeeiteehfefhhefgheekte ffteeujeejjefggfenucffohhmrghinhepfehvgehlrdhorhhgnecuvehluhhsthgvrhfu ihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheplhgrrhhrhiesghgrrhhfihgvlh guthgvtghhrdgtohhmpdhnsggprhgtphhtthhopedupdhmohguvgepshhmthhpohhuthdp rhgtphhtthhopehinhhtvghrnhgrlhhssehlihhsthhsrdhphhhprdhnvght X-ME-Proxy: Feedback-ID: i8414410d:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id AA71629C0072; Thu, 10 Apr 2025 12:00:41 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 X-ThreadId: Tb59d627fb5f6e7a2 Date: Thu, 10 Apr 2025 11:00:21 -0500 To: "php internals" Message-ID: In-Reply-To: References: <5efa2f02-dd1d-4d59-ae07-c75f193b4096@app.fastmail.com> <92b7f1ea-900b-4438-bed7-3fd766bb2d61@rwec.co.uk> <7e2a3dea-aaaf-4427-b1b2-32c568af8b77@app.fastmail.com> <51df1d77-33ce-414e-b489-8a62f9768811@rwec.co.uk> <78cb31b7-ac23-4ee6-8317-5ba265db8de2@app.fastmail.com> Subject: Re: [PHP-DEV] [RFC] Pipe Operator (again) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: larry@garfieldtech.com ("Larry Garfield") On Wed, Apr 9, 2025, at 12:56 AM, Rob Landers wrote: > On Wed, Apr 9, 2025, at 01:29, Ilija Tovilo wrote: >> Hi Larry >>=20 >> Sorry again for the delay. >>=20 >> On Fri, Apr 4, 2025 at 6:37=E2=80=AFAM Larry Garfield wrote: >> > >> > * A new iterable API is absolutely a good thing and we should do it. >> > * That said, we *need* to split Sequence, Set, and Dictionary into = separate types. We are the only language I reviewed that didn't have th= em as separate constructs with their own APIs. >> > * The use of the same construct (arrays and iterables) for all thre= e types is a fundamental and core flaw in PHP's design that we should no= t double-down on. It's ergonomically awful, it's bad for performance, a= nd it invites major security holes. (The "Drupageddon" remote exploit w= as caused by using an array and assuming it was sequential when it was a= ctually a map.) >> > >> > So while I want a new iterable API, the more I think on it, the mor= e I think a bunch of map(iterable $it, callable $fn) style functions wou= ld not be the right way to do it. That would be easy, but also ineffect= ive. >> > >> > The behavior of even basic operations like map and filter are subtl= y different depending on which type you're dealing with. Whether the in= put is lazy or not is the least of the concerns. The bigger issue is wh= en to pass keys to the $fn; probably always in Dict, probably never in S= eq, and certainly never in Set (as there are no meaningful keys). Simil= arly, when filtering a Dict, you would want keys preserved. When filter= ing a Seq, you'd want the indexes re-zeroed. (Or to seem like it, given= or take implementation details.) And then, yes, there's the laziness q= uestion. >> > >> > So we'd effectively want three different versions of map(), filter(= ), etc. if we didn't want to perpetuate and further entrench the design = flaw and security hole that is "sequences and hashes are the same thing = if you squint." And... frankly I'd probably vote against an interable/c= ollections API that didn't address that issue. >>=20 >> I fundamentally disagree with this assessment. In most languages, >> including PHP, iterators are simply a sequence of values that can be >> consumed. Usually, the consumer should not be concerned with the data >> structure of the iterated value, this is abstracted away through the >> iterator. For most languages, both Sequences and Sets are translated >> 1:1 (i.e. Sequence =3D> Iterator, Set =3D> Iterator). >> Dictionaries usually result in a tuple, combining both the key and >> value into a single value pair (Dict =3D> Iterator<(T, U)>). PHP >> is a bit different in that all iterators require a key. Semantically, >> this makes sense for both Sequences (which are logically indexed by >> the elements position in the sequence, so Sequence =3D> Iterator> T>) and Dicts (which have an explicit key, so Dict =3D> >> Iterator). Sets don't technically have a logical key, but IMO >> this is not enough of a reason to fundamentally change how iterators >> work. A sequential number would be fine, which is also what yield >> without providing a key does. If we really wanted to avoid it, we can >> make it return null, as this is already allowed for generators. >> https://3v4l.org/LvIjP >>=20 >> The big upside of treating all iterators the same, regardless of their >> data source is 1. the code becomes more generic, you don't need three >> variants of a value map() functions when the one works on all of them. >> And 2. you can populate any of the data structures from a generic >> iterator without any data shuffling. >>=20 >> $users >> |> Iter\mapKeys(fn($u) =3D> $u->getId()) >> |> Iter\toDict(); >>=20 >> This will work if $users is a Sequence, Set or existing Dict with some >> other key. Actually, it works for any Traversable. If mapKeys() only >> applied to Dict iterators you would necessarily have to create a >> temporary dictionary first, or just not use the iterator API at all. >>=20 >> > However, a simple "first arg" pipe wouldn't allow for that. Or rat= her, we'd need to implement seqMap(iterable $it, callable $fn), setMap(i= terable $it, callable $fn), and dictMap(iterable $it, callable $fn). An= d the same split for filter, and probably a few other things. That seem= s ergonomically suspect, at best, and still wouldn't really address the = issue since you would have no way to ensure you're using the "right" ver= sion of each function. Similarly, a dict version of implode() would like= ly need to take 2 separators, whereas the other types would take only on= e. >> > >> > So the more I think on it, the more I think the sort of iterable AP= I that first-arg pipes would make easy is... probably not the iterable A= PI we want anyway. There may well be other cases for Elixir-style first= -arg pipes, but a new iterable API isn't one of them, at least not in th= is form. >>=20 >> After having talked to you directly, it seemed to me that there is >> some confusion about the iterator API vs. the API offered by the data >> structure itself. For example: >>=20 >> > $l =3D new List(1,2, 3); >> > $l2 =3D $l |> map(fn($x) =3D> $x*2); >> > >> > What is the type of $l2? I would expect it to be a List, but there'= s currently >> > no way to write a map() that statically guarantees that. (And that'= s before we >> > get into generics.) >>=20 >> $l2 wouldn't be a List (or Sequence, to stick with the same >> terminology) but an iterator, specifically Iterator. If you >> want to get back a sequence, you need to populate a new sequence from >> the iterator using Iter\toSeq(). We may also decide to introduce a >> Sequence::map() method that maps directly to a new sequence, which may >> be more efficient for single transformations. That said, the nice >> thing about the iterator API is that it generically applies to all >> data structures implementing Traversable. For example, an Iter\max() >> function would not need to care about the implementation details of >> the underlying data structure, nor do all data structures need to >> reimplement their own versions of max(). I agree that max() likely would not need multiple versions. My concern = is with cases where the signature of the callback changes depending on t= he type it's on, which is mainly map, filter, and maybe reduce. Possibl= y sorted as well, if you want to allow sorting by keys. If I'm following you correctly, you're saying that because PHP is alread= y weird (in that abstract iterators are always keyed), it's not increasi= ng the weird for dedicated collection objects to have implicit keys when= used with an abstract iterator API. Yes? I think that's valid, but I also know just how many times I've been bitt= en by arrays doing double-duty. Keys getting lost during a transformati= on when they shouldn't, etc. I am highly skeptical about perpetuating t= hat, and if we're going to revisit collections and iterators I would wan= t to get the kind of guarantees that PHP has never given us, but most la= nguages have always had. That means, eg, seq/set/dict values/objects would pretty much have to ha= ve their own versions of map, filter, etc. So that means we'd have 4 ve= rsions of map: seq::map, set::map, dict::map, and iter\map(). When woul= d you use the latter over the former? In any case, I fear this question is moot. Basically no one but you and= I seems to like the implicit-first-arg approach, so whether it's viable= or not sadly doesn't matter. Unless any voters want to speak up now to correct that impression? >> > Which brings us then to extension functions. >>=20 >> I have largely changed my mind on extension functions. Extension >> functions that are exclusively local, static and detached from the >> type system are rather useless. Looking at an example: >>=20 >> > function PointEntity.toMessage(): PointMessage { >> > return new PointMessage($this->x, $this->y); >> > } >> > >> > $result =3D json_encode($point->toMessage()); >>=20 >> If for some reason toMessage() cannot be implemented on PointEntity, >> there's arguably no benefit of $point->toMessage() over `$point |> >> PointEntityExtension\toMessage()` (with an optional import to make it >> almost as short). All the extension really achieves is changing the >> syntax, but we would already have the pipe operator for this. >> Technically, you can use such extensions for untyped, local >> polymorphism, but this does not seem like a good approach. >>=20 >> function PointEntity.toMessage(): PointMessage { ... } >> function RectEntity.toMessage(): RectMessage { ... } >>=20 >> $entities =3D [new Point, new Rect]; >>=20 >> foreach ($entities as $e) { >> $e->toMessage(); // Technically works, but the type system is >> entirely unaware. >> takesToMessage($e); // This breaks, because Point and Rect don't >> actually implement the ToMessage interface. >> } You wouldn't pass $e directly to takesToMessage(). You'd call takesMess= age($e->toMessage()). It's literally just a function that you're revers= ing the syntax order on. It is not supposed to impact the type signatur= e. If it does, then it's Rust Traits, not extension functions. >> Where extensions would really shine is if they could hook into the >> type system by implementing interfaces on types that aren't in your >> control. Rust and Swift are two examples that take this approach. >>=20 >> implement ToMessage for Rect { ... } >>=20 >> takesToMessage(new Rect); // Now this actually works. >>=20 >> However, this becomes even harder to implement than extension >> functions already would. I won't go into detail because this e-mail is >> already too long, but I'm happy to discuss it further off-list. All >> this to say, I don't think extensions will work well in PHP, but I >> also don't think they are necessary for the iterator API. >>=20 >> Regards, >> Ilija Every time I daydream about what my ideal object-type-definition syntax = would be, I eventually end up at Rust. :-) And then I get sad that as a= n interpreted language, PHP makes that basically impossible. All of the above leads me back around to "well if we don't do first-arg,= then we'll want a way to make higher order functions easier to implemen= t." Which I am all for, and have proposed RFCs for in the past, and the= y've all been rejected. So, yeah. Maybe once pipes get used people wil= l realize the value. :-) > Hi Ilija and Larry, > > This got me thinking: what if instead of "magically" passing a first=20 > value to a function, or partial applications, we create a new=20 > interface; something like: > > interface PipeCompatible { > function receiveContext(mixed $lastValue): void; > } > > If the implementing type implements this interface, it will receive th= e=20 > last value via the interface before being called > > This would then force userland to implement a bunch of functionality t= o=20 > take true advantage of the pipe operator, but at the same time, allow=20 > for extensions (or core / SPL) to also take full advantage of them.=20 > > I have no idea if such a thing works in practice, so I'm just spit bal= ling here. > > =E2=80=94 Rob This approach would only be viable on objects. So you'd have to do=20 $a |> new B('c') |> ... ; to get it to work. Most of what we would want to use here are functions= or methods, not manually created objects. This would also be slower, a= s it involves two function calls instead of one. Besides, that can already be achieved with __invoke(). =20 class B { public function __construct(private $arg1) {} public function __invoke($passedValue): Whatever { // Do stuff with both $arg1 and $passedValue } } --Larry Garfield