Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127081 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id C779E1A00BC for ; Tue, 8 Apr 2025 23:29:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1744154854; bh=ELMaEwJKH/vpS/A2DpkQvDbbn7fvn+mdP2Gv9fWhfcM=; h=References:In-Reply-To:From:Date:Subject:To:From; b=ahZxFVKtZVB71AfQxm/krbO62zpMumBOJCNiUyZoz31dEE9DyT5M1Vf09PZeZFJZm y8O5P9UR5VpenwdIPwVupjRO0xstkfXG8Wgcej5NzEsKosdfwotF30FPjsf0g+p8BG oXQFi6xuChPvT05TtwsNeW+Hga9LtExppeLc12xdnT+zAIcfU9JghhDHeuxgxVvCdK AQ6bMLKpDqotJrppPuHFD5QOus5l5QkMGtTonQaHWcfsSBcKL4BwmKTL2oURhmLKNe AGWBTKAUOknd1W8p7vbNJ4OowmYP/AJwfnhyK8dAkzP4DT+bCUSTWwv+zA4Voertgo JjFLc5zWMgafQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 0FCDE180079 for ; Tue, 8 Apr 2025 23:27:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 8 Apr 2025 23:27:32 +0000 (UTC) Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6e8f6970326so59593626d6.0 for ; Tue, 08 Apr 2025 16:29:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744154996; x=1744759796; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jZubcId7UE8GJOh+N6BdwUuJj8v3rhSqlQ4ZPmVkfnw=; b=iJ61y3v3d8ZseFU5uZhbI4qSuCOJCotfa0Jpkyd5Yhmb+QAKawSi2FoAs8tP4XFu24 /UfbII4Yu/CbyeOr7z8Cq5V5Ck8HIGDwrd3oQZJRoCTUhHrmx41LJ2m2nItd1o2EsJMD W4EUcPFNOfApCkpHgt69FzkeoEbdxpCgQILDmgFQ+IeSAl1CG+ZaZiE63HT7/P24k5Dm 45AnQ3/8NJjZ6zoh+VfiO+lVc1QZF+PngChWF94PukDVqKDunm9a4X5iBMskMUhj3GCm CqbZTQdERWCXdQceMsmsiSe/zp1fjlP9hpzTczcPG3Dnb+qAvP8WHlaoJmoqgtgIRe4n tFSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744154996; x=1744759796; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jZubcId7UE8GJOh+N6BdwUuJj8v3rhSqlQ4ZPmVkfnw=; b=lWvjt9ATyjZ1foQvGvBhqZFs8bKUNSsz2g6giQ/HDIXbvl9SY6CUwLJNsJVaVNyQK2 PNHY9MBaoULumO2xk+rEv9kuNy6wPHqR8PY5cL39kW8qcjs8zvIzoTmUgjUWWENvrzqZ qXqpnPd+C+JtJ7LyvLPK9PX9xgtN1KjlHTFXqwD2n7zj6IrZHUi8zu1k61nM92E4EqLH 8kl1FDk+DtdP4I4l1cqVJ2lfKTpcNxVMiPNz4ivjSFZrEPU+qrnHnwOZbQWMhZo/nXen 0uL0XWbKf8L2/e82KjWHOU7r7GY7C7udEIAL2fn7Mi6us771E/dOJn8BvSFnoZERrC6W OfVA== X-Gm-Message-State: AOJu0Yw/IGPOmkVlAiLYr7NpXfPhewmaTCTAvp3WLvu4+3xGd3lfTaTc g6YLuqvPkkCsgdWWFQuZUY/Ut8o04i8J19QlZlcz0dFkhCHwbRjJ5FgLtlpQIAu2iZjO2HiYdup 4rUb9ifip0kN8wJVsMvbzUnyUxZdpUW7jQO4= X-Gm-Gg: ASbGncvRNCxFKPkRa0Mtc2dip+q4OzlK/IRZxvkffWlmZmgiXxsZF1qVycBYESxfAIi xPHmy2IZTQnEbAO3Mwko/PC0xBTGCDgMWTdjAmmzOMnw1Wtc9vG8NWSE6smKu+pYu8Ab7s5g67J S0SSbpcXYioLWTWzaPFzQuUMfcSWAYVt1oQwV/B6GtDhHISv1TUh+ZhU8SPcCO X-Google-Smtp-Source: AGHT+IFwDEq2jSO8FXe9Q8ExZuVCX0clWE+s+hGrxZogW5ZAQZ8b2Fj4St9GjzeR5p+XgAhFkbCnta7UYXYbkUyKsAE= X-Received: by 2002:ad4:5de7:0:b0:6e4:2c6e:7cdc with SMTP id 6a1803df08f44-6f0dbc10ebdmr20327966d6.25.1744154996156; Tue, 08 Apr 2025 16:29:56 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <5efa2f02-dd1d-4d59-ae07-c75f193b4096@app.fastmail.com> <92b7f1ea-900b-4438-bed7-3fd766bb2d61@rwec.co.uk> <7e2a3dea-aaaf-4427-b1b2-32c568af8b77@app.fastmail.com> <51df1d77-33ce-414e-b489-8a62f9768811@rwec.co.uk> <78cb31b7-ac23-4ee6-8317-5ba265db8de2@app.fastmail.com> In-Reply-To: <78cb31b7-ac23-4ee6-8317-5ba265db8de2@app.fastmail.com> Date: Wed, 9 Apr 2025 01:29:45 +0200 X-Gm-Features: ATxdqUEFIBxpj9tkgeIfo5N5BVs2zCiAFfF6SFGk4efvhJ_UFf0ljaIihTOSUto Message-ID: Subject: Re: [PHP-DEV] [RFC] Pipe Operator (again) To: php internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: tovilo.ilija@gmail.com (Ilija Tovilo) Hi Larry Sorry again for the delay. On Fri, Apr 4, 2025 at 6:37=E2=80=AFAM Larry Garfield wrote: > > * A new iterable API is absolutely a good thing and we should do it. > * That said, we *need* to split Sequence, Set, and Dictionary into separa= te types. We are the only language I reviewed that didn't have them as sep= arate constructs with their own APIs. > * The use of the same construct (arrays and iterables) for all three type= s is a fundamental and core flaw in PHP's design that we should not double-= down on. It's ergonomically awful, it's bad for performance, and it invite= s major security holes. (The "Drupageddon" remote exploit was caused by us= ing an array and assuming it was sequential when it was actually a map.) > > So while I want a new iterable API, the more I think on it, the more I th= ink a bunch of map(iterable $it, callable $fn) style functions would not be= the right way to do it. That would be easy, but also ineffective. > > The behavior of even basic operations like map and filter are subtly diff= erent depending on which type you're dealing with. Whether the input is la= zy or not is the least of the concerns. The bigger issue is when to pass k= eys to the $fn; probably always in Dict, probably never in Seq, and certain= ly never in Set (as there are no meaningful keys). Similarly, when filteri= ng a Dict, you would want keys preserved. When filtering a Seq, you'd want= the indexes re-zeroed. (Or to seem like it, given or take implementation = details.) And then, yes, there's the laziness question. > > So we'd effectively want three different versions of map(), filter(), etc= . if we didn't want to perpetuate and further entrench the design flaw and = security hole that is "sequences and hashes are the same thing if you squin= t." And... frankly I'd probably vote against an interable/collections API = that didn't address that issue. I fundamentally disagree with this assessment. In most languages, including PHP, iterators are simply a sequence of values that can be consumed. Usually, the consumer should not be concerned with the data structure of the iterated value, this is abstracted away through the iterator. For most languages, both Sequences and Sets are translated 1:1 (i.e. Sequence =3D> Iterator, Set =3D> Iterator). Dictionaries usually result in a tuple, combining both the key and value into a single value pair (Dict =3D> Iterator<(T, U)>). PHP is a bit different in that all iterators require a key. Semantically, this makes sense for both Sequences (which are logically indexed by the elements position in the sequence, so Sequence =3D> Iterator) and Dicts (which have an explicit key, so Dict =3D> Iterator). Sets don't technically have a logical key, but IMO this is not enough of a reason to fundamentally change how iterators work. A sequential number would be fine, which is also what yield without providing a key does. If we really wanted to avoid it, we can make it return null, as this is already allowed for generators. https://3v4l.org/LvIjP The big upside of treating all iterators the same, regardless of their data source is 1. the code becomes more generic, you don't need three variants of a value map() functions when the one works on all of them. And 2. you can populate any of the data structures from a generic iterator without any data shuffling. $users |> Iter\mapKeys(fn($u) =3D> $u->getId()) |> Iter\toDict(); This will work if $users is a Sequence, Set or existing Dict with some other key. Actually, it works for any Traversable. If mapKeys() only applied to Dict iterators you would necessarily have to create a temporary dictionary first, or just not use the iterator API at all. > However, a simple "first arg" pipe wouldn't allow for that. Or rather, w= e'd need to implement seqMap(iterable $it, callable $fn), setMap(iterable $= it, callable $fn), and dictMap(iterable $it, callable $fn). And the same s= plit for filter, and probably a few other things. That seems ergonomically= suspect, at best, and still wouldn't really address the issue since you wo= uld have no way to ensure you're using the "right" version of each function= . Similarly, a dict version of implode() would likely need to take 2 separa= tors, whereas the other types would take only one. > > So the more I think on it, the more I think the sort of iterable API that= first-arg pipes would make easy is... probably not the iterable API we wan= t anyway. There may well be other cases for Elixir-style first-arg pipes, = but a new iterable API isn't one of them, at least not in this form. After having talked to you directly, it seemed to me that there is some confusion about the iterator API vs. the API offered by the data structure itself. For example: > $l =3D new List(1,2, 3); > $l2 =3D $l |> map(fn($x) =3D> $x*2); > > What is the type of $l2? I would expect it to be a List, but there's curr= ently > no way to write a map() that statically guarantees that. (And that's befo= re we > get into generics.) $l2 wouldn't be a List (or Sequence, to stick with the same terminology) but an iterator, specifically Iterator. If you want to get back a sequence, you need to populate a new sequence from the iterator using Iter\toSeq(). We may also decide to introduce a Sequence::map() method that maps directly to a new sequence, which may be more efficient for single transformations. That said, the nice thing about the iterator API is that it generically applies to all data structures implementing Traversable. For example, an Iter\max() function would not need to care about the implementation details of the underlying data structure, nor do all data structures need to reimplement their own versions of max(). > Which brings us then to extension functions. I have largely changed my mind on extension functions. Extension functions that are exclusively local, static and detached from the type system are rather useless. Looking at an example: > function PointEntity.toMessage(): PointMessage { > return new PointMessage($this->x, $this->y); > } > > $result =3D json_encode($point->toMessage()); If for some reason toMessage() cannot be implemented on PointEntity, there's arguably no benefit of $point->toMessage() over `$point |> PointEntityExtension\toMessage()` (with an optional import to make it almost as short). All the extension really achieves is changing the syntax, but we would already have the pipe operator for this. Technically, you can use such extensions for untyped, local polymorphism, but this does not seem like a good approach. function PointEntity.toMessage(): PointMessage { ... } function RectEntity.toMessage(): RectMessage { ... } $entities =3D [new Point, new Rect]; foreach ($entities as $e) { $e->toMessage(); // Technically works, but the type system is entirely unaware. takesToMessage($e); // This breaks, because Point and Rect don't actually implement the ToMessage interface. } Where extensions would really shine is if they could hook into the type system by implementing interfaces on types that aren't in your control. Rust and Swift are two examples that take this approach. implement ToMessage for Rect { ... } takesToMessage(new Rect); // Now this actually works. However, this becomes even harder to implement than extension functions already would. I won't go into detail because this e-mail is already too long, but I'm happy to discuss it further off-list. All this to say, I don't think extensions will work well in PHP, but I also don't think they are necessary for the iterator API. Regards, Ilija