Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118919 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 72394 invoked from network); 30 Oct 2022 16:22:36 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 30 Oct 2022 16:22:36 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8029E180538 for ; Sun, 30 Oct 2022 09:22:34 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS19151 66.111.4.0/24 X-Spam-Virus: No X-Envelope-From: Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 30 Oct 2022 09:22:33 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id C22235C00CA for ; Sun, 30 Oct 2022 12:22:32 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute1.internal (MEProxy); Sun, 30 Oct 2022 12:22:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= garfieldtech.com; h=cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1667146952; x= 1667233352; bh=YKBhe38Cu1SPHzSSux+aQWbhdTrDJygIWPYx/6fQasQ=; b=y CV4NLqH0SNrIuBxAWyhGCG40GnwraPZNmuuFknNnb5JkkRNLIQ6/hozeRmtXzv9E sO/hbHUwOUA0dJvZqTtJVDWHHxWJ6czUto8IQkpsqMkiBPjAEdkiN0xdpwTPFz/l VlkkawSdm+xAmpU56O30/MWM2XLhjQsuY0ICKSReN7YkkxUr7I/OTEzmfszBZtzH dhwMG1L78bv6KaXgEd7sUISu3hjvwbrdkGGdI7DXRA2fL+ZIP8XkPBBU5jl5wC5V oNvQ2U5B6s2iAu6BBtoBEXXQhaM7BKknveEAqE490nHe6+QuLzaWRKE8u/475aPL nUflpB4vOhZ1se+K2XCbA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1667146952; x=1667233352; bh=YKBhe38Cu1SPHzSSux+aQWbhdTrD JygIWPYx/6fQasQ=; b=VIE5iBj50jdloR6xRRhqVtYvy2E02P64AUQmVI633rhP JLFi+sr8vekrOQTH3w4kH9KSiWSQnR02K+ricK/+CaGAlnW0fijLp8wnyzKehE+q 7r9dq7xqWO0mZMbAYXdDgRw8Pes2Uu3qAX0JLWaW2K/Z48+IPIEiPTHUs3uOYd8P vFwfjrHKy+6h7eeJCJg3ETs9jA8xxW0LiKmbetmn/jaQn6iLBW0sZ9QK6wYDcNWN k/L3pSzYOKy4r8w7CBSP7oaWh5yOGssPLmxHsZe7YzlI7KdCJGztM0ZRIk4v1b0S tqH8FTZIRr6mikCp7BqsuJECicYWB8KkYDdoFsyghg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedruddtgdeklecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvufgtsehttdertderredtnecuhfhrohhmpedfnfgrrhhr hicuifgrrhhfihgvlhgufdcuoehlrghrrhihsehgrghrfhhivghlughtvggthhdrtghomh eqnecuggftrfgrthhtvghrnhepteffjeekueejveeuueetfeffleekueehudektdevteef vdevtdeigffhudevlefgnecuffhomhgrihhnpehphhhprdhnvghtpdgvgihtvghrnhgrlh hsrdhiohdpshhtrggtkhhovhgvrhhflhhofidrtghomhdpghhithhhuhgsrdgtohhmpdhp vggrkhgurdgtohhmnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilh hfrhhomheplhgrrhhrhiesghgrrhhfihgvlhguthgvtghhrdgtohhm X-ME-Proxy: Feedback-ID: i8414410d:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 8C5501700083; Sun, 30 Oct 2022 12:22:32 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-1087-g968661d8e1-fm-20221021.001-g968661d8 Mime-Version: 1.0 Message-ID: <960d38a8-c2ad-4cd3-9041-ce42aa38bd18@app.fastmail.com> In-Reply-To: References: Date: Sun, 30 Oct 2022 11:22:11 -0500 To: "php internals" Content-Type: text/plain Subject: Re: [PHP-DEV] Proposal: Expanded iterable helper functions and aliasing iterator_to_array in `iterable\` namespace From: larry@garfieldtech.com ("Larry Garfield") On Fri, Oct 28, 2022, at 8:45 AM, tyson andre wrote: > Hi internals, > > https://wiki.php.net/rfc/iterator_xyz_accept_array recently passed in > php 8.2, > fixing a common inconvenience of those functions throwing a TypeError > for arrays. > > However, from the `iterator_` name > (https://www.php.net/manual/en/class.iterator.php), > it's likely to become a source of confusion when writing or reviewing > code decades from now, > when the name suggests it only accepts objects (Traversable > Iterator/IteratorAggregate). > > I'm planning on creating an RFC adding the following functions to the > `iterable\` namespace as aliases of iterator_count/iterator_to_array. > Those accept iterables > (https://www.php.net/manual/en/language.types.iterable.php), i.e. both > Traversable objects and arrays. > > Namespaces were chosen after feedback on my previous RFC, > and I believe `iterable\` follows the guidance from > https://wiki.php.net/rfc/namespaces_in_bundled_extensions and > https://wiki.php.net/rfc/namespaces_in_bundled_extensions#core_standard_spl > > I plan to create an RFC with the following functionality in the > iterable\ namespace, and wanted to see what the preference on naming > was, or if there was other feedback. > (Not having enough functionality and wanting a better idea of the > overall > > - `iterable\count(...)` (alias of iterator_count) > - `iterable\to_array(Traversable $iterator, bool $preserve_keys = > true): array` (alias of iterator_to_array, so that users can stop using > a misleading name) > > - `iterable\any(iterable $input, ?callable $callback = null): bool` - > Determines whether any value of the iterable satisfies the predicate. > and all() - Determines whether all values of the iterable satisfies > the predicate. > > This is a different namespace from > https://wiki.php.net/rfc/any_all_on_iterable > - `iterable\none(iterable $input, ?callable $callback = null): bool` > > returns the opposite of any() > - `iterable\find(iterable $iterable, callable $callback, mixed $default > = null): mixed` > > Returns the first value for which $callback($value) is truthy. On > failure, returns default > - `iterable\fold(iterable $iterable, callable $callback, mixed > $initial): mixed` > > `fold` and requiring an initial value seems like better practice. See > https://externals.io/message/112558#112834 > and > https://stackoverflow.com/questions/25149359/difference-between-reduce-and-fold > - `iterable\unique_values(iterable $iterable): array {}` > > Returns true if this iterable includes a value identical to $value (`===`). > - `iterable\includes_value(iterable $iterable, mixed $value): bool {}` > Returns a list of unique values of $iterable > > There's other functionality that I was less certain about proposing, > such as `iterable\keys(iterable $iterable): array`, > which would work similarly to array_keys but also work on Traversables > (e.g. to be used with userland/internal collections, generators, etc.) > Or functions to get the iterable\first()/last() value in an iterable. > Any thoughts on those? > > I also wanted to know if more verbose names such as find_value(), > fold_values(), any_values(), all_values() were generally preferred > before proposing this, > since I only had feedback from a small number of names. My assumption > was short names were generally preferred when possible. > > See https://github.com/TysonAndre/pecl-teds/blob/main/teds.stub.php for > documentation of the other functions mentioned here. The functionality > can be tried out by installing https://pecl.php.net/package/teds > > Background > ----------- > > In February 2021, I proposed expanded iterable functionality and > brought it to a vote, > https://wiki.php.net/rfc/any_all_on_iterable , where feedback was > mainly about being too small in scope and the choice of naming. > > Later, after https://externals.io/message/112558#112780 , > https://wiki.php.net/rfc/namespaces_in_bundled_extensions#proposal was > created and brought to a vote in April 2021 that passed, > offering useful recommendations on how to standardize namespaces in > future proposals of new categories of functionality > (e.g. `iterable\any()` and `iterable\all()`) > > Any comments? Oh, a topic near and dear to me. :-) I'm going to try and respond to both the OP and some other responses together here. First off, I am generally in favor of improving PHP's iterable story, so consider me on board on the concept. Second, I have similar user-space utilities that were intended for pipe usage available in a library (since Levi mentioned pipe compatibility). I learned some very important things from that process. Details here: https://github.com/Crell/fp/blob/master/src/composition.php https://github.com/Crell/fp/blob/master/src/array.php Of particular note: 1. Because of PHP's inconsistent handling of excess arguments to functions, there MUST be separate versions of every function that takes a callback, one that passes the key and one that does not. It would be a fatal design flaw to do otherwise. Yes, this balloons the number of such functions, which sucks, but that's PHP for you. 2. There are ample use cases for most operations to return an array or a lazy iterable. Both totally exist. I solved that by also having a separate version of each function, eg, amap() vs itmap(). The former returned an array, the latter returned a generator that generated the equivalent array. It would be a fatal design flaw to not account for this. Yes, this balloons the number of such functions, which sucks, but that's PHP for you. So, eg, I have *four* map functions: amap(), itmap(), amapWithKeys(), itmapWithKeys(). Same for filter. Other operations only needed 2 variants, eg, first() and firstWithKeys(), any() and anyWithKeys(), etc. I do not claim that naming pattern to be ideal; in fact I don't particularly like *WithKeys(). We should think carefully on the naming. A possible alternative would be to always return a lazy iterable in all circumstances and assume someone can use to_array() or equivalent on the result if they want it as an array. (That's effectively what Python 3 does with comprehensions.) However, that could have non-trivial performance impact since generators are slower than plain arrays. 3. Feel free to borrow liberally, design-wise, from the above code. There's a few more methods in there that could be of use, too. Note, though, that all are designed to be used with a pipe(), so they mostly return a closure that has been manually partially applied with everything except the iterable, so you get a single-argument function, which is what a pipe() or compose() chain needs. Third, speaking of pipe, I disagree with Tim that putting the callback first would be easier for pipe/partials. If we ever get partials similar to previously implemented, then the argument order won't matter. If we get pipes as I've previously proposed, then none of these functions are directly usable because they're multi-argument. The alternative I've considered is somewhat inspired by Elixir (assuming I understand the little Elixir I've read), in which a function after a |> is automatically assumed to be partially applying everything but the first argument. So $list |> map($callable) translates to map($list, $callable). I've not decided yet if that's a good way to avoid needing full partial application or a good way to make horribly confusing code. But if that were to happen, it would only work if all of these functions took the iterable, the "object to be operated on", as their first argument. The callable, if inlined, is almost always the longest argument. That means it is most readable when it is the last argument, so there is no need to look at the end of the closure to see if there's any other arguments. (This is a problem with array_map() currently.) So I would instead propose that *all* iterator functions follow the pattern: name($iterable, other stuff, $callback_if_applicable); That is easily learnable, most likely to result in clean-ish code, and most likely to be nice with any future pipe or partial implementations. At worst, it would make pipe-ifying all such functions a trivially identical operation for all of them, making my library little more than a series of boring one-liners. (Please make my library little more than a series of boring one-liners.) That does also mean we cannot support variadics or optional arguments. I am OK with that. And if someone really needs a different order, well, we have named arguments now. Tim noted nesting these functions and what would make that cleanest. What would make it cleanest is to not nest them and instead use proper chaining instead; my pipe() function, a native pipe operator, or similar. Expecting these functions to nest and not be ugly is a fools errand, especially when there are vastly better options readily available. Fourth, I agree with Levi that figuring out the edge case handling around empty lists is crucial. The more we can design the sematics such that they "fall out" naturally, the better. Eg, first() may return null for not found, which dovetails nicely with the ?? operator to provide a default. However, that means null cannot be used as a meaningful found-value. I'd argue that is *the correct behavior*, but I'm sure some would disagree. An Option type would be nice, but to get that we really need to get ADTs first, and I don't have a timeline on that. Ilija is more interested in fixing core bugs right now than in adding new features, the silly man... :-P (Technically an Option/Maybe object could be implemented with just classes as we have them now, especially if it's done in core, but it would be cleaner and more ergonomic if built on top of an Enum.) Also, Monads are clunky in a language without first-class support for them. I've written extensively on this topic recently: https://peakd.com/hive-168588/@crell/much-ado-about-null I'm not sure of the best way forward here, other than it should be addressed very carefully and explicitly. Fifth, I would absolutely include map and filter in the included operations. They are critical parts of list handling. If we had pipe-compatible map and filter, that would basically give us a list comprehension tool for free. (That's exactly how many languages approach list comprehensions.) In my own list-operation-centric work, I've used map and filter a lot more than any of the other operations in the list above. I'm on board with the direction, modulo implementation details. --Larry Garfield