Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:114798 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 69188 invoked from network); 9 Jun 2021 14:40:43 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 9 Jun 2021 14:40:43 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 448B71804AE for ; Wed, 9 Jun 2021 07:55:45 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 9 Jun 2021 07:55:44 -0700 (PDT) Received: by mail-yb1-f182.google.com with SMTP id q21so35845278ybg.8 for ; Wed, 09 Jun 2021 07:55:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datadoghq.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZNxa7iNUomhPdXSWgei8b0qd1flcoO8Dn8XwsF1NJ/A=; b=PssoaVBn/h1SnlGq/10V6WOD1+97javEORrEkdJqb7FBr1tRz1w7xbeLwD9pWEAFhe oLTu3BQ/a45Opz2DdI5CfAI51ZSgZpxdLh9R4dkKcnCTqSlZhjpulxf0lsd4WTj7cfqk 3GPWJLDNditCb9dl0GhLxfGWXIBJ1R0wnupwY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZNxa7iNUomhPdXSWgei8b0qd1flcoO8Dn8XwsF1NJ/A=; b=Vr0ljt2aSkzhL+0KCEwx0MyAJwhR35XTmd7jTh50z/1A1Phf9U4XeBlaa2jDCdCr/j BcsvZBSXpTBzD+64biO9UGRDTdKAjvBfDeqF+TOnhQbVAD2ViEOTGiRd51LH9SJ2rJLZ Np4aq/A93RwxZJyMAsOqagYt5pSogWOASXMzmEyMNwzfHq+cLX9RytLm14w2Th+WpKxn NLGHC9N2kjBXJ00SOT9GdeXXMZIrJcygjo1eDs/SRLCw6vyaN48wlQEv38w/DADoPgpC QscccZSkmkT01MGSMVV9zpA5/yJOFi5X5l1OiFlvyLPRfq87Kh2YVCS0mh6r4VsGCvq3 Aa+A== X-Gm-Message-State: AOAM531Vm9r8X8MpGM0o5UP9A1gvOr+Zz7SqdcNzoQsZvOKpgrJaDXnV AXLp7MBV4Poz9lSAb/YVL/MwaqNkRgYKkP+soGeoBQ== X-Google-Smtp-Source: ABdhPJw+GSCrkngwcKtD1ZJJN088VSW2P5t+dPBHj/OZf6D4rJH3VuGxQ2UKiNpAEQcpGpiEd1z8cON225JlZSmZr3o= X-Received: by 2002:a25:d049:: with SMTP id h70mr514144ybg.153.1623250540863; Wed, 09 Jun 2021 07:55:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Reply-To: Levi Morrison Date: Wed, 9 Jun 2021 08:55:30 -0600 Message-ID: To: tyson andre Cc: "internals@lists.php.net" Content-Type: text/plain; charset="UTF-8" Subject: Re: [PHP-DEV] Re: RFC: CachedIterable (rewindable, allows any key&repeating keys) From: internals@lists.php.net ("Levi Morrison via internals") On Wed, Jun 9, 2021 at 8:12 AM tyson andre wrote: > > Hi Levi Morrison, > > > > > Hi internals, > > > > > > > > > I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable, > > > > > which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from > > > > > > > > > > This has the proposed signature: > > > > > > > > > > ``` > > > > > final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable > > > > > { > > > > > public function __construct(iterable $iterator) {} > > > > > public function getIterator(): InternalIterator {} > > > > > public function count(): int {} > > > > > // [[$key1, $value1], [$key2, $value2]] > > > > > public static function fromPairs(array $pairs): CachedIterable {} > > > > > // [[$key1, $value1], [$key2, $value2]] > > > > > public function toPairs(): array{} > > > > > public function __serialize(): array {} // [$k1, $v1, $k2, $v2,...] > > > > > public function __unserialize(array $data): void {} > > > > > > > > > > // useful for converting iterables back to arrays for further processing > > > > > public function keys(): array {} // [$k1, $k2, ...] > > > > > public function values(): array {} // [$v1, $v2, ...] > > > > > // useful to efficiently get offsets at the middle/end of a long iterable > > > > > public function keyAt(int $offset): mixed {} > > > > > public function valueAt(int $offset): mixed {} > > > > > > > > > > // '[["key1","value1"],["key2","value2"]]' instead of '{...}' > > > > > public function jsonSerialize(): array {} > > > > > // dynamic properties are forbidden > > > > > } > > > > > ``` > > > > > > > > > > Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later > > > > > (when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as: > > > > > > > > > > 1. Creating a rewindable copy of a non-rewindable Traversable > > > > > 2. Generating an IteratorAggregate from a class still implementing Iterator > > > > > 3. In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit), > > > > > iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) > > > > > 4. Providing memory-efficient random access to both keys and values of arbitrary key-value sequences > > > > > > > > > > Having this implemented as an internal class would also allow it to be much more efficient than a userland solution > > > > > (in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks > > > > > > > > > > After some consideration, this is being created as a standalone RFC, and going in the global namespace: > > > > > > > > > > - Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls) > > > > > It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus. > > > > > > > > > > An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass. > > > > > > > > > > Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules) > > > > > makes it an impractical choice when RFCs require a 2/3 majority to pass. > > > > > - While some may argue that a different namespace might pass, > > > > > https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form. > > > > > I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result. > > > > > > > > A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished. > > > > > > > > Any other feedback on CachedIterable? > > > > > > > > Thanks, > > > > Tyson > > > > > > > > -- > > > > PHP Internals - PHP Runtime Development Mailing List > > > > To unsubscribe, visit: https://www.php.net/unsub.php > > > > > > > > > > Based on a recent comment you made on GitHub, it seems like > > > `CachedIterable` eagerly creates the datastore instead of doing so > > > on-demand. Is this correct? > > > > Sorry, yes, that's correct and pointed out in the RFC. > > > > I think that's a significant implementation flaw. I don't see why we'd > > balloon memory usage unnecessarily by being eager -- if an operation > > needs to fetch more data then it can go ahead and do so. > > First, PHP's standard library accommodates a wide variety of use cases, of which I believe eager evaluation is the most common. > There is no reason that an eagerly evaluated CachedIterable and lazily evaluated LazyCachedIterable couldn't be both added at some point > if both had passing RFCs. > > (This is referring to https://en.wikipedia.org/wiki/Lazy_evaluation and https://en.wikipedia.org/wiki/Eager_evaluation) > > As was stated in that GitHub Discussion, > > 1) If a CachedIterable were to be used in the standard library or a user-defined library, > many end users would want the standard library to return something that could be iterated over multiple times. > The limit of a single iteration was a source of bugs in SPL classes > such as https://www.php.net/arrayobject prior to them being switched to IteratorAggregate. > > (This is concerning whether functions such as `*filter` and `*map` should evaluate the result eagerly or lazily if they do get added. > It is possible for a LazyCachedIterable to be implemented that computes values on demand, but see below points.) > > ``` > $foo = map(...); > foreach ($foo as $i => $v1) { > foreach ($foo as $i => $v2) { > if (some_pair_predicate($v1, $v2)) { > // do something > } > } > } > ``` > > 2) Userland library/application authors that are interested in lazy generators could use or implement something > such as https://github.com/nikic/iter instead. My opinion is that the standard library should provide > something that is easy to understand, debug, serialize or represent, etc. > I expect the inner iterable may be hidden entirely in a LazyCachedIterable from var_dump as an implementation detail. > > 3) It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework > when a lazy (instead of eager) iterable is passed to some function that accepts a generic iterable, > and harder to write correct exception handling for it if done in a lazy generation style. > > Many RFCs have been rejected due to being perceived as being likely to be misused in userland or > to make code harder to understand. > > 4) It is possible to implement a lazy alternative to CachedIterable that only loads values as needed. > However, I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful > enough to be included in php rather than as a userland or PECL library. > > Additionally, > > CachedIterables are much more memory efficient than existing options such as arrays > https://wiki.php.net/rfc/cachediterable#cachediterables_are_memory-efficient > (The only thing more efficient in PHP's core modules is SplFixedArray, > and that only allows keys `0..n-1`) > > Regards, > Tyson > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > I think you misunderstood my complaint because of the other conversation on GitHub. CachedIterable should load from the underlying datastore lazily -- there is hardly any visible impact from the user if this happens, because for the most part it looks and behaves the same as it does today. The only visible changes are around loading data from the underlying iterable. For example, if the user calls the count method on the CachedIterable, it would then load the remainder of the underlying data-store (and then drop its reference to it). If the user asks for valueAt($n) and it's beyond what's already loaded and we haven't finished consuming the underlying iterable, then it would load until $n is found or the end of the store is reached. I understand your concerns with `map`, `filter`, etc. CachedIterable is different because it holds onto the data, can be iterated over more than once, including the two nested loop cases, _even if it loads data from the underlying iterable on demand_.