Hi internals,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from
This has the proposed signature:
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable
{
public function __construct(iterable $iterator) {}
public function getIterator(): InternalIterator {}
public function `count()`: int {}
// [[$key1, $value1], [$key2, $value2]]
public static function fromPairs(array $pairs): CachedIterable {}
// [[$key1, $value1], [$key2, $value2]]
public function toPairs(): array{}
public function __serialize(): array {} // [$k1, $v1, $k2, $v2,...]
public function __unserialize(array $data): void {}
// useful for converting iterables back to arrays for further processing
public function keys(): array {} // [$k1, $k2, ...]
public function values(): array {} // [$v1, $v2, ...]
// useful to efficiently get offsets at the middle/end of a long iterable
public function keyAt(int $offset): mixed {}
public function valueAt(int $offset): mixed {}
// '[["key1","value1"],["key2","value2"]]' instead of '{...}'
public function jsonSerialize(): array {}
// dynamic properties are forbidden
}
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:
- Creating a rewindable copy of a non-rewindable Traversable
- Generating an IteratorAggregate from a class still implementing Iterator
- In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC) - Providing memory-efficient random access to both keys and values of arbitrary key-value sequences
Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks
After some consideration, this is being created as a standalone RFC, and going in the global namespace:
-
Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.
Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
makes it an impractical choice when RFCs require a 2/3 majority to pass. -
While some may argue that a different namespace might pass,
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.
Any other feedback unrelated to namespaces?
Thanks,
- Tyson
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed fromThis has the proposed signature:
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable { public function __construct(iterable $iterator) {} public function getIterator(): InternalIterator {} public function `count()`: int {} // [[$key1, $value1], [$key2, $value2]] public static function fromPairs(array $pairs): CachedIterable {} // [[$key1, $value1], [$key2, $value2]] public function toPairs(): array{} public function __serialize(): array {} // [$k1, $v1, $k2, $v2,...] public function __unserialize(array $data): void {} // useful for converting iterables back to arrays for further processing public function keys(): array {} // [$k1, $k2, ...] public function values(): array {} // [$v1, $v2, ...] // useful to efficiently get offsets at the middle/end of a long iterable public function keyAt(int $offset): mixed {} public function valueAt(int $offset): mixed {} // '[["key1","value1"],["key2","value2"]]' instead of '{...}' public function jsonSerialize(): array {} // dynamic properties are forbidden }
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:
- Creating a rewindable copy of a non-rewindable Traversable
- Generating an IteratorAggregate from a class still implementing Iterator
- In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC)- Providing memory-efficient random access to both keys and values of arbitrary key-value sequences
Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarksAfter some consideration, this is being created as a standalone RFC, and going in the global namespace:
- Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.
Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
makes it an impractical choice when RFCs require a 2/3 majority to pass.
- While some may argue that a different namespace might pass,
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.Any other feedback unrelated to namespaces?
After feedback, I have decided to postpone the start of voting on this (or other proposals related to SPL or iterables) until April at the earliest,
to avoid interfering with the ongoing SPL naming policy discussions.
Thanks,
- Tyson
On Thu, Feb 11, 2021 at 5:47 AM tyson andre tysonandre775@hotmail.com
wrote:
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding
CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the
keys and values of the iterable it was constructed from
Any other feedback unrelated to namespaces?
Thanks,
- Tyson
--To unsubscribe, visit: https://www.php.net/unsub.php
Hi Tyson,
I needed this feature a few years ago. In that case, the source was a
generator that was slowly generating data while fetching them from a
paginated API that had rate limits.
The result wrapping iterator was used at runtime in multiple (hundreds)
other iterators that were processing elements in various ways (technical
analysis indicator on time series) and after that merged back with some
MultipleIterator.
Just for reference, this is how the implementation in userland was and I
was happy with it as a solution:
https://gist.github.com/drealecs/ad720b51219675a8f278b8534e99d7c7
Not sure if it's useful but I thought I should share it as I noticed you
mentioned in your example for PolyfillIterator you chose not to use
an IteratorAggregate because complexity
Was wondering how much inefficient this would be compared to the C
implementation.
Also, the implementation having the ability to be lazy was important and I
think that should be the case here as well, by design, especially as we are
dealing with Generators.
Regards,
Alex
Hi Alex,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed fromAny other feedback unrelated to namespaces?
Hi Tyson,
I needed this feature a few years ago. In that case, the source was a generator that was slowly generating data while fetching them from a paginated API that had rate limits.
The result wrapping iterator was used at runtime in multiple (hundreds) other iterators that were processing elements in various ways (technical analysis indicator on time series) and after that merged back with some MultipleIterator.Just for reference, this is how the implementation in userland was and I was happy with it as a solution:
https://gist.github.com/drealecs/ad720b51219675a8f278b8534e99d7c7Not sure if it's useful but I thought I should share it as I noticed you mentioned in your example for PolyfillIterator you chose not to use an IteratorAggregate because complexity
Was wondering how much inefficient this would be compared to the C implementation.
That was for simplicity(shortness) of the RFC for people reading the polyfill.
I don't expect it to affect CPU timing or memory usage for large arrays in the polyfill.
Userland lazy iterable implementations could still benefit from having a CachedIterable around,
by replacing the lazy IteratorAggregate with a Cached Iterable when the end of iteration was detected.
Also, the implementation having the ability to be lazy was important and I think that should be the case here as well, by design, especially as we are dealing with Generators.
We're dealing with the entire family of iterables, including but not limited to Generators, arrays, user-defined Traversables, etc.
I'd considered that but decided not to include it in the RFC's scope.
If I was designing that, it would be a separate class LazyCachedIterable
.
Currently, CachedIterable
has several useful properties:
- Serialization/Unserializable behavior is predictable - if the object was constructed it can be safely serialized if keys/values can be serialized.
- Iteration has no side effects (e.g. won't throw)
- keyAt(int $offset) and so on have predictable behavior, good performance, and only one throwable type
- Memory usage is small - this might also be the case for a LazyIterable depending on implementation choices/constraints.
Adding lazy iteration support would make it no longer have some of those properties.
While I'd be in favor of that if it was implemented correctly, I don't plan to work on implementing this until I know
if the addition of CachedIterable
to a large family of iterable classes would pass.
CachedIterable has some immediate benefits on problems I was actively working on, such as:
- Being able to represent iterable functions such as iterable_reverse()
- Memory efficiency and time efficiency for iteration
- Being something internal code could return for getIterator(), etc.
Regards,
Tyson