RFC: CachedIterable (rewindable, allows any key&repeating keys)

4 years ago by tyson andre — view source — reply

unread

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from

This has the proposed signature:
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable
{
    public function __construct(iterable $iterator) {}
    public function getIterator(): InternalIterator {}
    public function `count()`: int {}
    // [[$key1, $value1], [$key2, $value2]]
    public static function fromPairs(array $pairs): CachedIterable {}
    // [[$key1, $value1], [$key2, $value2]]
    public function toPairs(): array{} 
    public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
    public function __unserialize(array $data): void {}
 
    // useful for converting iterables back to arrays for further processing
    public function keys(): array {}  // [$k1, $k2, ...]
    public function values(): array {}  // [$v1, $v2, ...]
    // useful to efficiently get offsets at the middle/end of a long iterable
    public function keyAt(int $offset): mixed {}
    public function valueAt(int $offset): mixed {}
 
    // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
    public function jsonSerialize(): array {}
    // dynamic properties are forbidden
}
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:

Creating a rewindable copy of a non-rewindable Traversable

Generating an IteratorAggregate from a class still implementing Iterator

In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC)

Providing memory-efficient random access to both keys and values of arbitrary key-value sequences

Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks

After some consideration, this is being created as a standalone RFC, and going in the global namespace:

Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.

An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.

Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
makes it an impractical choice when RFCs require a 2/3 majority to pass.

While some may argue that a different namespace might pass,
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.

A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks,
Tyson

4 years ago by Levi Morrison via internals — view source — reply

unread

Hi internals,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from

This has the proposed signature:
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable
{
    public function __construct(iterable $iterator) {}
    public function getIterator(): InternalIterator {}
    public function `count()`: int {}
    // [[$key1, $value1], [$key2, $value2]]
    public static function fromPairs(array $pairs): CachedIterable {}
    // [[$key1, $value1], [$key2, $value2]]
    public function toPairs(): array{}
    public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
    public function __unserialize(array $data): void {}

    // useful for converting iterables back to arrays for further processing
    public function keys(): array {}  // [$k1, $k2, ...]
    public function values(): array {}  // [$v1, $v2, ...]
    // useful to efficiently get offsets at the middle/end of a long iterable
    public function keyAt(int $offset): mixed {}
    public function valueAt(int $offset): mixed {}

    // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
    public function jsonSerialize(): array {}
    // dynamic properties are forbidden
}
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:

Creating a rewindable copy of a non-rewindable Traversable

Generating an IteratorAggregate from a class still implementing Iterator

In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC)

Providing memory-efficient random access to both keys and values of arbitrary key-value sequences

Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks

After some consideration, this is being created as a standalone RFC, and going in the global namespace:

Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.

An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.

Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
makes it an impractical choice when RFCs require a 2/3 majority to pass.

While some may argue that a different namespace might pass,
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.
A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks,
Tyson

--

To unsubscribe, visit: https://www.php.net/unsub.php

Based on a recent comment you made on GitHub, it seems like
CachedIterable eagerly creates the datastore instead of doing so
on-demand. Is this correct?

4 years ago by Levi Morrison via internals — view source — reply

unread

On Tue, Jun 8, 2021 at 10:47 PM Levi Morrison
levi.morrison@datadoghq.com wrote:

Hi internals,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from

This has the proposed signature:
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable
{
    public function __construct(iterable $iterator) {}
    public function getIterator(): InternalIterator {}
    public function `count()`: int {}
    // [[$key1, $value1], [$key2, $value2]]
    public static function fromPairs(array $pairs): CachedIterable {}
    // [[$key1, $value1], [$key2, $value2]]
    public function toPairs(): array{}
    public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
    public function __unserialize(array $data): void {}

    // useful for converting iterables back to arrays for further processing
    public function keys(): array {}  // [$k1, $k2, ...]
    public function values(): array {}  // [$v1, $v2, ...]
    // useful to efficiently get offsets at the middle/end of a long iterable
    public function keyAt(int $offset): mixed {}
    public function valueAt(int $offset): mixed {}

    // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
    public function jsonSerialize(): array {}
    // dynamic properties are forbidden
}
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:

Creating a rewindable copy of a non-rewindable Traversable

Generating an IteratorAggregate from a class still implementing Iterator

In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC)

Providing memory-efficient random access to both keys and values of arbitrary key-value sequences

Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks

After some consideration, this is being created as a standalone RFC, and going in the global namespace:

Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.

An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.

Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
makes it an impractical choice when RFCs require a 2/3 majority to pass.

While some may argue that a different namespace might pass,
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.
A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks,
Tyson

--

To unsubscribe, visit: https://www.php.net/unsub.php
Based on a recent comment you made on GitHub, it seems like
CachedIterable eagerly creates the datastore instead of doing so
on-demand. Is this correct?

Sorry, yes, that's correct and pointed out in the RFC.

I think that's a significant implementation flaw. I don't see why we'd
balloon memory usage unnecessarily by being eager -- if an operation
needs to fetch more data then it can go ahead and do so.

4 years ago by tyson andre — view source — reply

unread

Hi Levi Morrison,

Hi internals,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from

This has the proposed signature:
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable
{
     public function __construct(iterable $iterator) {}
     public function getIterator(): InternalIterator {}
     public function `count()`: int {}
     // [[$key1, $value1], [$key2, $value2]]
     public static function fromPairs(array $pairs): CachedIterable {}
     // [[$key1, $value1], [$key2, $value2]]
     public function toPairs(): array{}
     public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
     public function __unserialize(array $data): void {}

     // useful for converting iterables back to arrays for further processing
     public function keys(): array {}  // [$k1, $k2, ...]
     public function values(): array {}  // [$v1, $v2, ...]
     // useful to efficiently get offsets at the middle/end of a long iterable
     public function keyAt(int $offset): mixed {}
     public function valueAt(int $offset): mixed {}

     // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
     public function jsonSerialize(): array {}
     // dynamic properties are forbidden
}
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:

Creating a rewindable copy of a non-rewindable Traversable

Generating an IteratorAggregate from a class still implementing Iterator

In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
     iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC)

Providing memory-efficient random access to both keys and values of arbitrary key-value sequences

Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks

After some consideration, this is being created as a standalone RFC, and going in the global namespace:

Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
   It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.

   An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.

   Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
   makes it an impractical choice when RFCs require a 2/3 majority to pass.

While some may argue that a different namespace might pass,
   https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
   I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.
A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks,
Tyson

--

To unsubscribe, visit: https://www.php.net/unsub.php
Based on a recent comment you made on GitHub, it seems like
CachedIterable eagerly creates the datastore instead of doing so
on-demand. Is this correct?
Sorry, yes, that's correct and pointed out in the RFC.

I think that's a significant implementation flaw. I don't see why we'd
balloon memory usage unnecessarily by being eager -- if an operation
needs to fetch more data then it can go ahead and do so.

First, PHP's standard library accommodates a wide variety of use cases, of which I believe eager evaluation is the most common.
There is no reason that an eagerly evaluated CachedIterable and lazily evaluated LazyCachedIterable couldn't be both added at some point
if both had passing RFCs.

(This is referring to https://en.wikipedia.org/wiki/Lazy_evaluation and https://en.wikipedia.org/wiki/Eager_evaluation)

As was stated in that GitHub Discussion,

If a CachedIterable were to be used in the standard library or a user-defined library,
many end users would want the standard library to return something that could be iterated over multiple times.
The limit of a single iteration was a source of bugs in SPL classes
such as https://www.php.net/arrayobject prior to them being switched to IteratorAggregate.

(This is concerning whether functions such as *filter and *map should evaluate the result eagerly or lazily if they do get added.
It is possible for a LazyCachedIterable to be implemented that computes values on demand, but see below points.)

$foo = map(...);
foreach ($foo as $i => $v1) {
    foreach ($foo as $i => $v2) {
        if (some_pair_predicate($v1, $v2)) {
            // do something
        }
    }
}

Userland library/application authors that are interested in lazy generators could use or implement something
such as https://github.com/nikic/iter instead. My opinion is that the standard library should provide
something that is easy to understand, debug, serialize or represent, etc.
I expect the inner iterable may be hidden entirely in a LazyCachedIterable from var_dump as an implementation detail.
It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework
when a lazy (instead of eager) iterable is passed to some function that accepts a generic iterable,
and harder to write correct exception handling for it if done in a lazy generation style.

Many RFCs have been rejected due to being perceived as being likely to be misused in userland or
to make code harder to understand.
It is possible to implement a lazy alternative to CachedIterable that only loads values as needed.
However, I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful
enough to be included in php rather than as a userland or PECL library.

Additionally,

CachedIterables are much more memory efficient than existing options such as arrays
https://wiki.php.net/rfc/cachediterable#cachediterables_are_memory-efficient
(The only thing more efficient in PHP's core modules is SplFixedArray,
and that only allows keys 0..n-1)

Regards,
Tyson

4 years ago by Levi Morrison via internals — view source — reply

unread

Hi Levi Morrison,
Hi internals,
I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from

This has the proposed signature:
final class CachedIterable implements IteratorAggregate, Countable, JsonSerializable
{
    public function __construct(iterable $iterator) {}
    public function getIterator(): InternalIterator {}
    public function `count()`: int {}
    // [[$key1, $value1], [$key2, $value2]]
    public static function fromPairs(array $pairs): CachedIterable {}
    // [[$key1, $value1], [$key2, $value2]]
    public function toPairs(): array{}
    public function __serialize(): array {}  // [$k1, $v1, $k2, $v2,...]
    public function __unserialize(array $data): void {}

    // useful for converting iterables back to arrays for further processing
    public function keys(): array {}  // [$k1, $k2, ...]
    public function values(): array {}  // [$v1, $v2, ...]
    // useful to efficiently get offsets at the middle/end of a long iterable
    public function keyAt(int $offset): mixed {}
    public function valueAt(int $offset): mixed {}

    // '[["key1","value1"],["key2","value2"]]' instead of '{...}'
    public function jsonSerialize(): array {}
    // dynamic properties are forbidden
}
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later
(when the iterable has arbitrary keys, or when keys might be repeated). It would be useful to do so for many use cases, such as:

Creating a rewindable copy of a non-rewindable Traversable

Generating an IteratorAggregate from a class still implementing Iterator

In the future, providing internal or userland helpers such as iterable_flip(iterable $input), iterable_take(iterable $input, int $limit),
iterable_chunk(iterable $input, int $chunk_size), iterable_reverse(), etc (these are not part of the RFC)

Providing memory-efficient random access to both keys and values of arbitrary key-value sequences

Having this implemented as an internal class would also allow it to be much more efficient than a userland solution
(in terms of time to create, time to iterate over the result, and total memory usage). See https://wiki.php.net/rfc/cachediterable#benchmarks

After some consideration, this is being created as a standalone RFC, and going in the global namespace:

Based on early feedback on https://wiki.php.net/rfc/any_all_on_iterable#straw_poll (on the namespace preferred in previous polls)
It seems like it's way too early for me to be proposing namespaces in any RFCs for PHP adding to modules that already exist, when there is no consensus.

An earlier attempt by others on creating a policy for namespaces in general(https://wiki.php.net/rfc/php_namespace_policy#vote) also did not pass.

Having even 40% of voters opposed to introducing a given namespace (in pre-existing modules)
makes it an impractical choice when RFCs require a 2/3 majority to pass.

While some may argue that a different namespace might pass,
https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote had a sharp dropoff in feedback after the 3rd form.
I don't know how to interpret that - e.g. are unranked namespaces preferred even less than the options that were ranked or just not seen as affecting the final result.
A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks,
Tyson

--

To unsubscribe, visit: https://www.php.net/unsub.php
Based on a recent comment you made on GitHub, it seems like
CachedIterable eagerly creates the datastore instead of doing so
on-demand. Is this correct?
Sorry, yes, that's correct and pointed out in the RFC.

I think that's a significant implementation flaw. I don't see why we'd
balloon memory usage unnecessarily by being eager -- if an operation
needs to fetch more data then it can go ahead and do so.
First, PHP's standard library accommodates a wide variety of use cases, of which I believe eager evaluation is the most common.
There is no reason that an eagerly evaluated CachedIterable and lazily evaluated LazyCachedIterable couldn't be both added at some point
if both had passing RFCs.

(This is referring to https://en.wikipedia.org/wiki/Lazy_evaluation and https://en.wikipedia.org/wiki/Eager_evaluation)

As was stated in that GitHub Discussion,

If a CachedIterable were to be used in the standard library or a user-defined library,
many end users would want the standard library to return something that could be iterated over multiple times.
The limit of a single iteration was a source of bugs in SPL classes
such as https://www.php.net/arrayobject prior to them being switched to IteratorAggregate.

(This is concerning whether functions such as *filter and *map should evaluate the result eagerly or lazily if they do get added.
It is possible for a LazyCachedIterable to be implemented that computes values on demand, but see below points.)
$foo = map(...);
foreach ($foo as $i => $v1) {
    foreach ($foo as $i => $v2) {
        if (some_pair_predicate($v1, $v2)) {
            // do something
        }
    }
}
Userland library/application authors that are interested in lazy generators could use or implement something
such as https://github.com/nikic/iter instead. My opinion is that the standard library should provide
something that is easy to understand, debug, serialize or represent, etc.
I expect the inner iterable may be hidden entirely in a LazyCachedIterable from var_dump as an implementation detail.

It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework
when a lazy (instead of eager) iterable is passed to some function that accepts a generic iterable,
and harder to write correct exception handling for it if done in a lazy generation style.

Many RFCs have been rejected due to being perceived as being likely to be misused in userland or
to make code harder to understand.

It is possible to implement a lazy alternative to CachedIterable that only loads values as needed.
However, I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful
enough to be included in php rather than as a userland or PECL library.

Additionally,

CachedIterables are much more memory efficient than existing options such as arrays
https://wiki.php.net/rfc/cachediterable#cachediterables_are_memory-efficient
(The only thing more efficient in PHP's core modules is SplFixedArray,
and that only allows keys 0..n-1)

Regards,
Tyson

--

To unsubscribe, visit: https://www.php.net/unsub.php

I think you misunderstood my complaint because of the other
conversation on GitHub. CachedIterable should load from the underlying
datastore lazily -- there is hardly any visible impact from the user
if this happens, because for the most part it looks and behaves the
same as it does today. The only visible changes are around loading
data from the underlying iterable.

For example, if the user calls the count method on the CachedIterable,
it would then load the remainder of the underlying data-store (and
then drop its reference to it). If the user asks for valueAt($n) and
it's beyond what's already loaded and we haven't finished consuming
the underlying iterable, then it would load until $n is found or the
end of the store is reached.

I understand your concerns with map, filter, etc. CachedIterable
is different because it holds onto the data, can be iterated over more
than once, including the two nested loop cases, even if it loads data
from the underlying iterable on demand.

4 years ago by Peter Bowyer — view source — reply

unread

On Wed, 9 Jun 2021 at 15:55, Levi Morrison via internals <
internals@lists.php.net> wrote:

On Wed, Jun 9, 2021 at 8:12 AM tyson andre tysonandre775@hotmail.com
wrote:

Hi Levi Morrison,

Hi internals,

Would participants please trim the emails they're quoting, it makes it
easier for readers to focus on what's being discussed in emails.

Thanks,
Peter

4 years ago by drealecs@gmail.com — view source — reply

unread

On Wed, Jun 9, 2021 at 3:22 AM tyson andre tysonandre775@hotmail.com
wrote:

Hi internals,

I've created a new RFC https://wiki.php.net/rfc/cachediterable adding
CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of
the keys and values of the iterable it was constructed from

A heads up - I will probably start voting on
https://wiki.php.net/rfc/cachediterable this weekend after
https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks,
Tyson

Hi Tyson,

Thanks for explaining 4 months ago about my concern.
I think I understand the main real impact of an eager iterable cache vs a
lazy iterable cache from a functional point of view:

exceptions are thrown during construction vs during the first iteration
predictable performance also on the first iteration.

How did you gather the information that eager implementation is more
valuable than lazy one? I'm mostly curious also how to assess this as
technically to me it also looks the other way around. Maybe mention that in
the RFC.
I was even thinking that CachedIterable should be lazy and an
EagerCachedIterable would be built upon that with more methods. Or have it
in the same class with a constructor parameter.

Also, being able to have a perfect userland implementation, not very
complex, even considering the lower performance, is not that good for
positive voting from what I remember from history...

Regards,
Alex

4 years ago by tyson andre — view source — reply

unread

Hi Alex,

I've created a new RFC https://wiki.php.net/rfc/cachediterable adding CachedIterable,
which eagerly evaluates any iterable and contains an immutable copy of the keys and values of the iterable it was constructed from

A heads up - I will probably start voting on https://wiki.php.net/rfc/cachediterable this weekend after https://wiki.php.net/rfc/cachediterable_straw_poll is finished.

Any other feedback on CachedIterable?

Thanks for explaining 4 months ago about my concern.
I think I understand the main real impact of an eager iterable cache vs a lazy iterable cache from a functional point of view:

exceptions are thrown during construction vs during the first iteration

predictable performance also on the first iteration.

How did you gather the information that eager implementation is more valuable than lazy one? I'm mostly curious also how to assess this as technically to me it also looks the other way around. Maybe mention that in the RFC.
I was even thinking that CachedIterable should be lazy and an EagerCachedIterable would be built upon that with more methods. Or have it in the same class with a constructor parameter.

One of the reasons was size/efficiency. Adding the functionality to support lazy evaluation would require extra properties to track internal state and extra checks at runtime,
point to the original iterable and the functions being applied to that iterable - so an application that creates lots of small/empty cached iterables would have a higher memory usage.

Having a data structure that tries to do everything would do other things poorly
(potentially not support serialization, use more memory than necessary,
have unintuitive behaviors when attempting to var_export/var_dump it,
surprisingly throw when being iterated over, etc)

Also, being able to have a perfect userland implementation, not very complex, even considering the lower performance, is not that good for positive voting from what I remember from history...

The userland polyfill included in the RFC is an incomplete implementation that only supports iteration.
It's meant to be as fast as possible at the cost of memory usage.
It's not even an IteratorAggregate, doesn't support json encode, createFromPairs, and many other functions.
Virtually all of the spl iterables that don't deal with filesystems can be reimplemented in userland.
(https://en.wikipedia.org/wiki/Turing_completeness)

Even complicated extensions such as redis or memcached can be reimplemented in userland on top of sockets,
but with higher cpu usage than native extensions (https://github.com/predis/predis/blob/main/FAQ.md#predis-is-a-pure-php-implementation-it-can-not-be-fast-enough)

The benefit of having data structures internally is the fact that developers who learn them can use them in any project without adding dependencies
(even in single file scripts) and that applications using CachedIterable would have much better performance

Also, you and Levi have pointed out that iterable/iterator functionality is traditionally on-demand
(https://en.wikipedia.org/wiki/Lazy_evaluation) (e.g. iterables such as CallbackFilterIterator, RecursiveArrayIterator, etc)

As a result, I'm thinking CachedIterable is really not a good name for the eagerly evaluated data structure I'm proposing here,
and that there was confusion about how the data structure behaved when the name CachedIterable was suggested.
If functionality like that described in https://externals.io/message/114805#114792 was added, it could use the name CachedIterable instead.

So I'm probably changing this to ImmutableTraversable as a short name for the functionality,
to make it clear arguments are eagerly evaluated when it is created.
(ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php)

Thanks,
Tyson

4 years ago by Pierre — view source — reply

unread

Le 10/06/2021 à 16:16, tyson andre a écrit :

So I'm probably changing this to ImmutableTraversable as a short name for the functionality,
to make it clear arguments are eagerly evaluated when it is created.
(ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php)

Hello,

And why not simply RewindableIterator ? Isn't it the prominent feature
of it ?

Agreed it's immutable, but a lot of traversable could be as well.

Regards,

Pierre

4 years ago by Levi Morrison via internals — view source — reply

unread

Le 10/06/2021 à 16:16, tyson andre a écrit :

So I'm probably changing this to ImmutableTraversable as a short name for the functionality,
to make it clear arguments are eagerly evaluated when it is created.
(ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php)

Hello,

And why not simply RewindableIterator ? Isn't it the prominent feature
of it ?

Agreed it's immutable, but a lot of traversable could be as well.

Regards,

--

Pierre

--

To unsubscribe, visit: https://www.php.net/unsub.php

All iterators are "rewindable", though of course not in practice. I
would avoid such names because we may eventually add an interface
which works as a "tag" to say "yes, I actually do support rewinding."

The property of being rewindable comes from it being cached. Maybe
CachedAggregate? Aggregates are data structures from which an
external iterator can be obtained, so it makes a bit more sense if
it's eager.

4 years ago by tyson andre — view source — reply

unread

Hi internals,

So I'm probably changing this to ImmutableTraversable as a short name for the functionality,
to make it clear arguments are eagerly evaluated when it is created.
(ImmutableSequence may be expected to only contain values, and would be confused with the ds PECL's https://www.php.net/manual/en/class.ds-sequence.php)

Hello,

And why not simply RewindableIterator ? Isn't it the prominent feature
of it ?

Agreed it's immutable, but a lot of traversable could be as well.

All iterators are "rewindable", though of course not in practice. I
would avoid such names because we may eventually add an interface
which works as a "tag" to say "yes, I actually do support rewinding."

The property of being rewindable comes from it being cached. Maybe
CachedAggregate? Aggregates are data structures from which an
external iterator can be obtained, so it makes a bit more sense if
it's eager.

I think CachedAggregate would have problems with an unclear meaning similar to those that were raised previously in https://externals.io/message/114819#114798
(Some developers would think it may refer to the act of lazily evaluating the iterable(caching it on-demand to access later))

https://en.wikipedia.org/wiki/Aggregate on its own refers to a collection of objects/values
or in other contexts, functions such as count/sum/min/max https://en.wikipedia.org/wiki/Aggregate_function

In other contexts such as set theory, there might not be keys associated with the values
so aggregate on its own seems unclear.

ImmutableIteratorAggregate or just ImmutableIterable/ImmutableTraversable makes more sense than Cached* to me.
ImmutableKeyValueSequence is an even shorter name than ImmutableIteratorAggregate and describes what the data structure is.

Thanks,
Tysosn