Proposal: Expanded iterable helper functions and aliasing iterator_to_array in `iterable\` namespace

2 years ago by tyson andre — view source

unread

Hi internals,

https://wiki.php.net/rfc/iterator_xyz_accept_array recently passed in php 8.2,
fixing a common inconvenience of those functions throwing a TypeError for arrays.

However, from the iterator_ name (https://www.php.net/manual/en/class.iterator.php),
it's likely to become a source of confusion when writing or reviewing code decades from now,
when the name suggests it only accepts objects (Traversable Iterator/IteratorAggregate).

I'm planning on creating an RFC adding the following functions to the iterable\ namespace as aliases of iterator_count/iterator_to_array.
Those accept iterables (https://www.php.net/manual/en/language.types.iterable.php), i.e. both Traversable objects and arrays.

Namespaces were chosen after feedback on my previous RFC,
and I believe iterable\ follows the guidance from https://wiki.php.net/rfc/namespaces_in_bundled_extensions and
https://wiki.php.net/rfc/namespaces_in_bundled_extensions#core_standard_spl

I plan to create an RFC with the following functionality in the iterable\ namespace, and wanted to see what the preference on naming was, or if there was other feedback.
(Not having enough functionality and wanting a better idea of the overall

iterable\count(...) (alias of iterator_count)
iterable\to_array(Traversable $iterator, bool $preserve_keys = true): array (alias of iterator_to_array, so that users can stop using a misleading name)
iterable\any(iterable $input, ?callable $callback = null): bool - Determines whether any value of the iterable satisfies the predicate.
and all() - Determines whether all values of the iterable satisfies the predicate.

This is a different namespace from https://wiki.php.net/rfc/any_all_on_iterable
iterable\none(iterable $input, ?callable $callback = null): bool

returns the opposite of any()
iterable\find(iterable $iterable, callable $callback, mixed $default = null): mixed

Returns the first value for which $callback($value) is truthy. On failure, returns default
iterable\fold(iterable $iterable, callable $callback, mixed $initial): mixed

fold and requiring an initial value seems like better practice. See https://externals.io/message/112558#112834
and https://stackoverflow.com/questions/25149359/difference-between-reduce-and-fold
iterable\unique_values(iterable $iterable): array {}

Returns true if this iterable includes a value identical to $value (===).
iterable\includes_value(iterable $iterable, mixed $value): bool {}
Returns a list of unique values of $iterable

There's other functionality that I was less certain about proposing, such as iterable\keys(iterable $iterable): array,
which would work similarly to array_keys but also work on Traversables (e.g. to be used with userland/internal collections, generators, etc.)
Or functions to get the iterable\first()/last() value in an iterable. Any thoughts on those?

I also wanted to know if more verbose names such as find_value(), fold_values(), any_values(), all_values() were generally preferred before proposing this,
since I only had feedback from a small number of names. My assumption was short names were generally preferred when possible.

See https://github.com/TysonAndre/pecl-teds/blob/main/teds.stub.php for documentation of the other functions mentioned here. The functionality can be tried out by installing https://pecl.php.net/package/teds

Background

In February 2021, I proposed expanded iterable functionality and brought it to a vote,
https://wiki.php.net/rfc/any_all_on_iterable , where feedback was mainly about being too small in scope and the choice of naming.

Later, after https://externals.io/message/112558#112780 , https://wiki.php.net/rfc/namespaces_in_bundled_extensions#proposal was created and brought to a vote in April 2021 that passed,
offering useful recommendations on how to standardize namespaces in future proposals of new categories of functionality
(e.g. iterable\any() and iterable\all())

Any comments?

Thanks,
Tyson

2 years ago by tim@bastelstu.be — view source

unread

iterable\count(...) (alias of iterator_count)

iterable\to_array(Traversable $iterator, bool $preserve_keys = true): array (alias of iterator_to_array, so that users can stop using a misleading name)

I wonder if this should be made an alias or if the introduction of the
namespace this is a good opportunity to revisit the $preserve_keys = true default choice. See also this email in the voting discussion for a
previous iterable_to_array() RFC:
https://externals.io/message/102562#102616

iterable\any(iterable $input, ?callable $callback = null): bool - Determines whether any value of the iterable satisfies the predicate.
and all() - Determines whether all values of the iterable satisfies the predicate.

I would not make the callable optional. If one really wants to filter on
the falsyness of the items, any($i, fn ($v) => !!$v) is easy enough.

I'd likely also swap the order of parameters, having the callable first
is more natural with a possible pipe operator or partial application.

It also is arguably more readable when using nested function calls,
because the callback appears immediately beside the function name:

(Callback first)

iterable\count(
iterable\filter(
fn ($v) => $v > 5
iterable\map(
fn ($v) => strlen($v),
$iterable
)
)
)

(Callable last)

iterable\count(
iterable\filter(
iterable\map(
$iterable,
fn ($v) => strlen($v)
),
fn ($v) => $v > 5
)
)

iterable\unique_values(iterable $iterable): array {}

Returns true if this iterable includes a value identical to $value (===).

iterable\includes_value(iterable $iterable, mixed $value): bool {}
Returns a list of unique values of $iterable

It appears you mixed up the descriptions here.

Any comments?

Your proposals look good to me. Especially any and all are something
I'm missing somewhat regularly.

Is there a reason there is no iterable\map() and iterable\filter()
(that I used in my example above)?

Best regards
Tim Düsterhus

2 years ago by Levi Morrison via internals — view source

unread

Namespaces were chosen after feedback on my previous RFC,
and I believe iterable\ follows the guidance from https://wiki.php.net/rfc/namespaces_in_bundled_extensions and
https://wiki.php.net/rfc/namespaces_in_bundled_extensions#core_standard_spl

I support this general direction :thumbsup:

I plan to create an RFC with the following functionality in the iterable\ namespace, and wanted to see what the preference on naming was, or if there was other feedback.
(Not having enough functionality and wanting a better idea of the overall

iterable\count(...) (alias of iterator_count)

iterable\to_array(Traversable $iterator, bool $preserve_keys = true): array (alias of iterator_to_array, so that users can stop using a misleading name)

At the very least, I'd like to also see to_array_list which ensures
that the array_is_list invariant is upheld by only using values from
the iterable.

Variants for assoc arrays have to deal with the fact that iterables
have duplicate keys, and also may contain key types which don't work
as array keys.

I think later values overwriting previous values of the same is
generally agreeable.
However, what to do with things that aren't key types I can see
having a lot of discussion. Some people probably just want them to
silently filter out, some people want a warning but to proceed with
the list, others may want a TypeError. I suppose we can make 3
functions, one for each case?
- to_assoc_array_silent which drops things which don't work as array keys.
- to_assoc_array_warn which drops things which don't work as
  array keys but also emits a warning.
- to_assoc_array_throw which throws a TypeError if a key doesn't
  work as an array key.

Not sure on these assoc things, but I'm pretty sure to_array_list
should be included in the very first version.

iterable\any(iterable $input, ?callable $callback = null): bool - Determines whether any value of the iterable satisfies the predicate.
and all() - Determines whether all values of the iterable satisfies the predicate.

:thumbsup:

iterable\none(iterable $input, ?callable $callback = null): bool

returns the opposite of any()

:thumbsup:

iterable\find(iterable $iterable, callable $callback, mixed $default = null): mixed

I would prefer to drop the default and have an Option return type
instead, but we don't have one today.

iterable\fold(iterable $iterable, callable $callback, mixed $initial): mixed

fold and requiring an initial value seems like better practice. See https://externals.io/message/112558#112834
and https://stackoverflow.com/questions/25149359/difference-between-reduce-and-fold

:thumbsup:

If we figure out an optional/result type then I think
iterable\reduce which just uses the first item as the initial would
be very helpful, but we have to deal with the empty iterable case so
I'm happy to leave it out for now since we don't have Option.

iterable\unique_values(iterable $iterable): array {}

Returns true if this iterable includes a value identical to $value (===).

I suppose this should take an optional callback that users can provide
for a custom definition of uniqueness?

iterable\includes_value(iterable $iterable, mixed $value): bool {}
Returns a list of unique values of $iterable

Not sure on this one, but the description doesn't match the signature :)
Anything which does comparison/uniqueness check should have a callback
(or optional callback) for specifying how to do that.

There's other functionality that I was less certain about proposing, such as iterable\keys(iterable $iterable): array,
which would work similarly to array_keys but also work on Traversables (e.g. to be used with userland/internal collections, generators, etc.)

I assume this doesn't care about key uniqueness -- it takes the keys
and makes them values, in the same order they were returned?

In any case, I don't think this should return an iterable, not array,
to allow for lazy operations in a chain, while also allowing for using
an array as an optimization.

Or functions to get the iterable\first()/last() value in an iterable. Any thoughts on those?

What do they return on empty? I would prefer to delay these and try to
get an Option type. Otherwise, they require doing a lot of variants
like the following to be ergonomic:

first_or($iterable, mixed $default)
first_or_else($iterable, callable $callback)
last_or($iterable, mixed $default)
last_or_else($iterable, callable $callback)

But if we can return an option:

first($iterable): Option
last($iterable): Option

Then the other parts move to the Option API:

iterable\first($iterable)->unwrap_or($default)
iterable\first($iterable)->unwrap_or_else(fn () => $whatever)

This general feedback also applies to iterable\find().

I also wanted to know if more verbose names such as find_value(), fold_values(), any_values(), all_values() were generally preferred before proposing this,
since I only had feedback from a small number of names. My assumption was short names were generally preferred when possible.

I like the verbose names when there are variants, for instance fold
and fold_with_keys (where we provide both key and value). I
definitely do not like it when we change function signatures of
callbacks based on flags parameters like some PHP functions do
today.

Phew, I'm not sure the mailing list is the best way to the tidbits of
the APIs! In any case, I strongly support the direction.

2 years ago by Levi Morrison via internals — view source

unread

There's other functionality that I was less certain about proposing, such as iterable\keys(iterable $iterable): array,
which would work similarly to array_keys but also work on Traversables (e.g. to be used with userland/internal collections, generators, etc.)

I assume this doesn't care about key uniqueness -- it takes the keys
and makes them values, in the same order they were returned?

In any case, I don't think this should return an iterable, not array,
to allow for lazy operations in a chain, while also allowing for using
an array as an optimization.

Oops, I meant to say I think this should return an iterable, not an array.

2 years ago by Larry Garfield — view source

unread

Hi internals,

https://wiki.php.net/rfc/iterator_xyz_accept_array recently passed in
php 8.2,
fixing a common inconvenience of those functions throwing a TypeError
for arrays.

However, from the iterator_ name
(https://www.php.net/manual/en/class.iterator.php),
it's likely to become a source of confusion when writing or reviewing
code decades from now,
when the name suggests it only accepts objects (Traversable
Iterator/IteratorAggregate).

I'm planning on creating an RFC adding the following functions to the
iterable\ namespace as aliases of iterator_count/iterator_to_array.
Those accept iterables
(https://www.php.net/manual/en/language.types.iterable.php), i.e. both
Traversable objects and arrays.

Namespaces were chosen after feedback on my previous RFC,
and I believe iterable\ follows the guidance from
https://wiki.php.net/rfc/namespaces_in_bundled_extensions and
https://wiki.php.net/rfc/namespaces_in_bundled_extensions#core_standard_spl

I plan to create an RFC with the following functionality in the
iterable\ namespace, and wanted to see what the preference on naming
was, or if there was other feedback.
(Not having enough functionality and wanting a better idea of the
overall

iterable\count(...) (alias of iterator_count)

iterable\to_array(Traversable $iterator, bool $preserve_keys = true): array (alias of iterator_to_array, so that users can stop using
a misleading name)

iterable\any(iterable $input, ?callable $callback = null): bool -
Determines whether any value of the iterable satisfies the predicate.
and all() - Determines whether all values of the iterable satisfies
the predicate.

This is a different namespace from
https://wiki.php.net/rfc/any_all_on_iterable

iterable\none(iterable $input, ?callable $callback = null): bool

returns the opposite of any()

iterable\find(iterable $iterable, callable $callback, mixed $default = null): mixed

Returns the first value for which $callback($value) is truthy. On
failure, returns default

iterable\fold(iterable $iterable, callable $callback, mixed $initial): mixed

fold and requiring an initial value seems like better practice. See
https://externals.io/message/112558#112834
and
https://stackoverflow.com/questions/25149359/difference-between-reduce-and-fold

iterable\unique_values(iterable $iterable): array {}

Returns true if this iterable includes a value identical to $value (===).

iterable\includes_value(iterable $iterable, mixed $value): bool {}
Returns a list of unique values of $iterable

There's other functionality that I was less certain about proposing,
such as iterable\keys(iterable $iterable): array,
which would work similarly to array_keys but also work on Traversables
(e.g. to be used with userland/internal collections, generators, etc.)
Or functions to get the iterable\first()/last() value in an iterable.
Any thoughts on those?

I also wanted to know if more verbose names such as find_value(),
fold_values(), any_values(), all_values() were generally preferred
before proposing this,
since I only had feedback from a small number of names. My assumption
was short names were generally preferred when possible.

See https://github.com/TysonAndre/pecl-teds/blob/main/teds.stub.php for
documentation of the other functions mentioned here. The functionality
can be tried out by installing https://pecl.php.net/package/teds

Background

In February 2021, I proposed expanded iterable functionality and
brought it to a vote,
https://wiki.php.net/rfc/any_all_on_iterable , where feedback was
mainly about being too small in scope and the choice of naming.

Later, after https://externals.io/message/112558#112780 ,
https://wiki.php.net/rfc/namespaces_in_bundled_extensions#proposal was
created and brought to a vote in April 2021 that passed,
offering useful recommendations on how to standardize namespaces in
future proposals of new categories of functionality
(e.g. iterable\any() and iterable\all())

Any comments?

Oh, a topic near and dear to me. :-) I'm going to try and respond to both the OP and some other responses together here.

First off, I am generally in favor of improving PHP's iterable story, so consider me on board on the concept.

Second, I have similar user-space utilities that were intended for pipe usage available in a library (since Levi mentioned pipe compatibility). I learned some very important things from that process. Details here:

https://github.com/Crell/fp/blob/master/src/composition.php
https://github.com/Crell/fp/blob/master/src/array.php

Of particular note:

Because of PHP's inconsistent handling of excess arguments to functions, there MUST be separate versions of every function that takes a callback, one that passes the key and one that does not. It would be a fatal design flaw to do otherwise. Yes, this balloons the number of such functions, which sucks, but that's PHP for you.
There are ample use cases for most operations to return an array or a lazy iterable. Both totally exist. I solved that by also having a separate version of each function, eg, amap() vs itmap(). The former returned an array, the latter returned a generator that generated the equivalent array. It would be a fatal design flaw to not account for this. Yes, this balloons the number of such functions, which sucks, but that's PHP for you.

So, eg, I have four map functions: amap(), itmap(), amapWithKeys(), itmapWithKeys(). Same for filter. Other operations only needed 2 variants, eg, first() and firstWithKeys(), any() and anyWithKeys(), etc.

I do not claim that naming pattern to be ideal; in fact I don't particularly like *WithKeys(). We should think carefully on the naming. A possible alternative would be to always return a lazy iterable in all circumstances and assume someone can use to_array() or equivalent on the result if they want it as an array. (That's effectively what Python 3 does with comprehensions.) However, that could have non-trivial performance impact since generators are slower than plain arrays.

Feel free to borrow liberally, design-wise, from the above code. There's a few more methods in there that could be of use, too. Note, though, that all are designed to be used with a pipe(), so they mostly return a closure that has been manually partially applied with everything except the iterable, so you get a single-argument function, which is what a pipe() or compose() chain needs.

Third, speaking of pipe, I disagree with Tim that putting the callback first would be easier for pipe/partials. If we ever get partials similar to previously implemented, then the argument order won't matter. If we get pipes as I've previously proposed, then none of these functions are directly usable because they're multi-argument.

The alternative I've considered is somewhat inspired by Elixir (assuming I understand the little Elixir I've read), in which a function after a |> is automatically assumed to be partially applying everything but the first argument. So $list |> map($callable) translates to map($list, $callable). I've not decided yet if that's a good way to avoid needing full partial application or a good way to make horribly confusing code. But if that were to happen, it would only work if all of these functions took the iterable, the "object to be operated on", as their first argument.

The callable, if inlined, is almost always the longest argument. That means it is most readable when it is the last argument, so there is no need to look at the end of the closure to see if there's any other arguments. (This is a problem with array_map() currently.) So I would instead propose that all iterator functions follow the pattern:

name($iterable, other stuff, $callback_if_applicable);

That is easily learnable, most likely to result in clean-ish code, and most likely to be nice with any future pipe or partial implementations. At worst, it would make pipe-ifying all such functions a trivially identical operation for all of them, making my library little more than a series of boring one-liners. (Please make my library little more than a series of boring one-liners.)

That does also mean we cannot support variadics or optional arguments. I am OK with that. And if someone really needs a different order, well, we have named arguments now.

Tim noted nesting these functions and what would make that cleanest. What would make it cleanest is to not nest them and instead use proper chaining instead; my pipe() function, a native pipe operator, or similar. Expecting these functions to nest and not be ugly is a fools errand, especially when there are vastly better options readily available.

Fourth, I agree with Levi that figuring out the edge case handling around empty lists is crucial. The more we can design the sematics such that they "fall out" naturally, the better. Eg, first() may return null for not found, which dovetails nicely with the ?? operator to provide a default. However, that means null cannot be used as a meaningful found-value. I'd argue that is the correct behavior, but I'm sure some would disagree.

An Option type would be nice, but to get that we really need to get ADTs first, and I don't have a timeline on that. Ilija is more interested in fixing core bugs right now than in adding new features, the silly man... :-P

(Technically an Option/Maybe object could be implemented with just classes as we have them now, especially if it's done in core, but it would be cleaner and more ergonomic if built on top of an Enum.)

Also, Monads are clunky in a language without first-class support for them. I've written extensively on this topic recently: https://peakd.com/hive-168588/@crell/much-ado-about-null

I'm not sure of the best way forward here, other than it should be addressed very carefully and explicitly.

Fifth, I would absolutely include map and filter in the included operations. They are critical parts of list handling. If we had pipe-compatible map and filter, that would basically give us a list comprehension tool for free. (That's exactly how many languages approach list comprehensions.) In my own list-operation-centric work, I've used map and filter a lot more than any of the other operations in the list above.

I'm on board with the direction, modulo implementation details.

--Larry Garfield

2 years ago by tim@bastelstu.be — view source

unread

There are ample use cases for most operations to return an array or a lazy iterable. Both totally exist. I solved that by also having a separate version of each function, eg, amap() vs itmap(). The former returned an array, the latter returned a generator that generated the equivalent array. It would be a fatal design flaw to not account for this. Yes, this balloons the number of such functions, which sucks, but that's PHP for you.

Please not. Only provide functions that return an iterable without
giving any further guarantees, because …

A possible alternative would be to always return a lazy iterable in all circumstances and assume someone can use to_array() or equivalent on the result if they want it as an array. (That's effectively what Python 3 does with comprehensions.) However, that could have non-trivial performance impact since generators are slower than plain arrays.

… this works. The users should not need to think about what return type
is more useful for their specific use case.

The heavy lifting / optimization should be left to the compiler /
engine, similarly how a fully qualified 'is_null()' is also optimized,
such that no actual function call happens [1].

Even if this kind of optimization does not happen in the first version,
this should not be a reason to clutter the API surface.

Feel free to borrow liberally, design-wise, from the above code. There's a few more methods in there that could be of use, too. Note, though, that all are designed to be used with a pipe(), so they mostly return a closure that has been manually partially applied with everything except the iterable, so you get a single-argument function, which is what a pipe() or compose() chain needs.

Third, speaking of pipe, I disagree with Tim that putting the callback first would be easier for pipe/partials. If we ever get partials similar to previously implemented, then the argument order won't matter. If we get pipes as I've previously proposed, then none of these functions are directly usable because they're multi-argument.

The alternative I've considered is somewhat inspired by Elixir (assuming I understand the little Elixir I've read), in which a function after a |> is automatically assumed to be partially applying everything but the first argument. So $list |> map($callable) translates to map($list, $callable). I've not decided yet if that's a good way to avoid needing full partial application or a good way to make horribly confusing code. But if that were to happen, it would only work if all of these functions took the iterable, the "object to be operated on", as their first argument.

With Haskell, from which my functional programming experience comes,
it's the exact inverse: All functions are automatically partially
applied [2], thus in a pipe the "missing" parameter comes last, not
first as with Elixir.

So I guess there is no right or wrong and it depends on the programming
language which variant feels more natural. I still prefer having the
callback first for the reasons I've outlined, but in the end I'm happy
as long as it's consistent.

Best regards
Tim Düsterhus

[1]
https://github.com/php/php-src/blob/e00dadf43a17da3bb79aba360d07e29b359c12b3/Zend/zend_compile.c#L4372-L4373
[2] You can just do 'lengthOfAll = map length' and then 'lengthOfAll
["foo", "bar", "foobar"]'

2 years ago by Larry Garfield — view source

unread

Hi

There are ample use cases for most operations to return an array or a lazy iterable. Both totally exist. I solved that by also having a separate version of each function, eg, amap() vs itmap(). The former returned an array, the latter returned a generator that generated the equivalent array. It would be a fatal design flaw to not account for this. Yes, this balloons the number of such functions, which sucks, but that's PHP for you.

Please not. Only provide functions that return an iterable without
giving any further guarantees, because …

A possible alternative would be to always return a lazy iterable in all circumstances and assume someone can use to_array() or equivalent on the result if they want it as an array. (That's effectively what Python 3 does with comprehensions.) However, that could have non-trivial performance impact since generators are slower than plain arrays.

… this works. The users should not need to think about what return type
is more useful for their specific use case.

The heavy lifting / optimization should be left to the compiler /
engine, similarly how a fully qualified 'is_null()' is also optimized,
such that no actual function call happens [1].

Even if this kind of optimization does not happen in the first version,
this should not be a reason to clutter the API surface.

I'm open to this approach, but we should be mindful of the ergonomics. Specifically, I don't think to_array() as an answer is workable unless we have some built-in chaining mechanism, pipes or similar. Otherwise, you're effectively requiring anyone who wants to ensure they have an array (which will be a lot of people) to write

to_array(some_stuff_here());

All the time, which is considerably less ergonomic than

some_stuff_here() |> to_array();

Especially if there's more than one operation involved.

Side note: Should there instead be to_list() (which implies dropping keys and reindexing) and to_assoc() (which implies keeping keys and merging on key if appropriate)? That seems much more self-documenting than to_array(true) or whatever.

Feel free to borrow liberally, design-wise, from the above code. There's a few more methods in there that could be of use, too. Note, though, that all are designed to be used with a pipe(), so they mostly return a closure that has been manually partially applied with everything except the iterable, so you get a single-argument function, which is what a pipe() or compose() chain needs.

Third, speaking of pipe, I disagree with Tim that putting the callback first would be easier for pipe/partials. If we ever get partials similar to previously implemented, then the argument order won't matter. If we get pipes as I've previously proposed, then none of these functions are directly usable because they're multi-argument.

The alternative I've considered is somewhat inspired by Elixir (assuming I understand the little Elixir I've read), in which a function after a |> is automatically assumed to be partially applying everything but the first argument. So $list |> map($callable) translates to map($list, $callable). I've not decided yet if that's a good way to avoid needing full partial application or a good way to make horribly confusing code. But if that were to happen, it would only work if all of these functions took the iterable, the "object to be operated on", as their first argument.

With Haskell, from which my functional programming experience comes,
it's the exact inverse: All functions are automatically partially
applied [2], thus in a pipe the "missing" parameter comes last, not
first as with Elixir.

So I guess there is no right or wrong and it depends on the programming
language which variant feels more natural. I still prefer having the
callback first for the reasons I've outlined, but in the end I'm happy
as long as it's consistent.

Mm, yes, the root question is what kind of auto-partialing we are going to have. Right now we have none, which is problematic for any sort of composition, which these iterable functions are going to want to have. Whether we partial from the left or from the right will determine how we order parameters for functions we expect to be partialed.

We should bear in mind that PHP has a couple of features that complicate matters; namely optional arguments, variadics, and named arguments. The previous RFC that Joe wrote handled all of those as gracefully as I think is possible, but the complexity was apparently too high for too many folks. He didn't seem confident that a less-complex approach was possible, though.

I know this is veering off topic, but one idea I had kicked around that Joe didn't like was a function call prefix to indicate that it was a partial rather than a direct call. Something like %foo($a, $b, ?), to indicate that we were partially applying the function rather than calling it. I don't know if that would actually work, especially if combined with some simplified set of capabilities for better auto-partialing.

These questions are not immediately part of the topic of this thread, but they're closely-related topics that should be considered when designing them.

--Larry Garfield