Hi
I have created an RFC to add the function array_find which returns the
first element for which a predicate callback returns true. This is a
function which I missed often. Furthermore this type of function is
implemented with other programming languages like C++, JavaScript and
Rust, too.
You can find the RFC at:
https://wiki.php.net/rfc/array_find
Proof of concept implementation is in:
https://github.com/joshuaruesweg/php-src/commit/9f3fc252b92f534d498e5f1e6a463e15f45da208
I'm looking forward to your feedback.
Cheers
Joshua Rüsweg
Hi
I have created an RFC to add the function array_find which returns the
first element for which a predicate callback returns true. This is a
function which I missed often. Furthermore this type of function is
implemented with other programming languages like C++, JavaScript and
Rust, too.
You can find the RFC at:
https://wiki.php.net/rfc/array_find
Proof of concept implementation is in:
https://github.com/joshuaruesweg/php-src/commit/9f3fc252b92f534d498e5f1e6a463e15f45da208
I'm looking forward to your feedback.
Cheers
Joshua Rüsweg
I'm open to this, but two points that I'm sure someone will bring up:
-
Should this work on arrays or iterables? This is a long standing limitation of PHP. The array operations don't work on iterables, even though we've had iterables for 20 years.)
-
Key handling. It's good that you have looked into this, because I was going to mention it. :-) However, I don't think a boolean is the right answer, since the question is binary, not true/false. (Those are not the same thing.) I think a small return-mode Enum would make more sense here.
--Larry Garfield
Hi
- Should this work on arrays or iterables? This is a long standing limitation of PHP. The array operations don't work on iterables, even though we've had iterables for 20 years.)
In the longer term, it definitely makes sense to create a separate API
here that can handle not only arrays, but iterables in general. I have
heard this suggestion in various places (including in the mailing list)
and had also looked into it in the process of this RFC, but did not
pursue it further after the initial idea, as it would be important for
me that such an API is planned accordingly and has an appropriate
repertoire right from the start (functions such as map, filter, find,
push, pop, …). In my opinion, a single function would be very out of
place, especially if this API is then really soon tackled and then
possibly differs from the implementation of the RFC.
- Key handling. It's good that you have looked into this, because I was going to mention it. :-) However, I don't think a boolean is the right answer, since the question is binary, not true/false. (Those are not the same thing.) I think a small return-mode Enum would make more sense here.
I like the idea, thank you! However, I am unsure whether an additional
enum for the function would not be too much overhead.
Cheers
Joshua Rüsweg
Hi
- Should this work on arrays or iterables? This is a long standing limitation of PHP. The array operations don't work on iterables, even though we've had iterables for 20 years.)
In the longer term, it definitely makes sense to create a separate API
here that can handle not only arrays, but iterables in general. I have
heard this suggestion in various places (including in the mailing list)
For reference: https://externals.io/message/118896#118896
and had also looked into it in the process of this RFC, but did not
pursue it further after the initial idea, as it would be important for
me that such an API is planned accordingly and has an appropriate
repertoire right from the start (functions such as map, filter, find,
push, pop, …). In my opinion, a single function would be very out of
place, especially if this API is then really soon tackled and then
possibly differs from the implementation of the RFC.
It makes sense to me to not make array_find
the “odd one out” and
widening all the array_* functions probably would not work well, because
either the Iterator is converted into array, nullifying the benefits of
using an Iterator or the return type needs to be changed, making the
signature confusing without generics.
That said, adding a “find” function makes sense to me and the
implementation looks reasonable.
However I'm not sure if adding new array functions piecemeal is the
right choice at this point. array_any and array_every are conceptually
very similar to array_find and are missing as well. In fact
array_any($cb, $array) = array_find($cb, $array, true) !== null and
array_every($cb, $array) = !array_any($negatedCb, $array), but it would
make sense to have them explicitly for clarity of the reader of the code.
- Key handling. It's good that you have looked into this, because I was going to mention it. :-) However, I don't think a boolean is the right answer, since the question is binary, not true/false. (Those are not the same thing.) I think a small return-mode Enum would make more sense here.
I like the idea, thank you! However, I am unsure whether an additional
enum for the function would not be too much overhead.
I feel the same. Adding an enum for each binary parameter that is
semantically true and false feels quite unwieldy with how class / enums
/ interfaces are currently organized in the namespace hierarchy.
Some of the array functions have paired function with a _key suffix, but
looking at the docs it appears the difference usually is that they
operate on the keys, instead of returning the keys. So I'm not sure
whether adding a array_find_key companion would be confusing or not.
Best regards
Tim Düsterhus
However I'm not sure if adding new array functions piecemeal is the
right choice at this point. array_any and array_every are conceptually
very similar to array_find and are missing as well. In fact
array_any($cb, $array) = array_find($cb, $array, true) !== null and
array_every($cb, $array) = !array_any($negatedCb, $array), but it would
make sense to have them explicitly for clarity of the reader of the code.
We're in a major catch-22, unfortunately. We know that collections/iterables are long overdue for a rethink, which means small fixes are just making more work for the future. Intermediate concepts like the pipe operator have been rejected. However, a full rethink is a massive undertaking, and few people want to do that given the entirely unknown odds any RFC has. And a real rethink doesn't make sense to do without generics, and... yeah.
So I genuinely don't know what to do here, strategically.
- Key handling. It's good that you have looked into this, because I was going to mention it. :-) However, I don't think a boolean is the right answer, since the question is binary, not true/false. (Those are not the same thing.) I think a small return-mode Enum would make more sense here.
I like the idea, thank you! However, I am unsure whether an additional
enum for the function would not be too much overhead.I feel the same. Adding an enum for each binary parameter that is
semantically true and false feels quite unwieldy with how class / enums
/ interfaces are currently organized in the namespace hierarchy.
Point of order: This parameter is not semantically true and false. It is semantically either/or, and we kinda twist sideways to make it look like true/false if we squint. That's actually pretty common in the current stdlib, though it's not a good approach. Hence why I asked about an enum. I wouldn't expect it to be single-function, though, but to be applicable for multiple functions. (I did not go looking to see if such functions exist.)
Some of the array functions have paired function with a _key suffix, but
looking at the docs it appears the difference usually is that they
operate on the keys, instead of returning the keys. So I'm not sure
whether adding a array_find_key companion would be confusing or not.
Another alternative is to always return the key, because you can trivially get the value from the key, but not vice versa. Of course, the resulting syntax for that is frequently fugly.
$val = $array[array_find($db, $array)] ?? some-default;
I don't have a good small-scale solution.
--Larry Garfield
Hi
I feel the same. Adding an enum for each binary parameter that is
semantically true and false feels quite unwieldy with how class / enums
/ interfaces are currently organized in the namespace hierarchy.Point of order: This parameter is not semantically true and false. It is semantically either/or, and we kinda twist sideways to make it look like true/false if we squint. That's actually pretty common in the current stdlib, though it's not a good approach. Hence why I asked about an enum. I wouldn't expect it to be single-function, though, but to be applicable for multiple functions. (I did not go looking to see if such functions exist.)
Whoops, I accidentally a word in that paragraph, thank you.
It should have read "each binary parameter that is NOT semantically true
and false".
Best regards
Tim Düsterhus
Hi
Another alternative is to always return the key, because you can trivially get the value from the key, but not vice versa. Of course, the resulting syntax for that is frequently fugly.
$val = $array[array_find($db, $array)] ?? some-default;
In 95% (roughly) of cases, I (personally) need the array value and not
the key. Just having the option to get the key is already a step forward
compared to the current state, but I personally would find it
impractical (and fugly, as you said) to use the method, especially if
the search callback is more complex and multi-line (and it is fugly, as
you said).
Here is an example for this, without using a helper variable:
$value = $array[\array_find($array, function ($value, $key): bool {
if ($key % 5) {
return \strlen($value) < 100;
}
if ($key % 2) {
return \strlen($value) < 40;
}
return false;
})];
So I think, this is not an option.
Cheers
Josh
Hi
Another alternative is to always return the key, because you can trivially get the value from the key, but not vice versa. Of course, the resulting syntax for that is frequently fugly.
$val = $array[array_find($db, $array)] ?? some-default;
In 95% (roughly) of cases, I (personally) need the array value and not
the key. Just having the option to get the key is already a step forward
compared to the current state, but I personally would find it
impractical (and fugly, as you said) to use the method, especially if
the search callback is more complex and multi-line (and it is fugly, as
you said).Here is an example for this, without using a helper variable:
$value = $array[\array_find($array, function ($value, $key): bool {
if ($key % 5) {
return \strlen($value) < 100;
}if ($key % 2) { return \strlen($value) < 40; } return false;
})];
So I think, this is not an option.
Cheers
Josh
I think at this point I'm on team "let's just make it 2 separate functions, give them good names, and move on with life."
There's another issue, though: Will the callback always be given both $value and $key? If so, it's incompatible with internal single-parameter functions. If not, and it tries to auto-detect, we run into issues with functions with optional second arguments. (This is a pre-existing mess I've run into before.)
--Larry Garfield
Hi
There's another issue, though: Will the callback always be given both $value and $key?
The current implementation always passes both, which, as I just learned,
is inconsistent with array_filter
which has the $mode
parameter to
control what to pass to the callback.
Not that I claim that the $mode
parameter is a good idea from the
typing perspective for static analysis tools, it probably isn't.
Best regards
Tim Düsterhus
Hi
There's another issue, though: Will the callback always be given both
$value and $key?The current implementation always passes both, which, as I just learned,
is inconsistent witharray_filter
which has the$mode
parameter to
control what to pass to the callback.Not that I claim that the
$mode
parameter is a good idea from the
typing perspective for static analysis tools, it probably isn't.
Yes, this method always passes both parameters. This has the advantage
that we do not need a third parameter that specifies which parameters
are to be passed to the function. The array_filter
method was
developed at a time when there were no closures and it was therefore not
trivially possible to change the parameter order of a function (or to
omit a parameter completely). With closures, this is now possible
without any problems.
Converting the method so that it works like array_filter
would mean
that we would have to introduce new constants (or inconsistent, too: an
enum) for the function that controls this behaviour. Reusing the
constants which are used in array_filter
is not possible, because the
constant name contains the function name array_filter
.
Cheers
Josh
Hi
However I'm not sure if adding new array functions piecemeal is the
right choice at this point. array_any and array_every are conceptually
very similar to array_find and are missing as well. In fact
array_any($cb, $array) = array_find($cb, $array, true) !== null and
array_every($cb, $array) = !array_any($negatedCb, $array), but it would
make sense to have them explicitly for clarity of the reader of the code.
Thinking about this: I believe that it would make sense to bundle
array_any and array_every (or array_all, see below) within the same RFC
due to the similarity. It can be a separate vote for those two, but
having the option of getting all three would probably alleviate my
concerns of adding new array functions piecemeal.
The implementation should be trivial, because it effectively just
changes the return type. Nevertheless I'm happy to assist should any
issues arise.
As for the naming:
JavaScript: every + some
Haskell : all + any
Rust : all + any
C++ : all_of + any_of + none_of
Java : allMatch + anyMatch (in java.util.stream.Stream)
Swift : allSatisfy + contains(where: …)
It appears the commonly used choice is all + any.
Best regards
Tim Düsterhus
Hi
On 19.04.24 23:29, Tim Düsterhus wrote:> Thinking about this: I believe
that it would make sense to bundle
array_any and array_every (or array_all, see below) within the same RFC
due to the similarity. It can be a separate vote for those two, but
having the option of getting all three would probably alleviate my
concerns of adding new array functions piecemeal.The implementation should be trivial, because it effectively just
changes the return type. Nevertheless I'm happy to assist should any
issues arise.As for the naming:
JavaScript: every + some
Haskell : all + any
Rust : all + any
C++ : all_of + any_of + none_of
Java : allMatch + anyMatch (in java.util.stream.Stream)
Swift : allSatisfy + contains(where: …)It appears the commonly used choice is all + any.
Thanks for your suggestion! I have added the array_any
and array_all
functions to the RFC accordingly.
Cheers
Josh
- Should this work on arrays or iterables? This is a long standing limitation of PHP. The array operations don't work on iterables, even though we've had iterables for 20 years.)
In the longer term, it definitely makes sense to create a separate API
here that can handle not only arrays, but iterables in general. I have
heard this suggestion in various places (including in the mailing list)
and had also looked into it in the process of this RFC, but did not
pursue it further after the initial idea, as it would be important for
me that such an API is planned accordingly and has an appropriate
repertoire right from the start (functions such as map, filter, find,
push, pop, …). In my opinion, a single function would be very out of
place, especially if this API is then really soon tackled and then
possibly differs from the implementation of the RFC.
I think it's fine to have an array-specific variant in this case,
because arrays never have duplicate keys. Iterables can, and that may
factor into design. Additionally, arrays are not consumed by iterating
on them, but iterables may, and this could be a gotcha. I think it's
fine to have an array-specific version (simpler, nicer).
Hi
I have created an RFC to add the function array_find which returns the
first element for which a predicate callback returns true. This is a
function which I missed often. Furthermore this type of function is
implemented with other programming languages like C++, JavaScript and
Rust, too.
You can find the RFC at:
https://wiki.php.net/rfc/array_find
Proof of concept implementation is in:
https://github.com/joshuaruesweg/php-src/commit/9f3fc252b92f534d498e5f1e6a463e15f45da208
I'm looking forward to your feedback.
Cheers
Joshua RüswegI'm open to this, but two points that I'm sure someone will bring up:
Should this work on arrays or iterables? This is a long standing limitation of PHP. The array operations don't work on iterables, even though we've had iterables for 20 years.)
Key handling. It's good that you have looked into this, because I was going to mention it. :-) However, I don't think a boolean is the right answer, since the question is binary, not true/false. (Those are not the same thing.) I think a small return-mode Enum would make more sense here.
IMO, it's better to separate it into two functions because its type is
stable without control flow. For instance:
// Returns K if $b is true, V otherwise.
function array_find(
array<K, V> $array,
callable(V, K): bool $callback,
bool $b = false
) -> K|V;
This isn't stable and requires control-flow to understand the type.
These are simpler:
function array_find(
array<K, V> $array,
callable(V, K): bool $callback
) -> V;
function array_find_key(
array<K, V> $array,
callable(V, K): bool $callback
) -> K;
Naming bikeshedding aside, it's better to have types that are
inferrable without function-specific knowledge of control flow. It
doesn't matter if it's a bool or an enum, it still has problems.
Better to just separate them to different functions.
IMO, it's better to separate it into two functions because its type is
stable without control flow. For instance:// Returns K if $b is true, V otherwise. function array_find( array<K, V> $array, callable(V, K): bool $callback, bool $b = false ) -> K|V;
This isn't stable and requires control-flow to understand the type.
These are simpler:function array_find( array<K, V> $array, callable(V, K): bool $callback ) -> V; function array_find_key( array<K, V> $array, callable(V, K): bool $callback ) -> K;
Naming bikeshedding aside, it's better to have types that are
inferrable without function-specific knowledge of control flow. It
doesn't matter if it's a bool or an enum, it still has problems.
Better to just separate them to different functions.
I definitely see the point where there is an advantage to having two
separate methods and can definitely understand that it is easier for
developers to understand the control flow without evaluating the parameters.
I'm unsure if that's really necessary though, because basically it's
probably not necessary to directly see what exactly the function
returns. Perhaps there will be another opinion on this in an email in
the next few days.
Cheers
Josh
Hi
I definitely see the point where there is an advantage to having two
separate methods and can definitely understand that it is easier for
developers to understand the control flow without evaluating the
parameters.I'm unsure if that's really necessary though, because basically it's
probably not necessary to directly see what exactly the function
returns. Perhaps there will be another opinion on this in an email in
the next few days.
Now that I've thought about it for a few days, it's really better that
the whole thing is broken down into two methods. I have adjusted the RFC
accordingly. The RFC contains now two separat functions array_find
and
array_find_key
.
Cheers
Josh
Hi
I definitely see the point where there is an advantage to having two
separate methods and can definitely understand that it is easier for
developers to understand the control flow without evaluating the
parameters.I'm unsure if that's really necessary though, because basically it's
probably not necessary to directly see what exactly the function
returns. Perhaps there will be another opinion on this in an email in
the next few days.Now that I've thought about it for a few days, it's really better that
the whole thing is broken down into two methods. I have adjusted the RFC
accordingly. The RFC contains now two separat functionsarray_find
and
array_find_key
.Cheers
Josh
The RFC looks better to me. The
Unaffected PHP Functionality section looks like it needs
updating, though:
This RFC only adds two new functions and an enum to PHP and
only affects previously defined functions which are named as
the proposed function or enum.
I don't see an enum in the text nor in the git diff.
Hi
The RFC looks better to me. The
Unaffected PHP Functionality section looks like it needs
updating, though:This RFC only adds two new functions and an enum to PHP and
only affects previously defined functions which are named as
the proposed function or enum.I don't see an enum in the text nor in the git diff.
I have removed the leftover.
Thank you!
Hi
I definitely see the point where there is an advantage to having two
separate methods and can definitely understand that it is easier for
developers to understand the control flow without evaluating the
parameters.I'm unsure if that's really necessary though, because basically it's
probably not necessary to directly see what exactly the function
returns. Perhaps there will be another opinion on this in an email in
the next few days.Now that I've thought about it for a few days, it's really better that
the whole thing is broken down into two methods. I have adjusted the RFC
accordingly. The RFC contains now two separat functionsarray_find
and
array_find_key
.Cheers
Josh
This looks good to me, with one remaining exception, which isn't specific to this function but should still be discussed: Always passing the value and key to the callback is unsafe, for two reasons.
-
If the callback is an internal function rather than a user-land one, and it has only one argument, it will error confusingly. That makes the current implementation incompatible with unary built-in functions. See, for instance, https://www.php.net/is_string (and friends)
-
If the callback takes two arguments but the second is optional, it's highly unlikely that the key is the value expected as the second argument. This could lead to confusingly hilarious errors. See, for instance, https://www.php.net/intval.
These won't come up in the typical case of passing an inline closure (either short or long form), but are still hidden landmines for anyone using functions not tailor made for these functions.
I'm not sure of a good solution here, honestly, so I don't know what to recommend. In Crell/fp, I ended up just making two different versions of the function that pass one or two arguments. I don't think that's a good answer for this RFC, but I'm not sure what is. At the very least, it should be mentioned as a known-limitation that gets documented., unless we can come up with something better.
--Larry Garfield
Hi
This looks good to me, with one remaining exception, which isn't specific to this function but should still be discussed: Always passing the value and key to the callback is unsafe, for two reasons.
- If the callback is an internal function rather than a user-land one, and it has only one argument, it will error confusingly. That makes the current implementation incompatible with unary built-in functions. See, for instance, https://www.php.net/is_string (and friends)
I think, that this problem can easily be detected with static analysers.
Currently neither PHPStan [1] nor psalm [2] does detect this issue, but
as the tools already validate the signature (e.g. str_contains is
rejected) this can probably be integrated and might even be considered a
bug.
The proper fix from PHP's side would be something like the proposal in
https://externals.io/message/122928 (RFC idea: using the void type to
control maximum arity of user-defined functions).
- If the callback takes two arguments but the second is optional, it's highly unlikely that the key is the value expected as the second argument. This could lead to confusingly hilarious errors. See, for instance, https://www.php.net/intval.
These won't come up in the typical case of passing an inline closure (either short or long form), but are still hidden landmines for anyone using functions not tailor made for these functions.
I see the problem, but don't think, that this is a problem, that we can
solve. As a note: Such problems exists in JavaScript, too [3] and is
handled in the same way.
I'm not sure of a good solution here, honestly, so I don't know what to recommend. In Crell/fp, I ended up just making two different versions of the function that pass one or two arguments. I don't think that's a good answer for this RFC, but I'm not sure what is. At the very least, it should be mentioned as a known-limitation that gets documented., unless we can come up with something better.
I have added this problem in the section "Open Issues" in the RFC to
document this behavior.
[1] https://phpstan.org/r/bd0866cd-6a76-4c18-8eb3-0f3848de7f4a
[2] https://psalm.dev/r/1baa3f8e0d
[3] https://wirfs-brock.com/allen/posts/166