Hi,
After programming in PHP for two decades, my goal for 2023 is to try to contribute to the language.
The plan is to start small and, if successful, work my way up increasing complexity of proposals.
This topic has been chosen for starters, because IMO it strikes a good balance between simplicity and usefulness.
I should be able to implement the RFC myself, unless some deep OPcache/JIT nuances pop up.
Let me give you a brief overview of the problem and the proposed solution.
Function array_is_list() added in PHP 8.1 introduces the concept of a "list" β array having 0..count-1 indexes.
The function is awesome and array "lists" are completely compatible with all array_* functions!
However, function array_filter()
exhibits a nuanced behavior when filtering lists.
For instance, it preserves array keys which may (or may not) create gaps in sequential indexes.
These gaps mean that a filtered list is not a list anymore as validated by array_is_list().
For example:
$originalList = ['first', '', 'last'];
$filteredList = array_filter($originalList);
var_export(filteredList); // array(0 => 'first', 2 => 'last')
var_export(array_is_list($originalList)); // true
var_export(array_is_list($filteredList)); // false
The behavior is counterintuitive and can lead to subtle bugs, such as encoding issues:
echo json_encode($originalList); // ["first", "", "last"]
echo json_encode($filteredList); // {"0": "first", "2": "last"}
The workaround is to post-process the filtered array with array_values()
to reset the indexes.
The proposal is to introduce a function array_filter_list() that would work solely on lists.
It will have the same signature as array_filter()
and will always return a valid list.
See a draft RFC with more details here:
https://dev.to/sshymko/php-rfc-arrayfilterlist-function-35mb-temp-slug-7074000?preview=21d6760126a02464b0511498bbb95749150afb17a7ff6377c458ee54a8f57cfe00d4e258aa06bad3232c0dd9e73a2d62138fc990048987e9e2339a3d
I just registered a wiki account "sshymko" with the intention of submitting the RFC.
Could someone please approve the account and give it some karma?
Looking forward to collaborating with the internals team! π
Regards,
Sergii Shymko
Hey Sergii,
Hi,
After programming in PHP for two decades, my goal for 2023 is to try to
contribute to the language.
The plan is to start small and, if successful, work my way up increasing
complexity of proposals.
This topic has been chosen for starters, because IMO it strikes a good
balance between simplicity and usefulness.
I should be able to implement the RFC myself, unless some deep OPcache/JIT
nuances pop up.Let me give you a brief overview of the problem and the proposed solution.
Function array_is_list() added in PHP 8.1 introduces the concept of a
"list" β array having 0..count-1 indexes.
The function is awesome and array "lists" are completely compatible with
all array_* functions!
However, functionarray_filter()
exhibits a nuanced behavior when
filtering lists.
For instance, it preserves array keys which may (or may not) create gaps
in sequential indexes.
These gaps mean that a filtered list is not a list anymore as validated by
array_is_list().For example:
$originalList = ['first', '', 'last'];
$filteredList = array_filter($originalList);
var_export(filteredList); // array(0 => 'first', 2 => 'last')
var_export(array_is_list($originalList)); // true
var_export(array_is_list($filteredList)); // falseThe behavior is counterintuitive and can lead to subtle bugs, such as
encoding issues:
echo json_encode($originalList); // ["first", "", "last"]
echo json_encode($filteredList); // {"0": "first", "2": "last"}The workaround is to post-process the filtered array with
array_values()
to reset the indexes.
The proposal is to introduce a function array_filter_list() that would
work solely on lists.
It will have the same signature asarray_filter()
and will always return a
valid list.See a draft RFC with more details here:
I just registered a wiki account "sshymko" with the intention of
submitting the RFC.
Could someone please approve the account and give it some karma?Looking forward to collaborating with the internals team! π
I don't want to shoot this down too early, but:
- why in the language, when a simple userland function suffices?
- what's wrong with writing
array_values(array_filter(...))
?
Marco Pivetta
Hi Marco,
Hi Marco,
From: Marco Pivetta ocramius@gmail.com
Sent: Wednesday, February 1, 2023 9:25 AM
To: Sergii Shymko sergey@shymko.net
Cc: internals@lists.php.net internals@lists.php.net
Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoid
subtle bugs/workaroundsHey Sergii,
On Wed, 1 Feb 2023 at 18:22, Sergii Shymko <sergey@shymko.net<mailto:
sergey@shymko.net>> wrote:
Hi,After programming in PHP for two decades, my goal for 2023 is to try to
contribute to the language.
The plan is to start small and, if successful, work my way up increasing
complexity of proposals.
This topic has been chosen for starters, because IMO it strikes a good
balance between simplicity and usefulness.
I should be able to implement the RFC myself, unless some deep OPcache/JIT
nuances pop up.Let me give you a brief overview of the problem and the proposed solution.
Function array_is_list() added in PHP 8.1 introduces the concept of a
"list" β array having 0..count-1 indexes.
The function is awesome and array "lists" are completely compatible with
all array_* functions!
However, functionarray_filter()
exhibits a nuanced behavior when
filtering lists.
For instance, it preserves array keys which may (or may not) create gaps
in sequential indexes.
These gaps mean that a filtered list is not a list anymore as validated by
array_is_list().For example:
$originalList = ['first', '', 'last'];
$filteredList = array_filter($originalList);
var_export(filteredList); // array(0 => 'first', 2 => 'last')
var_export(array_is_list($originalList)); // true
var_export(array_is_list($filteredList)); // falseThe behavior is counterintuitive and can lead to subtle bugs, such as
encoding issues:
echo json_encode($originalList); // ["first", "", "last"]
echo json_encode($filteredList); // {"0": "first", "2": "last"}The workaround is to post-process the filtered array with
array_values()
to reset the indexes.
The proposal is to introduce a function array_filter_list() that would
work solely on lists.
It will have the same signature asarray_filter()
and will always return a
valid list.See a draft RFC with more details here:
I just registered a wiki account "sshymko" with the intention of
submitting the RFC.
Could someone please approve the account and give it some karma?Looking forward to collaborating with the internals team! π
I don't want to shoot this down too early, but:
- why in the language, when a simple userland function suffices?
- what's wrong with writing
array_values(array_filter(...))
?Marco Pivetta
I do understand the aversion to adding more functions to PHP core π
I think that's rather simple from the discussions I've seen here in the
past years:
C code is harder to write, maintain and understand while PHP code is
simpler and is also not bound to php versioning so it can evolve faster.
Of course, given that the performance is similar.
When the performance is greatly better when having the function native, it
makes more sense, like it was the case with array_is_list(), where the
information exists as a flag internally almost always.
I would suggest you check
https://github.com/azjezz/psl/blob/next/src/Psl/Vec/filter.php#L34
and maybe use the library for other features as well.
Or you can actually do the implementation and show also some numbers
regarding the performance to have a better chance of having the rfc pass.
That's what I observed it helps.
Regards,
Alex
Let me try to articulate the answers to your questions:
- A userland function is enough, it's almost always possible to create
a userland function.
However, pretty much every single PHP project would need it and use pretty
often.
It would be great to standardize the function name, especially now that it
compliments array_is_list().
The same userland argument can be made for array_is_list(), yet people are
appreciating the function.
The reasoning for adding array_filter_list() to the core is similar to
array_is_list().- The array_values(array_filter()) workaroud is usable and is indeed
used on the projects I'm involved with.
It's a relatively short one-liner allowing to not go to the extent of
wrapping it in a userland function.
Some of the disadvantages include having to always mentally translate it
in your head to "filter list".
The worst thing is that 9/10 times developers (me included) would overlook
this nuanced behavior.
Logically, people assume that a subset of a list is a list, which
unfortunately isn't always the case.
They'd forget to implementarray_values()
and cause bugs that are not
immediately detectable.
At least 3-4 of projects in my recent memory experienced this issue, it
got overlooked at code review.
Even automated tests did not catch it as the issue remains hidden when
filtering out the tail end of a list.
After brainstorming with developers, we concluded formalizing the concept
of a "list" more would help.
Having functions array_is_list() and array_filter_list() would make you
think in terms of lists more.
One would use array_filter_list() instead ofarray_filter()
by default
unless they work with assoc arrays.Maybe there could be an even more advanced approach: formalizing the list
data type.
But it would be an overkill to propose a major change like this at this
time.Regards,
Sergii
From: Alexandru PΔtrΔnescu drealecs@gmail.com
Sent: Wednesday, February 1, 2023 10:53 AM
To: Sergii Shymko sergey@shymko.net
Cc: Marco Pivetta ocramius@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoid subtle bugs/workarounds
Hi Marco,
Hi Sergii,
First of all, let me say that I am glad more people are willing to give
back to the PHP language. I'm happy you decided to join the ML.
However, the change you are proposing is unlikely to be well-received. PHP
already has too many functions. It is known as a kitchen sink1.
I don't think we need list-variants of the array functions. As you noted
yourself, it can be simply implemented with array_values()
. If we do this
one then next someone will say what about array_diff()
with a list?
Most of the functionality can be implemented in userland and it's usually
the preferred way. If we are going to add a new function to PHP, it must
have a very good reason to be implemented in C. array_is_list() had a very
good explanation: performance.
Ideally, we would have a separate type for List, but... it can easily be
implemented in userland so we are back to square one. While PHP's arrays
are something of a Frankenstein's monster, they work well as an internal
implementation detail of other data structures.
Adding more array_* functions is something that many of us would want to
avoid.
Regards,
Kamil
From: Kamil Tekiela tekiela246@gmail.com
Sent: Wednesday, February 1, 2023 11:55 AM
To: Sergii Shymko sergey@shymko.net; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] RFC proposal: function array_filter_list() to avoid subtle bugs/workarounds
Hi Sergii,
First of all, let me say that I am glad more people are willing to give back to the PHP language. I'm happy you decided to join the ML.
However, the change you are proposing is unlikely to be well-received. PHP already has too many functions. It is known as a kitchen sink1.
I don't think we need list-variants of the array functions. As you noted yourself, it can be simply implemented with array_values()
. If we do this one then next someone will say what about array_diff()
with a list?
Most of the functionality can be implemented in userland and it's usually the preferred way. If we are going to add a new function to PHP, it must have a very good reason to be implemented in C. array_is_list() had a very good explanation: performance.
Ideally, we would have a separate type for List, but... it can easily be implemented in userland so we are back to square one. While PHP's arrays are something of a Frankenstein's monster, they work well as an internal implementation detail of other data structures.
Adding more array_* functions is something that many of us would want to avoid.
Regards,
Kamil
Hi Kamil,
Thanks for your thoughtful feedback!
The argument about list-flavored versions of other array_* functions is a very strong argument.
I'm glad you're bringing it up, because that aspect worries me a lot as well.
Array functions producing a subset of a list are all logically expected to return a list.
I think, this assumption is only applicable to array_filter()
, array_[u]diff(), and array_[u]intersect().
These are basically all array functions that already have an array_*_assoc() version.
Since we already distinguish assoc arrays, doing the same for lists seems pretty reasonable.
You're right, we can totally anticipate people requesting array_diff_list() and array_intersect_list().
I've got mixed feelings about that, also not a fan of multiplying the number of array functions.
In practice, I've only encountered the need for array_filter_list(), but not others.
Filtering is different from other functions as it operates on a single array instead of multiple arrays.
Maybe it's a sufficient reason to treat it separately from other array functions, maybe not.
Hi
For example:
$originalList = ['first', '', 'last'];
$filteredList = array_filter($originalList);
var_export(filteredList); // array(0 => 'first', 2 => 'last')
var_export(array_is_list($originalList)); // true
var_export(array_is_list($filteredList)); // falseThe behavior is counterintuitive and can lead to subtle bugs, such as encoding issues:
echo json_encode($originalList); // ["first", "", "last"]
echo json_encode($filteredList); // {"0": "first", "2": "last"}The workaround is to post-process the filtered array with
array_values()
to reset the indexes.
The proposal is to introduce a function array_filter_list() that would work solely on lists.
It will have the same signature asarray_filter()
and will always return a valid list.
I agree with the general premise that "array_filter turns lists into
non-lists" is not great and I've encountered the issue myself more than
I would like. However I also agree with the other replies that
array_filter_list() is likely not the correct solution.
I believe that defaults matter and as a developer when I have the choice
between 'array_filter' and 'array_filter_list', the former is likely
going to be the default, because it's shorter. So instead of remembering
to wrap the array_filter()
into array_values()
, I have to remember which
of the two functions I need.
I also believe that non-array iterables should not be a second class
citizen to arrays and would like to point to my email in this mailing
list thread for the "[List/Assoc\unique]" RFC and thus a pretty similar
topic, because it also applies here:
https://externals.io/message/119070#119072
Perhaps the hypothetical "iterable\filter" function could be smart with
regard to the handling of keys? If the input appears to be consecutively
indexed, it is treated as a list and reindexed. Or it could default to
dropping the keys, which I consider the better default, because dropping
keys is immediately obvious when testing, whereas having holes in a
list is not.
Best regards
Tim DΓΌsterhus