Option for array_column() to preserve keys.

3 years ago by Andreas Hennings — view source

unread

Hello internals,

The function array_column() would be much more useful if there was an
option to preserve the original array keys.
I can create an RFC, but I think it is better to first discuss the options.

This is requested in different places on the web, e.g.
https://stackoverflow.com/questions/27204590/php-array-column-how-to-keep-the-keys/39298759

A workaround is proposed here and elsewhere, using array_keys() and
array_combine() to restore the keys.
However, this workaround not only adds complexity, but it breaks down
if some items don't have the value key. See https://3v4l.org/im2gZ.

A more robust workaround would be array_map(), but this is more
complex and probably slower than array_column(), for the given
purpose.

Some links for your convenience:
The function was introduced in this RFC, https://wiki.php.net/rfc/array_column
It is now documented here,
https://www.php.net/manual/en/function.array-column.php

Some ideas how this could be fixed:

Allow a magic value (e.g. TRUE) for the $index_key parameter, that
would cause the assoc behavior. To fully avoid BC break, this must be
a value that previously was completely forbidden. The value TRUE is
currently only forbidden with strict_types=1. A value of e.g. new
\stdClass is fully forbidden, but would be weird. A constant could be
introduced, but this would not prevent the BC concern.
Make the function preserve keys if $index_key === NULL. This would
be a full BC break.
Add an additional parameter with a boolean option or with integer
flags. This would be weird, because it would make the $index_key
parameter useless.
Add a new function.

Personally I would prefer option 1, with value TRUE (I can't think of
something better).

If I could change history, I would prefer option 2. The current
behavior could still be achieved with array_values(array_column(..)).

Regards,
Andreas

3 years ago by Marco Pivetta — view source

unread

Heyo,

Hello internals,

The function array_column() would be much more useful if there was an
option to preserve the original array keys.
I can create an RFC, but I think it is better to first discuss the options.

New function, please 🙏

3 years ago by Andreas Hennings — view source

unread

Thanks for the feedback so far!

Heyo,

Hello internals,

The function array_column() would be much more useful if there was an
option to preserve the original array keys.
I can create an RFC, but I think it is better to first discuss the options.

New function, please 🙏

I am not opposed. But I am also curious what others think.
What I don't like so much is how the situation with two different
functions will have a "historically grown wtf" smell about it.
But this is perhaps preferable to BC breaks or overly "magic"
parameters or overly crowded signatures.

If we go for a new function:
A name could be array_column_assoc().

array_column_assoc(array $array, string $value_key)

This would behave the same as array_column($array, $value_key), but
preserve original keys.
Items which are not arrays or which lack the key will be omitted.
A $value_key === NULL would be useless, because this would simply
return the original array.

The question is, should it do anything beyond the most obvious?
Or should we leave it minimal for now, with the potential for
additional parameters in the future?

Limitations:
If some items are omitted, it will be awkward to restore the missing
items while preserving the order of the array.

Possible ideas for additional functionality:

Replicate a lot of the behavior of array_column(), e.g. with an
optional $index_key parameter. This would be mostly redundant.
Additional functionality for nested arrays?
Fallback value for entries that don't have the key? Or perhaps even
a fallback callback like with array_map()?
Option to capture missing entries e.g. in a by-reference variable?

A benefit of keeping the limited functionality would be that
programming errors are revealed more easily due to the strict
signature.

A question is how we would look at this long term:
Do we want both functions to co-exist long-term, or do we want to
deprecate one of them at some point?
If array_column() is going to stay, then array_column_assoc() only
needs to cover the few use cases that are missing.

-- Andreas

3 years ago by Andreas Hennings — view source

unread

Thanks for the feedback so far!

Heyo,

Hello internals,

The function array_column() would be much more useful if there was an
option to preserve the original array keys.
I can create an RFC, but I think it is better to first discuss the options.

New function, please 🙏

I am not opposed. But I am also curious what others think.
What I don't like so much is how the situation with two different
functions will have a "historically grown wtf" smell about it.
But this is perhaps preferable to BC breaks or overly "magic"
parameters or overly crowded signatures.

If we go for a new function:
A name could be array_column_assoc().

array_column_assoc(array $array, string $value_key)

This would behave the same as array_column($array, $value_key), but
preserve original keys.
Items which are not arrays or which lack the key will be omitted.
A $value_key === NULL would be useless, because this would simply
return the original array.

The question is, should it do anything beyond the most obvious?
Or should we leave it minimal for now, with the potential for
additional parameters in the future?

Limitations:
If some items are omitted, it will be awkward to restore the missing
items while preserving the order of the array.

Possible ideas for additional functionality:

Replicate a lot of the behavior of array_column(), e.g. with an
optional $index_key parameter. This would be mostly redundant.

Additional functionality for nested arrays?

Fallback value for entries that don't have the key? Or perhaps even
a fallback callback like with array_map()?

Option to capture missing entries e.g. in a by-reference variable?

A benefit of keeping the limited functionality would be that
programming errors are revealed more easily due to the strict
signature.

A question is how we would look at this long term:
Do we want both functions to co-exist long-term, or do we want to
deprecate one of them at some point?
If array_column() is going to stay, then array_column_assoc() only
needs to cover the few use cases that are missing.

-- Andreas

If we want to support nested array structures, it could work like this:

NOTE: We actually don't need to squeeze this into array_column_assoc().
We could easily introduce a 3rd function instead, e.g.
array_column_recursive(), if/when we want to have this in the future.
I am only posting this so that we get an idea about the surrounding
design space.

$source['a']['b']['x']['c']['y'] = 5;
$expected['a']['b']['c'] = 5;
assert($expected === array_column_assoc($source, [null, null, 'x', null, 'y']));

Note the first NULL, which only exists to make the system feel more "complete".
This could be useful if the array is coming from a function call.
The following examples show this:

unset($source, $expected); // (reset vars)
$source['a']['x']['b'] = 5;
$expected['a']['b'] = 5;
assert($expected === array_column_assoc($source, [null, 'x']));
assert($expected === array_column_assoc($source, 'x'));

unset($source, $expected); // (reset vars)
$source['x']['a'] = 5;
$expected['a'] = 5;
assert($expected === array_column_assoc($source, ['x']));
assert($expected === $source['x'] ?? []);

Trailing NULLs do almost nothing, except to ensure that non-arrays are
removed from the tree.
I would have to think more about the details, but I think it would
work like this:

unset($source, $expected); // (reset vars)
$source['a0']['b'] = 5;
$source['a1'] = 5;
$expected = $actual;
assert($expected === array_column_assoc($source, []));
assert($expected === array_column_assoc($source, [null]));
unset($expected['a1']);
assert($expected === array_column_assoc($source, [null, null]));
unset($expected['a0']);
assert($expected === array_column_assoc($source, [null, null]));

Another idea could be to "collapse" array levels, using a magic value
other than NULL, that does not work as an array key.

unset($source, $expected); // (reset vars)
$source['a0']['b0'] = 5;
$source['a1']['b1'] = 5;
$expected['b0'] = 5;
$expected['b1'] = 5;
assert($expected === array_column_assoc($source, [false]));
unset($expected);
$expected['a0'] = 5;
$expected['a1'] = 5;
assert($expected === array_column_assoc($source, [null, false]));

-- Andreas

3 years ago by Andreas Hennings — view source

unread

$source['a0']['b01'] = 5;On Wed, 8 Sept 2021 at 16:48, Andreas
Hennings andreas@dqxtech.net wrote:

Thanks for the feedback so far!

Heyo,

Hello internals,

The function array_column() would be much more useful if there was an
option to preserve the original array keys.
I can create an RFC, but I think it is better to first discuss the options.

New function, please 🙏

I am not opposed. But I am also curious what others think.
What I don't like so much is how the situation with two different
functions will have a "historically grown wtf" smell about it.
But this is perhaps preferable to BC breaks or overly "magic"
parameters or overly crowded signatures.

If we go for a new function:
A name could be array_column_assoc().

array_column_assoc(array $array, string $value_key)

This would behave the same as array_column($array, $value_key), but
preserve original keys.
Items which are not arrays or which lack the key will be omitted.
A $value_key === NULL would be useless, because this would simply
return the original array.

The question is, should it do anything beyond the most obvious?
Or should we leave it minimal for now, with the potential for
additional parameters in the future?

Limitations:
If some items are omitted, it will be awkward to restore the missing
items while preserving the order of the array.

Possible ideas for additional functionality:

Replicate a lot of the behavior of array_column(), e.g. with an
optional $index_key parameter. This would be mostly redundant.

Additional functionality for nested arrays?

Fallback value for entries that don't have the key? Or perhaps even
a fallback callback like with array_map()?

Option to capture missing entries e.g. in a by-reference variable?

A benefit of keeping the limited functionality would be that
programming errors are revealed more easily due to the strict
signature.

A question is how we would look at this long term:
Do we want both functions to co-exist long-term, or do we want to
deprecate one of them at some point?
If array_column() is going to stay, then array_column_assoc() only
needs to cover the few use cases that are missing.

-- Andreas

If we want to support nested array structures, it could work like this:

NOTE: We actually don't need to squeeze this into array_column_assoc().
We could easily introduce a 3rd function instead, e.g.
array_column_recursive(), if/when we want to have this in the future.
I am only posting this so that we get an idea about the surrounding
design space.

$source['a']['b']['x']['c']['y'] = 5;
$expected['a']['b']['c'] = 5;
assert($expected === array_column_assoc($source, [null, null, 'x', null, 'y']));

Note the first NULL, which only exists to make the system feel more "complete".
This could be useful if the array is coming from a function call.
The following examples show this:

unset($source, $expected); // (reset vars)
$source['a']['x']['b'] = 5;
$expected['a']['b'] = 5;
assert($expected === array_column_assoc($source, [null, 'x']));
assert($expected === array_column_assoc($source, 'x'));

unset($source, $expected); // (reset vars)
$source['x']['a'] = 5;
$expected['a'] = 5;
assert($expected === array_column_assoc($source, ['x']));
assert($expected === $source['x'] ?? []);

Trailing NULLs do almost nothing, except to ensure that non-arrays are
removed from the tree.
I would have to think more about the details, but I think it would
work like this:

unset($source, $expected); // (reset vars)
$source['a0']['b'] = 5;
$source['a1'] = 5;
$expected = $actual;
assert($expected === array_column_assoc($source, []));
assert($expected === array_column_assoc($source, [null]));
unset($expected['a1']);
assert($expected === array_column_assoc($source, [null, null]));
unset($expected['a0']);
assert($expected === array_column_assoc($source, [null, null]));

Another idea could be to "collapse" array levels, using a magic value
other than NULL, that does not work as an array key.

unset($source, $expected); // (reset vars)
$source['a0']['b0'] = 5;
$source['a1']['b1'] = 5;
$expected['b0'] = 5;
$expected['b1'] = 5;
assert($expected === array_column_assoc($source, [false]));
unset($expected);
$expected['a0'] = 5;
$expected['a1'] = 5;
assert($expected === array_column_assoc($source, [null, false]));

-- Andreas

Another option to support nested arrays, but simpler.
Some of the functionality I proposed earlier now needs multiple calls,
but I think this is fine.

New signature:
function array_column_assoc(array $source, string $value_key, int $level = 1);

unset($source, $expected, $expected2); // (reset vars)
$source['a']['b']['x']['c']['y'] = 5;
$expected['a']['b']['c']['y'] = 5;
assert($expected === array_column_assoc($source, 'x', 2));
$expected2['a']['b']['c'] = 5;
assert($expected2 === array_column_assoc($expected, 'y', 3));

To collapse array levels, we could introduce a separate function.
Similar to array_merge(), but preserving all keys.

unset($source, $expected); // (reset vars)
$source['a0']['b01'] = '0.01';
$source['a0']['b02'] = '0.02';
$source['a1']['b1'] = '1.1';
$expected['b01'] = '0.01;
$expected['b02'] = '0.02';
$expected['b1'] = '1.1';
assert($expected === array_collapse($source, 0));
unset($expected);
$expected['a0'] = '0.01';
$expected['a1'] = '1.1';
assert($expected === array_collapse($source, 1));

3 years ago by Guilliam Xavier — view source

unread

Yes please! This has been requested multiple times, for instance:

Regards,

--
Guilliam Xavier

3 years ago by Ben Ramsey — view source

unread

Andreas Hennings wrote on 9/7/21 19:19:

Hello internals,

The function array_column() would be much more useful if there was an
option to preserve the original array keys.
I can create an RFC, but I think it is better to first discuss the options.

This is requested in different places on the web, e.g.
https://stackoverflow.com/questions/27204590/php-array-column-how-to-keep-the-keys/39298759

A workaround is proposed here and elsewhere, using array_keys() and
array_combine() to restore the keys.
However, this workaround not only adds complexity, but it breaks down
if some items don't have the value key. See https://3v4l.org/im2gZ.

A more robust workaround would be array_map(), but this is more
complex and probably slower than array_column(), for the given
purpose.

Some links for your convenience:
The function was introduced in this RFC, https://wiki.php.net/rfc/array_column
It is now documented here,
https://www.php.net/manual/en/function.array-column.php

Some ideas how this could be fixed:

Allow a magic value (e.g. TRUE) for the $index_key parameter, that
would cause the assoc behavior. To fully avoid BC break, this must be
a value that previously was completely forbidden. The value TRUE is
currently only forbidden with strict_types=1. A value of e.g. new
\stdClass is fully forbidden, but would be weird. A constant could be
introduced, but this would not prevent the BC concern.

Make the function preserve keys if $index_key === NULL. This would
be a full BC break.

Add an additional parameter with a boolean option or with integer
flags. This would be weird, because it would make the $index_key
parameter useless.

Add a new function.

Personally I would prefer option 1, with value TRUE (I can't think of
something better).

If I could change history, I would prefer option 2. The current
behavior could still be achieved with array_values(array_column(..)).

Regards,
Andreas

We originally had a patch for this while PHP 5.5 was still in beta, but
we decided against merging it, and I can't remember why. :-)

https://github.com/php/php-src/pull/331

Cheers,
Ben

3 years ago by Ben Ramsey — view source

unread

Ben Ramsey wrote on 9/8/21 16:31:> We originally had a patch for this
while PHP 5.5 was still in beta, but

we decided against merging it, and I can't remember why. :-)

This looks like part of the thread. I'm not sure where the rest of it
is: https://externals.io/message/67113

Cheers,
Ben