On arrays starting with a negative index

8 years ago by mail@pmmaga.net — view source

unread

Hi internals,

I've prepared a PR (https://github.com/php/php-src/pull/2383) that would
change the current behavior of arrays when the first index is a negative
integer.

The main goal is to make the result of array_fill more in line with what
you would expect if the start_index is a negative integer. However, the
current (and historical) result is properly documented on the array_fill
page.

I would like to hear your feedback about this change and if you feel this
warrants an RFC.

Best regards,
Pedro Magalhães

8 years ago by mail@pmmaga.net — view source

unread

Hi internals,

I've prepared a PR (https://github.com/php/php-src/pull/2383) that would
change the current behavior of arrays when the first index is a negative
integer.

The main goal is to make the result of array_fill more in line with what
you would expect if the start_index is a negative integer. However, the
current (and historical) result is properly documented on the array_fill
page.

I would like to hear your feedback about this change and if you feel this
warrants an RFC.

Best regards,
Pedro Magalhães

Bump on this topic as I would like to hear some feedback.

As a clarification, the current implementation of the PR affects arrays in
general, not only array_fill. Any array that starts with a negative index
would continue from that index instead of 0. Meaning that [-2 => true,
true, true] would now return [-2 => true, -1 => true, 0 => true] instead of
[-2 => true, 0 => true, 1 => true].

Regards,
Pedro

8 years ago by Rowan Collins — view source

unread

As a clarification, the current implementation of the PR affects arrays in
general, not only array_fill. Any array that starts with a negative index
would continue from that index instead of 0. Meaning that [-2 => true,
true, true] would now return [-2 => true, -1 => true, 0 => true] instead of
[-2 => true, 0 => true, 1 => true].

Hi Pedro,

Would other behaviour also be affected?

For instance:

$foo = [ -2 => true ];
$foo[] = true;
$foo[] = true;
var_dump($foo);

If so, this is a much wider BC break; if not, why not?

Currently, this is completely consistent: https://3v4l.org/hXAsf Indeed,
I would say this is the missing explanation of why array_fill works the
way it does.

Presumably, this is why there is a note on the manual page:

See also theArrays http://php.net/manual/en/language.types.array.php section of
manual for a detailed explanation of negative keys.

Although as pointed out by Andrew Nester, this "detailed explanation"
doesn't actually seem to exist. This seems to be the closest we get:

As mentioned above, if no key is specified, the maximum of the existing integer indices is taken, and the new key will be that
maximum value plus 1 (but at least 0). If no integer indices exist yet,
the key will be 0 (zero).

Hidden in that first rambling sentence is the key phrase "but at least
0"; and the second should really say "If no positive integer indices..."
[It shouldn't say "exist", either, because as the next paragraph
explains, it can actually be affected by keys that no longer exist;
which I'd never actually realised before!]

This deserves to be a full section rather than a note, and much more
clearly articulated. It might even make sense to split that page into
sub-pages in some way, but I'm drifting into a different discussion, for
a different list.

The point right now is that array_fill's behaviour is actually
consistent with the rest of the language, and while the value of that
behaviour is questionable, changing it now would be a major decision.

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by mail@pmmaga.net — view source

unread

Hi Rowan,

On Thu, Mar 2, 2017 at 11:39 PM, Rowan Collins rowan.collins@gmail.com
wrote:

Would other behaviour also be affected?

For instance:

$foo = [ -2 => true ];
$foo[] = true;
$foo[] = true;
var_dump($foo);

If so, this is a much wider BC break; if not, why not?

It would indeed. Internally, nNextFreeElement is initialized with 0 and
updates on larger values. In the PR (
https://github.com/php/php-src/pull/2383/files#diff-fd78a0a3f78ea28c6907f907f25b908eR798
) I'm allowing it to be set to a negative value if it is the first element.

Currently, this is completely consistent: https://3v4l.org/hXAsf Indeed,
I would say this is the missing explanation of why array_fill works the way
it does.

With this PR all of your examples would behave consistently but with the
negative indexes.

As mentioned above, if no key is specified, the maximum of the existing

integer indices is taken, and the new key will be that

maximum value plus 1 (but at least 0). If no integer indices exist yet,
the key will be 0 (zero).

The "but at least 0" is exactly what this PR would change.

The point right now is that array_fill's behaviour is actually consistent
with the rest of the language, and while the value of that behaviour is
questionable, changing it now would be a major decision.

I completely agree on the impact of this change. But I also believe that
this behaviour would be preferable.

Thank you for the feedback.

Regards,
Pedro

8 years ago by Rowan Collins — view source

unread

Hi Pedro,

Would other behaviour also be affected?

For instance:

$foo = [ -2 => true ];
$foo[] = true;
$foo[] = true;
var_dump($foo);

If so, this is a much wider BC break; if not, why not?
It would indeed. Internally, nNextFreeElement is initialized with 0
and updates on larger values. In the PR
(https://github.com/php/php-src/pull/2383/files#diff-fd78a0a3f78ea28c6907f907f25b908eR798) I'm
allowing it to be set to a negative value if it is the first element.

I haven't built with the patch to check, but have you checked it behaves
correctly with other combinations? For instance [-10 => true, -5 =>
true, true] or ['string_key' => true, -10 => true, true]

I completely agree on the impact of this change. But I also believe
that this behaviour would be preferable.

I broadly agree that the proposed behaviour would be preferable in
itself. Unfortunately, in a language with 20 years of history, and
millions of lines of deployed code, we don't have the luxury of looking
at behaviour in isolation, we have to think about the consequences of
change as well as the end result.

Would you be willing to draft an RFC for this change? If you're not
familiar with the process, have a look at this:
https://blogs.oracle.com/opal/entry/the_mysterious_php_rfc_process

You'll need to make the case for the change, and try to consider all the
angles. For instance:

Exactly what behaviour will change, and how? We already have three
different things that are affected, and there may well be others.
What code might rely on the current behaviour? What kind of bugs might
such code exhibit if it's not updated after the change? How can users
write code that will work the same before and after the change?
On the other side, what are the benefits of the change? What is the
harm caused by the current behaviour, e.g. potential for bugs or mistakes?

This may seem like a lot of work for such a simple patch, but the more
we discuss it, the clearer it becomes that this would be a significant
language change, so we need to make sure it's considered properly.

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by mail@pmmaga.net — view source

unread

Hi Rowan,

On Sat, Mar 4, 2017 at 5:09 PM, Rowan Collins rowan.collins@gmail.com
wrote:

I haven't built with the patch to check, but have you checked it behaves
correctly with other combinations? For instance [-10 => true, -5 => true,
true] or ['string_key' => true, -10 => true, true]

The first example behaves how you would expect with this patch (third
element gets -4).
In the second example however, it would act the "old" way. Thanks for the
hint! I'm working on making it consistent in that case as well.

Would you be willing to draft an RFC for this change? If you're not

familiar with the process, have a look at this:
https://blogs.oracle.com/opal/entry/the_mysterious_php_rfc_process

Yes, I'm already familiar with it. It clearly seems to be the way forward
so I'll create a draft once I'm done with the implementation fixes/further
discussion.

Exactly what behaviour will change, and how? We already have three
different things that are affected, and there may well be others.

If the first numeric index - n - of an array is negative (however that
array is created), the next element (however it is inserted) will not
default to 0, but will have the index n + 1.

What code might rely on the current behaviour? What kind of bugs might
such code exhibit if it's not updated after the change? How can users write
code that will work the same before and after the change?

I have a hard time coming up with a use case for it but basically, any code
relying on the defaulting to 0. Basically, the only safe way I see is that
if you don't set explicit keys, you probably shouldn't try to access array
elements by an explicit key.

On the other side, what are the benefits of the change? What is the harm
caused by the current behaviour, e.g. potential for bugs or mistakes?

IMHO, the benefit is that you naturally expect a subsequent element to get
the n+1 key regardless of n being positive or negative. The current
implementation gives you an effect that you only discover once you hit it
or read the manual page about array_fill or the note you mentioned before
on the page about arrays.

Regards,
Pedro