Hello internals,
While working on analysing the impact of the changes proposed by amending
the behaviour of the increment and decrement operators (
https://wiki.php.net/rfc/saner-inc-dec-operators) I discovered that the
range()
function has some rather lax behaviour that is very unintuitive.
I therefore propose the "Define proper semantics for range()
function" RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semantics
The change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.
Best regards
George P. Banyard
Am 28.03.2023 um 00:36 schrieb G. P. B. george.banyard@gmail.com:
I therefore propose the "Define proper semantics for
range()
function" RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semanticsThe change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.
I think it makes sense to clean up the range()
function, thanks!
There are two cases I would handle differently:
- I'm not sure why a negative step for $start > $end is considered wrong, I consider range(10, 0, -2) at least as logical/readable as using a positive decrement of 2. Not requiring a sign for steps seems weirder to me but that's something we cannot change. BUT if it is the result of a calculation it seems wrong to require an
abs()
around it. I do see the reason for a warning/error when $start < $end and $step < 0. - Values of '' or null in integer context (e.g. range(null, 10, 2)) should IMHO emit a warning first, not directly be changed to a TypeError. The usual BC / migration concern :-)
Regards,
- Chris
On Tue, 28 Mar 2023 at 08:19, Christian Schneider cschneid@cschneid.com
wrote:
Am 28.03.2023 um 00:36 schrieb G. P. B. george.banyard@gmail.com:
I therefore propose the "Define proper semantics for
range()
function"
RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semanticsThe change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.I think it makes sense to clean up the
range()
function, thanks!There are two cases I would handle differently:
- I'm not sure why a negative step for $start > $end is considered wrong,
I consider range(10, 0, -2) at least as logical/readable as using a
positive decrement of 2. Not requiring a sign for steps seems weirder to me
but that's something we cannot change. BUT if it is the result of a
calculation it seems wrong to require anabs()
around it. I do see the
reason for a warning/error when $start < $end and $step < 0.
Considering the only other programming language that I know of that has a
range()
function that accepts a step argument is Python, and its behaviour
is IMHO worse.
For increasing ranges it requires a positive step, and if not just
generates an empty range.
For decreasing ranges it requires a negative step, and if not just
generates an empty range (this applies even if using the default step value
of 1 which is bonkers).
Making it a requirement to pass a negative step is definitely out of the
question.
Making it okay to use negative steps only for decreasing ranges could be
sensible, but we check for the step parameter way before we look into the
boundary values because those are different for int, float and string
boundaries.
Moreover, I personally find it weirder to require a sign for negative steps
as for me a step is something that must be positive, and at least Kotlin
seems to somewhat agree with me looking around at
https://kotlinlang.org/docs/ranges.html#progression and playing with the
source code
Namely:
for (i in 4 downTo 1 step 2) print(i)
42
for (i in 4 downTo 1 step -2) print(i)
Exception in thread "main" java.lang.IllegalArgumentException: Step must
be positive, was: -2.
- Values of '' or null in integer context (e.g. range(null, 10, 2)) should
IMHO emit a warning first, not directly be changed to a TypeError. The
usual BC / migration concern :-)
When null is used, no TypeError is emitted, just the "usual" null to scalar
deprecation that happens since
https://wiki.php.net/rfc/deprecate_null_to_scalar_internal_arg got accepted.
But I'll add an example to the RFC and a test case in the PR.
Trying to figure out if an empty string was used with another string
boundary is tedious, as this information needs to somehow get carried
around.
A previous iteration of the PR used to convert empty strings to 0 with a
warning, but considering the analysis I decide to just make this a
ValueError as it doesn't seem that empty strings are actually used in
practice.
But this is an easy revert, and I'm not really bound to this decision.
Best regards,
George P. Banyard
Am 28.03.2023 um 14:42 schrieb G. P. B. george.banyard@gmail.com:
There are two cases I would handle differently:
- I'm not sure why a negative step for $start > $end is considered wrong, I consider range(10, 0, -2) at least as logical/readable as using a positive decrement of 2. Not requiring a sign for steps seems weirder to me but that's something we cannot change. BUT if it is the result of a calculation it seems wrong to require an
abs()
around it. I do see the reason for a warning/error when $start < $end and $step < 0.Considering the only other programming language that I know of that has a
range()
function that accepts a step argument is Python, and its behaviour is IMHO worse.
For increasing ranges it requires a positive step, and if not just generates an empty range. For decreasing ranges it requires a negative step, and if not just generates an empty range (this applies even if using the default step value of 1 which is bonkers).Making it a requirement to pass a negative step is definitely out of the question.
Making it okay to use negative steps only for decreasing ranges could be sensible, but we check for the step parameter way before we look into the boundary values because those are different for int, float and string boundaries.
Moreover, I personally find it weirder to require a sign for negative steps as for me a step is something that must be positive
I quickly checked our own codebase and there is indeed one instance of
range($last, 0, -1)
which was not written by me so there is at least one more person who found this logical ;-)
- Values of '' or null in integer context (e.g. range(null, 10, 2)) should IMHO emit a warning first, not directly be changed to a TypeError. The usual BC / migration concern :-)
Trying to figure out if an empty string was used with another string boundary is tedious, as this information needs to somehow get carried around.
A previous iteration of the PR used to convert empty strings to 0 with a warning, but considering the analysis I decide to just make this a ValueError as it doesn't seem that empty strings are actually used in practice.
But this is an easy revert, and I'm not really bound to this decision.
Even though this might be a bit cumbersome at the moment I think this would be an important transition step as currently
range('', 10, 2);
returns the (somewhat) expected result of 0, 2, 4, 6, 8, 10 so I'd be in favor of first giving a warning and then changing it to a ValueError.
- Chris
Hi
On Tue, 28 Mar 2023 at 08:19, Christian Schneider cschneid@cschneid.com
wrote:Am 28.03.2023 um 00:36 schrieb G. P. B. george.banyard@gmail.com:
I therefore propose the "Define proper semantics for
range()
function"
RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semanticsThe change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.I think it makes sense to clean up the
range()
function, thanks!There are two cases I would handle differently:
- I'm not sure why a negative step for $start > $end is considered wrong,
I consider range(10, 0, -2) at least as logical/readable as using a
positive decrement of 2. Not requiring a sign for steps seems weirder to me
but that's something we cannot change. BUT if it is the result of a
calculation it seems wrong to require anabs()
around it. I do see the
reason for a warning/error when $start < $end and $step < 0.Considering the only other programming language that I know of that has a
range()
function that accepts a step argument is Python, and its behaviour
is IMHO worse.
For increasing ranges it requires a positive step, and if not just
generates an empty range.
For decreasing ranges it requires a negative step, and if not just
generates an empty range (this applies even if using the default step value
of 1 which is bonkers).Making it a requirement to pass a negative step is definitely out of the
question.
Making it okay to use negative steps only for decreasing ranges could be
sensible, but we check for the step parameter way before we look into the
boundary values because those are different for int, float and string
boundaries.
Moreover, I personally find it weirder to require a sign for negative steps
as for me a step is something that must be positive, and at least Kotlin
seems to somewhat agree with me looking around at
https://kotlinlang.org/docs/ranges.html#progression and playing with the
source code
Namely:
for (i in 4 downTo 1 step 2) print(i)42
for (i in 4 downTo 1 step -2) print(i)
Exception in thread "main" java.lang.IllegalArgumentException: Step must
be positive, was: -2.
- Values of '' or null in integer context (e.g. range(null, 10, 2)) should
IMHO emit a warning first, not directly be changed to a TypeError. The
usual BC / migration concern :-)When null is used, no TypeError is emitted, just the "usual" null to scalar
deprecation that happens since
https://wiki.php.net/rfc/deprecate_null_to_scalar_internal_arg got accepted.
But I'll add an example to the RFC and a test case in the PR.Trying to figure out if an empty string was used with another string
boundary is tedious, as this information needs to somehow get carried
around.
A previous iteration of the PR used to convert empty strings to 0 with a
warning, but considering the analysis I decide to just make this a
ValueError as it doesn't seem that empty strings are actually used in
practice.
But this is an easy revert, and I'm not really bound to this decision.Best regards,
George P. Banyard
I like the RFC in general, just this negative $step parameter warning stood out to me.
While I agree that requiring a negative $step for a decreasing range is a bit silly,
I've always found it intuitive that a negative $step should be used for a decreasing range.
I'm not saying that it should be required, I'm just concerned about the BC break of emitting E_WARNING
and breaking people's intuition.
The reasoning behind this is that range($start, $end, $step) (for me at least) has always been symmetric and consistent with a for loop:
$array = [];
for ($i = $start; $i < $end (or >); $i += $step) { $array[] = $i; }
So by allowing a negative step the behaviour is consistent with how for loops behave, and avoids a BC break.
That Kotlin doesn't require it makes sense for me because the words "down to" combined with a negative step indeed don't make sense.
The negativity is in a sense already embedded by explicitly writing "down to". It's less explicit for PHP's range function.
I looked at the source code of some of my projects and I do see some occurrences of range($high, $low, -1 (or another negative value)).
Just wanted to chime in quickly to state my thoughts on this.
Thanks.
Kind regards
Niels
Hi
I therefore propose the "Define proper semantics for
range()
function" RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semantics
The "ASCII code point range" example is confusing, because it is a
decreasing range. However decreasing ranges are only introduced in the
next example.
Best regards
Tim Düsterhus
Hi
I therefore propose the "Define proper semantics for
range()
function"
RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semanticsThe "ASCII code point range" example is confusing, because it is a
decreasing range. However decreasing ranges are only introduced in the
next example.Best regards
Tim Düsterhus
I've reordered the example and added a more descriptive ASCII code point
range example first!
I've also added an example which highlights the behaviour with null and how
it would emits deprecation notices.
Best regards,
George P. Banyard
Hello internals,
While working on analysing the impact of the changes proposed by amending
the behaviour of the increment and decrement operators (
https://wiki.php.net/rfc/saner-inc-dec-operators) I discovered that the
range()
function has some rather lax behaviour that is very unintuitive.I therefore propose the "Define proper semantics for
range()
function" RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semanticsThe change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.
Unlike your changes to the increment operator, I'd love to see this
rationalisation put in place, though like many here I don't see problems
with using a negative step with decreasing ranges, but would consider it
strange for increasing ranges. And I do want to see some
case-consistency when working with string ranges.
I'd love to see it taken a stage (or two) further; returning an iterable
rather than an array (although that would be a bc break); and working
with strings (ASCII only) in the same way that the increment operator
does, so that range('A', 'IV') would be valid, and return Z
then AA
,
AZ
then BA
, etc.
I am slightly surprised that you make no mention of the odd behaviour of
mixed alphameric strings, e.g. var_dump(range('A1', 'C5')) which returns
a purely alpha array 'A' to 'C'; or var_dump(range('3c', '5e')) which
returns numeric (3, 4, 5); or var_dump(range('1', '1e2')) which treates
1e2
as scientific and returns 1..100.
--
Mark Baker
Hello internals,
While working on analysing the impact of the changes proposed by amending
the behaviour of the increment and decrement operators (
https://wiki.php.net/rfc/saner-inc-dec-operators) I discovered that the
range()
function has some rather lax behaviour that is very unintuitive.I therefore propose the "Define proper semantics for
range()
function"
RFC
to address the unintuitive behaviour that sees no usage and/or hide bugs:
https://wiki.php.net/rfc/proper-range-semanticsThe change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.Unlike your changes to the increment operator, I'd love to see this
rationalisation put in place, though like many here I don't see problems
with using a negative step with decreasing ranges, but would consider it
strange for increasing ranges.
I still find it somewhat odd, but this is not a hill I'm going to die on.
I've changed the behaviour to throw a ValueError if a negative step is
provided with increasing range and accept negative steps for decreasing
ranges.
Furthermore, I've also made passing an empty string an E_WARNING
with a
cast to 0, same as the current behaviour.
See new version:
https://wiki.php.net/rfc/proper-range-semantics
And I do want to see some
case-consistency when working with string ranges.I'd love to see it taken a stage (or two) further; returning an iterable
rather than an array (although that would be a bc break); and working
with strings (ASCII only) in the same way that the increment operator
does, so that range('A', 'IV') would be valid, and returnZ
thenAA
,
AZ
thenBA
, etc.
Frankly I was also surprised that the behaviour with strings was to do an
ASCII code point increment.
As I would agree that range("Y", "AC") returning ["Y", "Z", "AA", "AB",
"AC"] would have been more intuitive than the silently discarding
everything past the 1st byte.
However, I don't think there is much point in breaking BC to return a
possible generator or fix the unfortunate string behaviour.
I would rather that PHP creates dedicated syntax to creates ranges (e.g.
$s..$e seems to be what most other programming languages settles on,
although it might be slightly confused as concatenation) à la Ruby which
allows objects that implement certain methods to also be used to generate
ranges.
This is IMHO way more powerful as it would allow the creation of Date
ranges or other custom ranges.
And part of this proposal could be to support the aforementioned
alphabetical string ranges natively without needing to break BC on range()
and let this function just fade away into obscurity.
There is also this C++ talk from over a decade ago that argues that Ranges
are better than iterator, so this might be an additional motivation as to
why we would want this:
https://accu.org/conf-docs/PDFs_2009/AndreiAlexandrescu_iterators-must-go.pdf
I am slightly surprised that you make no mention of the odd behaviour of
mixed alphameric strings, e.g. var_dump(range('A1', 'C5')) which returns
a purely alpha array 'A' to 'C'; or var_dump(range('3c', '5e')) which
returns numeric (3, 4, 5); or var_dump(range('1', '1e2')) which treates
1e2
as scientific and returns 1..100.
Because I didn't think of this and was just well usual numeric string
behaviour or non-numeric string behaviour that truncates the string.
But that range('3c', '5e') is the only way to get an array of digits as
strings, and it makes me want to shout into the abyss.
I'm not sure it super worth to mention those cases, but I can add examples
of this to the RFC after crying about the even more insane behaviour
range()
currently has.
Best regards,
George P. Banyard
I've changed the behaviour to throw a ValueError if a negative step is
provided with increasing range and accept negative steps for decreasing
ranges.
I am not sure this is better. This would introduce a BC break because now
it's not as easy to avoid the error as you can't just wrap the variable in
abs()
call.
I am not sure why we are even treating this as an error. It looks to me
like PHP already copes with this whether it's a positive or negative
number. If we just want to inform the user that the sign has no meaning, we
can raise a Notice and leave it at that. It's not wrong that the step is
negative, it's just a pointless sign.
While working on analysing the impact of the changes proposed by
amending the behaviour of the increment and decrement operators (
https://wiki.php.net/rfc/saner-inc-dec-operators) I discovered that
therange()
function has some rather lax behaviour that is very
unintuitive.I therefore propose the "Define proper semantics for
range()
function"
RFC to address the unintuitive behaviour that sees no usage and/or
hide bugs: https://wiki.php.net/rfc/proper-range-semantics
| If $step is a float but is compatible with int interpret it as an
| integer.
I guess you mean with that "the fraction is '.0'" and "-2^52 < number <
2^52" ? What is also going to be playing up is the range of $start and
$end itself.
| Introduce and use a proper ZPP check for int|float|string $start and
| $end parameters, this will cause TypeErrors to be thrown when passing
| objects, resources, and arrays to range()
.
I am not sure whether it wise to disallow resources. These are often
file descriptors and having a range on those could make sense? Not
fussed much either way though.
| Throw a ValueError when passing a negative $step for increasing
| ranges.
I think that's a BC break that is not worthy of making.
| Emit an E_WARNING
when $start or $end has more than one byte.
Surely only if the string doesn't represent an int or float?
| var_dump(range(null, 2));
I think that safely can be interpreted as range(0, 2) — I wouldn't throw
a deprecation warning for that... but then of course, we do already have
a similar (deprecation) warning for arguments.
The change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.
| Target Version: PHP 8.3
In my opinion that's not a good thing. There are a fair amount of BC
breaks (beyond the one that I oppose to with nagative steps throwing an
ValyeError), without first having deprecations. IMO, the actual BC
breaks here should become deprecations in 8.X and BC can only be really
broken in 9.0.
cheers,
Derick
While working on analysing the impact of the changes proposed by
amending the behaviour of the increment and decrement operators (
https://wiki.php.net/rfc/saner-inc-dec-operators) I discovered that
therange()
function has some rather lax behaviour that is very
unintuitive.I therefore propose the "Define proper semantics for
range()
function"
RFC to address the unintuitive behaviour that sees no usage and/or
hide bugs: https://wiki.php.net/rfc/proper-range-semantics| If $step is a float but is compatible with int interpret it as an
| integer.I guess you mean with that "the fraction is '.0'" and "-2^52 < number <
2^52" ? What is also going to be playing up is the range of $start and
$end itself.
Yes, this is what I mean by "compatible with int".
| Introduce and use a proper ZPP check for int|float|string $start and
| $end parameters, this will cause TypeErrors to be thrown when passing
| objects, resources, and arrays torange()
.I am not sure whether it wise to disallow resources. These are often
file descriptors and having a range on those could make sense? Not
fussed much either way though.
This is not how file/stream resources work. The resource number is not
correlated to the file descriptor.
The only cases where this is the case are the predefined streams STDIN,
STDOUT, and STDERR.
This is part of the reason for needing a new function like the
file_descriptor() function I'm proposing. [1]
| Throw a ValueError when passing a negative $step for increasing
| ranges.I think that's a BC break that is not worthy of making.
There is no impact shown, and it doesn't make sense to pass a negative
$step.
This can be changed to a deprecation but considering there is no impact I
don't see why we should wait.
| Emit an E_WARNING
when $start or $end has more than one byte.
Surely only if the string doesn't represent an int or float?
Yes, this was implied. As we are only concerned about the "pure string"
case here.
But I have clarified this.
| var_dump(range(null, 2));
I think that safely can be interpreted as range(0, 2) — I wouldn't throw
a deprecation warning for that... but then of course, we do already have
a similar (deprecation) warning for arguments.
The deprecation is just a consequence of finally using ZPP, thus invoking
the previously accepted RFC.
The change propose to throw TypeErrors and ValueErrors for case where I
couldn't find occurrences in the wild and hide bugs, and emit some
E_WARNINGs for cases that are hard to detect via static analysis.| Target Version: PHP 8.3
In my opinion that's not a good thing. There are a fair amount of BC
breaks (beyond the one that I oppose to with nagative steps throwing an
ValyeError), without first having deprecations. IMO, the actual BC
breaks here should become deprecations in 8.X and BC can only be really
broken in 9.0.
Disagree, the implementation is already complicated, and trying to support
the current absurd behaviour is impractical, especially as no impact has
been found.
Best regards,
George P. Banyard
Hello Internals,
I plan to put the RFC to a vote tomorrow in its current state, which has
not been changed since the 30th of March:
https://wiki.php.net/rfc/proper-range-semantics
Any final comments or complaints should be raised now.
Best regards,
George P. Banyard
Hello!
I have one concern about the part:
Emit an
E_WARNING
when $start or $end is cast to an integer because the
other boundary input is a number or numeric string. (e.g. range
http://www.php.net/range('5', 'z'); or range http://www.php.net/range(5,
'z');)
Doesn't it limit the functionality of the function for the numbers as
characters? Currently when we call range('/','z') we get full range of
characters. https://onlinephp.io/c/9cb12
But when we change argument $start to next character which is zero ('0')
then we get array with only one element. https://onlinephp.io/c/a0cda
Casting numerical string in this function may be confusing.
Sorry for making fuss just before voting, but didn't see this topic before
and wanted to share my insights with you thinking it may be relevant.
Kind regards,
Jorg
Hello Internals,
I plan to put the RFC to a vote tomorrow in its current state, which has
not been changed since the 30th of March:
https://wiki.php.net/rfc/proper-range-semanticsAny final comments or complaints should be raised now.
Best regards,
George P. Banyard
Hello!
I have one concern about the part:
Emit an
E_WARNING
when $start or $end is cast to an integer because the
other boundary input is a number or numeric string. (e.g. range
http://www.php.net/range('5', 'z'); or range http://www.php.net/range(
5, 'z');)Doesn't it limit the functionality of the function for the numbers as
characters? Currently when we call range('/','z') we get full range of
characters. https://onlinephp.io/c/9cb12But when we change argument $start to next character which is zero ('0')
then we get array with only one element. https://onlinephp.io/c/a0cdaCasting numerical string in this function may be confusing.
Sorry for making fuss just before voting, but didn't see this topic before
and wanted to share my insights with you thinking it may be relevant.
No worries, this is the point of giving a heads-up.
Someone else brought this to my attention again as well.
And the concern makes sense, I've updated the implementation and RFC to
adjust the behaviour with string digits:
https://wiki.php.net/rfc/proper-range-semantics
Please let me know if this addresses the issue and is also clear.
Best regards,
George P. Banyard
Thank you. That makes sense. I have last question about case with integer
and string digit, i.e. range('5', 10) or range('1', 9). What would be in
this case expected output? I couldn't find any test cases covering this
example.
Kind regards,
Jorg
Hello!
I have one concern about the part:
Emit an
E_WARNING
when $start or $end is cast to an integer because
the other boundary input is a number or numeric string. (e.g. range
http://www.php.net/range('5', 'z'); or range http://www.php.net/range
(5, 'z');)Doesn't it limit the functionality of the function for the numbers as
characters? Currently when we call range('/','z') we get full range of
characters. https://onlinephp.io/c/9cb12But when we change argument $start to next character which is zero ('0')
then we get array with only one element. https://onlinephp.io/c/a0cdaCasting numerical string in this function may be confusing.
Sorry for making fuss just before voting, but didn't see this topic
before and wanted to share my insights with you thinking it may be relevant.No worries, this is the point of giving a heads-up.
Someone else brought this to my attention again as well.
And the concern makes sense, I've updated the implementation and RFC to
adjust the behaviour with string digits:
https://wiki.php.net/rfc/proper-range-semanticsPlease let me know if this addresses the issue and is also clear.
Best regards,
George P. Banyard
Thank you. That makes sense. I have last question about case with integer
and string digit, i.e. range('5', 10) or range('1', 9). What would be in
this case expected output? I couldn't find any test cases covering this
example.
I've added test cases, but the RFC already mentions that those numeric
strings would be interpreted as integers and thus generate a list of ints.
Best regards,
George P. Banyard
PS: Just an etiquette reminder to bottom post instead of top posting to not
carry around the previous bits of emails (see
https://github.com/Danack/RfcCodex/blob/master/etiquette/mailing_list.md#why-should-i-place-my-response-below-the-quoted-text
)
Hello Internals,
Round 2, I'm planning to open voting tomorrow, the only change being is
handling of string digits to behave like doing an ASCII string range:
https://wiki.php.net/rfc/proper-range-semantics
Any final comments or complaints should be raised now.
Best regards,
George P. Banyard