Hi,
As my previous message seems to have been unnoticed, I send it again.
I need your thoughts on a PR (https://github.com/php/php-src/pull/1431) I am
working on. The subject is to change the way negative string offsets (and
lengths, where applicable) are handled, to make the behavior consistent with
the way they're handled in substr()
and substr_replace()
. If your return is
positive, I'll write the corresponding RFC.
Implemented so far :
-
Support negative string offsets in read mode. Example: $a = $string{-2};
-
Support negative string offsets in assignment mode. Example: $string{-2} =
'z'; -
strpos()
: Accept negative values for the '$offset' argument -
stripos()
: Accept negative values for the '$offset' argument -
substr_count()
: Accept negative values for the '$offset' and '$length'
parameters (same behavior as insubstr()
).
The next step is to work on substr_compare()
. This one brings a new issue :
as there's no default value for the '$length' param, it is impossible to
perform a case-insensitive comparison without setting an explicit length. If
you want to use the default value, you must compute it (strlen($main_str) -
$offset). My first idea was to use 0 as default value, as this is a quite
useless value. But in order to maximize BC, I will propose to use NULL
as
default value for the $length parameter in string functions. This will mean
'up to the end of the string'.
The documentation also needs to be unified because the quality is sometimes
very different between pages for quite similar functions (as an example, see
the difference between http://php.net/manual/en/function.strspn.php and
http://php.net/manual/en/function.strcspn.php).
I also have some questions :
-
Should we keep returning
NULL
on parameter parsing failure, while
documentation only states that false is returned on failure ? -
Would you support unifying error types and messages when offset and/or
length are out-of-bound ? -
In
substr_compare()
, an offset before the start of the string is silently
converted to 0, while the same condition generates a warning in every other
functions. Which behavior do you prefer ? Silently considering negative
string positions as 0, or issue a warning ?
All of this is creating BC breaks. Most of these changes are additions, so
they just authorize values that would have caused errors in previous
versions. Others give a meaning to useless values (like NULL
length). So,
these BC breaks can be considered as limited but they exist. So, what are
your thoughts about including these changes in 7.1 ? (I'd love including
them in 7.0 but I'm afraid it's too late now).
Regards
François
François Laupretre wrote:
I need your thoughts on a PR (https://github.com/php/php-src/pull/1431) I am
working on. The subject is to change the way negative string offsets (and
lengths, where applicable) are handled, to make the behavior consistent with
the way they're handled insubstr()
andsubstr_replace()
. If your return is
positive, I'll write the corresponding RFC.Implemented so far :
Support negative string offsets in read mode. Example: $a = $string{-2};
Support negative string offsets in assignment mode. Example: $string{-2} =
'z';
Generally, I like the idea of being able to specify negative string
offsets. However, I'm usually working with UTF-8 strings, and as such
I'm using string offsets rather seldom.
strpos()
: Accept negative values for the '$offset' argument
stripos()
: Accept negative values for the '$offset' argument
substr_count()
: Accept negative values for the '$offset' and '$length'
parameters (same behavior as insubstr()
).
That may be useful as well, but what about the respective mbstring and
grapheme functions? At least these should be improved as well, but I'm
not sure if that can be done in an efficient manner.
I also have some questions :
- Should we keep returning
NULL
on parameter parsing failure, while
documentation only states that false is returned on failure ?
It is documented[1] that:
| If the parameters given to a function are not what it expects, such
| as passing an array where a string is expected, the return value of
| the function is undefined. In this case it will likely return NULL
| but this is just a convention, and cannot be relied upon.
IMHO it's best to stick with the current behavior, and to consider to
raise exceptions sometime in the future.
[1] http://php.net/manual/en/functions.internal.php
--
Christoph M. Becker