Greetings internals,
I'm here to get internals's opinion about allowing a string offset to be
replaced with arbitrary strings.
Since forever strings with more then one byte have been truncated silently
to one byte. The case with empty string kinda was existent but had a
buggy behaviour (see bug 71572 [1]) and as such has been turned into a
warning in PHP 7.1.0, and is meant to be promoted to an Error Exception per
the Reclassifying engine warnings RFC. [2]
An illustration of both these cases is available on 3v4l.org [3]
https://3v4l.org/O0nEM
I've got an implementation ready as a pull request on GitHub [4] (which
still needs some polishing for it to make CI happy and fix the various
leaks).
However, the question is if this behaviour is desirable. Moreover, the
silent truncation of strings with more than one byte should be changed to
the same severity as the empty string case; i.e. an Error Exception. This
seems reasonable due to the possible loss of data.
What ever the decision is, a BC break is to be expected. As code which
inadvertently tries to assign multiple bytes to an offset will know have
all those bytes in the string whereas before only the first one was used to
replace the byte at the designated offset. On the other hand the
introduction of an Error Exception is obviously a BC break but as it points
out to some kind of logic error.
I would assume the impact to be minimal for both of these case.
Any opinions?
Best regards
George Peter Banyard
[1] https://bugs.php.net/bug.php?id=71572
[2] https://wiki.php.net/rfc/engine_warnings
[3] https://3v4l.org/O0nEM
[4] https://github.com/php/php-src/pull/5013
Le 16 déc. 2019 à 10:58, George Peter Banyard girgias@php.net a écrit :
Greetings internals,
I'm here to get internals's opinion about allowing a string offset to be
replaced with arbitrary strings.Since forever strings with more then one byte have been truncated silently
to one byte. The case with empty string kinda was existent but had a
buggy behaviour (see bug 71572 [1]) and as such has been turned into a
warning in PHP 7.1.0, and is meant to be promoted to an Error Exception per
the Reclassifying engine warnings RFC. [2]
An illustration of both these cases is available on 3v4l.org [3]
https://3v4l.org/O0nEMI've got an implementation ready as a pull request on GitHub [4] (which
still needs some polishing for it to make CI happy and fix the various
leaks).However, the question is if this behaviour is desirable. Moreover, the
silent truncation of strings with more than one byte should be changed to
the same severity as the empty string case; i.e. an Error Exception. This
seems reasonable due to the possible loss of data.What ever the decision is, a BC break is to be expected. As code which
inadvertently tries to assign multiple bytes to an offset will know have
all those bytes in the string whereas before only the first one was used to
replace the byte at the designated offset. On the other hand the
introduction of an Error Exception is obviously a BC break but as it points
out to some kind of logic error.
I would assume the impact to be minimal for both of these case.Any opinions?
Best regards
George Peter Banyard
[1] https://bugs.php.net/bug.php?id=71572
[2] https://wiki.php.net/rfc/engine_warnings
[3] https://3v4l.org/O0nEM
[4] https://github.com/php/php-src/pull/5013
Being able to replace one byte with an arbitrary string does not seem very useful to me, unless you add the functionality of replacing an arbitrary substring with an arbitrary string (an equivalent of array_splice()
for strings). Your implementation gives the following potentially surprising result:
<?php
$str = "Hello world";
var_dump(strlen($str)); // 11
$str[0] = "あ"; // replace the first byte with three bytes (assuming source code is UTF-8-encoded)
$str[0] = "H"; // replace the first byte with one byte
var_dump(strlen($str)); // 13
?>
IMO, the only reasonable thing to do at this point is just emitting a Warning (or an Error).
—Claude
On Mon, Dec 16, 2019 at 12:21 PM Claude Pache claude.pache@gmail.com
wrote:
Le 16 déc. 2019 à 10:58, George Peter Banyard girgias@php.net a écrit
:Greetings internals,
I'm here to get internals's opinion about allowing a string offset to be
replaced with arbitrary strings.Since forever strings with more then one byte have been truncated
silently
to one byte. The case with empty string kinda was existent but had a
buggy behaviour (see bug 71572 [1]) and as such has been turned into a
warning in PHP 7.1.0, and is meant to be promoted to an Error Exception
per
the Reclassifying engine warnings RFC. [2]
An illustration of both these cases is available on 3v4l.org [3]
https://3v4l.org/O0nEMI've got an implementation ready as a pull request on GitHub [4] (which
still needs some polishing for it to make CI happy and fix the various
leaks).However, the question is if this behaviour is desirable. Moreover, the
silent truncation of strings with more than one byte should be changed to
the same severity as the empty string case; i.e. an Error Exception. This
seems reasonable due to the possible loss of data.What ever the decision is, a BC break is to be expected. As code which
inadvertently tries to assign multiple bytes to an offset will know have
all those bytes in the string whereas before only the first one was used
to
replace the byte at the designated offset. On the other hand the
introduction of an Error Exception is obviously a BC break but as it
points
out to some kind of logic error.
I would assume the impact to be minimal for both of these case.Any opinions?
Best regards
George Peter Banyard
[1] https://bugs.php.net/bug.php?id=71572
[2] https://wiki.php.net/rfc/engine_warnings
[3] https://3v4l.org/O0nEM
[4] https://github.com/php/php-src/pull/5013Being able to replace one byte with an arbitrary string does not seem very
useful to me, unless you add the functionality of replacing an arbitrary
substring with an arbitrary string (an equivalent ofarray_splice()
for
strings). Your implementation gives the following potentially surprising
result:<?php
$str = "Hello world";
var_dump(strlen($str)); // 11
$str[0] = "あ"; // replace the first byte with three bytes (assuming source
code is UTF-8-encoded)
$str[0] = "H"; // replace the first byte with one byte
var_dump(strlen($str)); // 13
?>IMO, the only reasonable thing to do at this point is just emitting a
Warning (or an Error).
I agree. As it's currently silently accepted, I'd recommend starting with a
warning.
Nikita
pon., 16 gru 2019 o 10:58 George Peter Banyard girgias@php.net napisał(a):
Greetings internals,
I'm here to get internals's opinion about allowing a string offset to be
replaced with arbitrary strings.Since forever strings with more then one byte have been truncated silently
to one byte. The case with empty string kinda was existent but had a
buggy behaviour (see bug 71572 [1]) and as such has been turned into a
warning in PHP 7.1.0, and is meant to be promoted to an Error Exception per
the Reclassifying engine warnings RFC. [2]
An illustration of both these cases is available on 3v4l.org [3]
https://3v4l.org/O0nEMI've got an implementation ready as a pull request on GitHub [4] (which
still needs some polishing for it to make CI happy and fix the various
leaks).However, the question is if this behaviour is desirable. Moreover, the
silent truncation of strings with more than one byte should be changed to
the same severity as the empty string case; i.e. an Error Exception. This
seems reasonable due to the possible loss of data.What ever the decision is, a BC break is to be expected. As code which
inadvertently tries to assign multiple bytes to an offset will know have
all those bytes in the string whereas before only the first one was used to
replace the byte at the designated offset. On the other hand the
introduction of an Error Exception is obviously a BC break but as it points
out to some kind of logic error.
I would assume the impact to be minimal for both of these case.Any opinions?
Best regards
George Peter Banyard
[1] https://bugs.php.net/bug.php?id=71572
[2] https://wiki.php.net/rfc/engine_warnings
[3] https://3v4l.org/O0nEM
[4] https://github.com/php/php-src/pull/5013
Allowing to unset string offset with empty string assignment in your patch
will not trigger a Warning anymore,
but using unset with string offset currently emits "Fatal error: Uncaught
Error: Cannot unset string offset".
An expected result is to be the same but an effect radically different as
you can see [1].
IMHO allowing string offset to be unset using unset construct could be
allowed to ensure consistency then.
Any thoughts on this?
BR,
Michał Brzuchalski