Hi,
I recently discovered that an array was automatically casting
numeric-string keys to int if it was possible. For instance, the following
array:
$a = ['01' => '01', '10' => '10'];
Is not an array with the key '01' and '10' but instead consider the second
key as an int.
This has some implications like the fact that
- array_flip(array_flip($a)) !== $a
- array_search('10', $a) is an int when array_search('01', $a) is still a
string. Someone using strict types and passing the result to a function
expecting a string could end with an unexpected crash.
I've created an issue about this https://github.com/php/php-src/issues/7845
but it was recommended to me to send a mail to this mailing list instead.
I don't think this behavior should be considered as "normal" and would like
to propose to change this for PHP 9, as it's a BC-break. To me it can and
be considered as a follow-up of
https://wiki.php.net/rfc/string_to_number_comparison and
ttps://wiki.php.net/rfc/saner-numeric-strings
https://wiki.php.net/rfc/saner-numeric-strings. This is still a "concept"
since I never code with C and know nothing about the PHP implementation and
if this change would be possible. Any help is welcome then.
Thanks
On Wed, Dec 29, 2021 at 4:42 PM Vincent Langlet misterdeviling@gmail.com
wrote:
Hi,
I recently discovered that an array was automatically casting
numeric-string keys to int if it was possible. For instance, the following
array:$a = ['01' => '01', '10' => '10'];
Is not an array with the key '01' and '10' but instead consider the second
key as an int.This has some implications like the fact that
- array_flip(array_flip($a)) !== $a
- array_search('10', $a) is an int when array_search('01', $a) is still a
string. Someone using strict types and passing the result to a function
expecting a string could end with an unexpected crash.I've created an issue about this
https://github.com/php/php-src/issues/7845
but it was recommended to me to send a mail to this mailing list instead.I don't think this behavior should be considered as "normal" and would like
to propose to change this for PHP 9, as it's a BC-break. To me it can and
be considered as a follow-up of
https://wiki.php.net/rfc/string_to_number_comparison and
ttps://wiki.php.net/rfc/saner-numeric-strings
https://wiki.php.net/rfc/saner-numeric-strings. This is still a
"concept"
since I never code with C and know nothing about the PHP implementation and
if this change would be possible. Any help is welcome then.Thanks
Hi,
While I'd love for this inconsistency to go away, I also know that this is
most likely such a big change that it causes months of work and broken code
because it relies on this behavior. It wouldn't surprise me if this
singular change would cause more work than all previous BC breaks combined.
This behavior is documented: PHP: Arrays - Manual
https://www.php.net/manual/en/language.types.array.php
The key can either be an int or a string. The value can be of any type.
Additionally the following key casts will occur:
- Strings containing valid decimal ints, unless the number is preceded
by a + sign, will be cast to the int type. E.g. the key "8" will actually
be stored under 8. On the other hand "08" will not be cast, as it isn't a
valid decimal integer.- Floats are also cast to ints, which means that the fractional part
will be truncated. E.g. the key 8.7 will actually be stored under 8.- Bools are cast to ints, too, i.e. the key true will actually be
stored under 1 and the key false under 0.- Null will be cast to the empty string, i.e. the key null will
actually be stored under "".- Arrays and objects can not be used as keys. Doing so will result in
a warning: Illegal offset type.
While I'd love for this inconsistency to go away, I also know that this is
most likely such a big change that it causes months of work and broken code
because it relies on this behavior. It wouldn't surprise me if this
singular change would cause more work than all previous BC breaks combined.
I was about to say the same thing - this is one of those ingrained
behaviours that is very hard to analyse use of without running the code,
so only someone with an extraordinarily good test suite would detect
that their code was affected before it went live.
Like a lot of PHP's type juggling features, it is surprising in a pure
data crunching context, but extremely reasonable in PHP's original and
still very popular context of processing input from web forms. For
instance, a list of products fetched from a database will likely have
indexes which are integers, but a value from $_GET or $_POST will always
be a string. It's therefore quite helpful that $products[42] and
$products['42'] refer to the same thing, so that you can write
$products[$_GET['id']] rather than $products[(int)$_GET['id']].
Would this feature be included in a language designed with knowledge of
how PHP is used today? Possibly not, but we have no time machine to
change our minds now.
One vaguely plausible (but not easy to implement) way forward is to have
new array-like types, where keys and values can be constrained in
different ways. Then it could obey the local scalar type passing setting
("strict_types") to allow combinations like this:
declare(strict_types=0);
$stringKeyedArray = new dict<string,mixed>;
$stringKeyedArray['42'] = true; // key kept as '42'
$stringKeyedArray[42] = true; // actually sets key '42'
$intKeyedArray = new dict<int,mixed>;
$intKeyedArray[42] = true; // key kept as 42
$intKeyedArray['42'] = true; // actually sets key 42
declare(strict_types=1);
$stringKeyedArray = new dict<string,mixed>;
$stringKeyedArray['42'] = true; // key kept as '42'
$stringKeyedArray[42] = true; // TypeError
$intKeyedArray = new dict<int,mixed>;
$intKeyedArray[42] = true; // key kept as 42
$intKeyedArray['42'] = true; // TypeError
Regards,
--
Rowan Tommins
[IMSoP]
While I'd love for this inconsistency to go away, I also know that this
is
most likely such a big change that it causes months of work and broken
code
because it relies on this behavior. It wouldn't surprise me if this
singular change would cause more work than all previous BC breaks
combined.I was about to say the same thing - this is one of those ingrained
behaviours that is very hard to analyse use of without running the code,
so only someone with an extraordinarily good test suite would detect
that their code was affected before it went live.
I encounter this issue with an array of zipcode as key, and let's say some
delay as int:
['00001' => 2, '00002' => 5, ..., '10000' => 4, ... ];
If I want all the zipcode with a delay superior to 3, combining array_keys
and array_filter will give me an array of string and int, instead of an
array of only string ; like I would have expected.
Maybe a way to reduce the BC-break would be to say that
- The type of the key won't change anymore i.e. array_search(0, ['10' =>
0]) will be '10' and not 10. - $array[10] will return $array[10] ?? $array['10']
- $array['10'] will return $array['10'] ?? $array[10]
This would avoid the issue with $products[$_GET['id']] VS
$products[(int)$_GET['id']] and still be a first step in the improvement of
this situation.
Hi!
I don't think this behavior should be considered as "normal" and would like
to propose to change this for PHP 9, as it's a BC-break. To me it can and
It'd not be just a BC break, it'd be an absolutely massive BC break that
has a potential to break a ton of code and would be extremely hard to
detect. This is not something that should be changed in an advanced
version of the language.
Stas Malyshev
smalyshev@gmail.com
It'd not be just a BC break, it'd be an absolutely massive BC break that
has a potential to break a ton of code and would be extremely hard to
detect. This is not something that should be changed in an advanced
version of the language.
Would you still consider this as a massive BC break if
- The type of the key don't change anymore i.e. array_search(0, ['10' =>
0]) will be '10' and not 10. - $array[10] still return $array[10] ?? $array['10']
- $array['10'] still return $array['10'] ?? $array[10]
so it still cast the value when accessing to an array key, but when using
array_keys/array_search/array_flip/... it won't be cast anymore.
On Thu, Dec 30, 2021 at 5:54 PM Vincent Langlet <
mr.vincent.langlet@gmail.com> wrote:
Would you still consider this as a massive BC break if
- The type of the key don't change anymore i.e. array_search(0, ['10' =>
0]) will be '10' and not 10.- $array[10] still return $array[10] ?? $array['10']
- $array['10'] still return $array['10'] ?? $array[10]
so it still cast the value when accessing to an array key, but when using
array_keys/array_search/array_flip/... it won't be cast anymore.
"Would you still consider this as a massive BC break"
Yes, because '10'
from an array flip is not 10
anymore, which means
that using any strict comparisons on those values afterwards will fail if
they assumed it was an int.
Среда, 29 декабря 2021, 18:42 +03:00 от Vincent Langlet misterdeviling@gmail.com:
Hi,I recently discovered that an array was automatically casting
numeric-string keys to int if it was possible. For instance, the following
array:$a = ['01' => '01', '10' => '10'];
Is not an array with the key '01' and '10' but instead consider the second
key as an int.This has some implications like the fact that
- array_flip(array_flip($a)) !== $a
- array_search('10', $a) is an int when array_search('01', $a) is still a
string. Someone using strict types and passing the result to a function
expecting a string could end with an unexpected crash.I've created an issue about this https://github.com/php/php-src/issues/7845
but it was recommended to me to send a mail to this mailing list instead.I don't think this behavior should be considered as "normal" and would like
to propose to change this for PHP 9, as it's a BC-break. To me it can and
be considered as a follow-up of
https://wiki.php.net/rfc/string_to_number_comparison and
ttps://wiki.php.net/rfc/saner-numeric-strings
< https://wiki.php.net/rfc/saner-numeric-strings >. This is still a "concept"
since I never code with C and know nothing about the PHP implementation and
if this change would be possible. Any help is welcome then.Thanks
I support this behavior fix because in its current form, due to a similar problem (almost?), all PSR-7 implementations contain bugs that violate RFC7230 (section 3.2: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, physically, by the standard, all headers can have the name "0" (like «0: value»), but when stored inside implementations, it is converted to a string and a problem arises ($message->getHeaders() // returns array<int|string, string> instead of array<string, string>).
To solve this problem without BC, it seems to me that Nikita's idea with «declares» is most suitable, like:
<?php
declare(strict_types=1, strict_array_keys=1);
$array["0"] = 42; // array(1) { string(1) "0" => int(42) }
Kirill Nesmeyanov
Am 31.12.2021 um 03:21 schrieb Kirill Nesmeyanov nesk@xakep.ru:
I support this behavior fix because in its current form, due to a similar problem (almost?), all PSR-7 implementations contain bugs that violate RFC7230 (section 3.2: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, physically, by the standard, all headers can have the name "0" (like «0: value»), but when stored inside implementations, it is converted to a string and a problem arises ($message->getHeaders() // returns array<int|string, string> instead of array<string, string>).
To solve this problem without BC, it seems to me that Nikita's idea with «declares» is most suitable, like:
<?php declare(strict_types=1, strict_array_keys=1); $array["0"] = 42; // array(1) { string(1) "0" => int(42) }
I probably sound like a broken record but I'd strongly advise against it as it creates language dialects which means a line like
$a["0"] = 42;
behaves differently depending on context, e.g. copying code from Stackoverflow might lead to subtle bugs and reminds me way too much of the ini-settings magic_quotes_gpc.
My take is that changing behavior is ok if it can go through a deprecation/warning phase but that is IMHO not the case here.
If people want to have different behavior I'd rather have a new type like
$a = new Dict;
$a["0"] = 42;
possibly with a short-hand for initialization.
Regards,
- Chris
Пятница, 31 декабря 2021, 10:20 +03:00 от Christian Schneider cschneid@cschneid.com:
Am 31.12.2021 um 03:21 schrieb Kirill Nesmeyanov < nesk@xakep.ru >:I support this behavior fix because in its current form, due to a similar problem (almost?), all PSR-7 implementations contain bugs that violate RFC7230 (section 3.2: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, physically, by the standard, all headers can have the name "0" (like «0: value»), but when stored inside implementations, it is converted to a string and a problem arises ($message->getHeaders() // returns array<int|string, string> instead of array<string, string>).
To solve this problem without BC, it seems to me that Nikita's idea with «declares» is most suitable, like:
<?php declare(strict_types=1, strict_array_keys=1); $array["0"] = 42; // array(1) { string(1) "0" => int(42) }
I probably sound like a broken record but I'd strongly advise against it as it creates language dialects which means a line like
$a["0"] = 42;
behaves differently depending on context, e.g. copying code from Stackoverflow might lead to subtle bugs and reminds me way too much of the ini-settings magic_quotes_gpc.My take is that changing behavior is ok if it can go through a deprecation/warning phase but that is IMHO not the case here.
If people want to have different behavior I'd rather have a new type like
$a = new Dict;
$a["0"] = 42;
possibly with a short-hand for initialization.Regards,
- Chris
--
To unsubscribe, visit: https://www.php.net/unsub.php
Yep =(
In any case, I don't see any other solutions to the problem yet... Well, perhaps adding the Map and Set objects like in ur example.
On the other hand, probably we can forget about the BC and still fix it in major release. I don't think that anyone really uses such functionality and this is part of the real business logic. This is what I personally would prefer. Then this will automatically fix all PSR-7 implementations, and I hope there is a lot of code where such a case with automatic type conversion is not taken into account.
--
Kirill Nesmeyanov
I support this behavior fix because in its current form, due to a similar problem (almost?), all PSR-7 implementations contain bugs that violate RFC7230 (section 3.2:https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, physically, by the standard, all headers can have the name "0" (like «0: value»), but when stored inside implementations, it is converted to a string and a problem arises ($message->getHeaders() // returns array<int|string, string> instead of array<string, string>).
You appear to be technically correct - the RFC defines a header name
only as "token", which implies the following would all be valid HTTP
headers:
42: The Answer
!: Bang
^_^: Surprised
In practice, it would be a bad idea to use any of these.
Every single one of the field names registered with IANA [1] starts with
a letter, and proceeds with only letters, digits, and hyphen ('-'). [The
exception is "*", listed there as "reserved" to specifically prevent its
use conflicting with the wild-card value in "Vary" lists.]
I'm actually surprised this definition hasn't been updated with
interoperability advice in recent revisions of the standard. I did find
this general advice for internet message headers in RFC 3864 [2]:
Thus, for maximum flexibility, header field names SHOULD further be
restricted to just letters, digits, hyphen ('-') and underscore ('_')
characters, with the first character being a letter or underscore.
The additional restriction on underscore ('_') in HTTP arises from CGI,
which maps headers to environment variables. For instance, Apache httpd
silently drops headers with anything other than letters, digits, and
hyphen [3] to avoid security issues caused by environment manipulation.
If I was developing a PSR-7 or similar library, I would be inclined to
drop any header composed only of digits, and issue a diagnostic warning,
so that it wouldn't escalate to a type error later. It certainly doesn't
seem reasonable to change the entire language to work around that
inconvenience.
[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml
[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1
[3] https://httpd.apache.org/docs/trunk/env.html#setting
Regards,
--
Rowan Tommins
[IMSoP]
Суббота, 1 января 2022, 17:41 +03:00 от Rowan Tommins rowan.collins@gmail.com:
I support this behavior fix because in its current form, due to a similar problem (almost?), all PSR-7 implementations contain bugs that violate RFC7230 (section 3.2: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus, physically, by the standard, all headers can have the name "0" (like «0: value»), but when stored inside implementations, it is converted to a string and a problem arises ($message->getHeaders() // returns array<int|string, string> instead of array<string, string>).
You appear to be technically correct - the RFC defines a header name
only as "token", which implies the following would all be valid HTTP
headers:42: The Answer
!: Bang
^_^: SurprisedIn practice, it would be a bad idea to use any of these.
Every single one of the field names registered with IANA [1] starts with
a letter, and proceeds with only letters, digits, and hyphen ('-'). [The
exception is "*", listed there as "reserved" to specifically prevent its
use conflicting with the wild-card value in "Vary" lists.]I'm actually surprised this definition hasn't been updated with
interoperability advice in recent revisions of the standard. I did find
this general advice for internet message headers in RFC 3864 [2]:> Thus, for maximum flexibility, header field names SHOULD further be
> restricted to just letters, digits, hyphen ('-') and underscore ('_')
> characters, with the first character being a letter or underscore.The additional restriction on underscore ('_') in HTTP arises from CGI,
which maps headers to environment variables. For instance, Apache httpd
silently drops headers with anything other than letters, digits, and
hyphen [3] to avoid security issues caused by environment manipulation.If I was developing a PSR-7 or similar library, I would be inclined to
drop any header composed only of digits, and issue a diagnostic warning,
so that it wouldn't escalate to a type error later. It certainly doesn't
seem reasonable to change the entire language to work around that
inconvenience.[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml
[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1
[3] https://httpd.apache.org/docs/trunk/env.html#settingRegards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
I just gave an example of what at the moment can cause an exception in any application that is based on the PSR. It is enough to send the header "0: Farewell to the server". In some cases (for example, as is the case with RoadRunner) - this can cause a physical stop and restart of the server.
Just in case, I will repeat my thesis: I cannot imagine that anyone is using this functionality consciously and that it is part of the real logic of the application. And fixing this behavior, I believe, will automatically fix many libraries (not necessarily PSR) that do not take this behavior into account.
Kirill Nesmeyanov
Суббота, 1 января 2022, 17:41 +03:00 от Rowan Tommins <
rowan.collins@gmail.com>:I support this behavior fix because in its current form, due to a
similar problem (almost?), all PSR-7 implementations contain bugs that
violate RFC7230 (section 3.2:
https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus,
physically, by the standard, all headers can have the name "0" (like «0:
value»), but when stored inside implementations, it is converted to a
string and a problem arises ($message->getHeaders() //
returns array<int|string, string> instead of array<string, string>).You appear to be technically correct - the RFC defines a header name
only as "token", which implies the following would all be valid HTTP
headers:42: The Answer
!: Bang
^_^: SurprisedIn practice, it would be a bad idea to use any of these.
Every single one of the field names registered with IANA [1] starts with
a letter, and proceeds with only letters, digits, and hyphen ('-'). [The
exception is "*", listed there as "reserved" to specifically prevent its
use conflicting with the wild-card value in "Vary" lists.]I'm actually surprised this definition hasn't been updated with
interoperability advice in recent revisions of the standard. I did find
this general advice for internet message headers in RFC 3864 [2]:Thus, for maximum flexibility, header field names SHOULD further be
restricted to just letters, digits, hyphen ('-') and underscore ('_')
characters, with the first character being a letter or underscore.The additional restriction on underscore ('_') in HTTP arises from CGI,
which maps headers to environment variables. For instance, Apache httpd
silently drops headers with anything other than letters, digits, and
hyphen [3] to avoid security issues caused by environment manipulation.If I was developing a PSR-7 or similar library, I would be inclined to
drop any header composed only of digits, and issue a diagnostic warning,
so that it wouldn't escalate to a type error later. It certainly doesn't
seem reasonable to change the entire language to work around that
inconvenience.[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml
[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1
[3] https://httpd.apache.org/docs/trunk/env.html#settingRegards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
I just gave an example of what at the moment can cause an exception in any
application that is based on the PSR. It is enough to send the header "0:
Farewell to the server". In some cases (for example, as is the case with
RoadRunner) - this can cause a physical stop and restart of the server.Just in case, I will repeat my thesis: I cannot imagine that anyone is
using this functionality consciously and that it is part of the real logic
of the application.
You don't have a lot of experience with legacy code then. PHP, particularly
old PHP (like 4, 5.1 era) was used by a lot of idiots.
I was one of those idiots (Perhaps I still am an idiot - jury is
deliberating on that but I digress).
Snark aside though, PHP has more than its fair share of self taught
programmers (again, not trying to be insulting as I am one myself), and
they do things with the code that veterans and formally trained programmers
would never think to try, let alone implement.
I guarantee fixing how key handling is done will break something - either
in the form of code exploiting the weird behavior, or code that is guarding
against the weird behavior; not to mention any tests that might be written
- though amateurs rarely write test code (again, speaking from past
experience I've grown beyond).
And fixing this behavior, I believe, will automatically fix many libraries
(not necessarily PSR) that do not take this behavior into account.
And blow up who knows how many old code bases - many of which don't have
unit test suites to discover if there is a break ahead of time. This is
the sort of BC break that would cause a cliff of users unable to migrate to
the major version that implements it. A Python 2 vs. 3 style of break.
Even with that all said it may indeed be worth fixing - but this will
require the same sort of kid gloves approach removing register globals had
(for the newer folks, there was a time when $_REQUEST["var"] would auto
populate $var with lovely security snarls). IIRC PHP 3 had register
globals always on, 4 created a config toggle to turn them off, and PHP 5.0
turned that toggle off by default, finally PHP 5.3 (6 without unicode more
or less) removed support for register globals entirely (My memory could be
off - it's in the changelogs for the curious).
I leave the decision making to the maintainers and contribs who do the
actual work. Hell, I personally don't even use PHP that much these days
having gotten a job where I focus on writing Cucumber tests in JavaScript
that run on node.js. I keep up with PHP and this list though cause one
never knows what the next job will entail. I just dropped out of lurk mode
to underscore along with others up thread the massive ramifications of what
is being proposed. As someone who wrote stupid code I can see this
breaking, tread lightly. And hell, I don't even know how much of that code
is still in use since I've changed employers many times since it was
written. This situation is not unique and can create huge headaches for
companies running projects on legacy code bases.
I leave the decision making to the maintainers and contribs who do the
actual work. Hell, I personally don't even use PHP that much these days
having gotten a job where I focus on writing Cucumber tests in JavaScript
that run on node.js. I keep up with PHP and this list though cause one
never knows what the next job will entail. I just dropped out of lurk mode
to underscore along with others up thread the massive ramifications of what
is being proposed. As someone who wrote stupid code I can see this
breaking, tread lightly. And hell, I don't even know how much of that code
is still in use since I've changed employers many times since it was
written. This situation is not unique and can create huge headaches for
companies running projects on legacy code bases.
I think a question worth asking is how much of that code is likely to
actually run on PHP X.X if this were done. I don't have an answer to that,
but the kind of code that you're talking about is likely to encounter other
more fundamental problems. For instance, prior to 8.0, doing $var = $obj + 2
would result in $var = (int)3;
In PHP 7 this also produced a notice.
In PHP 8, the same line results in a fatal error.
I don't think that the impact would be limited to code that is written
poorly however. There may be code which almost accidentally depends on this
behavior that is in fact well maintained, since it is such a nuanced
problem. To me, this seems like something worth doing, but it might be
something that should wait for a major version.
Jordan
Воскресенье, 2 января 2022, 8:20 +03:00 от Michael Morris tendoaki@gmail.com:
Суббота, 1 января 2022, 17:41 +03:00 от Rowan Tommins <
rowan.collins@gmail.com >:I support this behavior fix because in its current form, due to a
similar problem (almost?), all PSR-7 implementations contain bugs that
violate RFC7230 (section 3.2:
https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus,
physically, by the standard, all headers can have the name "0" (like «0:
value»), but when stored inside implementations, it is converted to a
string and a problem arises ($message->getHeaders() //
returns array<int|string, string> instead of array<string, string>).You appear to be technically correct - the RFC defines a header name
only as "token", which implies the following would all be valid HTTP
headers:42: The Answer
!: Bang
^_^: SurprisedIn practice, it would be a bad idea to use any of these.
Every single one of the field names registered with IANA [1] starts with
a letter, and proceeds with only letters, digits, and hyphen ('-'). [The
exception is "*", listed there as "reserved" to specifically prevent its
use conflicting with the wild-card value in "Vary" lists.]I'm actually surprised this definition hasn't been updated with
interoperability advice in recent revisions of the standard. I did find
this general advice for internet message headers in RFC 3864 [2]:Thus, for maximum flexibility, header field names SHOULD further be
restricted to just letters, digits, hyphen ('-') and underscore ('_')
characters, with the first character being a letter or underscore.The additional restriction on underscore ('_') in HTTP arises from CGI,
which maps headers to environment variables. For instance, Apache httpd
silently drops headers with anything other than letters, digits, and
hyphen [3] to avoid security issues caused by environment manipulation.If I was developing a PSR-7 or similar library, I would be inclined to
drop any header composed only of digits, and issue a diagnostic warning,
so that it wouldn't escalate to a type error later. It certainly doesn't
seem reasonable to change the entire language to work around that
inconvenience.[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml
[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1
[3] https://httpd.apache.org/docs/trunk/env.html#settingRegards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
I just gave an example of what at the moment can cause an exception in any
application that is based on the PSR. It is enough to send the header "0:
Farewell to the server". In some cases (for example, as is the case with
RoadRunner) - this can cause a physical stop and restart of the server.Just in case, I will repeat my thesis: I cannot imagine that anyone is
using this functionality consciously and that it is part of the real logic
of the application.You don't have a lot of experience with legacy code then. PHP, particularly
old PHP (like 4, 5.1 era) was used by a lot of idiots.I was one of those idiots (Perhaps I still am an idiot - jury is
deliberating on that but I digress).Snark aside though, PHP has more than its fair share of self taught
programmers (again, not trying to be insulting as I am one myself), and
they do things with the code that veterans and formally trained programmers
would never think to try, let alone implement.I guarantee fixing how key handling is done will break something - either
in the form of code exploiting the weird behavior, or code that is guarding
against the weird behavior; not to mention any tests that might be written
- though amateurs rarely write test code (again, speaking from past
experience I've grown beyond).And fixing this behavior, I believe, will automatically fix many libraries
(not necessarily PSR) that do not take this behavior into account.And blow up who knows how many old code bases - many of which don't have
unit test suites to discover if there is a break ahead of time. This is
the sort of BC break that would cause a cliff of users unable to migrate to
the major version that implements it. A Python 2 vs. 3 style of break.Even with that all said it may indeed be worth fixing - but this will
require the same sort of kid gloves approach removing register globals had
(for the newer folks, there was a time when $_REQUEST["var"] would auto
populate $var with lovely security snarls). IIRC PHP 3 had register
globals always on, 4 created a config toggle to turn them off, and PHP 5.0
turned that toggle off by default, finally PHP 5.3 (6 without unicode more
or less) removed support for register globals entirely (My memory could be
off - it's in the changelogs for the curious).I leave the decision making to the maintainers and contribs who do the
actual work. Hell, I personally don't even use PHP that much these days
having gotten a job where I focus on writing Cucumber tests in JavaScript
that run on node.js. I keep up with PHP and this list though cause one
never knows what the next job will entail. I just dropped out of lurk mode
to underscore along with others up thread the massive ramifications of what
is being proposed. As someone who wrote stupid code I can see this
breaking, tread lightly. And hell, I don't even know how much of that code
is still in use since I've changed employers many times since it was
written. This situation is not unique and can create huge headaches for
companies running projects on legacy code bases.
I absolutely agree with all your arguments. Moreover, I will add one more thing to them: This fix changes the logic of the code, and it is impossible to process it correctly, causing, for example, "depressation issue", which aggravates the situation.
However, the major release (versions of PHP9) implies a breakdown of BC. And this is the very moment when problems in the language can and should be fixed.
Since the mailing list serves to discuss the problem, I can suggest a variant with the temporary introduction of the ini setting "strict_arrays=1", which will be deleted in PHP10, and the value "strict_arrays=0" will cause a deprecation notice like "Please note that you have auto-conversion of array keys is enabled, the behavior of which will be changed in PHP10" in places where such conversion is performed. What do you think about it?
Kirill Nesmeyanov
Суббота, 1 января 2022, 17:41 +03:00 от Rowan Tommins <
rowan.collins@gmail.com>:I support this behavior fix because in its current form, due to a
similar problem (almost?), all PSR-7 implementations contain bugs that
violate RFC7230 (section 3.2:
https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus,
physically, by the standard, all headers can have the name "0" (like «0:
value»), but when stored inside implementations, it is converted to a
string and a problem arises ($message->getHeaders() //
returns array<int|string, string> instead of array<string, string>).
The solution is to cast the keys back to string when reading from the
array, IF the type matters.
foreach ($headers as $k => $values) {
$name = (string) $k;
}
We could introduce an alternative to array_keys()
that would do this
automatically, e.g. "array_keys_str()".
You appear to be technically correct - the RFC defines a header name
only as "token", which implies the following would all be valid HTTP
headers:42: The Answer
!: Bang
^_^: SurprisedIn practice, it would be a bad idea to use any of these.
Every single one of the field names registered with IANA [1] starts with
a letter, and proceeds with only letters, digits, and hyphen ('-'). [The
exception is "*", listed there as "reserved" to specifically prevent its
use conflicting with the wild-card value in "Vary" lists.]I'm actually surprised this definition hasn't been updated with
interoperability advice in recent revisions of the standard. I did find
this general advice for internet message headers in RFC 3864 [2]:Thus, for maximum flexibility, header field names SHOULD further be
restricted to just letters, digits, hyphen ('-') and underscore ('_')
characters, with the first character being a letter or underscore.The additional restriction on underscore ('_') in HTTP arises from CGI,
which maps headers to environment variables. For instance, Apache httpd
silently drops headers with anything other than letters, digits, and
hyphen [3] to avoid security issues caused by environment manipulation.If I was developing a PSR-7 or similar library, I would be inclined to
drop any header composed only of digits, and issue a diagnostic warning,
so that it wouldn't escalate to a type error later. It certainly doesn't
seem reasonable to change the entire language to work around that
inconvenience.[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml
[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1
[3] https://httpd.apache.org/docs/trunk/env.html#settingRegards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
I just gave an example of what at the moment can cause an exception in any
application that is based on the PSR. It is enough to send the header "0:
Farewell to the server". In some cases (for example, as is the case with
RoadRunner) - this can cause a physical stop and restart of the server.Just in case, I will repeat my thesis: I cannot imagine that anyone is
using this functionality consciously and that it is part of the real logic
of the application.
It is not really relevant weather this is used consciously.
You don't have a lot of experience with legacy code then. PHP, particularly
old PHP (like 4, 5.1 era) was used by a lot of idiots.I was one of those idiots (Perhaps I still am an idiot - jury is
deliberating on that but I digress).
We don't need to assume incompetence.
Any code that deals with arrays must consider this behavior, unless
the array keys are known to be only integers, or only non-integer-like
strings.
One obvious BC break:
What would be the value of $a in the following snippet?
$a = [];
$a['5] = 's';
$a[5] = 'n';
$a[7] = 'n';
$a['7'] = 's';
Currently it would be [5 => 'n', 7 => 's'].
With the "new" behavior, we'd have to decide what happens.
Can keys '5' and 5 coexist? ['5' => 's', 5 => 'n', 7 => 'n', '7' => 's']?
Or would assignment change the key type? [5 => 'n', '7' => 's']?
Or does the initial key type remain, and only the value changes? ['5'
=> 'n', 7 => 's']?
I would argue that the current behavior might still be the best we can
get for a general-purpose structure that can act as a vector or a map
or a mix of both.
The perceived awkwardness is just a result of trying to do everything at once.
Possible solutions:
- Dedicated array-reading methods that cast all keys to string on read.
- New structures, alternative to array, that either allow separate
entries for 5 and '5', or that are restricted to one key type.
--- Andreas
Snark aside though, PHP has more than its fair share of self taught
programmers (again, not trying to be insulting as I am one myself), and
they do things with the code that veterans and formally trained programmers
would never think to try, let alone implement.I guarantee fixing how key handling is done will break something - either
in the form of code exploiting the weird behavior, or code that is guarding
against the weird behavior; not to mention any tests that might be written
- though amateurs rarely write test code (again, speaking from past
experience I've grown beyond).And fixing this behavior, I believe, will automatically fix many libraries
(not necessarily PSR) that do not take this behavior into account.And blow up who knows how many old code bases - many of which don't have
unit test suites to discover if there is a break ahead of time. This is
the sort of BC break that would cause a cliff of users unable to migrate to
the major version that implements it. A Python 2 vs. 3 style of break.Even with that all said it may indeed be worth fixing - but this will
require the same sort of kid gloves approach removing register globals had
(for the newer folks, there was a time when $_REQUEST["var"] would auto
populate $var with lovely security snarls). IIRC PHP 3 had register
globals always on, 4 created a config toggle to turn them off, and PHP 5.0
turned that toggle off by default, finally PHP 5.3 (6 without unicode more
or less) removed support for register globals entirely (My memory could be
off - it's in the changelogs for the curious).I leave the decision making to the maintainers and contribs who do the
actual work. Hell, I personally don't even use PHP that much these days
having gotten a job where I focus on writing Cucumber tests in JavaScript
that run on node.js. I keep up with PHP and this list though cause one
never knows what the next job will entail. I just dropped out of lurk mode
to underscore along with others up thread the massive ramifications of what
is being proposed. As someone who wrote stupid code I can see this
breaking, tread lightly. And hell, I don't even know how much of that code
is still in use since I've changed employers many times since it was
written. This situation is not unique and can create huge headaches for
companies running projects on legacy code bases.
I just gave an example of what at the moment can cause an exception in any application that is based on the PSR. It is enough to send the header "0: Farewell to the server". In some cases (for example, as is the case with RoadRunner) - this can cause a physical stop and restart of the server.
Any library where a crafted HTTP request can cause a server shutdown has a bug which needs addressing right now - possibly more than one, actually, as it implies error handling is leaking across request boundaries. A change to the language applied in the next major version would fix this some time around 2025, once people start adopting it. A workaround in the library itself can be applied within weeks.
I already gave a simple solution that such libraries can apply right now, with very little chance of negative impact: sanitise headers more aggressively than the HTTP standard requires, as Apache httpd does, in this case discarding any header containing only digits. This is likely to be about three lines of code inside a loop preprocessing raw headers:
if ( ctype_digit($rawHeaderName) ) {
trigger_error("Numeric HTTP header '$rawHeaderName' has been discarded.", E_USER_WARNING);
continue;
}
If I was the maintainer of such a library, I might consider even stricter validation, considering what seems like an accidentally broad definition in the HTTP spec, and the possibility of an application receiving even more exotic characters if processing raw TCP traffic.
The idea of an array_keys variant or option that forces everything back to string seems like it might be useful (and easy to polyfill for old versions). Changing such a fundamental language behaviour in the hope that it will fix more code than it breaks is just not worth it.
Regards,
--
Rowan Tommins
[IMSoP]
On Sun, Jan 2, 2022 at 7:05 AM Rowan Tommins rowan.collins@gmail.com
wrote:
I just gave an example of what at the moment can cause an exception in
any application that is based on the PSR. It is enough to send the header
"0: Farewell to the server". In some cases (for example, as is the case
with RoadRunner) - this can cause a physical stop and restart of the server.Any library where a crafted HTTP request can cause a server shutdown has a
bug which needs addressing right now - possibly more than one, actually, as
it implies error handling is leaking across request boundaries. A change to
the language applied in the next major version would fix this some time
around 2025, once people start adopting it. A workaround in the library
itself can be applied within weeks.I already gave a simple solution that such libraries can apply right now,
with very little chance of negative impact: sanitise headers more
aggressively than the HTTP standard requires, as Apache httpd does, in this
case discarding any header containing only digits. This is likely to be
about three lines of code inside a loop preprocessing raw headers:if ( ctype_digit($rawHeaderName) ) {
trigger_error("Numeric HTTP header '$rawHeaderName' has been
discarded.", E_USER_WARNING);
continue;
}If I was the maintainer of such a library, I might consider even stricter
validation, considering what seems like an accidentally broad definition in
the HTTP spec, and the possibility of an application receiving even more
exotic characters if processing raw TCP traffic.The idea of an array_keys variant or option that forces everything back to
string seems like it might be useful (and easy to polyfill for old
versions). Changing such a fundamental language behaviour in the hope that
it will fix more code than it breaks is just not worth it.
But "001" casted to 1 will then get casted back to "1" not "001".
Regards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
--
Chase Peeler
chasepeeler@gmail.com
But "001" casted to 1 will then get casted back to "1" not "001".
"001" would not be cast to an integer in this context: https://3v4l.org/gGFHJ
Regards,
--
Rowan Tommins
[IMSoP]
On Mon, Jan 3, 2022 at 10:48 AM Rowan Tommins rowan.collins@gmail.com
wrote:
On 3 January 2022 15:41:48 GMT, Chase Peeler chasepeeler@gmail.com
wrote:But "001" casted to 1 will then get casted back to "1" not "001".
"001" would not be cast to an integer in this context:
https://3v4l.org/gGFHJ
My mistake, I thought it did get casted. Still, I think you add a lot of
complexity because you'll need to keep track of whether or not a key was a
string or not. There is no guarantee that all numeric keys were originally
a string:
[ "1" => "foo", 2 => "bar"]
And, for the record, I am not in favor of the original proposal either. I
think it's way too big of a BC break like others have said.
Regards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
--
Chase Peeler
chasepeeler@gmail.com
My mistake, I thought it did get casted. Still, I think you add a lot of
complexity because you'll need to keep track of whether or not a key was a
string or not. There is no guarantee that all numeric keys were originally
a string:
[ "1" => "foo", 2 => "bar"]
I don't think anybody was proposing any such tracking, simply a shorthand for this:
array_map(strval(...), array_keys($array));
In other words, take the keys of any array, which might be a mixture of integers and strings, and convert them all to strings, regardless of how they were originally specified. The result being an array where the values can all be safely passed to a function expecting strings, with strict_types=1 in force.
Regards,
--
Rowan Tommins
[IMSoP]
I've sent a few emails to unsubscribe. Please remove me from this list. I'm
tired of these fucking emails about a dead language
I support this behavior fix because in its current form, due to a
similar problem (almost?), all PSR-7 implementations contain bugs that
violate RFC7230 (section 3.2:
https://datatracker.ietf.org/doc/html/rfc7230#section-3.2 ). Thus,
physically, by the standard, all headers can have the name "0" (like «0:
value»), but when stored inside implementations, it is converted to a
string and a problem arises ($message->getHeaders() //
returns array<int|string, string> instead of array<string, string>).You appear to be technically correct - the RFC defines a header name
only as "token", which implies the following would all be valid HTTP
headers:42: The Answer
!: Bang
^_^: SurprisedIn practice, it would be a bad idea to use any of these.
Every single one of the field names registered with IANA [1] starts with
a letter, and proceeds with only letters, digits, and hyphen ('-'). [The
exception is "*", listed there as "reserved" to specifically prevent its
use conflicting with the wild-card value in "Vary" lists.]I'm actually surprised this definition hasn't been updated with
interoperability advice in recent revisions of the standard. I did find
this general advice for internet message headers in RFC 3864 [2]:Thus, for maximum flexibility, header field names SHOULD further be
restricted to just letters, digits, hyphen ('-') and underscore ('_')
characters, with the first character being a letter or underscore.The additional restriction on underscore ('_') in HTTP arises from CGI,
which maps headers to environment variables. For instance, Apache httpd
silently drops headers with anything other than letters, digits, and
hyphen [3] to avoid security issues caused by environment manipulation.If I was developing a PSR-7 or similar library, I would be inclined to
drop any header composed only of digits, and issue a diagnostic warning,
so that it wouldn't escalate to a type error later. It certainly doesn't
seem reasonable to change the entire language to work around that
inconvenience.[1] https://www.iana.org/assignments/http-fields/http-fields.xhtml
[2] https://datatracker.ietf.org/doc/html/rfc3864#section-4.1
[3] https://httpd.apache.org/docs/trunk/env.html#settingRegards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
I've sent a few emails to unsubscribe. Please remove me from this list. I'm
tired of these fucking emails about a dead language
Please send mail to internals-unsubscribe@lists.php.net to trigger the
unsubscription process. See
http://untroubled.org/ezmlm/ezman/ezman1.html for details.
Regards,
Christoph
On Wed, 29 Dec 2021, 15:42 Vincent Langlet, misterdeviling@gmail.com
wrote:
I recently discovered that an array was automatically casting
numeric-string keys to int if it was possible.
Having had to track down bugs numerous times over the years, I completely
agree this is unexpected behaviour, and would be in favour of it being
amended. It is quite a BC break indeed, and both Orklah (Psalm) and Ondrej
Mirtes (PHP Stan) agree that the issue is that it's a difficult issue to
identify with static analysis (see
https://twitter.com/OndrejMirtes/status/1478971968636567552 for some
thoughts).
As always with type juggling related discussions in PHP, I expect any RFC
to be contentious and heavily discussed ;)
I expect the main detractors would be the BC breaks, and people might not
think the BC break is worth the pain.
Just thinking out loud, being able to identify an array type might be a way
forward here (using "syntax" already in use by static analysis tools in
docblocks, for familiarity), e.g.
private array<string, mixed> $property;
This might "turn off" the automatic type juggling behaviour, for example.
If the existing strict_types=1 mode is on, this might fail:
$this->property[3] = "hi";
If strict_types=0 or not declared, the key would be cast to a string.
That said, I know there has been contentious discussion on this and the
related topic of generics multiple times, so don't really want to stir up a
hornet's nest there.
Great idea, and I'd love to see this fixed, given the countless hours I've
wasted trying to track down bugs that are a result of this. But I fear that
this would be a difficult one to push through the RFC process, sadly. I
hope I'm proven wrong though!
Thanks
James
Just thinking out loud, being able to identify an array type might be a way
forward here (using "syntax" already in use by static analysis tools in
docblocks, for familiarity), e.g.private array<string, mixed> $property;
This might "turn off" the automatic type juggling behaviour, for example.
If the existing strict_types=1 mode is on, this might fail:$this->property[3] = "hi";
If strict_types=0 or not declared, the key would be cast to a string.
Yes, something similar occurred to me earlier, and I honestly think that some new types like this are the only way this can practically be solved.
That said, I know there has been contentious discussion on this and the
related topic of generics multiple times, so don't really want to stir up a
hornet's nest there.
It's not so much that generics are contentious, but that they're really, really hard to implement within the context of PHP's other features. It gets complicated really quickly when you need to verify parameter and return types, e.g. to decide if a value of type SortedList<ChildClass> meets a parameter constraint List<ParentClass>
Leaving that aside, though, an array type that has certain constraints on assignment might be workable. And a simple pair that only constrain their keys could just be individually named rather than introducing generics.
As for the original suggestion, I think some people are underestimating just how disruptive changing this could be. The really big problem is that there is nothing to deprecate, very little to detect with static analysis tools, and very confusing code if you want to support both versions.
The following would be perfectly valid PHP both before and after the behaviour change:
$a = [];
$a[42] = true;
$a['42'] = false;
$a[ (string)43 ] = true;
$a[ (int)'43' ] = false;
None of those lines is "wrong", in and of itself, so there's nothing to mark deprecated or warn about; but the result in current PHP is [42 => false, 43 => false]; and with the proposed change, the result would presumably be [42 => true, '42' => false, '43' => true, 43 => false]
I can't think of any way one could prepare for this change, other than putting a string cast in front of every array index throughout an entire code base, and on every use of array_keys, foreach, etc that reads array keys. Unless someone can present a fully working migration plan, this would get a "no" vote from me.
Regards,
--
Rowan Tommins
[IMSoP]
Just thinking out loud, being able to identify an array type might be a
way
forward here (using "syntax" already in use by static analysis tools in
docblocks, for familiarity), e.g.private array<string, mixed> $property;
This might "turn off" the automatic type juggling behaviour, for example.
If the existing strict_types=1 mode is on, this might fail:$this->property[3] = "hi";
If strict_types=0 or not declared, the key would be cast to a string.
Yes, something similar occurred to me earlier, and I honestly think that
some new types like this are the only way this can practically be solved.
So a solution, which doesn't require generics, would be to introduce
something new like a dict
as a way to create an array<string, mixed>
.
{}
could be used instead of []
.
$dict = {'1' => '1', 'key' => 'value'};
\is_array($dict); // true
On Wed, 29 Dec 2021 at 17:42, Vincent Langlet misterdeviling@gmail.com
wrote:
Hi,
I recently discovered that an array was automatically casting
numeric-string keys to int if it was possible. For instance, the following
array:$a = ['01' => '01', '10' => '10'];
Is not an array with the key '01' and '10' but instead consider the second
key as an int.This has some implications like the fact that
- array_flip(array_flip($a)) !== $a
- array_search('10', $a) is an int when array_search('01', $a) is still a
string. Someone using strict types and passing the result to a function
expecting a string could end with an unexpected crash.I've created an issue about this
https://github.com/php/php-src/issues/7845
but it was recommended to me to send a mail to this mailing list instead.I don't think this behavior should be considered as "normal" and would like
to propose to change this for PHP 9, as it's a BC-break. To me it can and
be considered as a follow-up of
https://wiki.php.net/rfc/string_to_number_comparison and
ttps://wiki.php.net/rfc/saner-numeric-strings
https://wiki.php.net/rfc/saner-numeric-strings. This is still a
"concept"
since I never code with C and know nothing about the PHP implementation and
if this change would be possible. Any help is welcome then.Thanks
Hello everyone,
This is indeed a pretty sizeable BC break that probably is going to cause
quite a few headaches for people, but it's also one of those things where I
think it's absolutely worth tackling and fixing it.
I never got hit by this personally, but when I use string keys, I expect
them to be kept as strings and not do magic conversions to an integer.
Can't vote, but you definitely have my full support to fix this one.
Probably would need a depreciation period and some warnings for a version
or two.
--
Arvīds Godjuks
+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius