Hi Kamil Tekiela,
I read your RFC and I understand the intent in improving the current
parse_str function behaviour by introducing a new function to
avoid possible breakage,
However I feel that we are missing a chance to also improve how parse_str
algorithm is currently used, we could or should (?) use this opportunity
to fix some long shortcomings around parse_str.
In no particular order:
parse_str mangled data while it was acceptable before when it could
directly inject PHP variables into the current scope, this
feature is no longer needed and prevents interoperability with other query
parsing algorithms used in other languages. Also since this mangled data
is not well known it comes as a surprise to average PHP developer as the
behaviour feels unexpected see https://3v4l.org/KIJ9V
parse_str assumes that the query separator is always a "&" which reduces
its usage to only parsing query using that particular character. Again this
might be seen as an edge case but no RFC prevents using any other
character. If you where to use another character you are bound to use some
userland code workaround to then feed the "normalized" string to parse_str
parse_str parse the query string using PHP algorithm while there are now
some established parsing algorithm that are languages independant like
https://url.spec.whatwg.org/#interface-urlsearchparams that defines how a
query string should be parsed.
If we were to introduce your function as is, I feel that we will have to
submit yet another RFC and have a language with at least two ways to parse
a string. So maybe instead of a new function what we need is an object with
a better public API ? thoughts
On Fri, 6 Aug 2021 at 08:18, ignace nyamagana butera nyamsprod@gmail.com
wrote:
I read your RFC and I understand the intent in improving the current
parse_str function behaviour by introducing a new function to
avoid possible breakage,
However I feel that we are missing a chance to also improve how parse_str
algorithm is currently used, we could or should (?) use this opportunity
to fix some long shortcomings around parse_str.In no particular order:
I agree that the first 2 shortcomings ought to be addressed in a
HTTP-focused parse_str alternative. The first has been an annoyance, as
some querystrings use '.' to denote an array, in the way PHP chooses []. So
foo.bar would mean foo[bar] in PHP-speak.
The third I won't comment on as I don't know the Url algorithm.
Peter
btw why isn't foo.bar=123 decoded to array("foo.bar"=>123); ?
this looks pretty bad to me https://3v4l.org/6Wa23
On Fri, 6 Aug 2021 at 08:18, ignace nyamagana butera nyamsprod@gmail.com
wrote:I read your RFC and I understand the intent in improving the current
parse_str function behaviour by introducing a new function to
avoid possible breakage,
However I feel that we are missing a chance to also improve how parse_str
algorithm is currently used, we could or should (?) use this opportunity
to fix some long shortcomings around parse_str.In no particular order:
I agree that the first 2 shortcomings ought to be addressed in a
HTTP-focused parse_str alternative. The first has been an annoyance, as
some querystrings use '.' to denote an array, in the way PHP chooses []. So
foo.bar would mean foo[bar] in PHP-speak.The third I won't comment on as I don't know the Url algorithm.
Peter
btw why isn't foo.bar=123 decoded to array("foo.bar"=>123); ?
this looks pretty bad to me https://3v4l.org/6Wa23
Hi Hans,
This is because variables in PHP that contain the concatenation operator or
space are much more difficult to access. See https://3v4l.org/vUBWK
As the primary purpose of parse_str was to register globals from a query
string, it follows the same logic as your normal POST/GET/COOKIE parsing.
See https://stackoverflow.com/a/68742/1839439
Perhaps, instead of adjusting this behaviour only for the new function, we
could remove this behaviour as a whole, given that it is a remainder of the
long-forgotten register globals? I don't see any use for it anymore, as all
parsed input is only available in the form of an associative array. People
who extract $_POST can suffer the consequences of their own actions...
Regards,
Kamil
Perhaps, instead of adjusting this behaviour only for the new function, we
could remove this behaviour as a whole, given that it is a remainder of the
long-forgotten register globals? I don't see any use for it anymore
Changing behaviour like this is tricky, because even if it's not that
useful, it is consistent, so code to process a query string of
"?foo.bar=42" will be looking at $_GET['foo_bar'] not $_GET['foo.bar'].
If we "fix" the name mangling, that code will break, and not necessarily
in a very obvious way.
Note that there are other parts of the request which get mangled in
similar ways, such as when HTTP headers are translated into CGI
environment variables. The solution in that case is to ignore the CGI
vars and use a function like getallheaders()
, so having a non-mangling
http_parse_query() doesn't feel out of place.
Having the function mirror http_build_query()
as closely as possible
seems like a good idea, e.g. http_parse_query('foo_0=a;foo_1=b', 'foo_',
';') should return [0=>'a', 1=>'b'].
Regards,
--
Rowan Tommins
[IMSoP]
Perhaps, instead of adjusting this behaviour only for the new
function, we
could remove this behaviour as a whole, given that it is a remainder
of the
long-forgotten register globals? I don't see any use for it anymoreChanging behaviour like this is tricky, because even if it's not that
useful, it is consistent, so code to process a query string of
"?foo.bar=42" will be looking at $_GET['foo_bar'] not $_GET['foo.bar'].
If we "fix" the name mangling, that code will break, and not necessarily
in a very obvious way.Note that there are other parts of the request which get mangled in
similar ways, such as when HTTP headers are translated into CGI
environment variables. The solution in that case is to ignore the CGI
vars and use a function likegetallheaders()
, so having a non-mangling
http_parse_query() doesn't feel out of place.
Right. If we want to change this name mangling, we should change it
everywhere, and that certainly needs an own RFC and ideally some kind of
deprecation/change notice.
--
Christoph M. Becker