Hi internals
A while ago I encountered a limitation of how RFC1867 requests are
handled in PHP. PHP populates the $_POST and $_FILES superglobals when
the Content-Type is multipart/form-data or
application/x-www-form-urlencoded, but only when the method is POST.
For application/x-www-form-urlencoded PUT requests this is not a
problem because the format is simple, usually limited in size and PHP
offers functions to parse it, namely parse_str and parse_url. For
RFC1867 it's a different story.
The code handling the request will need to use streams because RFC1867
is often used with files, the format is much more complicated, files
should be cleaned up when the request ends if unused, etc. Handling
this manually is non-trivial. This has been reported many years ago,
and evidently caused a bit of frustration.
https://bugs.php.net/bug.php?id=55815
This is not limited to PUT either, multipart/form-data bodies are
valid with other requests. Here's the approach I believe is best.
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals. The function works for any non-POST requests. It assumes
that none of the input stream has been consumed, and that the
Content-Type is set accordingly. A nice side-effect of this approach
is that it may be used with the enable_post_data_reading ini setting
to decide whether to parse the RFC1867 bodies dynamically. For
example, a specific endpoint may accept bigger requests. The function
may be implemented in a more generic way 1. by returning the
data/files arrays instead of populating the superglobals and 2. by
providing an input stream manually. I don't know if there's such a
use-case and thus if this is worthwhile, as it would require bigger
changes in the RFC1867 handling.
Here's the proof-of-concept implementation:
https://github.com/php/php-src/pull/11472
For completeness, here are other options I considered.
- Create a new $_PUT superglobal that is always populated. Two
issues: The obvious one is that this is limited to PUT requests. While
we could also introduce $_PATCH, this seems like a poor solution.
While discouraged, other methods can also contain bodies. Another
issue is that the code for processing RFC1867 consumes the input
stream. This constitutes a BC break. Buffering the input is not
feasible for large requests that would be expected here. - The same as option 1, but populate the existing $_POST global. This
comes with the same BC break. - The same as options 1 or 2 with an additional ini setting to opt
into the behavior. The issue with this approach is that both the old
and new behavior might be desired in different parts of the same
application. The ini option can't be changed at runtime because the
populating of the superglobals happens before user code is being
executed.
Let me know what your thoughts are. If there is consensus in the
feedback I'll update the implementation accordingly and post an update
to the list. If there is no consensus, I will create an RFC.
Ilija
how are errors handled, like if the format of php://input is
unrecognized, not valid multipart/form-data and not valid
application/x-www-form-urlencoded?
errors? exceptions? nothing?
Hi internals
A while ago I encountered a limitation of how RFC1867 requests are
handled in PHP. PHP populates the $_POST and $_FILES superglobals when
the Content-Type is multipart/form-data or
application/x-www-form-urlencoded, but only when the method is POST.
For application/x-www-form-urlencoded PUT requests this is not a
problem because the format is simple, usually limited in size and PHP
offers functions to parse it, namely parse_str and parse_url. For
RFC1867 it's a different story.The code handling the request will need to use streams because RFC1867
is often used with files, the format is much more complicated, files
should be cleaned up when the request ends if unused, etc. Handling
this manually is non-trivial. This has been reported many years ago,
and evidently caused a bit of frustration.
https://bugs.php.net/bug.php?id=55815This is not limited to PUT either, multipart/form-data bodies are
valid with other requests. Here's the approach I believe is best.Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals. The function works for any non-POST requests. It assumes
that none of the input stream has been consumed, and that the
Content-Type is set accordingly. A nice side-effect of this approach
is that it may be used with the enable_post_data_reading ini setting
to decide whether to parse the RFC1867 bodies dynamically. For
example, a specific endpoint may accept bigger requests. The function
may be implemented in a more generic way 1. by returning the
data/files arrays instead of populating the superglobals and 2. by
providing an input stream manually. I don't know if there's such a
use-case and thus if this is worthwhile, as it would require bigger
changes in the RFC1867 handling.Here's the proof-of-concept implementation:
https://github.com/php/php-src/pull/11472For completeness, here are other options I considered.
- Create a new $_PUT superglobal that is always populated. Two
issues: The obvious one is that this is limited to PUT requests. While
we could also introduce $_PATCH, this seems like a poor solution.
While discouraged, other methods can also contain bodies. Another
issue is that the code for processing RFC1867 consumes the input
stream. This constitutes a BC break. Buffering the input is not
feasible for large requests that would be expected here.- The same as option 1, but populate the existing $_POST global. This
comes with the same BC break.- The same as options 1 or 2 with an additional ini setting to opt
into the behavior. The issue with this approach is that both the old
and new behavior might be desired in different parts of the same
application. The ini option can't be changed at runtime because the
populating of the superglobals happens before user code is being
executed.Let me know what your thoughts are. If there is consensus in the
feedback I'll update the implementation accordingly and post an update
to the list. If there is no consensus, I will create an RFC.Ilija
--
To unsubscribe, visit: https://www.php.net/unsub.php
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.
My initial instinct was to discuss how this could be made more flexible in
terms of input and output; but looking at how simple the implementation is,
this seems like a really quick win, and anything else can be future scope.
To bikeshed the name a bit:
- We should probably avoid the word "post"; although it populates $_POST,
that's really misnamed anyway, what we're really talking about is "form
data" - Even if there isn't a current category prefix where it belongs, it should
probably follow the "category_specific" naming scheme - We might in future add some related functions, such as returning the data
rather than populating globals
How about "request_form_populate_globals"?
Regards,
Rowan Tommins
[IMSoP]
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.My initial instinct was to discuss how this could be made more flexible in
terms of input and output; but looking at how simple the implementation is,
this seems like a really quick win, and anything else can be future scope.To bikeshed the name a bit:
- We should probably avoid the word "post"; although it populates $_POST,
that's really misnamed anyway, what we're really talking about is "form
data"- Even if there isn't a current category prefix where it belongs, it should
probably follow the "category_specific" naming scheme- We might in future add some related functions, such as returning the data
rather than populating globalsHow about "request_form_populate_globals"?
Another option for the name: populate_multipart_form_data()
.
Cheers,
Ben
Hi Ben, Hi Rowan
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.How about "request_form_populate_globals"?
The word "form" seems a bit out of place (even though it appears in
both multipart/form-data and application/x-www-form-urlencoded),
because this function is mainly targeted at PUT/PATCH requests for
REST APIs. Maybe request_body_populate_globals?
Another option for the name:
populate_multipart_form_data()
.
I avoided the term "multipart" because the function technically also
works for application/x-www-form-urlencoded requests. It's less
necessary for the reasons outlined in my previous email, but it would
allow for consistent handling of such requests for all HTTP methods.
Some people on GitHub voiced that they would prefer an INI setting.
Therefore I will create an RFC accordingly.
Ilija
Hi Ben, Hi Rowan
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.How about "request_form_populate_globals"?
The word "form" seems a bit out of place (even though it appears in
both multipart/form-data and application/x-www-form-urlencoded),
because this function is mainly targeted at PUT/PATCH requests for
REST APIs. Maybe request_body_populate_globals?Another option for the name:
populate_multipart_form_data()
.I avoided the term "multipart" because the function technically also
works for application/x-www-form-urlencoded requests. It's less
necessary for the reasons outlined in my previous email, but it would
allow for consistent handling of such requests for all HTTP methods.Some people on GitHub voiced that they would prefer an INI setting.
Therefore I will create an RFC accordingly.
I know this issue comes up enough because it’s confusing to developers that there’s not a $_PUT
, etc., but I’m not a fan of introducing something new that populates globals.
While I’ve never used application/x-www-form-urlencoded
data for PUT
requests (because I don’t think it’s a proper content-type for the semantics of PUT
), I do see how this content-type could be used with PATCH
, and I also don’t want to rule-out use cases of it for PUT
or any other HTTP method.
In the past, I’ve used something like the following to solve this:
parse_str(file_get_contents('php://input'), $data);
I haven’t looked up how any of the frameworks solve this, but I would be willing to bet they also do something similar.
Rather than implementing functionality to populate globals, would you be interested in introducing some new HTTP request functions. Something like:
http_request_body(): string
http_parse_query(string $queryString): array
http_request_body()
would return the raw body and would be the equivalent of calling file_get_contents('php://input')
. Of special note is that it should always return the raw body, even if $_POST
is populated, for the sake of consistency and reducing confusion.
http_parse_query()
would be the opposite of http_build_query()
and would return a value instead of requiring a reference parameter, like parse_str()
.
While these don’t address the confusion users face by not having a $_PUT
superglobal, I’d prefer not to overload the superglobals. Maybe we can update the documentation to encourage use of these new functions instead of the superglobals?
We also might want to introduce something like http_query_string(): string
that always returns the raw query string, instead of requiring use of the superglobal $_SERVER['QUERY_STRING']
.
Cheers,
Ben
Hi Ben, Hi Rowan
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.How about "request_form_populate_globals"?
The word "form" seems a bit out of place (even though it appears in
both multipart/form-data and application/x-www-form-urlencoded),
because this function is mainly targeted at PUT/PATCH requests for
REST APIs. Maybe request_body_populate_globals?Another option for the name:
populate_multipart_form_data()
.I avoided the term "multipart" because the function technically also
works for application/x-www-form-urlencoded requests. It's less
necessary for the reasons outlined in my previous email, but it would
allow for consistent handling of such requests for all HTTP methods.Some people on GitHub voiced that they would prefer an INI setting.
Therefore I will create an RFC accordingly.I know this issue comes up enough because it’s confusing to developers that there’s not a
$_PUT
, etc., but I’m not a fan of introducing something new that populates globals.While I’ve never used
application/x-www-form-urlencoded
data forPUT
requests (because I don’t think it’s a proper content-type for the semantics ofPUT
), I do see how this content-type could be used withPATCH
, and I also don’t want to rule-out use cases of it forPUT
or any other HTTP method.In the past, I’ve used something like the following to solve this:
parse_str(file_get_contents('php://input'), $data);
I haven’t looked up how any of the frameworks solve this, but I would be willing to bet they also do something similar.
Rather than implementing functionality to populate globals, would you be interested in introducing some new HTTP request functions. Something like:
http_request_body(): string
http_parse_query(string $queryString): array
http_request_body()
would return the raw body and would be the equivalent of callingfile_get_contents('php://input')
. Of special note is that it should always return the raw body, even if$_POST
is populated, for the sake of consistency and reducing confusion.
http_parse_query()
would be the opposite ofhttp_build_query()
and would return a value instead of requiring a reference parameter, likeparse_str()
.While these don’t address the confusion users face by not having a
$_PUT
superglobal, I’d prefer not to overload the superglobals. Maybe we can update the documentation to encourage use of these new functions instead of the superglobals?We also might want to introduce something like
http_query_string(): string
that always returns the raw query string, instead of requiring use of the superglobal$_SERVER['QUERY_STRING']
.Cheers,
Ben
As a userland/library developer I much prefer this approach, but please also include http_uploads()
(name/signature TBD), returning an array of file uploads using a structure akin to what $_POST (or http_parse_query) gives, based on the field name, rather than the current scenario where the presence of square brackets radically changes the structure of the $_FILES array.
My initial thought was that perhaps the leaf entities in said array could be a subclass of SplFileInfo referencing the uploaded temporary file, with additional read-only properties for the extra upload-related attributes - but really the important thing is just having a sane, consistent structure. Having an object rather than an array with constant keys would be a nice addition, but is much less important.
Cheers
Stephen
Hi Ben, Hi Rowan
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.How about "request_form_populate_globals"?
The word "form" seems a bit out of place (even though it appears in
both multipart/form-data and application/x-www-form-urlencoded),
because this function is mainly targeted at PUT/PATCH requests for
REST APIs. Maybe request_body_populate_globals?Another option for the name:
populate_multipart_form_data()
.I avoided the term "multipart" because the function technically also
works for application/x-www-form-urlencoded requests. It's less
necessary for the reasons outlined in my previous email, but it would
allow for consistent handling of such requests for all HTTP methods.Some people on GitHub voiced that they would prefer an INI setting.
Therefore I will create an RFC accordingly.I know this issue comes up enough because it’s confusing to developers that there’s not a
$_PUT
, etc., but I’m not a fan of introducing something new that populates globals.While I’ve never used
application/x-www-form-urlencoded
data forPUT
requests (because I don’t think it’s a proper content-type for the semantics ofPUT
), I do see how this content-type could be used withPATCH
, and I also don’t want to rule-out use cases of it forPUT
or any other HTTP method.In the past, I’ve used something like the following to solve this:
parse_str(file_get_contents('php://input'), $data);
I haven’t looked up how any of the frameworks solve this, but I would be willing to bet they also do something similar.
Rather than implementing functionality to populate globals, would you be interested in introducing some new HTTP request functions. Something like:
http_request_body(): string
http_parse_query(string $queryString): array
http_request_body()
would return the raw body and would be the equivalent of callingfile_get_contents('php://input')
. Of special note is that it should always return the raw body, even if$_POST
is populated, for the sake of consistency and reducing confusion.
http_parse_query()
would be the opposite ofhttp_build_query()
and would return a value instead of requiring a reference parameter, likeparse_str()
.While these don’t address the confusion users face by not having a
$_PUT
superglobal, I’d prefer not to overload the superglobals. Maybe we can update the documentation to encourage use of these new functions instead of the superglobals?We also might want to introduce something like
http_query_string(): string
that always returns the raw query string, instead of requiring use of the superglobal$_SERVER['QUERY_STRING']
.Cheers,
BenAs a userland/library developer I much prefer this approach, but
please also includehttp_uploads()
(name/signature TBD), returning
an array of file uploads using a structure akin to what $_POST (or
http_parse_query) gives, based on the field name, rather than the
current scenario where the presence of square brackets radically
changes the structure of the $_FILES array.My initial thought was that perhaps the leaf entities in said array
could be a subclass of SplFileInfo referencing the uploaded temporary
file, with additional read-only properties for the extra upload-related
attributes - but really the important thing is just having a sane,
consistent structure. Having an object rather than an array with
constant keys would be a nice addition, but is much less important.Cheers
Stephen
I also like Ben's suggestion. It avoids adding more magic globals, but also is still simple and straightforward to use in an easily composable fashion. Based on the Content-Type header of the request, it would allow for:
$formData = http_parse_query(http_request_body());
$jsonData = json_decode(http_request_body());
... Whatever other body parsers.
While you should probably not call it exactly like that, it means "take body string, pass to mime-appropriate body parser" is an easy pattern, and one that can also be easily mocked for testing.
--Larry Garfield
Hey All
Hi Ben, Hi Rowan
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.How about "request_form_populate_globals"?
The word "form" seems a bit out of place (even though it appears in
both multipart/form-data and application/x-www-form-urlencoded),
because this function is mainly targeted at PUT/PATCH requests for
REST APIs. Maybe request_body_populate_globals?Another option for the name:
populate_multipart_form_data()
.I avoided the term "multipart" because the function technically also
works for application/x-www-form-urlencoded requests. It's less
necessary for the reasons outlined in my previous email, but it would
allow for consistent handling of such requests for all HTTP methods.Some people on GitHub voiced that they would prefer an INI setting.
Therefore I will create an RFC accordingly.I know this issue comes up enough because it’s confusing to developers that there’s not a
$_PUT
, etc., but I’m not a fan of introducing something new that populates globals.While I’ve never used
application/x-www-form-urlencoded
data forPUT
requests (because I don’t think it’s a proper content-type for the semantics ofPUT
), I do see how this content-type could be used withPATCH
, and I also don’t want to rule-out use cases of it forPUT
or any other HTTP method.In the past, I’ve used something like the following to solve this:
parse_str(file_get_contents('php://input'), $data);
I haven’t looked up how any of the frameworks solve this, but I would be willing to bet they also do something similar.
Rather than implementing functionality to populate globals, would you be interested in introducing some new HTTP request functions. Something like:
http_request_body(): string http_parse_query(string $queryString): array
http_request_body()
would return the raw body and would be the equivalent of callingfile_get_contents('php://input')
. Of special note is that it should always return the raw body, even if$_POST
is populated, for the sake of consistency and reducing confusion.
http_parse_query()
would be the opposite ofhttp_build_query()
and would return a value instead of requiring a reference parameter, likeparse_str()
.While these don’t address the confusion users face by not having a
$_PUT
superglobal, I’d prefer not to overload the superglobals. Maybe we can update the documentation to encourage use of these new functions instead of the superglobals?We also might want to introduce something like
http_query_string(): string
that always returns the raw query string, instead of requiring use of the superglobal$_SERVER['QUERY_STRING']
.Cheers,
BenAs a userland/library developer I much prefer this approach, but
please also includehttp_uploads()
(name/signature TBD), returning
an array of file uploads using a structure akin to what $_POST (or
http_parse_query) gives, based on the field name, rather than the
current scenario where the presence of square brackets radically
changes the structure of the $_FILES array.My initial thought was that perhaps the leaf entities in said array
could be a subclass of SplFileInfo referencing the uploaded temporary
file, with additional read-only properties for the extra upload-related
attributes - but really the important thing is just having a sane,
consistent structure. Having an object rather than an array with
constant keys would be a nice addition, but is much less important.Cheers
Stephen
I also like Ben's suggestion. It avoids adding more magic globals, but also is still simple and straightforward to use in an easily composable fashion. Based on the Content-Type header of the request, it would allow for:
$formData = http_parse_query(http_request_body());
$jsonData = json_decode(http_request_body());
... Whatever other body parsers.While you should probably not call it exactly like that, it means "take body string, pass to mime-appropriate body parser" is an easy pattern, and one that can also be easily mocked for testing.
While I like not adding more Superglobals, it seems like we are adding
more and more functions to retrieve the different parts of a
Request-Object...
So when we are at it: Why don't we introduce exactly that? A
Request-Object that has all the methods. And which is immutable.
And one method request(): \Request
I deliberately didn't call it getRequest
(or get_request
) to not
confuse people why there isn't also a post_request
or put_request
or
... you get the picture)
One additional function in global namespace and then we can use one of
the request-objects that are already out in the wild. I don't think
there's a need to invent the wheel again.
The advantage might be that no matter how many calls to request()
you
will always get the same result. The Request as it came in.
Just my 0.02€
Cheers
Andreas
,,,
(o o)
+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl |
| mailto:andreas@heigl.org N 50°22'59.5" E 08°23'58" |
| https://andreas.heigl.org |
+---------------------------------------------------------------------+
| https://hei.gl/appointmentwithandreas |
+---------------------------------------------------------------------+
| GPG-Key: https://hei.gl/keyandreasheiglorg |
+---------------------------------------------------------------------+
Hi Andreas,
śr., 28 cze 2023 o 07:55 Andreas Heigl andreas@heigl.org napisał(a):
...
While I like not adding more Superglobals, it seems like we are adding
more and more functions to retrieve the different parts of a
Request-Object...So when we are at it: Why don't we introduce exactly that? A
Request-Object that has all the methods. And which is immutable.And one method
request(): \Request
I agree with the above, this would allow us to clean up the global
namespace in the future.
My personal use cases for PHP are mostly queue workers/event stream
consumers,
so the usual request/response model SAPI is used rarely. But I understand
that it's not the most used case.
Adding a couple of additional functions to a set of already ones just adds
more symbols not always used.
Ideally, I'd see an HTTP module in the future enabled in
request/response-oriented SAPI but that's a different story.
I deliberately didn't call it
getRequest
(orget_request
) to not
confuse people why there isn't also apost_request
orput_request
or
... you get the picture)One additional function in global namespace and then we can use one of
the request-objects that are already out in the wild. I don't think
there's a need to invent the wheel again.The advantage might be that no matter how many calls to
request()
you
will always get the same result. The Request as it came in.
That sounds like a use for a const?!
Just my .50€
Cheers,
Michał Marcin Brzuchalski
Hi Ben
Hi Ben, Hi Rowan
Introduce a new function (currently named populate_post_data()) to
read the input stream and populate the $_POST and $_FILES
superglobals.In the past, I’ve used something like the following to solve this:
parse_str(file_get_contents('php://input'), $data);
I haven’t looked up how any of the frameworks solve this, but I would be willing to bet they also do something similar.
Rather than implementing functionality to populate globals, would you be interested in introducing some new HTTP request functions. Something like:
http_request_body(): string http_parse_query(string $queryString): array
http_request_body()
would return the raw body and would be the equivalent of callingfile_get_contents('php://input')
. Of special note is that it should always return the raw body, even if$_POST
is populated, for the sake of consistency and reducing confusion.
http_parse_query()
would be the opposite ofhttp_build_query()
and would return a value instead of requiring a reference parameter, likeparse_str()
.
The problem is that the content stream for multipart/form-data is
expected to be big, as in possibly multiple gigabytes big. We can't
use http_request_body() to return the entire content as a string at
once. The current RFC1867 implementation reads and operates in chunks,
i.e. appends it to a file or to a string, depending on the content
part. It never has to hold on to the entire content in memory.
http_request_body() also can't return the content of the request again
after it has been consumed, because that's not how the HTTP protocol
works. We would need to buffer the content somewhere when reading it
for the first time, which again we can't do because it may be very
big.
It may be possible to pass the fopen('php://input', 'r') stream to
this function and let it consume it. However, as mentioned in my
original e-mail this requires some changes to how RFC1867 requests are
handled. Currently, it calls sapi_module.read_post() which directly
reads from the TCP socket. Instead, we'd need to read from the stream,
possibly in addition so that the general case is not degraded in terms
of performance. I'll verify if this is an option, and whether the
changes are (too) big. However, I don't suspect there to be a lot of
use-cases for this as RFC1867 is primarily used for requests and not
for responses, so you wouldn't usually need to parse this type of
content from some other source.
As for returning the parsed values as non-globals, that's entirely
possible. However, it's inconsistent with how requests are currently
handled. The values will need to be passed around manually and kept
alive, but the function still modifies global state (i.e. the input
stream, whether that's sapi_module.read_post() or php://input). I
don't believe it will be common to call this function more than once
per request, and thus decoupling the state is not really necessary.
Ilija