Hi Everyone,
I'd like to introduce my latest RFC that I've been working on for a while
now: https://wiki.php.net/rfc/uri_followup.
It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
- Query Parameter Manipulation
- Accessing Path Segments as an Array
- Host Type Detection
- URI Type Detection
- Percent-Encoding and Decoding Support
I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite my
efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:
- How to support array/object values for constructing query strings? (
https://wiki.php.net/rfc/uri_followup#type_support) - How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)? (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding) - Exactly how the advanced percent-decoding capabilities should work? Does
it make sense to support all the possible modes (UriPercentEncodingMode)
for percent-decoding as well (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support) - etc.
Regards,
Máté
Hi Everyone,
I'd like to introduce my latest RFC that I've been working on for a while now: https://wiki.php.net/rfc/uri_followup.
It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
- Query Parameter Manipulation
- Accessing Path Segments as an Array
- Host Type Detection
- URI Type Detection
- Percent-Encoding and Decoding Support
I did my best to write an RFC that was at least as extensive as https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite my efforts,
there are still a couple things which need a final decision, or which need to be polished/improved. Some examples:
- How to support array/object values for constructing query strings? (https://wiki.php.net/rfc/uri_followup#type_support)
- How to make the UriQueryParams and UrlQueryParams classes more interoperable with the query string component (mainly with respect to percent-encoding)? (https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)
- Exactly how the advanced percent-decoding capabilities should work? Does it make sense to support all the possible modes (UriPercentEncodingMode) for percent-decoding as well (https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support)
- etc.
Regards,
Máté
Hello!
For the builder methods, why not use the same wither method names? That would make switching to them really easy, over the current implementation.
— Rob
Hi Máté,
Once again thanks for this follow up RFC. While there are a lot to digest I
wanted to point out your reservation around implementing the
IteratorAggregate interface for Query Manipulation,
The UriQueryParams and UrlQueryParams classes could implement the
IteratorAggregate interface in theory. However, it's not possible to do so
due to query components that share the same name, e.g.:
param=foo¶m=bar¶m=baz. In this case, the same key (param) would be
repeated 3 times - and it's actually not possible to support with iterators.
When building the Query component in league URI I was able to use the
Countable and IteratorAggregate interface using a different representation
of the query pair see
https://uri.thephpleague.com/components/7.0/query/#countable-and-iteratoraggregate
TL;DR instead of using your proposed structure
[['name' => 'value'],...]
I used the following
[['name', 'value'], ...]
While both format IMHO can allow implementing the IteratorAggregate
interface, the latter allows for a more predictable API
$uri = new Uri('https://example.com?param=foo¶m=bar¶m=baz');
foreach ($uri->getQueryParams() as $key => $pair) {
//first iteration $pair['param'] = 'foo'
//second iteration $pair['param'] = 'bar'
//third iteration $pair['param'] = 'baz'
}
The user needs to know beforehand the name of the pair which is counter
intuitive if you do not know the exact position
of the pair. In contrast, using the league URI query syntax you will have
the following:
$uri = new Uri('https://example.com?param=foo¶m=bar¶m=baz');
Query::fromUri($uri);
foreach (Query::fromUri($uri) as $key => $pair) {
//first iteration $pair[0] = 'param'; $pair[1] = 'foo'
//second iteration $pair[0] = 'param'; $pair[1] = 'baz'
//third iteration $pair[0] = 'param'; $pair[1] = 'bar'
}
The user will always get the parameter name using $pair[0] and the value
using $pair[1] regardless of their content and value.
What do you think ? This IMHO would solve your issue but it is indeed a
stronger departure to how query strings are parsed in PHP currently.
Best regards.
> Hi Everyone,
>
> I'd like to introduce my latest RFC that I've been working on for a while
> now: https://wiki.php.net/rfc/uri_followup.
>
> It proposes 5 followup improvements for ext/uri in the following areas:
> - URI Building
> - Query Parameter Manipulation
> - Accessing Path Segments as an Array
> - Host Type Detection
> - URI Type Detection
> - Percent-Encoding and Decoding Support
>
> I did my best to write an RFC that was at least as extensive as
> https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
> my efforts,
> there are still a couple things which need a final decision, or which
> need to be polished/improved. Some examples:
>
> - How to support array/object values for constructing query strings? (
> https://wiki.php.net/rfc/uri_followup#type_support)
> - How to make the UriQueryParams and UrlQueryParams classes more
> interoperable with the query string component (mainly with respect to
> percent-encoding)? (
> https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)
> - Exactly how the advanced percent-decoding capabilities should work? Does
> it make sense to support all the possible modes (UriPercentEncodingMode)
> for percent-decoding as well (
> https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
> )
> - etc.
>
> Regards,
> Máté
Hi Everyone,
I'd like to introduce my latest RFC that I've been working on for a
while now: https://wiki.php.net/rfc/uri_followup.It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
- Query Parameter Manipulation
- Accessing Path Segments as an Array
- Host Type Detection
- URI Type Detection
- Percent-Encoding and Decoding Support
I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:
- How to support array/object values for constructing query strings?
(https://wiki.php.net/rfc/uri_followup#type_support)- How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)?
(https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)- Exactly how the advanced percent-decoding capabilities should work?
Does it make sense to support all the possible modes
(UriPercentEncodingMode) for percent-decoding as well
(https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support)- etc.
Regards,
Máté
Thanks, Máté.
Notes as I read through:
- I really, really hate the "set" prefix on all the methods. It's a builder object, surely the "set" is implied?
$builder->scheme('https')->host('example.com')->path('/foo/bar')->build();
That's nice and easy to read.
-
It really feels like there's an interface to extract here from the Url/UriBuilder classes. There's literally only one type-specific method (build()).
-
UriQueryParams::hasWithValue(), could that be just hasValue()? You still need to specify the key anyway, and that's self-evident from the signature.
-
There's a
count()method, so shouldn't Ur{i|l]QueryParams implement Countable? -
As above, there really is an interface lurking in UriQueryParams...
-
Why both Uri getRawQueryParams() and getQueryParams()? It looks like they would return the same value, no? (If not, that should be explained.)
-
The
sort()method... should it take an optional user callback, or do we lock people in to lexical ordering? -
It would be quite convenient of set() and append() returned $this, allowing them to be chained.
-
The fromArray() logic is... totally weird and unexpected and I hate it. :-) Why can't you support repeated query parameters using nested arrays rather than gumming up all calls with a wonky format?
-
It's not clear how one would start a new query from scratch, with the private constructor. There doesn't seem to be a justification for the private. I can't see why new UriQueryParams()->set('foo', 'bar') is a bad thing.
-
Type support: Looks reasonable to me.
-
The HostType logic seems reasonable to me.
-
Url::isSpecial() Could we come up with a better name here? "Special" could mean anything unless you know the RFC; it feels like "real escape string" all over again.
Some parts of this are over my head as I've not read the relevant RFCs, but overall I do like the direction.
--Larry Garfield
Hi Larry,
- Url::isSpecial() Could we come up with a better name here? "Special"
could mean anything unless you know the RFC; it feels like "real escape
string" all over again.
This comes from the WHATWG specification the isSpecial is how it is named
there
It really feels like there's an interface to extract here from the
Url/UriBuilder classes. There's literally only one type-specific method
(build()).
Yes but the return type is not always the same object (Uri and Url are
different so I would be incline not adding a useless interface they are
similar yet different)
There's a count() method, so shouldn't Ur{i|l]QueryParams implement
Countable?
I tend to agree Countable and IteratoAggregate should be implemented IMHO
- The fromArray() logic is... totally weird and unexpected and I hate it.
:-) Why can't you support repeated query parameters using nested arrays
rather than gumming up all calls with a wonky format?
I also do not like the fromArray there are 2 ways to represents query
parameters either you use the WHATWG spec in which case they are pairs or
you use PHP own algorithm (with it's own caveat and have something
resemble the result of parse_str which is destructive by essence. I would
prefer the named constructor to reflect that.
- The
sort()method... should it take an optional user callback, or do we
lock people in to lexical ordering?
This is also derived from the WHATWG spec. Adding a callback might be
useful but it really then depends on how you represent each query
parameter/pairs.
- Why both Uri getRawQueryParams() and getQueryParams()? It looks like
they would return the same value, no? (If not, that should be explained.)
Because Uri\Rfc3986\Uri already exposes Uri::getQuery and Uri::getRawQuery.
- It would be quite convenient of set() and append() returned $this,
allowing them to be chained.
+1
I believe that the structure for the query string is the thing that will
need more explaining. Once it is correctly settled on the rest can easily
be derived from.
my 2 cents.
On Mon, Dec 1, 2025 at 11:22 PM Larry Garfield larry@garfieldtech.com
wrote:
Hi Everyone,
I'd like to introduce my latest RFC that I've been working on for a
while now: https://wiki.php.net/rfc/uri_followup.It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
- Query Parameter Manipulation
- Accessing Path Segments as an Array
- Host Type Detection
- URI Type Detection
- Percent-Encoding and Decoding Support
I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:
- How to support array/object values for constructing query strings?
(https://wiki.php.net/rfc/uri_followup#type_support)- How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)?
(https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)- Exactly how the advanced percent-decoding capabilities should work?
Does it make sense to support all the possible modes
(UriPercentEncodingMode) for percent-decoding as well
(
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)- etc.
Regards,
MátéThanks, Máté.
Notes as I read through:
- I really, really hate the "set" prefix on all the methods. It's a
builder object, surely the "set" is implied?$builder->scheme('https')->host('example.com')->path('/foo/bar')->build();
That's nice and easy to read.
It really feels like there's an interface to extract here from the
Url/UriBuilder classes. There's literally only one type-specific method
(build()).UriQueryParams::hasWithValue(), could that be just hasValue()? You
still need to specify the key anyway, and that's self-evident from the
signature.There's a
count()method, so shouldn't Ur{i|l]QueryParams implement
Countable?As above, there really is an interface lurking in UriQueryParams...
Why both Uri getRawQueryParams() and getQueryParams()? It looks like
they would return the same value, no? (If not, that should be explained.)The
sort()method... should it take an optional user callback, or do we
lock people in to lexical ordering?It would be quite convenient of set() and append() returned $this,
allowing them to be chained.The fromArray() logic is... totally weird and unexpected and I hate it.
:-) Why can't you support repeated query parameters using nested arrays
rather than gumming up all calls with a wonky format?It's not clear how one would start a new query from scratch, with the
private constructor. There doesn't seem to be a justification for the
private. I can't see why new UriQueryParams()->set('foo', 'bar') is a bad
thing.Type support: Looks reasonable to me.
The HostType logic seems reasonable to me.
Url::isSpecial() Could we come up with a better name here? "Special"
could mean anything unless you know the RFC; it feels like "real escape
string" all over again.Some parts of this are over my head as I've not read the relevant RFCs,
but overall I do like the direction.--Larry Garfield
Hi Larry,
- Url::isSpecial() Could we come up with a better name here? "Special" could mean anything unless you know the RFC; it feels like "real escape string" all over again.This comes from the WHATWG specification the isSpecial is how it is named there
I realize that, but the vast majority of PHP devs won't have read the official spec so don't know what "special" means. Special in what way? The scheme, the path, the encoding? It's completely non-obvious unless you're versed in the specification, which, again, most people won't be.
It really feels like there's an interface to extract here from the Url/UriBuilder classes. There's literally only one type-specific method (build()).
Yes but the return type is not always the same object (Uri and Url are
different so I would be incline not adding a useless interface they are
similar yet different)
So the interface doesn't cover build(). Problem solved.
- Why both Uri getRawQueryParams() and getQueryParams()? It looks like they would return the same value, no? (If not, that should be explained.)
Because Uri\Rfc3986\Uri already exposes Uri::getQuery and Uri::getRawQuery.
Then the RFC needs to explain why we need to have both for the query params, and how they differ. Right now, there's no explanation of how they differ, and the example suggests that they'd return identical values.
--Larry Garfield
- Url::isSpecial() Could we come up with a better name here?
I would go with Url::isStandardBrowserUrl() or something similar. The
special in the WHATWG spec represents a list of specific URI schemes. It
currently contains http(s), ws(s) and file. But since it is a living
standard, if I remember correctly, at some point, the gopher scheme for
instance was also listed there and in contrast the data or the blob URI
scheme have never been listed there. So finding the right name which does
not get outdated depending on what it represents is ... special
- So the interface doesn't cover build(). Problem solved.
And it does not cover (set)userInfo, (set)username, (set)password ...
modifiers too and depending on if validation is taken into account you may
also remove (set)host from your interface
- Then the RFC needs to explain why we need to have both for the query
params, and how they differ.
The RFC3986 URI exposes therawnon normalized URI component as well as
the normalized one. The only distinction from the Query component
perspective in RFC3986 is that encoded characters need to be uppercased. So
if you want to work with the raw input untouched you will need the
getRawQuery and thus the getRawQueryParams. As I said this is already
covered in the previous RFC. Where I do agree with you
is, if after creating the UriQueryParams the result is identical, then we
may not need the getRawQueryParams method at all.
Best regards,
Ignace
Hi Larry,
- I really, really hate the "set" prefix on all the methods. It's a
builder object, surely the "set" is implied?
I like that the "set" prefix makes all setters grouped together. For
example, should we have to add some extra methods (like authority()),
the build() method would appear between authority() and fragment() in IDE
autocomplete lists without the set() prefix.
- It really feels like there's an interface to extract here from the
Url/UriBuilder classes. There's literally only one type-specific method
(build()).
It's the very same thing that we discussed for a very long time last
time... What would be the purpose of the interface? To make the two builders
interchangeable? But they produce fundamentally different URIs. Even if we
don't include build(), then we still have some differences between
the two implementations:
- components are different: even though most components match, RFC 3986 and
WHATWG URL still have a difference; notably, the userinfo
component is only acknowledged by RFC 3986, on the other hand, the username
and password components are only modifiable in case of WHATWG URL - validation rules are different: each implementation has their own
validation rules for each component. I'll need to clarify this in the RFC,
but some purely
syntax based validations should be performed during the setter/wither calls
(e.g. scheme cannot contain "%"), but the ones which rely on the "global
state" (e.g.
the host is required when the userinfo is set) should be performed by the
build() method in order to avoid the temporal coupling I mentioned in the
RFC.
- UriQueryParams::hasWithValue(), could that be just hasValue()? You
still need to specify the key anyway, and that's self-evident from the
signature.
Yes, I was already considering updating this name, but I was sure that
someone (with 99% confidence of you) will point this out and suggest a
better one.
I agree that hasValue() is probably the right choice, although
hasNameAndValue() would be the most technically correct name...
- There's a
count()method, so shouldn't Ur{i|l]QueryParams implement
Countable?
Yes, it can. I thought that implementing Countable on its own (without
IteratorAggregate) was less useful, so I omitted it. Ignace suggested
another approach that
would allow implementing IteratorAggregate: if it happens then I'm totally
fine with also implementing Countable.
- As above, there really is an interface lurking in UriQueryParams...
I have the same comment as for the builder with one small caveat: as far as
I know the implementations, the biggest differences between
UriQueryíParams and UrlQueryParams are how they parse the input, and how
they percent-encode them during recomposition. The rest is fairly similar
for
now at least.
I even had a brief moment when I thought that merging the two
implementations into one is a good idea, but I came to the conclusion that
it isn't so that the two
classes have the possibility to evolve separately, if needed. So I'd follow
the path of the original URI/URL debate and would not try to make the two
implementations
interoperable. They are only interoperable on the surface. :)
- Why both Uri getRawQueryParams() and getQueryParams()? It looks like
they would return the same value, no? (If not, that should be explained.)
This is actually already briefly explained in the RFC (but thanks to Ignace
how also described this part):
The difference between Uri\Rfc3986\Uri::getRawQueryParams() and
Uri\Rfc3986\Uri::getQueryParams() is that the former one passes the “raw”
(non-normalized)
query string as an input when instantiating Uri\Rfc3986\Uri\UriQueryParams.
- The
sort()method... should it take an optional user callback, or do we
lock people in to lexical ordering?
Only WHATWG URL specifies its behavior, and it uses basic alphanumeric
sorting. Even though there's nothing that could stop us from implementing
fancier
sorting ways, I think it's already fine as-is. Sort() can be used to
guarantee that the query components are in deterministic order, and I think
that's all that we need.
- It would be quite convenient of set() and append() returned $this,
allowing them to be chained.
That's fine for me. WHATWG URL specifies their return type as void, so I
went with this, but there's nothing wrong with returning $this.
- The fromArray() logic is... totally weird and unexpected and I hate it.
:-) Why can't you support repeated query parameters using nested arrays
rather than
gumming up all calls with a wonky format?
Do you mean something like ["foo" => [0, 1, 2, 3]])"? I think it is indeed
possible to implement what you suggested. Whenever the basic structure of
the proposal settles a little bit,
I'll update the implementation, and I'll try to find out a sensible
behavior for arrays/objects.
- It's not clear how one would start a new query from scratch, with the
private constructor. There doesn't seem to be a justification for the
private.
I can't see why new UriQueryParams()->set('foo', 'bar') is a bad thing.
Yes, starting from scratch is only possible by using
UriQueryParams::fromArray([]) or UriQueryParams::parse(""). But I don't
have any fundamental issue with
adding support for the empty constructor variant.
- Url::isSpecial() Could we come up with a better name here? "Special"
could mean anything unless you know the RFC; it feels like "real escape
string" all over again.
The "special URL" is indeed the technicus terminus that WHATWG URL uses.
The RFC explains the concept briefly:
The WHATWG URL specification defines some special schemes (http, https,
ftp, file, ws, wss), which have distinct parsing and serialization rules.
I don't have any issues with the current name, but the only alternative I
could imagine is Uri\WhatWg\Url::isSpecialScheme().
Regards,
Máté
Hi Larry,
- I really, really hate the "set" prefix on all the methods. It's a builder object, surely the "set" is implied?
I like that the "set" prefix makes all setters grouped together. For
example, should we have to add some extra methods (like authority()),
the build() method would appear between authority() and fragment() in
IDE autocomplete lists without the set() prefix.
That seems like a very minor point. Code will be read hundreds of times more than it is auto-completed...
- It really feels like there's an interface to extract here from the Url/UriBuilder classes. There's literally only one type-specific method (build()).
It's the very same thing that we discussed for a very long time last
time... What would be the purpose of the interface? To make the two
builders
interchangeable? But they produce fundamentally different URIs. Even if
we don't include build(), then we still have some differences between
the two implementations:
I guess what's bugging me here is that in the typical case, URLs and URIs are interchangeable. If you're just using a standard ASCII domain and path, which is the vast majority of URLs, then either one gets you the same result. So it feels grating to have to deal with two different versions of that logic; say, if I just want to pull the path off of one, or set the path on one.
I 1000% realize that's not your fault, nor PHP's fault, it's the fault of the two competing standards that don't talk to each other. The theoretical Venn diagram overlap of URLs and URIs is relatively small. The practical overlap is quite large. So that makes ignoring the overlap very grating. Hence why I keep looking for places to codify the safe overlap.
- UriQueryParams::hasWithValue(), could that be just hasValue()? You still need to specify the key anyway, and that's self-evident from the signature.
Yes, I was already considering updating this name, but I was sure that
someone (with 99% confidence of you) will point this out and suggest a
better one.
Nice to know I'm predictable? :-)
I agree that hasValue() is probably the right choice, although
hasNameAndValue() would be the most technically correct name...
- There's a
count()method, so shouldn't Ur{i|l]QueryParams implement Countable?Yes, it can. I thought that implementing Countable on its own (without
IteratorAggregate) was less useful, so I omitted it. Ignace suggested
another approach that
would allow implementing IteratorAggregate: if it happens then I'm
totally fine with also implementing Countable.
I don't feel strongly about IteratorAggregate. I'm not entirely sure I see the value for that. But Countable seems like a no-brainer to include.
- Why both Uri getRawQueryParams() and getQueryParams()? It looks like they would return the same value, no? (If not, that should be explained.)
This is actually already briefly explained in the RFC (but thanks to
Ignace how also described this part):The difference between Uri\Rfc3986\Uri::getRawQueryParams() and Uri\Rfc3986\Uri::getQueryParams() is that the former one passes the “raw” (non-normalized) query string as an input when instantiating Uri\Rfc3986\Uri\UriQueryParams.
I have read that sentence 3 times and it's still not making sense in my head. Can you clarify with an example (in the RFC)?
- The
sort()method... should it take an optional user callback, or do we lock people in to lexical ordering?Only WHATWG URL specifies its behavior, and it uses basic alphanumeric
sorting. Even though there's nothing that could stop us from
implementing fancier
sorting ways, I think it's already fine as-is.Sort()can be used to
guarantee that the query components are in deterministic order, and I
think that's all that we need.
Fair enough.
- It would be quite convenient of set() and append() returned $this, allowing them to be chained.
That's fine for me. WHATWG URL specifies their return type as void, so
I went with this, but there's nothing wrong with returning $this.
It would certainly make for cleaner code. Possibly sort() should also return $this.
My thinking is that half the time we'll just be inlining a builder call somewhere and directly passing it to the appropriate UR object, so the more we can avoid "ugh, I must have a temp variable here for some damned reason", the better.
- The fromArray() logic is... totally weird and unexpected and I hate it. :-) Why can't you support repeated query parameters using nested arrays rather than gumming up all calls with a wonky format?
Do you mean something like ["foo" => [0, 1, 2, 3]])"? I think it is
indeed possible to implement what you suggested. Whenever the basic
structure of the proposal settles a little bit,
I'll update the implementation, and I'll try to find out a sensible
behavior for arrays/objects.
Yes, that's more what I was thinking. Thanks.
- It's not clear how one would start a new query from scratch, with the private constructor. There doesn't seem to be a justification for the private. I can't see why new UriQueryParams()->set('foo', 'bar') is a bad thing.
Yes, starting from scratch is only possible by using
UriQueryParams::fromArray([]) or UriQueryParams::parse(""). But I don't
have any fundamental issue with
adding support for the empty constructor variant.
An empty constructor would help make it more ergonomic, yes. Or, heck, UriQueryParams::new() would also work, and avoid the edge cases of constructor calls. :-) Just some more natural way to "start from scratch."
- Url::isSpecial() Could we come up with a better name here? "Special" could mean anything unless you know the RFC; it feels like "real escape string" all over again.
The "special URL" is indeed the technicus terminus that WHATWG URL
uses. The RFC explains the concept briefly:The WHATWG URL specification defines some special schemes (http, https, ftp, file, ws, wss), which have distinct parsing and serialization rules.
I don't have any issues with the current name, but the only alternative
I could imagine is Uri\WhatWg\Url::isSpecialScheme().
That would be imperfect, but still a major improvement from just isSpecial(), as it specifies that it's the scheme that's special, not the whole URL. What "special" means is still unclear, but at least the scope is reduced. IOW, yes please.
--Larry Garfield
Hi Máté,
After quickly checking the proposed API for Percent-Encoding and Decoding
Support I wonder if the following would not
be more appropriate ?
namespace Uri\Rfc3986 {
enum UriPercentEncoding
{
case UserInfo;
case Host;
case RelativeReferencePath;
case RelativeReferenceFirstPathSegment;
case Path;
case PathSegment;
case Query;
case FormQuery;
case Fragment;
case AllReservedCharacters;
case All;
public function encode(string $input): string {}
public function decode(string $input): string {}
}
}
With the same logic being applied in the Uri\Whatwg namespace. This
would make for a better encapsulated feature. So we can
have a clear distinction between the Value Object, its builder and the
Encoding mechanism ? What do you think?
Best regards,
Ignace
Hi Everyone,
I'd like to introduce my latest RFC that I've been working on for a while
now: https://wiki.php.net/rfc/uri_followup.It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
- Query Parameter Manipulation
- Accessing Path Segments as an Array
- Host Type Detection
- URI Type Detection
- Percent-Encoding and Decoding Support
I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:
- How to support array/object values for constructing query strings? (
https://wiki.php.net/rfc/uri_followup#type_support)- How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)? (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)- Exactly how the advanced percent-decoding capabilities should work? Does
it make sense to support all the possible modes (UriPercentEncodingMode)
for percent-decoding as well (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)- etc.
Regards,
Máté
Hi Ignace,
After quickly checking the proposed API for Percent-Encoding and Decoding
Support I wonder if the following would not
be more appropriate ?namespace Uri\Rfc3986 { enum UriPercentEncoding { case UserInfo; case Host; case RelativeReferencePath; case RelativeReferenceFirstPathSegment; case Path; case PathSegment; case Query; case FormQuery; case Fragment; case AllReservedCharacters; case All; public function encode(string $input): string {} public function decode(string $input): string {} } }With the same logic being applied in the
Uri\Whatwgnamespace. This would make for a better encapsulated feature. So we canhave a clear distinction between the Value Object, its builder and the Encoding mechanism ? What do you think?
Yes, I was also wondering whether the URI/URL classes are really the best
places for the percentEncode() and percentDecode() methods, because
they are not only relevant for URIs/URLs but e.g. also for QueryParams.. So
overall, I'm also fine with moving the percent-encoding/decoding
capabilities to a separate place. Honestly, the enums themselves didn't
come to my mind... I think it's a good candidate. So I'll definitely
consider it.
Probably my only concern is the name, specifically, the "ing" suffix: it
suggests that it can only keep data, and cannot do any operation. The
latter would
rather have an "er" suffix (e.g. UriPercentEncoder). But I'm happy to get
feedback/suggestions about the options.
Máté
Hi Màté,
I read the Accessing Path Segments as an Array sub RFC and I have a couple
of remarks, suggestions.
In the RFC text it is said that:
The getter methods return null if the path is empty (https://example.com),
an empty array when the path
consists of a single slash (https://example.com/), and a non-empty array
otherwise.
This is suboptimal to me because it means that the signature for the getter
methods is array|null which would lead
developers to always add a check in the code whenever using the method to
distinguish the path state absolute or not.
Instead, I would rather always get a single type, the array as return
value. The issue you are facing is that
you want to convey via your return type if the path is absolute or not.
But, we already have access to this
information via the UriType Enum, at least in the case of the
Uri\Rfc3986\Uri class. For the Uri\WhatWg\Uri
the information is less crucial as the validation and normalization rules
of the WHATWG specifications
will autocorrect the path if needed. This leads me to propose the following
alternative:
For Uri\Rfc3986\Uri:
/** @return list<string> */
Uri::getPathSegments(): array {}
/** @return list<string> */
Uri::getRawPathSegments(): array {}
#[\NoDiscard(message: "as Uri\Rfc3986\Uri::withPathSegments() does not
modify the object itself")]
Uri::withPathSegments(array $segments, Uri\Rfc3986\UriType $uriType =
Uri\Rfc3986\UriType::RelativePathReference): static {}
(the default value for the $uriType parameter is TBD).
For Uri\WhatWg\Url:
/** @return list<string> */
Url::getPathSegments(): array {}
#[\NoDiscard(message: "as Uri\WhatWg\Url::withPathSegments() does not
modify the object itself")]
/** @param list<UrlValidationError> $errors */
Url::withPathSegments(array $segments): static {}
with the following behaviour
The getter methods return the empty array if the path is empty
(https://example.com https://example.com), or a single slash
(https://example.com/ https://example.com/),and a non-empty array
otherwise. To distinguish between an absolute path and a relative path you
can refer to the Uri\Rfc3986\Uri::getUriType(),
method, in case of RFC 3986 URI, and the information does not matter
otherwise (ie: for WHATWG URL).
During update, for RFC 3986 URI, The additional $uriType argument would
serve to tell if a / should be prepended or not to the generated
string path. For the WHATWG URL, no soft errors are emitted, which show
that the starting slash does not really matter.
Best regards,
Ignace
Hi Everyone,
I'd like to introduce my latest RFC that I've been working on for a while
now: https://wiki.php.net/rfc/uri_followup.It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
- Query Parameter Manipulation
- Accessing Path Segments as an Array
- Host Type Detection
- URI Type Detection
- Percent-Encoding and Decoding Support
I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:
- How to support array/object values for constructing query strings? (
https://wiki.php.net/rfc/uri_followup#type_support)- How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)? (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)- Exactly how the advanced percent-decoding capabilities should work? Does
it make sense to support all the possible modes (UriPercentEncodingMode)
for percent-decoding as well (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)- etc.
Regards,
Máté
Hi Ignace,
The getter methods return null if the path is empty (https://example.com),
an empty array when the path
consists of a single slash (https://example.com/), and a non-empty
array otherwise.
Yes, that's correct!
Instead, I would rather always get a single type, the array as return
value. The issue you are facing is that
you want to convey via your return type if the path is absolute or not.
But, we already have access to this
information via the UriType Enum, at least in the case of the
Uri\Rfc3986\Uri class.
The UriType enum in its current form is not really suitable, because it can
only distinguish relative and absolute
path references ("foo" vs "/foo"), but not absolute URIs ("
https://example.com" vs "https://example.com/").
"https://example.com" and "https://example.com/" are both absolute URIs,
and the former one has an empty path.
In order to find out the correct behavior, I think we should first try to
dig deeper into the definition of path segments.
Also, in order to have some inspiration, I checked how similar
functionality works in other languages, C# notably:
https://learn.microsoft.com/en-us/dotnet/api/system.uri.segments?view=net-10.0#system-uri-segments
Making the leading "/" its own segment feels a little bit off at the first
sight (not to mention that the "/" characters
are part of the segments), because RFC 3986 specifies that path segments
start after the leading "/" due to the
following ABNF rule:
path-abempty = *( "/" segment )
That is, for URIs containing an authority component, the path is either
empty, or contains a "/" followed by a segment
one or multiple times. Then segments have the following syntax:
segment = *pchar
That is, segments are composed of zero or multiple characters in the
"pchar" charset (the exact values don't matter
in this case). So let's see some basic examples with absolute URIs:
"https://example.com" -> no path segments: []
"https://example.com/foo" -> one path segment "foo": ["foo"]
Consequently:
"https://example.com/" -> one path segment which is empty: [""]
"https://example.com/foo/" -> two path segments: ["foo", ""]
Then the behavior of C# starts to make some sense - at least when the path
only consists of a "/" character (IMO it
doesn't make sense for other cases like "/foo").
Now let's see what to do with path references:
"" (empty string) -> no path segments: []
"/foo" -> one path segment "foo": ["foo"]
"foo" -> one path segment "foo": ["foo"]
"foo/" -> two path segments: ["foo", ""]
"/" -> one path segment which is empty: [""]
Unfortunately, this is not all, there are a few other special cases for
absolute URIs:
"https://" -> means that there's an authority, but it's empty, therefore
the path is also empty, therefore no path segments -> []
"https:/" -> means that there's no authority, and the path is "/",
therefore one path segment which is empty -> [""]
"https:" -> means that there's no authority, and the path is "", therefore
no path segments -> []
As far as I can see, this behavior is completely logical and satisfies the
definitions of RFC 3986. However, one case
may possibly need disambiguation in relation to the withPathSegments()
method: "/foo" vs "foo". (P.S. the uriparser library
had to use a special field for tracking exactly these cases.)
That being said, I agree with you that the currently suggested signatures
should be changed. However, accepting an
additional UriType parameter by the withPathSegments() method wouldn't be
correct, because I've just demonstrated
that the behavior doesn't depend on whether an URI is absolute or relative,
but whether the authority component is
defined or not.
So my alternative idea for disambiguating the above mentioned case is the
following: adding a 2nd parameter
$addLeadingSlashForNonEmptyRelativeUri to the withPathSegments() method (I
know this param name is insanely long,
so I'm happy to get recommendations), and then a leading slash would be
added to the path if and only if all the 3
criteria are satisfied:
- the $addLeadingSlashForNonEmptyRelativeUri boolean parameter is true
- the first item in the $pathSegments array parameter is non-empty
- the target URI is relative
This means that calling $uri->withPathSegments(["", "foo"], false) and
$uri->withPathSegments(["foo"], true) would result
in the same path reference ("/foo") when $uri doesn't have an authority.
I'm fine with bikeshedding/fine-tuning these rules,
but I do think we should go with something along the lines of this.
For the Uri\WhatWg\Uri the information is less crucial as the validation
and normalization rules of the WHATWG
specifications will autocorrect the path if needed.
Yes, true.
Máté
Hey,
I love to see this RFC. I’ve got a few notes which likely come from me not understanding the specifications. But I think most users will not have read them and will not know or care about the differences, right?
- Query setters and types of their parameter. Quote from the builder example:
->setQuery("a=1&b=2"])
->setQueryParams(["a" => 1, "b" => 2]) // Has the same effect as the setQuery() call above
I’d expect the setQueryParams to set the particular params that I specified instead of overriding all the query (which I’d expect setQuery to do).
Maybe it would be possible for setQuery to accept string, array and instances of *QueryParams? And setQueryParams could either be left out or override just the selected params? As far as I see, there is no equivalent withQueryParams so there’s no precedent on how such a method should work, right?
Btw I wasn’t able to find docs on the existing Uri classes (had to look it up in the prev RFC). Is the search broken or are the docs just not there yet?
- Regarding the interface extraction, is there any difference between the internal states of
Uri\WhatWg\UrlQueryParamsandUri\Rfc3986\UriQueryParamsobjects? I would naively expect not only a common interface, but a common class. AQueryParamsthat could be supplied to any of the withers/setters and that would be able to->toRfc3986QueryString()or->toWhatWgQueryString()on use not on instantiation.
Similarly with the builders I fail to see why do I need to decide on Uri\Rfc3986\UriBuilder and Uri\WhatWg\UrlBuilder at the start instead of having Uri\Builder that could be consumed via ->buildRfc3986Url() or `->buildWhatWgUri(). Is that UserInfo thing the whole dealbreaker? To me all the other parts like specifying host, port and so on seem spec-agnostic until serialization.
- I agree with Larry that I’d expect
[‘key1’ => ‘value1’, ‘key2’ => ‘value2’]to “just work”. I understand the issue about having multiple entries with the same keys, but I don’t think I should have to understand that to successfully build queries.
Maybe another method like bestEffortFromArray()could be added that would supoprt various forms of nested arrays and do whatever is possible to make them into a query string, similar to how http_build_query() does? I know it’s not perfect, but it would be great to have the new tooling as easy to work with as the previous, instead of having to reshape your arrays to fit a particular syntax.
BR,
Juris
Hi Everyone,
I'd like to introduce my latest RFC that I've been working on for a while
now: https://wiki.php.net/rfc/uri_followup.It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
- Query Parameter Manipulation
- Accessing Path Segments as an Array
- Host Type Detection
- URI Type Detection
- Percent-Encoding and Decoding Support
I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:
- How to support array/object values for constructing query strings? (
https://wiki.php.net/rfc/uri_followup#type_support)- How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)? (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)- Exactly how the advanced percent-decoding capabilities should work? Does
it make sense to support all the possible modes (UriPercentEncodingMode)
for percent-decoding as well (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)- etc.
Regards,
Máté
Hi Màté,
After thinking about it here's my take on the current proposal
regarding the Query Parameter Manipulation RFC. Sorry for the wall of
text, but I tried to summarize my thoughts.
First of all, I tried to put myself in the shoes of a regular PHP
developer who has little to no knowledge about the different URI
specifications but has a general grasp of PHP. From that point of view
the developer knows that:
- PHP already gives access to the URI query parameters via the
_GET
super globals - to parse the query string in PHP, the developer can rely on
parse_str. - that to build a query string he should use the
http_build_queryfunction.
What we do know is that:
the _GET values are also the result of using parse_str and its logic is:
- not documented
- PHP centric
- mangles the data
- truncates query string
Its original goal was to allow direct conversion of query string into
PHP variables usable in scripts. But this behaviour has been removed
for security reasons from PHP.
http_build_query allow creating a query string in a more predictable
way but still exposes PHP centric behaviour:
-
It uses
get_object_varson objects. which is counter-intuitive:- All
iterablestructures do not give the same result. - Depending on the object implementation the result varies between
PHP versions (ieDateTimeImmutableused to be rendered before PHP7.4
since then it fails silently resulting in an empty string being
generated.)
- All
-
It adds "[", "]" and indices around arrays. This is PHP centric
(other languages would just repeat the array name) -
It always adds the array indices even when the array is a list which
again can lead to unexpected behaviour, even within the PHP ecosystem.
On the other hand:
- Other modern languages like Java HttpServletRequest or the WHATWG
URLSearchParams have a complete different takes: They view the query
string as a collection of tuple (key/value pair) that can be repeated,
there is no notion of brackets. The data is preserved even though as
you mention the round-trip between encoding and decoding is never
guarantee. - We have the new HTTP QUERY method which may or may not fall into the
"Should this also be managed by a putative Query class".
Currently, in your proposal you have 2 Query objects. This will give
the developper a lot of work to understand where, when and which
object to choose and why. Is that complexity really needed? IMHO we
may end up with a correct API ... that no-one will use.
With all that in mind I believe a single Uri\Query should be used.
Its goal should be:
- to be immutable
- to store the query in its decoded form.
- to manipulate and change the data in a consistent way.
Decoding/encoding should happen at the object boundaries but
everything inside the object should
be done on decoded data. Since no algorithm guarantee preserving
encoding during a decode/encode round-trip,
there is no need to try hard to do so.
This also means:
- having multiple string representations
- not having a
Uri::withQueryParamsor aUrl::withQueryParamsmethod.
It should be left to the developer to understand which string version he needs.
On a bonus side, it would be nice to have a mechanism in PHP that
allows the application to switch
from the current parse_str usage to the new improved parsing
provided by the new class when
populating the _GET array. (So that deprecating parse_str can be
initiated in some distant future.)
This last observation/remark is not mandatory but nice to have.
So I would propose the following methods:
namespace Uri {
//takes no arguments returns an empty object
Query::__construct();
// named constructor to allow
// returning a new instance from
// PHP variables (same syntax as http_build_query)
Query::fromVariables(array $variable): static
// named constructor to allow
// returning a new instance from
// a list of tuples see the returns
// value of Query::toTuples()
Query::fromTuples(array $params): static
// named constructor to allow
// returning a new instance from
// query string this is where
// decoding takes place
Query::parseRfc1738String(): ?static
Query::parseRfc3986String(): ?static
Query::parseFormDataString(): ?static
Query::parseWhatWgString(): ?static
//String representation query
//this is where encoding should happen
//internal decoded data
//should only be encoded here
Query::toRfc3986String();
Query::toRfc1738String();
Query::toFormDataString();
Query::toWhatWgString();
// Tuple related methods
// like the one defined by the WHATWG specifications
// method names are changed or update to highlight
// the immutable state for modifying methods
Query::toTuples(): array<string, null|string|array<null|string>>
Query::count(): int;
Query::has(string $name): bool;
Query::hasValue(string $name, null|string $value): bool;
Query::getFirst(string $name): null|string;
Query::getLast(string $name): null|string;
Query::getAll(string $name): array<null|string>;
// Tuple modifying methods
Query::sort(): static;
Query::withValue(string $name, null|string|array<null,string>
$value): static;
Query::append(string $name, null|string|array<null,string> $value): static;
Query::delete(string $name): static;
Query::deleteValue(string $name, null|string $value): static;
// PHP variables related methods
// the parse_str replacement API
Query::toVariables(): array; // returns the same array as
parse_str (without mangled data)
Query::countVariables(): int; // returns the number of variable found
Query::hasVariable(string $variableName): bool; // tells whether
the variable exists
Query::getVariable(string $variableName): null|string|array; //
returns the variable value
Query::mergeVariable(array $variables): static // the same syntax
returned by the `Query::toVariables` method
Query::replaceVariable(string $variableName,
null|string|int|float|array $value): static
Query::deleteVariable(string $variableName): static
}
With the following changes:
- in respect to
parse_str, no mangled data should occur on parsing:
parse_str("foo.bar=baz", $params);
echo $params['foo_bar']; // returns "baz"
array_key_exists('foo.bar', $params); // returns false
$query = \Uri\Query::parseRfc1738String("foo.bar=baz");
$query->getVariable("foo.bar"); //returns "baz"
$query->hasVariable("foo_bar"); //returns false
-
in respect to
http_build_query. -
Only accept scalar values,
null, andarray. If an object or a
resource is detected aValueErrorerror
should be thrown.
echo http_build_query(['a' => `tmpfile()`]); //return '';
new \Uri\Query::fromVariables(['a' => `tmpfile()`]); // throw new ValueError
- Remove the addition of indices if the
arrayis a list.
echo http_build_query(['a' => [3, 5, 7]]); //return
a%5B0%5D=3&a%5B1%5D=5&a%5B2%5D=7;
new \Uri\Query::fromVariables(['a' => [3, 5, 7]])->toRfc1738String();
// return a%5B%5D=3&a%5B%5D=5&a%5B%5D=7
Best regards,
Ignace
I'd like to introduce my latest RFC that I've been working on for a
while now: https://wiki.php.net/rfc/uri_followup.It proposes 5 followup improvements for ext/uri in the following areas:
- URI Building
Would it make sense to have an interface for the set*() methods? Besides
build(), they all seem to have the same API.
- Query Parameter Manipulation
I see this adds NoDiscard:
#[\NoDiscard(message: "as Uri\Rfc3986\Uri::withQueryParams() does not modify the object itself")]
But the original methods on the classes don't have these NoDiscards, and
it doesn't seem that this RFC is suggesting to add them. It should at
least be consistent.
- Accessing Path Segments as an Array
Compare:
"especially considering the fact that Uri\Rfc3986\Uri internally stores
the path as a list of segments."
And:
Uri\Rfc3986\Uri::withPathSegments() … internally concatenate the input
segments separated by a / character, and then trigger
Uri\Rfc3986\Uri::withPath() …
Why does it need to do this concattenation, and then call withPath() for
Rfc3986\Uri then?
cheers,
Derick
--
https://derickrethans.nl | https://xdebug.org | https://dram.io
Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support
mastodon: @derickr@phpc.social @xdebug@phpc.social