[RFC] [Discussion] Followup Improvements for ext/uri

2 months ago by Rob Landers — view source — reply

unread

Hi Everyone,

I'd like to introduce my latest RFC that I've been working on for a while now: https://wiki.php.net/rfc/uri_followup.

It proposes 5 followup improvements for ext/uri in the following areas:

URI Building

Query Parameter Manipulation

Accessing Path Segments as an Array

Host Type Detection

URI Type Detection

Percent-Encoding and Decoding Support

I did my best to write an RFC that was at least as extensive as https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite my efforts,
there are still a couple things which need a final decision, or which need to be polished/improved. Some examples:

How to support array/object values for constructing query strings? (https://wiki.php.net/rfc/uri_followup#type_support)

How to make the UriQueryParams and UrlQueryParams classes more interoperable with the query string component (mainly with respect to percent-encoding)? (https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)

Exactly how the advanced percent-decoding capabilities should work? Does it make sense to support all the possible modes (UriPercentEncodingMode) for percent-decoding as well (https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support)

etc.

Regards,
Máté

Hello!

For the builder methods, why not use the same wither method names? That would make switching to them really easy, over the current implementation.

— Rob

2 months ago by ignace nyamagana butera — view source — reply

unread

Hi Máté,
Once again thanks for this follow up RFC. While there are a lot to digest I
wanted to point out your reservation around implementing the
IteratorAggregate interface for Query Manipulation,

The UriQueryParams and UrlQueryParams classes could implement the

IteratorAggregate interface in theory. However, it's not possible to do so
due to query components that share the same name, e.g.:
param=foo&param=bar&param=baz. In this case, the same key (param) would be
repeated 3 times - and it's actually not possible to support with iterators.

When building the Query component in league URI I was able to use the
Countable and IteratorAggregate interface using a different representation
of the query pair see
https://uri.thephpleague.com/components/7.0/query/#countable-and-iteratoraggregate
TL;DR instead of using your proposed structure

[['name' => 'value'],...]

I used the following

[['name', 'value'], ...]

While both format IMHO can allow implementing the IteratorAggregate
interface, the latter allows for a more predictable API

$uri = new Uri('https://example.com?param=foo&param=bar&param=baz');
foreach ($uri->getQueryParams() as $key => $pair) {
    //first iteration $pair['param'] = 'foo'
    //second iteration $pair['param'] = 'bar'
    //third iteration $pair['param'] = 'baz'
}

The user needs to know beforehand the name of the pair which is counter
intuitive if you do not know the exact position
of the pair. In contrast, using the league URI query syntax you will have
the following:

$uri = new Uri('https://example.com?param=foo&param=bar&param=baz');
Query::fromUri($uri);
foreach (Query::fromUri($uri) as $key => $pair) {
//first iteration $pair[0] = 'param'; $pair[1] = 'foo'
//second iteration $pair[0] = 'param'; $pair[1] = 'baz'
//third iteration $pair[0] = 'param'; $pair[1] = 'bar'
}


The user will always get the parameter name using $pair[0] and the value
using $pair[1] regardless of their content and value.

What do you think ? This IMHO would solve your issue but it is indeed a
stronger departure to how query strings are parsed in PHP currently.

Best regards.



> Hi Everyone,
>
> I'd like to introduce my latest RFC that I've been working on for a while
> now: https://wiki.php.net/rfc/uri_followup.
>
> It proposes 5 followup improvements for ext/uri in the following areas:
> - URI Building
> - Query Parameter Manipulation
> - Accessing Path Segments as an Array
> - Host Type Detection
> - URI Type Detection
> - Percent-Encoding and Decoding Support
>
> I did my best to write an RFC that was at least as extensive as
> https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
> my efforts,
> there are still a couple things which need a final decision, or which
> need to be polished/improved. Some examples:
>
> - How to support array/object values for constructing query strings? (
> https://wiki.php.net/rfc/uri_followup#type_support)
> - How to make the UriQueryParams and UrlQueryParams classes more
> interoperable with the query string component (mainly with respect to
> percent-encoding)? (
> https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)
> - Exactly how the advanced percent-decoding capabilities should work? Does
> it make sense to support all the possible modes (UriPercentEncodingMode)
> for percent-decoding as well (
> https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
> )
> - etc.
>
> Regards,
> Máté

2 months ago by Larry Garfield — view source — reply

unread

Hi Everyone,

I'd like to introduce my latest RFC that I've been working on for a
while now: https://wiki.php.net/rfc/uri_followup.

It proposes 5 followup improvements for ext/uri in the following areas:

URI Building

Query Parameter Manipulation

Accessing Path Segments as an Array

Host Type Detection

URI Type Detection

Percent-Encoding and Decoding Support

I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:

How to support array/object values for constructing query strings?
(https://wiki.php.net/rfc/uri_followup#type_support)

How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)?
(https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)

Exactly how the advanced percent-decoding capabilities should work?
Does it make sense to support all the possible modes
(UriPercentEncodingMode) for percent-decoding as well
(https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support)

etc.

Regards,
Máté

Thanks, Máté.

Notes as I read through:

I really, really hate the "set" prefix on all the methods. It's a builder object, surely the "set" is implied?

$builder->scheme('https')->host('example.com')->path('/foo/bar')->build();

That's nice and easy to read.

It really feels like there's an interface to extract here from the Url/UriBuilder classes. There's literally only one type-specific method (build()).
UriQueryParams::hasWithValue(), could that be just hasValue()? You still need to specify the key anyway, and that's self-evident from the signature.
There's a count() method, so shouldn't Ur{i|l]QueryParams implement Countable?
As above, there really is an interface lurking in UriQueryParams...
Why both Uri getRawQueryParams() and getQueryParams()? It looks like they would return the same value, no? (If not, that should be explained.)
The sort() method... should it take an optional user callback, or do we lock people in to lexical ordering?
It would be quite convenient of set() and append() returned $this, allowing them to be chained.
The fromArray() logic is... totally weird and unexpected and I hate it. :-) Why can't you support repeated query parameters using nested arrays rather than gumming up all calls with a wonky format?
It's not clear how one would start a new query from scratch, with the private constructor. There doesn't seem to be a justification for the private. I can't see why new UriQueryParams()->set('foo', 'bar') is a bad thing.
Type support: Looks reasonable to me.
The HostType logic seems reasonable to me.
Url::isSpecial() Could we come up with a better name here? "Special" could mean anything unless you know the RFC; it feels like "real escape string" all over again.

Some parts of this are over my head as I've not read the relevant RFCs, but overall I do like the direction.

--Larry Garfield

2 months ago by ignace nyamagana butera — view source — reply

unread

Hi Larry,

Url::isSpecial() Could we come up with a better name here? "Special"

could mean anything unless you know the RFC; it feels like "real escape
string" all over again.

This comes from the WHATWG specification the isSpecial is how it is named
there

It really feels like there's an interface to extract here from the

Url/UriBuilder classes. There's literally only one type-specific method
(build()).

Yes but the return type is not always the same object (Uri and Url are
different so I would be incline not adding a useless interface they are
similar yet different)

There's a count() method, so shouldn't Ur{i|l]QueryParams implement

Countable?

I tend to agree Countable and IteratoAggregate should be implemented IMHO

The fromArray() logic is... totally weird and unexpected and I hate it.

:-) Why can't you support repeated query parameters using nested arrays
rather than gumming up all calls with a wonky format?

I also do not like the fromArray there are 2 ways to represents query
parameters either you use the WHATWG spec in which case they are pairs or
you use PHP own algorithm (with it's own caveat and have something
resemble the result of parse_str which is destructive by essence. I would
prefer the named constructor to reflect that.

The sort() method... should it take an optional user callback, or do we

lock people in to lexical ordering?

This is also derived from the WHATWG spec. Adding a callback might be
useful but it really then depends on how you represent each query
parameter/pairs.

Why both Uri getRawQueryParams() and getQueryParams()? It looks like

they would return the same value, no? (If not, that should be explained.)

Because Uri\Rfc3986\Uri already exposes Uri::getQuery and Uri::getRawQuery.

It would be quite convenient of set() and append() returned $this,

allowing them to be chained.

+1

I believe that the structure for the query string is the thing that will
need more explaining. Once it is correctly settled on the rest can easily
be derived from.
my 2 cents.

On Mon, Dec 1, 2025 at 11:22 PM Larry Garfield larry@garfieldtech.com
wrote:

Hi Everyone,

I'd like to introduce my latest RFC that I've been working on for a
while now: https://wiki.php.net/rfc/uri_followup.

It proposes 5 followup improvements for ext/uri in the following areas:

URI Building

Query Parameter Manipulation

Accessing Path Segments as an Array

Host Type Detection

URI Type Detection

Percent-Encoding and Decoding Support

I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:

How to support array/object values for constructing query strings?
(https://wiki.php.net/rfc/uri_followup#type_support)

How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)?
(https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)

Exactly how the advanced percent-decoding capabilities should work?
Does it make sense to support all the possible modes
(UriPercentEncodingMode) for percent-decoding as well
(
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)

etc.

Regards,
Máté

Thanks, Máté.

Notes as I read through:

I really, really hate the "set" prefix on all the methods. It's a
builder object, surely the "set" is implied?

$builder->scheme('https')->host('example.com')->path('/foo/bar')->build();

That's nice and easy to read.

It really feels like there's an interface to extract here from the
Url/UriBuilder classes. There's literally only one type-specific method
(build()).

UriQueryParams::hasWithValue(), could that be just hasValue()? You
still need to specify the key anyway, and that's self-evident from the
signature.

There's a count() method, so shouldn't Ur{i|l]QueryParams implement
Countable?

As above, there really is an interface lurking in UriQueryParams...

Why both Uri getRawQueryParams() and getQueryParams()? It looks like
they would return the same value, no? (If not, that should be explained.)

The sort() method... should it take an optional user callback, or do we
lock people in to lexical ordering?

It would be quite convenient of set() and append() returned $this,
allowing them to be chained.

The fromArray() logic is... totally weird and unexpected and I hate it.
:-) Why can't you support repeated query parameters using nested arrays
rather than gumming up all calls with a wonky format?

It's not clear how one would start a new query from scratch, with the
private constructor. There doesn't seem to be a justification for the
private. I can't see why new UriQueryParams()->set('foo', 'bar') is a bad
thing.

Type support: Looks reasonable to me.

The HostType logic seems reasonable to me.

Url::isSpecial() Could we come up with a better name here? "Special"
could mean anything unless you know the RFC; it feels like "real escape
string" all over again.

Some parts of this are over my head as I've not read the relevant RFCs,
but overall I do like the direction.

--Larry Garfield

2 months ago by Larry Garfield — view source — reply

unread

Hi Larry,

Url::isSpecial() Could we come up with a better name here? "Special" could mean anything unless you know the RFC; it feels like "real escape string" all over again.This comes from the WHATWG specification the isSpecial is how it is named there

I realize that, but the vast majority of PHP devs won't have read the official spec so don't know what "special" means. Special in what way? The scheme, the path, the encoding? It's completely non-obvious unless you're versed in the specification, which, again, most people won't be.

It really feels like there's an interface to extract here from the Url/UriBuilder classes. There's literally only one type-specific method (build()).
Yes but the return type is not always the same object (Uri and Url are
different so I would be incline not adding a useless interface they are
similar yet different)

So the interface doesn't cover build(). Problem solved.

Why both Uri getRawQueryParams() and getQueryParams()? It looks like they would return the same value, no? (If not, that should be explained.)
Because Uri\Rfc3986\Uri already exposes Uri::getQuery and Uri::getRawQuery.

Then the RFC needs to explain why we need to have both for the query params, and how they differ. Right now, there's no explanation of how they differ, and the example suggests that they'd return identical values.

--Larry Garfield

2 months ago by ignace nyamagana butera — view source — reply

unread

Url::isSpecial() Could we come up with a better name here?
I would go with Url::isStandardBrowserUrl() or something similar. The
special in the WHATWG spec represents a list of specific URI schemes. It
currently contains http(s), ws(s) and file. But since it is a living
standard, if I remember correctly, at some point, the gopher scheme for
instance was also listed there and in contrast the data or the blob URI
scheme have never been listed there. So finding the right name which does
not get outdated depending on what it represents is ... special

So the interface doesn't cover build(). Problem solved.
And it does not cover (set)userInfo, (set)username, (set)password ...
modifiers too and depending on if validation is taken into account you may
also remove (set)host from your interface

Then the RFC needs to explain why we need to have both for the query
params, and how they differ.
The RFC3986 URI exposes the raw non normalized URI component as well as
the normalized one. The only distinction from the Query component
perspective in RFC3986 is that encoded characters need to be uppercased. So
if you want to work with the raw input untouched you will need the
getRawQuery and thus the getRawQueryParams. As I said this is already
covered in the previous RFC. Where I do agree with you
is, if after creating the UriQueryParams the result is identical, then we
may not need the getRawQueryParams method at all.

Best regards,
Ignace

2 months ago by kocsismate90@gmail.com — view source — reply

unread

Hi Larry,

I really, really hate the "set" prefix on all the methods. It's a

builder object, surely the "set" is implied?

I like that the "set" prefix makes all setters grouped together. For
example, should we have to add some extra methods (like authority()),
the build() method would appear between authority() and fragment() in IDE
autocomplete lists without the set() prefix.

It really feels like there's an interface to extract here from the
Url/UriBuilder classes. There's literally only one type-specific method
(build()).

It's the very same thing that we discussed for a very long time last
time... What would be the purpose of the interface? To make the two builders
interchangeable? But they produce fundamentally different URIs. Even if we
don't include build(), then we still have some differences between
the two implementations:

components are different: even though most components match, RFC 3986 and
WHATWG URL still have a difference; notably, the userinfo
component is only acknowledged by RFC 3986, on the other hand, the username
and password components are only modifiable in case of WHATWG URL
validation rules are different: each implementation has their own
validation rules for each component. I'll need to clarify this in the RFC,
but some purely
syntax based validations should be performed during the setter/wither calls
(e.g. scheme cannot contain "%"), but the ones which rely on the "global
state" (e.g.
the host is required when the userinfo is set) should be performed by the
build() method in order to avoid the temporal coupling I mentioned in the
RFC.

UriQueryParams::hasWithValue(), could that be just hasValue()? You
still need to specify the key anyway, and that's self-evident from the
signature.

Yes, I was already considering updating this name, but I was sure that
someone (with 99% confidence of you) will point this out and suggest a
better one.
I agree that hasValue() is probably the right choice, although
hasNameAndValue() would be the most technically correct name...

There's a count() method, so shouldn't Ur{i|l]QueryParams implement

Countable?

Yes, it can. I thought that implementing Countable on its own (without
IteratorAggregate) was less useful, so I omitted it. Ignace suggested
another approach that
would allow implementing IteratorAggregate: if it happens then I'm totally
fine with also implementing Countable.

As above, there really is an interface lurking in UriQueryParams...

I have the same comment as for the builder with one small caveat: as far as
I know the implementations, the biggest differences between
UriQueryíParams and UrlQueryParams are how they parse the input, and how
they percent-encode them during recomposition. The rest is fairly similar
for
now at least.

I even had a brief moment when I thought that merging the two
implementations into one is a good idea, but I came to the conclusion that
it isn't so that the two
classes have the possibility to evolve separately, if needed. So I'd follow
the path of the original URI/URL debate and would not try to make the two
implementations
interoperable. They are only interoperable on the surface. :)

Why both Uri getRawQueryParams() and getQueryParams()? It looks like
they would return the same value, no? (If not, that should be explained.)

This is actually already briefly explained in the RFC (but thanks to Ignace
how also described this part):

The difference between Uri\Rfc3986\Uri::getRawQueryParams() and

Uri\Rfc3986\Uri::getQueryParams() is that the former one passes the “raw”
(non-normalized)

query string as an input when instantiating Uri\Rfc3986\Uri\UriQueryParams.

The sort() method... should it take an optional user callback, or do we
lock people in to lexical ordering?

Only WHATWG URL specifies its behavior, and it uses basic alphanumeric
sorting. Even though there's nothing that could stop us from implementing
fancier
sorting ways, I think it's already fine as-is. Sort() can be used to
guarantee that the query components are in deterministic order, and I think
that's all that we need.

It would be quite convenient of set() and append() returned $this,
allowing them to be chained.

That's fine for me. WHATWG URL specifies their return type as void, so I
went with this, but there's nothing wrong with returning $this.

The fromArray() logic is... totally weird and unexpected and I hate it.
:-) Why can't you support repeated query parameters using nested arrays
rather than

gumming up all calls with a wonky format?

Do you mean something like ["foo" => [0, 1, 2, 3]])"? I think it is indeed
possible to implement what you suggested. Whenever the basic structure of
the proposal settles a little bit,
I'll update the implementation, and I'll try to find out a sensible
behavior for arrays/objects.

It's not clear how one would start a new query from scratch, with the

private constructor. There doesn't seem to be a justification for the
private.

I can't see why new UriQueryParams()->set('foo', 'bar') is a bad thing.

Yes, starting from scratch is only possible by using
UriQueryParams::fromArray([]) or UriQueryParams::parse(""). But I don't
have any fundamental issue with
adding support for the empty constructor variant.

Url::isSpecial() Could we come up with a better name here? "Special"
could mean anything unless you know the RFC; it feels like "real escape
string" all over again.

The "special URL" is indeed the technicus terminus that WHATWG URL uses.
The RFC explains the concept briefly:

The WHATWG URL specification defines some special schemes (http, https,

ftp, file, ws, wss), which have distinct parsing and serialization rules.

I don't have any issues with the current name, but the only alternative I
could imagine is Uri\WhatWg\Url::isSpecialScheme().

Regards,
Máté

2 months ago by Larry Garfield — view source — reply

unread

Hi Larry,

I really, really hate the "set" prefix on all the methods. It's a builder object, surely the "set" is implied?

I like that the "set" prefix makes all setters grouped together. For
example, should we have to add some extra methods (like authority()),
the build() method would appear between authority() and fragment() in
IDE autocomplete lists without the set() prefix.

That seems like a very minor point. Code will be read hundreds of times more than it is auto-completed...

It really feels like there's an interface to extract here from the Url/UriBuilder classes. There's literally only one type-specific method (build()).

It's the very same thing that we discussed for a very long time last
time... What would be the purpose of the interface? To make the two
builders
interchangeable? But they produce fundamentally different URIs. Even if
we don't include build(), then we still have some differences between
the two implementations:

I guess what's bugging me here is that in the typical case, URLs and URIs are interchangeable. If you're just using a standard ASCII domain and path, which is the vast majority of URLs, then either one gets you the same result. So it feels grating to have to deal with two different versions of that logic; say, if I just want to pull the path off of one, or set the path on one.

I 1000% realize that's not your fault, nor PHP's fault, it's the fault of the two competing standards that don't talk to each other. The theoretical Venn diagram overlap of URLs and URIs is relatively small. The practical overlap is quite large. So that makes ignoring the overlap very grating. Hence why I keep looking for places to codify the safe overlap.

UriQueryParams::hasWithValue(), could that be just hasValue()? You still need to specify the key anyway, and that's self-evident from the signature.

Yes, I was already considering updating this name, but I was sure that
someone (with 99% confidence of you) will point this out and suggest a
better one.

Nice to know I'm predictable? :-)

I agree that hasValue() is probably the right choice, although
hasNameAndValue() would be the most technically correct name...

There's a count() method, so shouldn't Ur{i|l]QueryParams implement Countable?

Yes, it can. I thought that implementing Countable on its own (without
IteratorAggregate) was less useful, so I omitted it. Ignace suggested
another approach that
would allow implementing IteratorAggregate: if it happens then I'm
totally fine with also implementing Countable.

I don't feel strongly about IteratorAggregate. I'm not entirely sure I see the value for that. But Countable seems like a no-brainer to include.

Why both Uri getRawQueryParams() and getQueryParams()? It looks like they would return the same value, no? (If not, that should be explained.)

This is actually already briefly explained in the RFC (but thanks to
Ignace how also described this part):

The difference between Uri\Rfc3986\Uri::getRawQueryParams() and Uri\Rfc3986\Uri::getQueryParams() is that the former one passes the “raw” (non-normalized) query string as an input when instantiating Uri\Rfc3986\Uri\UriQueryParams.

I have read that sentence 3 times and it's still not making sense in my head. Can you clarify with an example (in the RFC)?

The sort() method... should it take an optional user callback, or do we lock people in to lexical ordering?

Only WHATWG URL specifies its behavior, and it uses basic alphanumeric
sorting. Even though there's nothing that could stop us from
implementing fancier
sorting ways, I think it's already fine as-is. Sort() can be used to
guarantee that the query components are in deterministic order, and I
think that's all that we need.

Fair enough.

It would be quite convenient of set() and append() returned $this, allowing them to be chained.

That's fine for me. WHATWG URL specifies their return type as void, so
I went with this, but there's nothing wrong with returning $this.

It would certainly make for cleaner code. Possibly sort() should also return $this.

My thinking is that half the time we'll just be inlining a builder call somewhere and directly passing it to the appropriate UR object, so the more we can avoid "ugh, I must have a temp variable here for some damned reason", the better.

The fromArray() logic is... totally weird and unexpected and I hate it. :-) Why can't you support repeated query parameters using nested arrays rather than gumming up all calls with a wonky format?

Do you mean something like ["foo" => [0, 1, 2, 3]])"? I think it is
indeed possible to implement what you suggested. Whenever the basic
structure of the proposal settles a little bit,
I'll update the implementation, and I'll try to find out a sensible
behavior for arrays/objects.

Yes, that's more what I was thinking. Thanks.

It's not clear how one would start a new query from scratch, with the private constructor. There doesn't seem to be a justification for the private. I can't see why new UriQueryParams()->set('foo', 'bar') is a bad thing.

Yes, starting from scratch is only possible by using
UriQueryParams::fromArray([]) or UriQueryParams::parse(""). But I don't
have any fundamental issue with
adding support for the empty constructor variant.

An empty constructor would help make it more ergonomic, yes. Or, heck, UriQueryParams::new() would also work, and avoid the edge cases of constructor calls. :-) Just some more natural way to "start from scratch."

Url::isSpecial() Could we come up with a better name here? "Special" could mean anything unless you know the RFC; it feels like "real escape string" all over again.

The "special URL" is indeed the technicus terminus that WHATWG URL
uses. The RFC explains the concept briefly:

The WHATWG URL specification defines some special schemes (http, https, ftp, file, ws, wss), which have distinct parsing and serialization rules.

I don't have any issues with the current name, but the only alternative
I could imagine is Uri\WhatWg\Url::isSpecialScheme().

That would be imperfect, but still a major improvement from just isSpecial(), as it specifies that it's the scheme that's special, not the whole URL. What "special" means is still unclear, but at least the scope is reduced. IOW, yes please.

--Larry Garfield

2 months ago by ignace nyamagana butera — view source — reply

unread

Hi Máté,
After quickly checking the proposed API for Percent-Encoding and Decoding
Support I wonder if the following would not
be more appropriate ?

namespace Uri\Rfc3986 {
    enum UriPercentEncoding
    {
        case UserInfo;
        case Host;
        case RelativeReferencePath;
        case RelativeReferenceFirstPathSegment;
        case Path;
        case PathSegment;
        case Query;
        case FormQuery;
        case Fragment;
        case AllReservedCharacters;
        case All;

        public function encode(string $input): string {}
        public function decode(string $input): string {}
    }
}

With the same logic being applied in the Uri\Whatwg namespace. This
would make for a better encapsulated feature. So we can

have a clear distinction between the Value Object, its builder and the
Encoding mechanism ? What do you think?

Best regards,

Ignace

Hi Everyone,

I'd like to introduce my latest RFC that I've been working on for a while
now: https://wiki.php.net/rfc/uri_followup.

It proposes 5 followup improvements for ext/uri in the following areas:

URI Building

Query Parameter Manipulation

Accessing Path Segments as an Array

Host Type Detection

URI Type Detection

Percent-Encoding and Decoding Support

I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:

How to support array/object values for constructing query strings? (
https://wiki.php.net/rfc/uri_followup#type_support)

How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)? (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)

Exactly how the advanced percent-decoding capabilities should work? Does
it make sense to support all the possible modes (UriPercentEncodingMode)
for percent-decoding as well (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)

etc.

Regards,
Máté

2 months ago by kocsismate90@gmail.com — view source — reply

unread

Hi Ignace,

After quickly checking the proposed API for Percent-Encoding and Decoding

Support I wonder if the following would not
be more appropriate ?
namespace Uri\Rfc3986 {
    enum UriPercentEncoding
    {
        case UserInfo;
        case Host;
        case RelativeReferencePath;
        case RelativeReferenceFirstPathSegment;
        case Path;
        case PathSegment;
        case Query;
        case FormQuery;
        case Fragment;
        case AllReservedCharacters;
        case All;

        public function encode(string $input): string {}
        public function decode(string $input): string {}
    }
}
With the same logic being applied in the Uri\Whatwg namespace. This would make for a better encapsulated feature. So we can

have a clear distinction between the Value Object, its builder and the Encoding mechanism ? What do you think?

Yes, I was also wondering whether the URI/URL classes are really the best
places for the percentEncode() and percentDecode() methods, because
they are not only relevant for URIs/URLs but e.g. also for QueryParams.. So
overall, I'm also fine with moving the percent-encoding/decoding
capabilities to a separate place. Honestly, the enums themselves didn't
come to my mind... I think it's a good candidate. So I'll definitely
consider it.

Probably my only concern is the name, specifically, the "ing" suffix: it
suggests that it can only keep data, and cannot do any operation. The
latter would
rather have an "er" suffix (e.g. UriPercentEncoder). But I'm happy to get
feedback/suggestions about the options.

Máté

2 months ago by ignace nyamagana butera — view source — reply

unread

Hi Màté,

I read the Accessing Path Segments as an Array sub RFC and I have a couple
of remarks, suggestions.
In the RFC text it is said that:

The getter methods return null if the path is empty (https://example.com),
an empty array when the path
consists of a single slash (https://example.com/), and a non-empty array
otherwise.

This is suboptimal to me because it means that the signature for the getter
methods is array|null which would lead
developers to always add a check in the code whenever using the method to
distinguish the path state absolute or not.
Instead, I would rather always get a single type, the array as return
value. The issue you are facing is that
you want to convey via your return type if the path is absolute or not.
But, we already have access to this
information via the UriType Enum, at least in the case of the
Uri\Rfc3986\Uri class. For the Uri\WhatWg\Uri
the information is less crucial as the validation and normalization rules
of the WHATWG specifications
will autocorrect the path if needed. This leads me to propose the following
alternative:

For Uri\Rfc3986\Uri:

/** @return list<string> */
Uri::getPathSegments(): array {}
/** @return list<string> */
Uri::getRawPathSegments(): array {}
#[\NoDiscard(message: "as Uri\Rfc3986\Uri::withPathSegments() does not
modify the object itself")]
Uri::withPathSegments(array $segments, Uri\Rfc3986\UriType $uriType =
Uri\Rfc3986\UriType::RelativePathReference): static {}

(the default value for the $uriType parameter is TBD).

For Uri\WhatWg\Url:

/** @return list<string> */
Url::getPathSegments(): array {}
#[\NoDiscard(message: "as Uri\WhatWg\Url::withPathSegments() does not
modify the object itself")]
/**  @param list<UrlValidationError> $errors */
Url::withPathSegments(array $segments): static {}

with the following behaviour

The getter methods return the empty array if the path is empty
(https://example.com https://example.com), or a single slash
(https://example.com/ https://example.com/),and a non-empty array
otherwise. To distinguish between an absolute path and a relative path you
can refer to the Uri\Rfc3986\Uri::getUriType(),
method, in case of RFC 3986 URI, and the information does not matter
otherwise (ie: for WHATWG URL).

During update, for RFC 3986 URI, The additional $uriType argument would
serve to tell if a / should be prepended or not to the generated
string path. For the WHATWG URL, no soft errors are emitted, which show
that the starting slash does not really matter.

Best regards,
Ignace

Hi Everyone,

I'd like to introduce my latest RFC that I've been working on for a while
now: https://wiki.php.net/rfc/uri_followup.

It proposes 5 followup improvements for ext/uri in the following areas:

URI Building

Query Parameter Manipulation

Accessing Path Segments as an Array

Host Type Detection

URI Type Detection

Percent-Encoding and Decoding Support

I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:

How to support array/object values for constructing query strings? (
https://wiki.php.net/rfc/uri_followup#type_support)

How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)? (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)

Exactly how the advanced percent-decoding capabilities should work? Does
it make sense to support all the possible modes (UriPercentEncodingMode)
for percent-decoding as well (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)

etc.

Regards,
Máté

2 months ago by kocsismate90@gmail.com — view source — reply

unread

Hi Ignace,

The getter methods return null if the path is empty (https://example.com),
an empty array when the path
consists of a single slash (https://example.com/), and a non-empty
array otherwise.

Yes, that's correct!

Instead, I would rather always get a single type, the array as return
value. The issue you are facing is that
you want to convey via your return type if the path is absolute or not.
But, we already have access to this
information via the UriType Enum, at least in the case of the
Uri\Rfc3986\Uri class.

The UriType enum in its current form is not really suitable, because it can
only distinguish relative and absolute
path references ("foo" vs "/foo"), but not absolute URIs ("
https://example.com" vs "https://example.com/").
"https://example.com" and "https://example.com/"; are both absolute URIs,
and the former one has an empty path.

In order to find out the correct behavior, I think we should first try to
dig deeper into the definition of path segments.

Also, in order to have some inspiration, I checked how similar
functionality works in other languages, C# notably:
https://learn.microsoft.com/en-us/dotnet/api/system.uri.segments?view=net-10.0#system-uri-segments
Making the leading "/" its own segment feels a little bit off at the first
sight (not to mention that the "/" characters
are part of the segments), because RFC 3986 specifies that path segments
start after the leading "/" due to the
following ABNF rule:

path-abempty = *( "/" segment )

That is, for URIs containing an authority component, the path is either
empty, or contains a "/" followed by a segment
one or multiple times. Then segments have the following syntax:

segment = *pchar

That is, segments are composed of zero or multiple characters in the
"pchar" charset (the exact values don't matter
in this case). So let's see some basic examples with absolute URIs:

"https://example.com" -> no path segments: []
"https://example.com/foo"; -> one path segment "foo": ["foo"]

Consequently:

"https://example.com/"; -> one path segment which is empty: [""]
"https://example.com/foo/"; -> two path segments: ["foo", ""]

Then the behavior of C# starts to make some sense - at least when the path
only consists of a "/" character (IMO it
doesn't make sense for other cases like "/foo").

Now let's see what to do with path references:

"" (empty string) -> no path segments: []
"/foo" -> one path segment "foo": ["foo"]
"foo" -> one path segment "foo": ["foo"]
"foo/" -> two path segments: ["foo", ""]
"/" -> one path segment which is empty: [""]

Unfortunately, this is not all, there are a few other special cases for
absolute URIs:

"https://" -> means that there's an authority, but it's empty, therefore
the path is also empty, therefore no path segments -> []
"https:/" -> means that there's no authority, and the path is "/",
therefore one path segment which is empty -> [""]
"https:" -> means that there's no authority, and the path is "", therefore
no path segments -> []

As far as I can see, this behavior is completely logical and satisfies the
definitions of RFC 3986. However, one case
may possibly need disambiguation in relation to the withPathSegments()
method: "/foo" vs "foo". (P.S. the uriparser library
had to use a special field for tracking exactly these cases.)

That being said, I agree with you that the currently suggested signatures
should be changed. However, accepting an
additional UriType parameter by the withPathSegments() method wouldn't be
correct, because I've just demonstrated
that the behavior doesn't depend on whether an URI is absolute or relative,
but whether the authority component is
defined or not.

So my alternative idea for disambiguating the above mentioned case is the
following: adding a 2nd parameter
$addLeadingSlashForNonEmptyRelativeUri to the withPathSegments() method (I
know this param name is insanely long,
so I'm happy to get recommendations), and then a leading slash would be
added to the path if and only if all the 3
criteria are satisfied:

the $addLeadingSlashForNonEmptyRelativeUri boolean parameter is true
the first item in the $pathSegments array parameter is non-empty
the target URI is relative

This means that calling $uri->withPathSegments(["", "foo"], false) and
$uri->withPathSegments(["foo"], true) would result
in the same path reference ("/foo") when $uri doesn't have an authority.
I'm fine with bikeshedding/fine-tuning these rules,
but I do think we should go with something along the lines of this.

For the Uri\WhatWg\Uri the information is less crucial as the validation

and normalization rules of the WHATWG

specifications will autocorrect the path if needed.

Yes, true.

Máté

2 months ago by Juris Evertovskis — view source — reply

unread

Hey,

I love to see this RFC. I’ve got a few notes which likely come from me not understanding the specifications. But I think most users will not have read them and will not know or care about the differences, right?

Query setters and types of their parameter. Quote from the builder example:


    ->setQuery("a=1&b=2"])

    ->setQueryParams(["a" => 1, "b" => 2]) // Has the same effect as the setQuery() call above

I’d expect the setQueryParams to set the particular params that I specified instead of overriding all the query (which I’d expect setQuery to do).

Maybe it would be possible for setQuery to accept string, array and instances of *QueryParams? And setQueryParams could either be left out or override just the selected params? As far as I see, there is no equivalent withQueryParams so there’s no precedent on how such a method should work, right?

Btw I wasn’t able to find docs on the existing Uri classes (had to look it up in the prev RFC). Is the search broken or are the docs just not there yet?

Regarding the interface extraction, is there any difference between the internal states of Uri\WhatWg\UrlQueryParams and Uri\Rfc3986\UriQueryParams objects? I would naively expect not only a common interface, but a common class. A QueryParams that could be supplied to any of the withers/setters and that would be able to ->toRfc3986QueryString() or ->toWhatWgQueryString() on use not on instantiation.

Similarly with the builders I fail to see why do I need to decide on Uri\Rfc3986\UriBuilder and Uri\WhatWg\UrlBuilder at the start instead of having Uri\Builder that could be consumed via ->buildRfc3986Url() or `->buildWhatWgUri(). Is that UserInfo thing the whole dealbreaker? To me all the other parts like specifying host, port and so on seem spec-agnostic until serialization.

I agree with Larry that I’d expect [‘key1’ => ‘value1’, ‘key2’ => ‘value2’] to “just work”. I understand the issue about having multiple entries with the same keys, but I don’t think I should have to understand that to successfully build queries.

Maybe another method like bestEffortFromArray()could be added that would supoprt various forms of nested arrays and do whatever is possible to make them into a query string, similar to how http_build_query() does? I know it’s not perfect, but it would be great to have the new tooling as easy to work with as the previous, instead of having to reshape your arrays to fit a particular syntax.

BR,

Juris

2 months ago by kocsismate90@gmail.com — view source — reply

unread

Hi,

Query setters and types of their parameter. Quote from the builder
example:
    ->setQuery*(*"a=1&b=2"*])*

    ->setQueryParams*([*"a" => 1, "b" => 2*])* *// Has the same effect as
the setQuery() call above*
I’d expect the setQueryParams to set the particular params that I
specified instead of overriding all the query (which I’d expect setQuery
to do).

I'm happy that you raised this concern because I wouldn't ever have thought
the behavior was unclear. My expectation with setters is that they
completely override the related property.
If they were intended to add/append the query params then something like
addQueryParams() should be used instead. And I think this question has just
highlighted why
it's a bad idea to remove the set prefix from the Builder methods: because
it would also remove some additional context about what the method exactly
does (which is
apparently not even entirely clear with the set prefix included, but
without the set prefix, one would have zero clue if it's an append or set
operation).

Maybe it would be possible for setQuery to accept string, array and
instances of *QueryParams? And setQueryParams could either be left out
or override just the selected params? As far as I see, there is no
equivalent withQueryParams so there’s no precedent on how such a method
should work, right?

Good call, I've already updated the RFC so that setQueryParams() accepts a
UriQueryParams/UrlQueryParams instance. It was just an oversight on my part.
But I don't really like to use union types (mainly because the method
behavior needs extra explanation), that's why I chose to add two
distinct methods for the query handling.

Btw I wasn’t able to find docs on the existing Uri classes (had to look it
up in the prev RFC). Is the search broken or are the docs just not there
yet?

Yes, unfortunately, the docs are not there yet :( I usually prioritize my
work in favor of php-src, and since I apparently chose a massive
undertaking yet again, I have very limited time
for anything else. But I'll try to find time to add the missing stuff to
the documentation.

Regarding the interface extraction, is there any difference between the
internal states of Uri\WhatWg\UrlQueryParams and
Uri\Rfc3986\UriQueryParams objects? I would naively expect not only a
common interface, but a common class. A QueryParams that could be
supplied to any of the withers/setters and that would be able to
->toRfc3986QueryString() or ->toWhatWgQueryString() on use not on
instantiation.

I have already been thinking a lot about this question for a while, and I
agree that the two QueryParams implementations are very similar: the only
difference between them is how
they parse the query string into query params and how they recompose the
query params to a query string. So I would be happy to be able to unify the
two implementations, that would be a huge
simplification. However, there are two reasons why I'm very hesitant to do
so:

We should be absolutely sure that an unified implementation cannot cause
parsing confusion vulnerability. E.g. when the query params are parsed
according to one specification, and then
they are recomposed according to the other one. For example, one difference
(this is also mentioned in the RFC) is that WHATWG URL removes the leading
"?" character during parsing,
while RFC 3986 leaves it as-is. These differences must be considered very
carefully.
If we have two dedicated classes, then they can evolve separately with
specification-specific behavior. If it turns out that we want to add
support for some specification-specific feature, then
a specification-specific class is better suited for the purpose. I can't
really come up with many examples, but maybe additional percent-encoding
capabilities could be needed (e.g. getFirstPercentEncoded()).

Similarly with the builders I fail to see why do I need to decide on
Uri\Rfc3986\UriBuilder and Uri\WhatWg\UrlBuilder at the start instead
of having Uri\Builder that could be consumed via ->buildRfc3986Url() or
`->buildWhatWgUri(). Is that UserInfo thing the whole dealbreaker? To me
all the other parts like specifying host, port and so on seem spec-agnostic
until serialization.

Yes, the userinfo is just one of the deal breakers. Not too long ago, I
modified the RFC text, and added more info about how validation exactly
works. According to my plans,
the individual setters would also make some basic validation (formatting of
the component), while the build() methods would make sure that some global
rules are also satisfied
(there's a bit more info in the RFC about this). And this is only possible
to implement if there are two different builders.

Máté

2 months ago by ignace nyamagana butera — view source — reply

unread

Hi Everyone,

I'd like to introduce my latest RFC that I've been working on for a while
now: https://wiki.php.net/rfc/uri_followup.

It proposes 5 followup improvements for ext/uri in the following areas:

URI Building

Query Parameter Manipulation

Accessing Path Segments as an Array

Host Type Detection

URI Type Detection

Percent-Encoding and Decoding Support

I did my best to write an RFC that was at least as extensive as
https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
my efforts,
there are still a couple things which need a final decision, or which
need to be polished/improved. Some examples:

How to support array/object values for constructing query strings? (
https://wiki.php.net/rfc/uri_followup#type_support)

How to make the UriQueryParams and UrlQueryParams classes more
interoperable with the query string component (mainly with respect to
percent-encoding)? (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)

Exactly how the advanced percent-decoding capabilities should work? Does
it make sense to support all the possible modes (UriPercentEncodingMode)
for percent-decoding as well (
https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
)

etc.

Regards,
Máté

Hi Màté,

After thinking about it here's my take on the current proposal
regarding the Query Parameter Manipulation RFC. Sorry for the wall of
text, but I tried to summarize my thoughts.

First of all, I tried to put myself in the shoes of a regular PHP
developer who has little to no knowledge about the different URI
specifications but has a general grasp of PHP. From that point of view
the developer knows that:

PHP already gives access to the URI query parameters via the _GET
super globals
to parse the query string in PHP, the developer can rely on parse_str.
that to build a query string he should use the http_build_query function.

What we do know is that:

the _GET values are also the result of using parse_str and its logic is:

not documented
PHP centric
mangles the data
truncates query string

Its original goal was to allow direct conversion of query string into
PHP variables usable in scripts. But this behaviour has been removed
for security reasons from PHP.

http_build_query allow creating a query string in a more predictable
way but still exposes PHP centric behaviour:

It uses get_object_vars on objects. which is counter-intuitive:
- All iterable structures do not give the same result.
- Depending on the object implementation the result varies between
  PHP versions (ie DateTimeImmutable used to be rendered before PHP7.4
  since then it fails silently resulting in an empty string being
  generated.)
It adds "[", "]" and indices around arrays. This is PHP centric
(other languages would just repeat the array name)
It always adds the array indices even when the array is a list which
again can lead to unexpected behaviour, even within the PHP ecosystem.

On the other hand:

Other modern languages like Java HttpServletRequest or the WHATWG
URLSearchParams have a complete different takes: They view the query
string as a collection of tuple (key/value pair) that can be repeated,
there is no notion of brackets. The data is preserved even though as
you mention the round-trip between encoding and decoding is never
guarantee.
We have the new HTTP QUERY method which may or may not fall into the
"Should this also be managed by a putative Query class".

Currently, in your proposal you have 2 Query objects. This will give
the developper a lot of work to understand where, when and which
object to choose and why. Is that complexity really needed? IMHO we
may end up with a correct API ... that no-one will use.

With all that in mind I believe a single Uri\Query should be used.
Its goal should be:

to be immutable
to store the query in its decoded form.
to manipulate and change the data in a consistent way.

Decoding/encoding should happen at the object boundaries but
everything inside the object should
be done on decoded data. Since no algorithm guarantee preserving
encoding during a decode/encode round-trip,
there is no need to try hard to do so.

This also means:

having multiple string representations
not having a Uri::withQueryParams or a Url::withQueryParams method.

It should be left to the developer to understand which string version he needs.

On a bonus side, it would be nice to have a mechanism in PHP that
allows the application to switch
from the current parse_str usage to the new improved parsing
provided by the new class when
populating the _GET array. (So that deprecating parse_str can be
initiated in some distant future.)
This last observation/remark is not mandatory but nice to have.

So I would propose the following methods:


namespace Uri {
    //takes no arguments returns an empty object
    Query::__construct();

    // named constructor to allow
    // returning a new instance from
    // PHP variables (same syntax as http_build_query)
    Query::fromVariables(array $variable): static

    // named constructor to allow
    // returning a new instance from
    // a list of tuples see the returns
    // value of Query::toTuples()
    Query::fromTuples(array $params): static

    // named constructor to allow
    // returning a new instance from
    // query string this is where
    // decoding takes place

    Query::parseRfc1738String(): ?static
    Query::parseRfc3986String(): ?static
    Query::parseFormDataString(): ?static
    Query::parseWhatWgString(): ?static

    //String representation query
    //this is where encoding should happen
    //internal decoded data
    //should only be encoded here

    Query::toRfc3986String();
    Query::toRfc1738String();
    Query::toFormDataString();
    Query::toWhatWgString();

    // Tuple related methods
    // like the one defined by the WHATWG specifications
    // method names are changed or update to highlight
    // the immutable state for modifying methods

    Query::toTuples(): array<string, null|string|array<null|string>>
    Query::count(): int;
    Query::has(string $name): bool;
    Query::hasValue(string $name, null|string $value): bool;
    Query::getFirst(string $name): null|string;
    Query::getLast(string $name): null|string;
    Query::getAll(string $name): array<null|string>;

    // Tuple modifying methods

    Query::sort(): static;
    Query::withValue(string $name, null|string|array<null,string>
$value): static;
    Query::append(string $name, null|string|array<null,string> $value): static;
    Query::delete(string $name): static;
    Query::deleteValue(string $name, null|string $value): static;

    // PHP variables related methods
    // the parse_str replacement API

    Query::toVariables(): array;  // returns the same array as
parse_str (without mangled data)
    Query::countVariables(): int; // returns the number of variable found
    Query::hasVariable(string $variableName): bool; // tells whether
the variable exists
    Query::getVariable(string $variableName): null|string|array; //
returns the variable value
    Query::mergeVariable(array $variables): static // the same syntax
returned by the `Query::toVariables` method
    Query::replaceVariable(string $variableName,
null|string|int|float|array $value): static
    Query::deleteVariable(string $variableName): static
}

With the following changes:

in respect to parse_str, no mangled data should occur on parsing:

parse_str("foo.bar=baz", $params);
echo $params['foo_bar'];             // returns "baz"
array_key_exists('foo.bar', $params); // returns false

$query = \Uri\Query::parseRfc1738String("foo.bar=baz");
$query->getVariable("foo.bar"); //returns "baz"
$query->hasVariable("foo_bar"); //returns false

in respect to http_build_query.
Only accept scalar values, null, and array. If an object or a
resource is detected a ValueError error
should be thrown.

echo http_build_query(['a' => `tmpfile()`]); //return '';
new \Uri\Query::fromVariables(['a' => `tmpfile()`]); // throw new ValueError

Remove the addition of indices if the array is a list.

echo http_build_query(['a' => [3, 5, 7]]); //return
a%5B0%5D=3&a%5B1%5D=5&a%5B2%5D=7;
new \Uri\Query::fromVariables(['a' => [3, 5, 7]])->toRfc1738String();
// return a%5B%5D=3&a%5B%5D=5&a%5B%5D=7

Best regards,

Ignace

1 month ago by kocsismate90@gmail.com — view source — reply

unread

Hi Ignace,

Currently, in your proposal you have 2 Query objects. This will give
the developper a lot of work to understand where, when and which
object to choose and why. Is that complexity really needed? IMHO we
may end up with a correct API ... that no-one will use.

Just to reiterate what I wrote to Juris a few days ago: I'm open to
unifying the two classes, but I'm just hesitant because of security and
evolvability reasons (but the main one is security).

With all that in mind I believe a single Uri\Query should be used.
Its goal should be:

to be immutable

to store the query in its decoded form.

to manipulate and change the data in a consistent way.

So far, I imagined the two QueryParams classes to be mutable because one of
their main goals is to be able to build (~ mutate) query param list...
But otherwise an immutable implementation would be useful for sure.

Decoding/encoding should happen at the object boundaries but
everything inside the object should

be done on decoded data.

Yes, that's what I also had to find out based on my experience with
implementing the POC, so I completely agree here.

On a bonus side, it would be nice to have a mechanism in PHP that
allows the application to switch

from the current parse_str usage to the new improved parsing provided by the new class when
populating the _GET array. (So that deprecating parse_str can be initiated in some distant future.)
This last observation/remark is not mandatory but nice to have.

This is a very interesting remark, and I have not thought about this
possibility yet. Generally, I agree with
the idea, but my long-term goal (or wish) is to move away from using $_GET
and $_POST to access request
data in favor of using objects... So I most probably won't deal with trying
to implement this idea. However,
I'm willing to add a UriQueryParams::fromCurrentQueryString(), maybe even a
UriQueryParams::fromCurrentBody()
or similar factory methods if people like it.

in respect to parse_str, no mangled data should occur on parsing:

Uh, I completely forgot about this behavior of parse_str(), and I
definitely agree that mangling shouldn't happen.

Only accept scalar values, null, and array. If an object or a
resource is detected a ValueError error

should be thrown.

I wasn't sure what to do with objects, but I'm happy to skip their
support, especially if they would cause issues.
The rest of the suggestions align with my initial plans (maybe with the
exception of throwing ValueError -- I wrote
TypeError in the related section).

Remove the addition of indices if the array is a list.

Yes, this also aligns with my initial plans.

Best regards,
Máté

2 months ago by Derick Rethans — view source — reply

unread

I'd like to introduce my latest RFC that I've been working on for a
while now: https://wiki.php.net/rfc/uri_followup.

It proposes 5 followup improvements for ext/uri in the following areas:

URI Building

Would it make sense to have an interface for the set*() methods? Besides
build(), they all seem to have the same API.

Query Parameter Manipulation

I see this adds NoDiscard:
#[\NoDiscard(message: "as Uri\Rfc3986\Uri::withQueryParams() does not modify the object itself")]

But the original methods on the classes don't have these NoDiscards, and
it doesn't seem that this RFC is suggesting to add them. It should at
least be consistent.

Accessing Path Segments as an Array

Compare:

"especially considering the fact that Uri\Rfc3986\Uri internally stores
the path as a list of segments."

And:

Uri\Rfc3986\Uri::withPathSegments() … internally concatenate the input
segments separated by a / character, and then trigger
Uri\Rfc3986\Uri::withPath() …

Why does it need to do this concattenation, and then call withPath() for
Rfc3986\Uri then?

cheers,
Derick

--
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support

mastodon: @derickr@phpc.social @xdebug@phpc.social

2 months ago by kocsismate90@gmail.com — view source — reply

unread

Hi Derick,

Would it make sense to have an interface for the set*() methods? Besides
build(), they all seem to have the same API.

I've just answered this to Juris, but TLDR: IMO it is not possible if we go
with the currently suggested behavior.
The set*() method pairs would have completely different validation rules
based on the specification they implement.

Query Parameter Manipulation

I see this adds NoDiscard:
#[\NoDiscard(message: "as Uri\Rfc3986\Uri::withQueryParams() does not
modify the object itself")]

But the original methods on the classes don't have these NoDiscards, and
it doesn't seem that this RFC is suggesting to add them. It should at
least be consistent.

Thanks for noticing these; I wanted to add #[NoDiscard] to Uri/Url wither
methods just before the final PHP 8.5 release,
but it was a bit controversial, so I didn't do it at last. Then I thought
that at least the newly proposed methods could
have it, that's why I included these attributes into this RFC. But now I
realized that I'm not ready to fight for them,
so I've just removed them all from the proposal.

Accessing Path Segments as an Array

Compare:

"especially considering the fact that Uri\Rfc3986\Uri internally stores
the path as a list of segments."

And:

Uri\Rfc3986\Uri::withPathSegments() … internally concatenate the input
segments separated by a / character, and then trigger
Uri\Rfc3986\Uri::withPath() …

Why does it need to do this concattenation, and then call withPath() for
Rfc3986\Uri then?

The path setter of the uriparser library (uriSetPathMmA) takes a string
first,
then reparses it as an URI (setters for the other components work much
simpler),
and finally, does a few other checks/modifications. If everything went well,
the path is stored as a list of segments indeed. That's why concatenating
the
segments is needed, otherwise validation wouldn't work properly.

Cheers,
Máté

1 month ago by ignace nyamagana butera — view source — reply

unread

Hi Máté,

After thinking about it and trying some different logics I believe the
following rules should be applied during conversion from array to query
string.

object: invalid mapping, throws a TypeError
resource: invalid mapping, throws a TypeError
array: if a resource or an object is present in the array it should
throw a ValueError
null: should be allowed (['a' => null] should return "&a" (no =
sign added). Currently entries with null values are skipped.

Currently in PHP, when using parse_str , &a is converted to ['a' => ''] I
presume this behaviour was done so that the variable $a could still be
accessible in a PHP script with an empty string as value instead of being
the value null. Since we will be dealing with arrays the following rules
could be updated when parsing the string using PHP behaviour:

no mangled data should occur on parsing
"&a" should be converted to ['a' => null]
the removal of the presence of indices if the array is a list. (would
be a nice to have, to reduce the query string length, but not mandatory)

Best regards,
Ignace

1 month ago by Juris Evertovskis — view source — reply

unread

Since we will be dealing with arrays the following rules could be
updated when parsing the string using PHP behaviour:

"&a" should be converted to ['a' => null]

Hey Ignace,

In practice valueless arguments like ?debug are most often "flags" or
"booleans" and their presence implies truthiness.

Do you think it would be wrong or confusing to have it converted to
['debug' => true]?

I'm worried that ['a' => null] would not be that handy since both
$params['a'] and isset($params['a']) would return falsy which would
likely be opposite to the intended value.

BR,
Juris

1 month ago by ignace nyamagana butera — view source — reply

unread

Since we will be dealing with arrays the following rules could be updated
when parsing the string using PHP behaviour:

"&a" should be converted to ['a' => null]

Hey Ignace,

In practice valueless arguments like ?debug are most often "flags" or
"booleans" and their presence implies truthiness.

Do you think it would be wrong or confusing to have it converted to
['debug' => true]?

I'm worried that ['a' => null] would not be that handy since both
$params['a'] and isset($params['a']) would return falsy which would
likely be opposite to the intended value.

BR,
Juris

Hi Juris,

Do you think it would be wrong or confusing to have it converted to
['debug' => true]?

Yes IMHO it would be wrong because flag parameters or booleans are
converted to ['debug' => 1]
The ['debug' => null] expresses the presence of the name pair and the
absence of value associated with it.
Let's see how it is currently done:

The WHATWG URL living standard does the following:

let url = new URL('https://example.com?debug&foo=bar&debug=');
console.log(
url.searchParams.toString(), //returns debug=&foo=bar&debug='
);

the pair gets converted to ['debug' => '']. The roundtrip does not conserve
the query string as is but all key/pair (tuples) are present.

In PHP you have currently the following behaviour:

example 1

parse_str('debug&foo=bar&debug=', $params);
var_dump($params, http_build_query($params));
//$params ['debug' => '', 'foo' => 'bar']
//after roundtrip you get 'debug=&foo=bar'

example 2

parse_str('debug&foo=bar&debug=1', $params);
var_dump($params, http_build_query($params));
//$params ['debug' => '1', 'foo' => 'bar']
//after roundtrip you get 'debug=1&foo=bar'

So you lose data and the query data can be randomly sorted
parse_str convert the first debug into ['debug' => '']
parse_str overwrites the value (This may be a security concern if you need
to hash/validate your query string)

Since IMHO interoperability and security is important you should prefer an
algorithm that preserves the original query.
The proposed solution is already in use for instance in League/Uri or in
Guzzle

echo Uri::withQueryValues(Utils::uriFor('https://example.com'), [
'debug' => null,
'foo' => 'bar',
'baz' => '',
]), PHP_EOL;
// https://example.com?debug&foo=bar&baz=

Because Guzzle uses an associative array, the debug variable can only
appear once but there is a difference using null and the empty string.
This improves interoperability with other languages and you no longer have
data loss or random query re-arrangement.

Last but not least, the Query objects proposed by Màté all expose:

a has method which will always tell if the key is present regardless of
its value an equivalent to array_key_exists.
provide a way to have the same parameter appear multiple times in the
query string

So IMHO it is an improvement to also allow the distinction between null and
the empty string so we can finally write in PHP

echo (new Uri\Rfc3986\Query())
->append('debug', null)
->append('foo', 'bar')
->append('debug', '')
->toRfc3986String();
// debug&foo=bar&baz=

Best regards,
Ignace

1 month ago by kocsismate90@gmail.com — view source — reply

unread

Hi,

The WHATWG URL living standard does the following:

let url = new URL('https://example.com?debug&foo=bar&debug=');
console.log(
url.searchParams.toString(), //returns debug=&foo=bar&debug='
);

the pair gets converted to ['debug' => '']. The roundtrip does not
conserve the query string as is but all key/pair (tuples) are present.

Yes, confirmed. Unfortunately, WHATWG URL only supports string values, so
there's no way to support
query parameters without a key (e.g. ?debug) in the RFC implementation
either. :(

On the other hand, the RFC 3986 implementation supports this notion, even
uriparser calls this out
in its documentation:
https://uriparser.github.io/doc/api/latest/index.html#querystrings.

However, there are a few other problems which came up when I was updating
my implementation.

1.) Yesterday, I wrote that name mangling of the query params shouldn't
happen. However, as I realized,
it is still needed for non-list arrays, because the [..] suffix must be
added to their name:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", [2 => "bar", 4 => "baz"]);

echo $params->toRfc3986String(); //
foo%5B2%5D=bar&foo%5B4%5D=baz

var_dump($params->getFirst("foo")); // NULL

Even though I appended params with the name "foo", no items can be returned
when calling getFirst(),
because of name mangling.

2.) I'm not really sure how empty arrays should be represented? PHP doesn't
retain them, and they are
simply skipped. But should we do the same thing? I can't really come up
with any other sensible behavior.

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", []);

echo $params->toRfc3986String(); // ???

3.) We wrote earlier that objects shouldn't be supported when creating
the query string from variables. But what about
backed enums? Support for them in http_build_query() was added not long
ago: https://github.com/php/php-src/pull/15650
Should we support them, right?

Regards,
Máté

1 month ago by ignace nyamagana butera — view source — reply

unread

Hi,

The WHATWG URL living standard does the following:

let url = new URL('https://example.com?debug&foo=bar&debug=');
console.log(
url.searchParams.toString(), //returns debug=&foo=bar&debug='
);

the pair gets converted to ['debug' => '']. The roundtrip does not
conserve the query string as is but all key/pair (tuples) are present.

Yes, confirmed. Unfortunately, WHATWG URL only supports string values, so
there's no way to support
query parameters without a key (e.g. ?debug) in the RFC implementation
either. :(

On the other hand, the RFC 3986 implementation supports this notion, even
uriparser calls this out
in its documentation:
https://uriparser.github.io/doc/api/latest/index.html#querystrings.

However, there are a few other problems which came up when I was updating
my implementation.

1.) Yesterday, I wrote that name mangling of the query params shouldn't
happen. However, as I realized,
it is still needed for non-list arrays, because the [..] suffix must be
added to their name:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", [2 => "bar", 4 => "baz"]);

echo $params->toRfc3986String(); //
foo%5B2%5D=bar&foo%5B4%5D=baz

var_dump($params->getFirst("foo")); // NULL

Even though I appended params with the name "foo", no items can be
returned when calling getFirst(),
because of name mangling.

2.) I'm not really sure how empty arrays should be represented? PHP
doesn't retain them, and they are
simply skipped. But should we do the same thing? I can't really come up
with any other sensible behavior.

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", []);

echo $params->toRfc3986String(); // ???

3.) We wrote earlier that objects shouldn't be supported when creating
the query string from variables. But what about
backed enums? Support for them in http_build_query() was added not long
ago: https://github.com/php/php-src/pull/15650
Should we support them, right?

Regards,
Máté

Hi Máté

1.) Yesterday, I wrote that name mangling of the query params shouldn't

happen. However, as I realized,

it is still needed for non-list arrays, because the [..] suffix must be

added to their name:

When I talk about data mangling I am talking about this

parse_str('foo.bar=baz', $params);
var_dump($params); //returns ['foo_bar' => 'baz']

The bracket is a PHP specificity and I would not change it now otherwise
you introduce a huge BC break in the ecosystem
for no particular gain IMHO.

2.) I'm not really sure how empty arrays should be represented? PHP doesn't

retain them, and they are
simply skipped. But should we do the same thing? I can't really come up
with any other sensible behavior.
$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", []);

This to me should yield the same result as ->append('foo', null); as the
array construct is only indicative of a repeating parameter name
if there is no repeat then it means no data is attached to the name.

3.) We wrote earlier that objects shouldn't be supported when creating

the query string from variables. But what about
backed enums? Support for them in http_build_query() was added not long
ago: https://github.com/php/php-src/pull/15650
Should we support them, right?

This gave me a WTF? moment (sorry for the language). Why was this change
added in PHP without a proper discussion in the internals
or even an RFC because it sets up a lot of precedence and even changes how
http_build_query is supposed to work in regards to objects.
If this had landed on the internal list I would have objected to it on the
ground as it breaks expectation in the function handling of type in PHP.
Do I agree with everything the function does ? No, but introducing
inconsistencies in the function is not a good thing. Now http_build_query
is aware of specific objects. Sringable or Travarsable objects are not
detected but Enum are ?? Pure Enum emits a TypeError but resource
do not ? Backed Enums are not converted to int or to string by PDO ? Why
would http_build_query do it differently ? The same reasoning apply as to
why Backed Enum does not have a Stringable interface.

Yes the output was "weird" in PHP8.1-> PHP8.3 but it was expected. Should
something be done for DateInterval too because the
output using http_build_query is atrocious ?

I still stand on my opinion that objects, resources should NEVER be
converted. In an ideal world only scalar + null and their repeated values
encapsulated in an array should be allowed; everything else should be left
to the developer to decide. So yes in your implementation I do think
that Backed Enum should not be treated differently than others objects and
should throw.

PS: I would even revert this change or deprecated it for removal in PHP9
(in a separate RFC)

Best regards,
Ignace

1 month ago by ignace nyamagana butera — view source — reply

unread

On Fri, Dec 19, 2025 at 8:59 AM ignace nyamagana butera nyamsprod@gmail.com
wrote:

On Thu, Dec 18, 2025 at 10:46 PM Máté Kocsis kocsismate90@gmail.com
wrote:

Hi,

The WHATWG URL living standard does the following:

let url = new URL('https://example.com?debug&foo=bar&debug=');
console.log(
url.searchParams.toString(), //returns debug=&foo=bar&debug='
);

the pair gets converted to ['debug' => '']. The roundtrip does not
conserve the query string as is but all key/pair (tuples) are present.

Yes, confirmed. Unfortunately, WHATWG URL only supports string values, so
there's no way to support
query parameters without a key (e.g. ?debug) in the RFC implementation
either. :(

On the other hand, the RFC 3986 implementation supports this notion, even
uriparser calls this out
in its documentation:
https://uriparser.github.io/doc/api/latest/index.html#querystrings.

However, there are a few other problems which came up when I was updating
my implementation.

1.) Yesterday, I wrote that name mangling of the query params shouldn't
happen. However, as I realized,
it is still needed for non-list arrays, because the [..] suffix must be
added to their name:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", [2 => "bar", 4 => "baz"]);

echo $params->toRfc3986String(); //
foo%5B2%5D=bar&foo%5B4%5D=baz

var_dump($params->getFirst("foo")); // NULL

Even though I appended params with the name "foo", no items can be
returned when calling getFirst(),
because of name mangling.

2.) I'm not really sure how empty arrays should be represented? PHP
doesn't retain them, and they are
simply skipped. But should we do the same thing? I can't really come up
with any other sensible behavior.

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", []);

echo $params->toRfc3986String(); // ???

3.) We wrote earlier that objects shouldn't be supported when creating
the query string from variables. But what about
backed enums? Support for them in http_build_query() was added not long
ago: https://github.com/php/php-src/pull/15650
Should we support them, right?

Regards,
Máté

Hi Máté

1.) Yesterday, I wrote that name mangling of the query params shouldn't

happen. However, as I realized,

it is still needed for non-list arrays, because the [..] suffix must be

added to their name:

When I talk about data mangling I am talking about this

parse_str('foo.bar=baz', $params);
var_dump($params); //returns ['foo_bar' => 'baz']

The bracket is a PHP specificity and I would not change it now otherwise
you introduce a huge BC break in the ecosystem
for no particular gain IMHO.

2.) I'm not really sure how empty arrays should be represented? PHP

doesn't retain them, and they are
simply skipped. But should we do the same thing? I can't really come up
with any other sensible behavior.
$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", []);

This to me should yield the same result as ->append('foo', null); as the
array construct is only indicative of a repeating parameter name
if there is no repeat then it means no data is attached to the name.

3.) We wrote earlier that objects shouldn't be supported when creating

the query string from variables. But what about
backed enums? Support for them in http_build_query() was added not long
ago: https://github.com/php/php-src/pull/15650
Should we support them, right?

This gave me a WTF? moment (sorry for the language). Why was this change
added in PHP without a proper discussion in the internals
or even an RFC because it sets up a lot of precedence and even changes how
http_build_query is supposed to work in regards to objects.
If this had landed on the internal list I would have objected to it on the
ground as it breaks expectation in the function handling of type in PHP.
Do I agree with everything the function does ? No, but introducing
inconsistencies in the function is not a good thing. Now http_build_query
is aware of specific objects. Sringable or Travarsable objects are not
detected but Enum are ?? Pure Enum emits a TypeError but resource
do not ? Backed Enums are not converted to int or to string by PDO ? Why
would http_build_query do it differently ? The same reasoning apply as to
why Backed Enum does not have a Stringable interface.

Yes the output was "weird" in PHP8.1-> PHP8.3 but it was expected. Should
something be done for DateInterval too because the
output using http_build_query is atrocious ?

I still stand on my opinion that objects, resources should NEVER be
converted. In an ideal world only scalar + null and their repeated values
encapsulated in an array should be allowed; everything else should be left
to the developer to decide. So yes in your implementation I do think
that Backed Enum should not be treated differently than others objects and
should throw.

PS: I would even revert this change or deprecated it for removal in PHP9
(in a separate RFC)

Best regards,
Ignace

3.) We wrote earlier that objects shouldn't be supported when creating

the query string from variables. But what about
backed enums? Support for them in http_build_query() was added not long
ago: https://github.com/php/php-src/pull/15650
Should we support them, right?

Hi Máté,

After further checking and researching, here's my view on Enum support. It
is based on the PHP8.4 behaviour of Enum with json_encode since it is the
base used to add its support in http_build_query.

So in your implementation it would mean:

allowed type: null, int, float, string, boolean, and Backed Enum (to minic
json_encode and PHP8.4+ behaviour)
arrays with values containing valid allowed type or array. are also
supported to allow complex type support.

Any other type (object, resource, Pure Enum) are disallowed they should
throw a TypeError

Maybe in the future scope of this RFC or in this RFC depending on how you
scope the RFC you may introduce an Interface which will allow serializing
objects using a representation that
follows the described rules above. Similar to what the
JsonSerializable interface is for json_encode.

If such interface lands, then Pure Enum serialization will be allowed. via
the interface and the behaviour of BackedEnum also would be affected just
like what is happening with json_encode. (ie: the interface takes
precedence over the class instance default behaviour).

Last but not Last, all this SHOULD not affect how http_buid_query works.
The function should never have been modified IMHO so it should be left
untouched by all this except if we allow it
to opt-in the behaviour once the interface is approved and added to PHP.

What do you think ?
Best regards,
Ignace

1 month ago by kocsismate90@gmail.com — view source — reply

unread

Hi Ignace,

When I talk about data mangling I am talking about this

parse_str('foo.bar=baz', $params);
var_dump($params); //returns ['foo_bar' => 'baz']

Sure! I just wanted to point it out that name mangling will still be
present due to arrays, and this will have some disadvantages,
namely that adding params and retrieving them won't be symmetric:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", [2 => "bar", 4 => "baz"]);

var_dump($params->getFirst("foo")); // NULL

One cannot be sure if a parameter that was added can really be retrieved
later via a get*() method.

Another edge cases:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", ["bar", "baz"]) // Value is a
list, so "foo" is added without brackets
->append("foo", [2 => "qux", 4 => "quux"]); // Value is an array,
so "foo" is added with brackets

var_dump($params->toRfc3986String()); //
foo=bar&foo=baz&foo%5B2%5D=qux&foo%5B4%5D=quux

var_dump($params->getLast("foo")) // Should it be "baz"
or "quux"?
var_dump($params->getAll("foo")) // Should it only
include the params with name "foo", or also "foo[]"?

And of course this behavior also makes the implementation incompatible with
the WHATWG URL specification: although I do think this part of
the specification is way too underspecified and vague, so URLSearchParams
doesn't seem well-usable in practice...

So an idea that I'm now pondering about is to keep the append() and set()
methods compatible with WHATWG: and they would only support
scalar values, therefore the param name wasn't mangled. And an extra
appendArray() and setArray() method could be added that would possible
mangle the param names, and they would only support passing arrays. This
solution would hopefully result in a slightly less surprising
behavior: one could immediately know if a previously added parameter is
really retrievable (when append() or set() was used), or extra
checks may be needed (when using appendArray() or setArray()).

This to me should yield the same result as ->append('foo', null); as the
array construct is only indicative of a repeating parameter name
if there is no repeat then it means no data is attached to the name.

Alright, that seems the least problematic solution indeed, and
http_build_query() also represents empty arrays just like null values
(omitting them).

So in your implementation it would mean:

allowed type: null, int, float, string, boolean, and Backed Enum (to minic
json_encode and PHP8.4+ behaviour)
arrays with values containing valid allowed type or array. are also
supported to allow complex type support.

Any other type (object, resource, Pure Enum) are disallowed they should
throw a TypeError

+1

Maybe in the future scope of this RFC or in this RFC depending on how you
scope the RFC you may introduce an Interface which will allow serializing
objects using a representation that
follows the described rules above. Similar to what the
JsonSerializable interface is for json_encode.

Hm, good idea! I'm not particularly interested in this feature, but I agree
it's a good way to add support for objects.

Last but not Last, all this SHOULD not affect how http_buid_query works.

The function should never have been modified IMHO so it should be left
untouched by all this except if we allow it
to opt-in the behaviour once the interface is approved and added to PHP.

+1

Regards,
Máté

1 month ago by ignace nyamagana butera — view source — reply

unread

Hi Ignace,

When I talk about data mangling I am talking about this

parse_str('foo.bar=baz', $params);
var_dump($params); //returns ['foo_bar' => 'baz']

Sure! I just wanted to point it out that name mangling will still be
present due to arrays, and this will have some disadvantages,
namely that adding params and retrieving them won't be symmetric:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", [2 => "bar", 4 => "baz"]);

var_dump($params->getFirst("foo")); // NULL

One cannot be sure if a parameter that was added can really be retrieved
later via a get*() method.

Another edge cases:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", ["bar", "baz"]) // Value is a
list, so "foo" is added without brackets
->append("foo", [2 => "qux", 4 => "quux"]); // Value is an array,
so "foo" is added with brackets

var_dump($params->toRfc3986String()); //
foo=bar&foo=baz&foo%5B2%5D=qux&foo%5B4%5D=quux

var_dump($params->getLast("foo")) // Should it be "baz"
or "quux"?
var_dump($params->getAll("foo")) // Should it only
include the params with name "foo", or also "foo[]"?

And of course this behavior also makes the implementation incompatible
with the WHATWG URL specification: although I do think this part of
the specification is way too underspecified and vague, so URLSearchParams
doesn't seem well-usable in practice...

So an idea that I'm now pondering about is to keep the append() and set()
methods compatible with WHATWG: and they would only support
scalar values, therefore the param name wasn't mangled. And an extra
appendArray() and setArray() method could be added that would possible
mangle the param names, and they would only support passing arrays. This
solution would hopefully result in a slightly less surprising
behavior: one could immediately know if a previously added parameter is
really retrievable (when append() or set() was used), or extra
checks may be needed (when using appendArray() or setArray()).

This to me should yield the same result as ->append('foo', null); as the
array construct is only indicative of a repeating parameter name
if there is no repeat then it means no data is attached to the name.

Alright, that seems the least problematic solution indeed, and
http_build_query() also represents empty arrays just like null values
(omitting them).

So in your implementation it would mean:

allowed type: null, int, float, string, boolean, and Backed Enum (to
minic json_encode and PHP8.4+ behaviour)
arrays with values containing valid allowed type or array. are also
supported to allow complex type support.

Any other type (object, resource, Pure Enum) are disallowed they should
throw a TypeError

+1

Maybe in the future scope of this RFC or in this RFC depending on how you
scope the RFC you may introduce an Interface which will allow serializing
objects using a representation that
follows the described rules above. Similar to what the
JsonSerializable interface is for json_encode.

Hm, good idea! I'm not particularly interested in this feature, but I
agree it's a good way to add support for objects.

Last but not Last, all this SHOULD not affect how http_buid_query works.

The function should never have been modified IMHO so it should be left
untouched by all this except if we allow it
to opt-in the behaviour once the interface is approved and added to PHP.

+1

Regards,
Máté

Hi Máté,

And an extra appendArray() and setArray() method could be added that would

possible
mangle the param names, and they would only support passing arrays. This
solution would hopefully result in a slightly less surprising
behavior: one could immediately know if a previously added parameter is
really retrievable (when append() or set() was used), or extra
checks may be needed (when using appendArray() or setArray()).

I believe adding the appendArray and setArray is the way forward as the
bracket addition and thus mangling is really a PHP specificity that we MUST
keep to avoid hard BC.
I would even go a step further and add a getArray and hasArray methods
which will lead to the following API

$params = (new Uri\Rfc3986\UriQueryParams())
->append("foo", ["bar", "baz"]) // Value is a list, so
"foo" is added without brackets
->appendArray("foo", ["qux", "quux"]); // Value is a list, using
PHP serialization "foo" is added with brackets

var_dump($params->toRfc3986String()); //
foo=bar&foo=baz&foo%5B0%5D=qux&foo%5B1%5D=quux

$params->hasArray('foo'); //returns true
$params->getArray("foo"); //returns ["qux", "quux"]

$params->has('foo'); //returns true
$params->getFirst("foo"); //returns "bar"
$params->getLast("foo"); //returns "baz"
$params->getAll('foo'); //returns ["bar", "baz"]

Hope this makes sense

Regards,
Ignace

1 month ago by ignace nyamagana butera — view source — reply

unread

On Sun, Dec 21, 2025 at 4:51 PM ignace nyamagana butera nyamsprod@gmail.com
wrote:

On Sun, Dec 21, 2025 at 1:13 PM Máté Kocsis kocsismate90@gmail.com
wrote:

Hi Ignace,

When I talk about data mangling I am talking about this

parse_str('foo.bar=baz', $params);
var_dump($params); //returns ['foo_bar' => 'baz']

Sure! I just wanted to point it out that name mangling will still be
present due to arrays, and this will have some disadvantages,
namely that adding params and retrieving them won't be symmetric:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", [2 => "bar", 4 => "baz"]);

var_dump($params->getFirst("foo")); // NULL

One cannot be sure if a parameter that was added can really be retrieved
later via a get*() method.

Another edge cases:

$params = new Uri\Rfc3986\UriQueryParams()
->append("foo", ["bar", "baz"]) // Value is
a list, so "foo" is added without brackets
->append("foo", [2 => "qux", 4 => "quux"]); // Value is an
array, so "foo" is added with brackets

var_dump($params->toRfc3986String()); //
foo=bar&foo=baz&foo%5B2%5D=qux&foo%5B4%5D=quux

var_dump($params->getLast("foo")) // Should it be "baz"
or "quux"?
var_dump($params->getAll("foo")) // Should it only
include the params with name "foo", or also "foo[]"?

And of course this behavior also makes the implementation incompatible
with the WHATWG URL specification: although I do think this part of
the specification is way too underspecified and vague, so URLSearchParams
doesn't seem well-usable in practice...

So an idea that I'm now pondering about is to keep the append() and set()
methods compatible with WHATWG: and they would only support
scalar values, therefore the param name wasn't mangled. And an extra
appendArray() and setArray() method could be added that would possible
mangle the param names, and they would only support passing arrays. This
solution would hopefully result in a slightly less surprising
behavior: one could immediately know if a previously added parameter is
really retrievable (when append() or set() was used), or extra
checks may be needed (when using appendArray() or setArray()).

This to me should yield the same result as ->append('foo', null); as
the array construct is only indicative of a repeating parameter name
if there is no repeat then it means no data is attached to the name.

Alright, that seems the least problematic solution indeed, and
http_build_query() also represents empty arrays just like null values
(omitting them).

So in your implementation it would mean:

allowed type: null, int, float, string, boolean, and Backed Enum (to
minic json_encode and PHP8.4+ behaviour)
arrays with values containing valid allowed type or array. are also
supported to allow complex type support.

Any other type (object, resource, Pure Enum) are disallowed they should
throw a TypeError

+1

Maybe in the future scope of this RFC or in this RFC depending on how
you scope the RFC you may introduce an Interface which will allow
serializing objects using a representation that
follows the described rules above. Similar to what the
JsonSerializable interface is for json_encode.

Hm, good idea! I'm not particularly interested in this feature, but I
agree it's a good way to add support for objects.

Last but not Last, all this SHOULD not affect how http_buid_query works.

The function should never have been modified IMHO so it should be left
untouched by all this except if we allow it
to opt-in the behaviour once the interface is approved and added to PHP.

+1

Regards,
Máté

Hi Máté,

And an extra appendArray() and setArray() method could be added that

would possible
mangle the param names, and they would only support passing arrays. This
solution would hopefully result in a slightly less surprising
behavior: one could immediately know if a previously added parameter is
really retrievable (when append() or set() was used), or extra
checks may be needed (when using appendArray() or setArray()).

I believe adding the appendArray and setArray is the way forward as the
bracket addition and thus mangling is really a PHP specificity that we MUST
keep to avoid hard BC.
I would even go a step further and add a getArray and hasArray methods
which will lead to the following API

$params = (new Uri\Rfc3986\UriQueryParams())
->append("foo", ["bar", "baz"]) // Value is a list, so "foo" is added without brackets
->appendArray("foo", ["qux", "quux"]); // Value is a list, using PHP serialization "foo" is added with brackets

var_dump($params->toRfc3986String()); // foo=bar&foo=baz&foo%5B0%5D=qux&foo%5B1%5D=quux

$params->hasArray('foo'); //returns true
$params->getArray("foo"); //returns ["qux", "quux"]

$params->has('foo'); //returns true
$params->getFirst("foo"); //returns "bar"
$params->getLast("foo"); //returns "baz"
$params->getAll('foo'); //returns ["bar", "baz"]

Hope this makes sense

Regards,
Ignace

Hi Màté,

I have been playing around your Query Param API and I have a couple of
questions:

Question 1) While I am not a proponent of the addition of the
getQueryParams on both classes even though I know the method exists in
the WHATWG URL spec I find strange is that the method may return null. To
me this makes for an awkward API where the user will always
have to add some conditional checks before using the method returned value.
Why can't this be true ?

$url = Uri\Rfc3986\Uri::parse('https://www.example.com/path/to/whatever');
$url->getQueryParams();
// should return a empty UriQueryParams instance

$url = Uri\Rfc3986\Uri::parse('https://www.example.com/path/to/whatever?');
$url->getQueryParams();
// should return UriQueryParams with a pair
// represented like this ['' => null] or like this ['', null]

This IMHO should also be the case for the UrlQueryParams instance

Question 1-bis)

I prefer having some extra named constructors on the UrlQueryParams
instead of having a getter on the Uri/Url classes. This fully
decoupled

the Ur(i|l)QueryParams from the Uri/Url classes and let the user
opt-in the new API if needed. In case of errors/bugs etc... only the
QueryParams

cointainer bags would be affected ... not the Url/Uri classes.

Question 2)

I see you have

UriQueryParams::fromArray,
UriQueryParams::list,

If I read it correctly, this returns 2 array representations of the query ?

My question is shouldn't we have either a fromList named constructor
and/or a toArray which return both distinctive forms ?

This might confused the developer who will have a hard time understand
which form is what and when to use it and it which

one in which context can be used to instantiate a new instance ?

Question 3)

I wanted to know how the following code will be processed ?

$query = 'a[]=foo&a[]=bar&a=qux';
parse_str($query, $result);
$result['a']; //returns "qux"

As seen in the example with parse_str the full array notation is
overwritten and can not be used/accessed

Will the getArray API still be able to access the array data or will
it act like parse_str and skip the array notation ?

Best regards,

Ignace