Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:129570 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 6221B1A00BC for ; Mon, 8 Dec 2025 11:02:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1765191770; bh=tuEVP+/mWPmYMCQVflsvXRIVBmIzMfaGcjw3a+jEV7o=; h=References:In-Reply-To:From:Date:Subject:To:From; b=gCAaUAPNvv+D6+j7hd5oBbnSs8ypgqWdBLUuc5lKfhEGe8WnHk1f4V/8wSaU3U+KY L7XimnIyJJafAqMwyJz0MspbII0fwlO3v1WHDEtMVYJJ2+1xv2h1lk9lTLY8raXemd A3lxjcD/eyDkHbC1R1RiMyT9PzAVZBFbOFKx/Z/QzfP1rHW5643gRKCHVyJPeE8qg8 WH94Qy8d1XlQLBvFpPadfj8ZmllZjzDAkzX4BuN/xLMX+H+5oTEYWr33fQ1++IqbFN gd3lCnn+afgR0dXvfrePzNTHbisz3+2hapEb6FW4R+zvMIfBjyR9c44Cnobagl6TwS 0MfaOSSnoRovA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5DDB7180034 for ; Mon, 8 Dec 2025 11:02:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,MANY_SPAN_IN_TEXT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-oo1-f45.google.com (mail-oo1-f45.google.com [209.85.161.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 8 Dec 2025 11:02:49 +0000 (UTC) Received: by mail-oo1-f45.google.com with SMTP id 006d021491bc7-6592f1f55a8so1190316eaf.2 for ; Mon, 08 Dec 2025 03:02:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765191763; x=1765796563; darn=lists.php.net; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=AhJL20NwPPh95N/UJBHrx0ndzNjqKAi7gqbIGtlHvHo=; b=jVVYEEo9L2oMl3/NM5gTAV3ECxzVl+bxjfg/kZ2isZ6iOQqLSzhCCmmev5W3bRfNAO nDU78O7bHTk4mQtErr8jUnN76fGiAVKBYNT0Uh1ZGlKXejKehonvfRLjcmmA6R4oK+rk m0okOZEfVMF8cJLRSwXPvwi7Jt1f6nTJW2Iy9mhZwNBazu5FfWY2Kp8ac3eztoDAOemf Z+uefz5cUpjNHSS67eGLpJIDLZNHtXsLzFkinf+BS9ost/WJww39xGPfIWbUU7F9XJTG uH+JVDJ7aDyeybmj2WUtfxCjpc5leHInZT0uGRVX3nDvo4ogkYtHcvG06NCchl0QmtN6 rzsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765191763; x=1765796563; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AhJL20NwPPh95N/UJBHrx0ndzNjqKAi7gqbIGtlHvHo=; b=QQ0KRLciHJG2fIIDg8D3WVyHE87zOzQkJ4KFhvKqLoOBgzVMuQomRuUzRmRuGYLqY9 +/HC7MiSF1axZkGvmvGtbbv9aRXM1teG6IU84cbAX69GkmXpMhBFc+3OlQAiWubtrtWT ng7sumOmJ8sv2yCU6TrNek76pDO6knf591D15jYnM8PNYOSOzP0sH0Fdm/oM0P8T03hD Q1+QGEelcGmk0bgLjXDr532qeorHsy8OOtpVxgN7iSeJhZX5XzUS8rm8n+lVe9Qvs6Cp cDfTMQQAUigWFQzWyLVScGTXvuSDWKwUQgcY+eEip26qVVQfPHHqH2LK+YOgJ0s9p2kx R22w== X-Forwarded-Encrypted: i=1; AJvYcCWDEI1V/qpGcT/vZLP04kbHOlr0klqH/axISd1+MLqk3VsiSRssZ2q+06z4wEGN5mR/v6WoBQaE3nw=@lists.php.net X-Gm-Message-State: AOJu0Yx7mPjt0YsmX7yD1JNUwcpKq1sdmR5lBekYu7qSewZSwIBpM2Ys acS3eLLXtP5gDdMym1udzaRnxGcbomgCMla80KfaSCjUoSeDVWLDI9auIfEbPMlFhp8137zkpks ewMExhUkDVA45aPSpeqPgd93TVe0V5FWpftNA X-Gm-Gg: ASbGncsLxV7udr0u/izK78+EFG1P9DqmrYt1L/mQpLKB2HtC049KpNMl4YIseXL/elt Qba78aFE6svyQBlHyNb5G3tuFq5chjRjmp0Yp+f1UAQ8Rf/2nROQKJsbcCIX7GakYfKZBhqIYL1 0ILwWqpIjc8Gxdu8Kn1KRJcckjHytUXzhb3dtvcB2+QZqflByBPXqaKyjaM87wJJ4d+yjduedXy UXBobRDBky5yxjJLRjEyjN32yIIpYDFNt7jvbGCePdHMda9G/6z6RMI02Uu9i8zAt5ltwcz3c3k 6imKpJjPmSbgOLNxlTNbwZsmGeHpCNC04M014+gu X-Google-Smtp-Source: AGHT+IHOHdNosVd8dNh6ZV6wKCv31WVWMIUZy084aAQ6NaTe3NMydJTF8oPQkvxIoPPf6J0WvT8XgHIn1AlkS4i+N8g= X-Received: by 2002:a05:6820:160e:b0:659:9a49:8e7a with SMTP id 006d021491bc7-6599a982656mr3081471eaf.74.1765191763104; Mon, 08 Dec 2025 03:02:43 -0800 (PST) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 References: In-Reply-To: Date: Mon, 8 Dec 2025 12:02:32 +0100 X-Gm-Features: AQt7F2oQAwkB74VR1ha2CXK3grZUBXwZTIsEFQZcZ0mDMnugHaT04pX8Xy9uer4 Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Followup Improvements for ext/uri To: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= , PHP Internals List , Larry Garfield , =?UTF-8?Q?Tim_D=C3=BCsterhus?= Content-Type: multipart/alternative; boundary="00000000000000173b06456ebc14" From: nyamsprod@gmail.com (ignace nyamagana butera) --00000000000000173b06456ebc14 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Dec 1, 2025 at 9:53=E2=80=AFPM M=C3=A1t=C3=A9 Kocsis wrote: > Hi Everyone, > > I'd like to introduce my latest RFC that I've been working on for a while > now: https://wiki.php.net/rfc/uri_followup. > > It proposes 5 followup improvements for ext/uri in the following areas: > - URI Building > - Query Parameter Manipulation > - Accessing Path Segments as an Array > - Host Type Detection > - URI Type Detection > - Percent-Encoding and Decoding Support > > I did my best to write an RFC that was at least as extensive as > https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite > my efforts, > there are still a couple things which need a final decision, or which > need to be polished/improved. Some examples: > > - How to support array/object values for constructing query strings? ( > https://wiki.php.net/rfc/uri_followup#type_support) > - How to make the UriQueryParams and UrlQueryParams classes more > interoperable with the query string component (mainly with respect to > percent-encoding)? ( > https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding) > - Exactly how the advanced percent-decoding capabilities should work? Doe= s > it make sense to support all the possible modes (UriPercentEncodingMode) > for percent-decoding as well ( > https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_suppo= rt > ) > - etc. > > Regards, > M=C3=A1t=C3=A9 > Hi M=C3=A0t=C3=A9, After thinking about it here's my take on the current proposal regarding the Query Parameter Manipulation RFC. Sorry for the wall of text, but I tried to summarize my thoughts. First of all, I tried to put myself in the shoes of a regular PHP developer who has little to no knowledge about the different URI specifications but has a general grasp of PHP. From that point of view the developer knows that: - PHP already gives access to the URI query parameters via the `_GET` super globals - to parse the query string in PHP, the developer can rely on `parse_str`. - that to build a query string he should use the `http_build_query` functio= n. What we do know is that: the `_GET` values are also the result of using `parse_str` and its logic is= : - not documented - PHP centric - mangles the data - truncates query string Its original goal was to allow direct conversion of query string into PHP variables usable in scripts. But this behaviour has been removed for security reasons from PHP. `http_build_query` allow creating a query string in a more predictable way but still exposes PHP centric behaviour: - It uses `get_object_vars` on objects. which is counter-intuitive: - All `iterable` structures do not give the same result. - Depending on the object implementation the result varies between PHP versions (ie `DateTimeImmutable` used to be rendered before PHP7.4 since then it fails silently resulting in an empty string being generated.) - It adds "[", "]" and indices around arrays. This is PHP centric (other languages would just repeat the array name) - It always adds the array indices even when the array is a list which again can lead to unexpected behaviour, even within the PHP ecosystem. On the other hand: - Other modern languages like Java HttpServletRequest or the WHATWG URLSearchParams have a complete different takes: They view the query string as a collection of tuple (key/value pair) that can be repeated, there is no notion of brackets. The data is preserved even though as you mention the round-trip between encoding and decoding is never guarantee. - We have the new HTTP QUERY method which may or may not fall into the "Should this also be managed by a putative Query class". Currently, in your proposal you have 2 Query objects. This will give the developper a lot of work to understand where, when and which object to choose and why. Is that complexity really needed? IMHO we may end up with a correct API ... that no-one will use. With all that in mind I believe a single `Uri\Query` should be used. Its goal should be: - to be immutable - to store the query in its decoded form. - to manipulate and change the data in a consistent way. Decoding/encoding should happen at the object boundaries but everything inside the object should be done on decoded data. Since no algorithm guarantee preserving encoding during a decode/encode round-trip, there is no need to try hard to do so. This also means: - having multiple string representations - not having a `Uri::withQueryParams` or a `Url::withQueryParams` method. It should be left to the developer to understand which string version he ne= eds. On a bonus side, it would be nice to have a mechanism in PHP that allows the application to switch from the current `parse_str` usage to the new improved parsing provided by the new class when populating the `_GET` array. (So that deprecating `parse_str` can be initiated in some distant future.) This last observation/remark is not mandatory but nice to have. So I would propose the following methods: ```php namespace Uri { //takes no arguments returns an empty object Query::__construct(); // named constructor to allow // returning a new instance from // PHP variables (same syntax as http_build_query) Query::fromVariables(array $variable): static // named constructor to allow // returning a new instance from // a list of tuples see the returns // value of Query::toTuples() Query::fromTuples(array $params): static // named constructor to allow // returning a new instance from // query string this is where // decoding takes place Query::parseRfc1738String(): ?static Query::parseRfc3986String(): ?static Query::parseFormDataString(): ?static Query::parseWhatWgString(): ?static //String representation query //this is where encoding should happen //internal decoded data //should only be encoded here Query::toRfc3986String(); Query::toRfc1738String(); Query::toFormDataString(); Query::toWhatWgString(); // Tuple related methods // like the one defined by the WHATWG specifications // method names are changed or update to highlight // the immutable state for modifying methods Query::toTuples(): array> Query::count(): int; Query::has(string $name): bool; Query::hasValue(string $name, null|string $value): bool; Query::getFirst(string $name): null|string; Query::getLast(string $name): null|string; Query::getAll(string $name): array; // Tuple modifying methods Query::sort(): static; Query::withValue(string $name, null|string|array $value): static; Query::append(string $name, null|string|array $value): sta= tic; Query::delete(string $name): static; Query::deleteValue(string $name, null|string $value): static; // PHP variables related methods // the parse_str replacement API Query::toVariables(): array; // returns the same array as parse_str (without mangled data) Query::countVariables(): int; // returns the number of variable found Query::hasVariable(string $variableName): bool; // tells whether the variable exists Query::getVariable(string $variableName): null|string|array; // returns the variable value Query::mergeVariable(array $variables): static // the same syntax returned by the `Query::toVariables` method Query::replaceVariable(string $variableName, null|string|int|float|array $value): static Query::deleteVariable(string $variableName): static } ``` With the following changes: - in respect to `parse_str`, no mangled data should occur on parsing: ```php parse_str("foo.bar=3Dbaz", $params); echo $params['foo_bar']; // returns "baz" array_key_exists('foo.bar', $params); // returns false $query =3D \Uri\Query::parseRfc1738String("foo.bar=3Dbaz"); $query->getVariable("foo.bar"); //returns "baz" $query->hasVariable("foo_bar"); //returns false ``` - in respect to `http_build_query`. - Only accept scalar values, `null`, and `array`. If an object or a resource is detected a `ValueError` error should be thrown. ```php echo http_build_query(['a' =3D> tmpfile()]); //return ''; new \Uri\Query::fromVariables(['a' =3D> tmpfile()]); // throw new ValueErro= r ``` - Remove the addition of indices if the `array` is a list. ```php echo http_build_query(['a' =3D> [3, 5, 7]]); //return a%5B0%5D=3D3&a%5B1%5D=3D5&a%5B2%5D=3D7; new \Uri\Query::fromVariables(['a' =3D> [3, 5, 7]])->toRfc1738String(); // return a%5B%5D=3D3&a%5B%5D=3D5&a%5B%5D=3D7 ``` Best regards, Ignace --00000000000000173b06456ebc14 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Dec 1, 2025 at 9:53=E2=80=AFPM M= =C3=A1t=C3=A9 Kocsis <kocsisma= te90@gmail.com> wrote:
Hi Everyone,

I'd like to introduce my latest RFC th= at I've been working on for a while now: https://wiki.php.net/rfc/uri_followup= .

It proposes 5=C2=A0followup=C2=A0improvement= s for ext/uri in the following areas:
- URI Building
- Query P= arameter Manipulation
- Accessing Path Segments as an Array
- Host Ty= pe Detection
- URI Type Detection
- Percent-Encoding and Decoding Sup= port

I did my best to write an RFC that was at lea= st as extensive as=C2=A0https://wiki.php.net/rfc/url_parsing_api had become= by the end. Despite my efforts,
there are still a couple things = which need=C2=A0a final decision, or which need=C2=A0to be polished/improve= d. Some examples:

- How to support array/object va= lues for constructing query strings? (https://wiki.php.net/rfc/uri_fo= llowup#type_support)
- How to make the UriQueryParams and=C2= =A0UrlQueryParams classes more interoperable with the query string componen= t (mainly with respect to percent-encoding)? (https:= //wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)
- Exactly how the advanced percent-decoding capabilities should work? Doe= s it make sense to support all the possible modes (UriPercentEncodingMode) = for percent-decoding as well (https://wiki.p= hp.net/rfc/uri_followup#percent-encoding_and_decoding_support)
- etc.

Regards,
M=C3=A1t=C3=A9

Hi M=C3=A0t=C3=A9,

After thin= king about it here's my take on the current proposal regarding the Quer= y Parameter Manipulation RFC. Sorry for the wall of text, but I tried to su= mmarize my thoughts.

First of all, I tried to put myself in the shoe= s of a regular PHP developer who has little to no knowledge about the diffe= rent URI specifications but has a general grasp of PHP. From that point of = view the developer knows that:

-= PHP already gives access to the URI query parameters via the `_GET` super globals
- t= o parse the query string in PHP, the developer can rely on `parse_str`<= /span>.
- that to build a que= ry string he should use the `htt= p_build_query` function.

= What we do know is that:

the `_GET` values are also the r= esult of using `parse_str` and its logic is:

- not documented
- PHP centric
- = mangles the data
- tru= ncates query string

Its original goal was to allow direct conversion= of query string into PHP variables usable in scripts. But this behaviour h= as been removed for security reasons from PHP.

`http_build_query= ` allow creating a query string in a more predictable way but still = exposes PHP centric behaviour:

-= It uses `get_object_vars= ` on objects. which is counter-i= ntuitive:

- All `iterable` structures do not give the same result.
- Depending on the object implementation the re= sult varies between PHP versions (ie `<= /span>DateTimeImmutable` used to= be rendered before PHP7.4 since then it fails silently resulting in an emp= ty string being generated.)

- It adds "[", "]" and indices around arrays. This is PHP centric (other languages w= ould just repeat the array name)
- <= /span>It always adds the array indices even when the array is a list which = again can lead to unexpected behaviour, even within the PHP ecosystem.
<= br>On the other hand:

- O= ther modern languages like Java HttpServletRequest or the WHATWG URLSearchP= arams have a complete different takes: They view the query string as a coll= ection of tuple (key/value pair) that can be repeated, there is no notion o= f brackets. The data is preserved even though as you mention the round-trip= between encoding and decoding is never guarantee.
- We have the new HTTP QUERY method which may or may = not fall into the "Should this also be managed by a putative Query cla= ss".

Currently, in your proposal you have 2 Query objects. This= will give the developper a lot of work to understand where, when and which= object to choose and why. Is that complexity really needed? IMHO we may en= d up with a correct API ... that no-one will use.

With all that in m= ind I believe a single `Uri\Quer= y` should be used. Its goal shou= ld be:

- to be immutable<= br>- to store the query in its d= ecoded form.
- to manipulate = and change the data in a consistent way.

Decoding/encoding should ha= ppen at the object boundaries but everything inside the object should
be= done on decoded data. Since no algorithm guarantee preserving encoding dur= ing a decode/encode round-trip,
there is no need to try hard to do so.
This also means:

- = having multiple string representations
- not having a `Uri::w= ithQueryParams` or a `Url::withQueryParams` method.

It should be left to the developer to= understand which string version he needs.

On a bonus side, it would= be nice to have a mechanism in PHP that allows the application to switchfrom the current `parse_str`
usage to the new improved parsing= provided by the new class when
populating the `_GET` array. = (So that deprecating `parse_str<= span style=3D"color:rgb(0,51,179)">`
can be initiated in some distan= t future.)
This last observation/remark is not mandatory but nice to hav= e.

So I would propose the following methods:


```php

nam= espace Uri {
//ta= kes no arguments returns an empty object
Query::__construct();

// named constructor to allow
// returning a new instance from
// PHP variables (same syntax as = http_build_query)
= Query::fromVariables(array $variable): static

// named constructor to allow
// returning a new instance from
// a list of tuples see the retu= rns
// value of Q= uery::toTuples()
= Query::fromTuples(array $params): static

// named constructor to allow
// returning a new instance from
// query string this is where
// decoding takes place
Query::parseRfc1738String(): ?stat= ic
Query::parseRf= c3986String(): ?static
Query::parseWhatWgString(): ?static

//String representation query
//this is where encoding should = happen
//internal= decoded data
//s= hould only be encoded here

Query:= :toRfc3986String();
= Query::toRfc1738String();
Query::toFormDataString();
Query::toWhatWgString();

// Tuple related methods
// like the one defined by the WHATWG specifications<= br> // method names a= re changed or update to highlight
// the immutable state for modifying methods

Query::toTuples(): array<string, null|string|ar= ray<null|string>>
Query::count(): int;
Query::has(string $name): bool;
Query::hasValue(string $name, null|string $val= ue): bool;
Query:= :getFirst(string $name): null|string;
Query::getLast(string $name): null|string;
<= span style=3D"font-family:"JetBrains Mono",monospace;color:rgb(6,= 125,23);background-color:rgb(237,252,237)"> Query::getAll(string $name):= array<null|string>;

// Tup= le modifying methods
=
Query::sort(= ): static;
Query:= :withValue(string $name, null|string|array<null,string> $value): stat= ic;
Query::append= (string $name, null|string|array<null,string> $value): static;
Query::delete(string $n= ame): static;
Que= ry::deleteValue(string $name, null|string $value): static;

// PHP variables related methods
// the parse_str replacement API<= br>
Query::toVariables(): array; // = returns the same array as parse_str (without mangled data)
Query::countVariables(): int; // = returns the number of variable found
Query::hasVariable(string $variableName): bool; // tell= s whether the variable exists
Query::getVariable(string $variableName): null|string|array; /= / returns the variable value
Query::mergeVariable(array $variables): static // the same synt= ax returned by the `Query::toVariables` method
Query::replaceVariable(string $variableName, = null|string|int|float|array $value): static
Query::deleteVariable(string $variableName): st= atic
}```

With the following changes:
- in respect to `parse_str= `, no mangled data should occur on parsing:


```php
parse_str("foo.b= ar=3Dbaz", $params);
echo $params['foo_bar'= ;]; // returns "baz"
array_key_ex= ists('foo.bar', $params); // returns false

$query =3D \Uri\Query
::parseRfc1738String("foo.bar=3Dbaz");
$query->= getVariable("foo.bar"); //returns &quo= t;baz"
$query->hasVariable("foo_bar"); //returns false
`= ``

- in respect to `http_build_query`.

- Only accept sca= lar values, `null`, and `
array`. If an object or a re= source is detected a `ValueError= ` error
should be thrown.

```php
echo http_build_query(['a' =3D> tmpfile()]); //= return '';
ne= w \Uri\Query::fromVariables(['a' =3D> tmpfile()]); // throw new = ValueError
```
<= br>- Remove the addition of indices if the `array` is a = list.

```php
echo http_build_query(['a' =3D> [3, 5,= 7]]); //return a%5B0%5D=3D3&a%5B1%5D=3D5&a%5B2%5D=3D7;
<= span style=3D"font-family:"JetBrains Mono",monospace;color:rgb(6,= 125,23);background-color:rgb(237,252,237)">new \Uri\Query::fromVariables([&= #39;a' =3D> [3, 5, 7]])->toRfc1738String(); // return a%5B%5D=3D3= &a%5B%5D=3D5&a%5B%5D=3D7

```
Best regards,
Ignace=C2=A0
--00000000000000173b06456ebc14--