Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:129605 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id C560B1A00BC for ; Sat, 13 Dec 2025 21:28:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1765661332; bh=l4YDKJIXCJyCuUpn5eD9xk7g1CuxLt4LepyvcD0vdN0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=DrRC6ktVXK9eQbBhifXYn12Dwo/mZiOC12UuZt1MZ02nHyUK7vlw29Z/jFSnrJ8j+ s/RG8CfAmMOSGpLOHpWQta2m42knKWX3A+YO2EdwUwa5hEpRQicGRUe+S8qJiUzlKJ 8T0N2RevhOx/e/RPOfzXcK/FG3t2L5uvwoiLbojxxdT7lfArTflx8YeZhhFICR7st4 6yEE69blhtbMqIhcLUR0lILCudfltmO3Cayje1yoCZ4v/oeb7u/BnT0ENhhD6mDftL EumFYrzENL3siP0jfQ5klMjXOv4v85Se37/XHHPPfHCQ5iorBgClNB4DVPZP2nhpNZ rsoOWgO8Z5B4Q== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id DCB8118005B for ; Sat, 13 Dec 2025 21:28:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 13 Dec 2025 21:28:50 +0000 (UTC) Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-4f0013c54efso8436931cf.1 for ; Sat, 13 Dec 2025 13:28:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765661325; x=1766266125; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4FHvTDkjJ10gXRoZXzr8FZJu8R1buMkDFHwI8ewnFaM=; b=ZDk/CUs72HR+iDGuTkK3cyoJ+dfYPCpvCqll0HQCzj92XUsOkSKje2oPwu1FcwGZdy owPDxjCUb5DPkbSeeCHdNegY+m/pmPDtCmTzDocdBP6v32zoTPjjO2VtvKm8tX3v3kzr UCj6a/j9jfY6OsNuJZ1LbSEwQ4bLu0q4nEiF5Fk6cNnTDdHMPzTjhQOyXD1ZcNDfbhIy i87Br/6h7i2Xlekm63/2brLgC/4Ztijs8hF6jg+SKKKQkz0aAlOQzUIfndiTvf6HhypT KipJLjdWKjqlwBDB/FrFzjbiODROvHinKZ7jYvy3TIYLWz6l6UMq7YiUWjpbNVIRX0sq 7sJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765661325; x=1766266125; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4FHvTDkjJ10gXRoZXzr8FZJu8R1buMkDFHwI8ewnFaM=; b=n/SvkKxO8p/ROeV3Sl1fdbdVVSUDkZ1jsWQtQrBWj82rmKCAGBBDQgR5ryzqUKvXXV TZSrFmB1GNEJWd4k+kzOxA9gY74zvFvzdbSr1wHK4w0K118K4JDpSvwvpRveexnXaS+6 0sOWjucnJpe1+qMcoKhQ4w9SU4TVB+ATGhGU3cT7xesGy3a3UmZw3QmCOa47dGXMj+Kc F62/o4epCTIe/VVQpM++JZNu8xSW9O+ui9Z4YY7+b8Gg0P/nnzZkVSycDmkhrBSI0Kyr CoryKmrDM/WrXz+BwpFG74Q0txwAkrfkThtoCjYu2qoX5vz4ewTlrjwGOZiNX6O+KBZW juDA== X-Gm-Message-State: AOJu0YzToK3b1o6PSTNkIEZuYJGU66zdaRslp+R6Fd5fwJblIvF347QH 5a3Aldg1AtLe9iY5tbLW8wn+fC1O/rtbs/p7+gxJnUYb8htbsn64nwT+8wdP5zIAVnOChYAuUXa l26WdfrUSnPRkbdQGp2T+XyIYDHruj2w= X-Gm-Gg: AY/fxX71ZW+GqpCY2/368LrMeW/OpNFJyHXGQBKCj2yNRL8/C1RJIdid2k+eLqXXEv6 NngbPTTtNfNrkOlJvmMC7tKvCxuqZS5zYLiKJeW/xyCgQN43qKaY7id1UwEWrjVRkoKEljjKLA4 JCgI5XMelT7+RwuRcz5JYUQmk+O0tVEbyGiMJJXleAX4NSHbt2mD+aWe+BGSOUy5f36eTT8TfmA 5Tsvp79E+K3zgM3nt3SKQcMMBFdB0uY9eGhHlhzkuZMu/bd5CQjgyAJbiFv5Ls1xm4fUg== X-Google-Smtp-Source: AGHT+IFa3yHVOA8R6w6Rglcoz8Z9GSUQwM0PoAb/S9d8M9U28u42LhudiD6JtiPzcps92ed3zACl1yQI1azJy2VrokM= X-Received: by 2002:ac8:6f05:0:b0:4ed:6dde:4573 with SMTP id d75a77b69052e-4f1d05defe5mr90110791cf.52.1765661323947; Sat, 13 Dec 2025 13:28:43 -0800 (PST) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 References: <007f01dc6799$bd08f3d0$371adb70$@glaive.pro> In-Reply-To: <007f01dc6799$bd08f3d0$371adb70$@glaive.pro> Date: Sat, 13 Dec 2025 22:28:33 +0100 X-Gm-Features: AQt7F2qXr8D1TCK1vjWXj92rL-apIX-wOtlCMVjt0Ykf0ciyJ3t5OFmWD9BoQSk Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Followup Improvements for ext/uri To: Juris Evertovskis Cc: PHP Internals List Content-Type: multipart/alternative; boundary="00000000000001f4ed0645dc10b3" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --00000000000001f4ed0645dc10b3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, > 1. Query setters and types of their parameter. Quote from the builder > example: > > > > ``` > > ->setQuery*(*"a=3D1&b=3D2"*])* > > ->setQueryParams*([*"a" =3D> 1, "b" =3D> 2*])* *// Has the same effec= t as > the setQuery() call above* > > ``` > > > > I=E2=80=99d expect the `setQueryParams` to set the particular params that= I > specified instead of overriding all the query (which I=E2=80=99d expect `= setQuery` > to do). > I'm happy that you raised this concern because I wouldn't ever have thought the behavior was unclear. My expectation with setters is that they completely override the related property. If they were intended to add/append the query params then something like addQueryParams() should be used instead. And I think this question has just highlighted why it's a bad idea to remove the set prefix from the Builder methods: because it would also remove some additional context about what the method exactly does (which is apparently not even entirely clear with the set prefix included, but without the set prefix, one would have zero clue if it's an append or set operation). > Maybe it would be possible for `setQuery` to accept string, array and > instances of `*QueryParams`? And `setQueryParams` could either be left ou= t > or override just the selected params? As far as I see, there is no > equivalent `withQueryParams` so there=E2=80=99s no precedent on how such = a method > should work, right? > Good call, I've already updated the RFC so that setQueryParams() accepts a UriQueryParams/UrlQueryParams instance. It was just an oversight on my part= . But I don't really like to use union types (mainly because the method behavior needs extra explanation), that's why I chose to add two distinct methods for the query handling. > Btw I wasn=E2=80=99t able to find docs on the existing Uri classes (had t= o look it > up in the prev RFC). Is the search broken or are the docs just not there > yet? > Yes, unfortunately, the docs are not there yet :( I usually prioritize my work in favor of php-src, and since I apparently chose a massive undertaking yet again, I have very limited time for anything else. But I'll try to find time to add the missing stuff to the documentation. > 2. Regarding the interface extraction, is there any difference between th= e > internal states of `Uri\WhatWg\UrlQueryParams` and > `Uri\Rfc3986\UriQueryParams` objects? I would naively expect not only a > common interface, but a common class. A `QueryParams` that could be > supplied to any of the withers/setters and that would be able to > `->toRfc3986QueryString()` or `->toWhatWgQueryString()` on use not on > instantiation. > I have already been thinking a lot about this question for a while, and I agree that the two QueryParams implementations are very similar: the only difference between them is how they parse the query string into query params and how they recompose the query params to a query string. So I would be happy to be able to unify the two implementations, that would be a huge simplification. However, there are two reasons why I'm very hesitant to do so: - We should be absolutely sure that an unified implementation cannot cause parsing confusion vulnerability. E.g. when the query params are parsed according to one specification, and then they are recomposed according to the other one. For example, one difference (this is also mentioned in the RFC) is that WHATWG URL removes the leading "?" character during parsing, while RFC 3986 leaves it as-is. These differences must be considered very carefully. - If we have two dedicated classes, then they can evolve separately with specification-specific behavior. If it turns out that we want to add support for some specification-specific feature, then a specification-specific class is better suited for the purpose. I can't really come up with many examples, but maybe additional percent-encoding capabilities could be needed (e.g. getFirstPercentEncoded()). > Similarly with the builders I fail to see why do I need to decide on > `Uri\Rfc3986\UriBuilder` and `Uri\WhatWg\UrlBuilder` at the start instead > of having `Uri\Builder` that could be consumed via `->buildRfc3986Url()` = or > `->buildWhatWgUri(). Is that UserInfo thing the whole dealbreaker? To me > all the other parts like specifying host, port and so on seem spec-agnost= ic > until serialization. > Yes, the userinfo is just one of the deal breakers. Not too long ago, I modified the RFC text, and added more info about how validation exactly works. According to my plans, the individual setters would also make some basic validation (formatting of the component), while the build() methods would make sure that some global rules are also satisfied (there's a bit more info in the RFC about this). And this is only possible to implement if there are two different builders. M=C3=A1t=C3=A9 --00000000000001f4ed0645dc10b3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
=C2=A0

1. Query setters = and types of their parameter. Quote from the builder example:=

=C2= =A0

= ```

=C2=A0=C2=A0=C2=A0 ->setQuery("a=3D1&b=3D2"<= b>])

=C2=A0=C2=A0=C2=A0 ->setQueryParams(["a" = =3D> 1, "b" =3D> 2]) // Has the same effect as th= e setQuery() call above

= ```

=C2=A0

I=E2=80=99d expect the `set= QueryParams` to set the particular params that I specified instead of overr= iding all the query (which I=E2=80=99d expect `setQuery` to do).

=

I'm happy that you r= aised this concern because I wouldn't ever have thought the behavior wa= s unclear. My expectation with setters is that they completely override the= related property.
If they were intended to add/append the query = params then something like addQueryParams() should be used instead. And I t= hink this question has just highlighted why
it's a bad idea t= o remove the set prefix from the Builder methods: because it would also rem= ove some additional context about what the=C2=A0method exactly does (which = is
apparently not even entirely clear with the set prefix include= d, but without the set prefix, one would have zero clue if it's an appe= nd or set operation).
=C2=A0

Maybe it would be possible for `setQuery` to accept str= ing, array and instances of `*QueryParams`? And `setQueryParams` could eith= er be left out or override just the selected params? As far as I see, there= is no equivalent `withQueryParams` so there=E2=80=99s no precedent on how = such a method should work, right?

=

Good call, I've already updated the RFC so that=C2= =A0setQueryParams() accepts a UriQueryParams/UrlQueryParams instance. It wa= s just an oversight on my part.
But I don't really like to us= e union types (mainly because the method behavior needs extra explanation),= that's why I chose to add two distinct=C2=A0methods for the=C2=A0query= handling.
=C2=A0

Btw I wasn=E2=80=99t able to find docs on the existing = Uri classes (had to look it up in the prev RFC). Is the search broken or ar= e the docs just not there yet?


Yes, unfortunately, the docs are not there yet :( I usuall= y prioritize my work in favor of php-src, and since I apparently chose a ma= ssive undertaking yet again, I have very limited time
for anythin= g else. But I'll try to find time to add the missing stuff to the docum= entation.
=C2=A0=C2=A0

2. Regarding the interface extraction, is there any = difference between the internal states of `Uri\WhatWg\UrlQueryParams` and = =C2=A0`Uri\Rfc3986\UriQueryParams` objects? I would naively expect not only= a common interface, but a common class. A `QueryParams` that could be supp= lied to any of the withers/setters and that would be able to `->toRfc398= 6QueryString()` or `->toWhatWgQueryString()` on use not on instantiation= .


I have alrea= dy been thinking a lot about this question for a while, and I agree that th= e two QueryParams implementations are very similar: the only difference bet= ween them is how
they parse the query string into query params an= d how they recompose the query params to a query string. So I would be happ= y to be able to unify the two implementations, that would be a huge
simplification. However, there are two reasons why I'm very hesitant= to do so:
- We should be absolutely sure that an unified impleme= ntation cannot cause parsing confusion vulnerability. E.g. when the query p= arams are parsed according to one specification, and then
they ar= e recomposed according to the other one. For example, one difference (this = is also mentioned in the RFC) is that WHATWG URL removes the leading "= ?" character during parsing,
while RFC 3986 leaves it as-is.= These differences must be considered very carefully.
- If we hav= e two dedicated classes, then they can evolve separately with specification= -specific behavior. If it turns out that we want to add support for some sp= ecification-specific feature, then
a specification-specific class= is better suited for the purpose. I can't really come up with many exa= mples, but maybe additional percent-encoding capabilities could be needed (= e.g. getFirstPercentEncoded()).
=C2=A0

Similarly with the builders I fail to see why do I need to decide on `Uri\= Rfc3986\UriBuilder` and `Uri\WhatWg\UrlBuilder` at the start instead of hav= ing `Uri\Builder` that could be consumed via `->buildRfc3986Url()` or `-= >buildWhatWgUri(). Is that UserInfo thing the whole dealbreaker? To me a= ll the other parts like specifying host, port and so on seem spec-agnostic = until serialization.


Yes, the userinfo is just one of the deal breakers. Not too long ago= , I modified the RFC text, and added more info about how validation exactly= works. According to my plans,
the individual setters would also = make some basic validation (formatting of the component), while the build()= methods would make sure that some global rules are also satisfied
(there's a bit more info in the RFC about this). And this is only pos= sible to implement if there are two different builders.

M=C3=A1t=C3=A9
--00000000000001f4ed0645dc10b3--