Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:130257 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 05A131A00BC for ; Thu, 5 Mar 2026 07:10:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1772694621; bh=QhgPFD6Deg+kccfhU5fJdv9H2THd4+HuYXCiADlPSKk=; h=References:In-Reply-To:From:Date:Subject:To:From; b=SAukV8CAtjh8M6cWPXmsEt7ywt/QQwTrQt2vL3R6Xf16Ghs5IjJezKA0t4mKIHahz fqts+gBOcSAAKqs/zs5P5/vfbEzTVrHrL3CFEjpxhHz8eYRI0G5s46GqzG4FySMulI +69Ekk1IC9eBE5T45R55YxtD2ldgd6IHWI50ev0/0XoJrI0B/GP092gfU+rsHqsaAZ XeTKJmf465lV6vtMa13MVuJIJJnBKbLGkgbXK+pnVc5PW/ZmDbTmgj7eF+3vuPl+02 kcIvltIPd3BrRhf1Mhs6R5Kr3m7OdqpTBg6T0oevBWZzntjYVxdeRcqhJ7LPs7M6LM y/UDx5mEPp5HQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7570C1801D4 for ; Thu, 5 Mar 2026 07:10:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 5 Mar 2026 07:10:10 +0000 (UTC) Received: by mail-ot1-f45.google.com with SMTP id 46e09a7af769-7d556c1a79eso9457538a34.3 for ; Wed, 04 Mar 2026 23:10:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772694604; cv=none; d=google.com; s=arc-20240605; b=k8fE6bXbBXDNIjUfyT89eNUbjNg+2Pp3nfYUzn6zcMln3t7VuPYem0XBNKZN/VX0qm nmmYeiGt+IbwkDQveoxT55Fwi6SBJJ5Hy1e0fkPHvvENxHLD/pEhvXLWA3EFiAyJrycS ogkXi49OFT61fD5WqAGPB85r3CumM4k3aj8bUy6mJjZtDRUtBrhXv8atGCcz8PEd6ta3 IdaKyj11HWm/4x8aOBxEuuxPFWI3RVNTimRdK3Ei2I9aSqH6WVPM+KDGlDyQjVDCXyri nOp2kOEM76D2vNuPSw1W3x8QvYfjqh8Xzu87056nyAeLcmXrNEhfEWDmAZbpx+1cvW1e WJmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=lSx+jSkxLEgiOKnSKfOf6wx3lY1tiUVWY1n2quYJo50=; fh=o6BGd4iISNJ25roShBnILObmDpazANjLujIL78c+3Cs=; b=YuUbjJbGmcVnFRwhJx6UO0yzcv0K/TDOUDvAB0QKJykzlEK+2jo4U11q/Diyh3SlUD +hc7tLAxrgjC4MZfeZpVCiIwe6h3T6vLA2mtPHHWGDfSnQyPgxGMJIPEzza/s+Ku6GMr 71b5iem709Abh5V2fGq6Adz1KE34Blqm1wDGJB7Iog+SEWIKr/gt6ChOLX9fdBUknoqU JY0Pu1P3IzLhivTofH2tLMHsCAToolmYKOk0qbexsB/0sFfqunkxs1cFRYiwq7dRk4Vo a9r0lk+8z3I+he2fJ5qAD1lEmP3UdfLeavsUKAtJD9x1ao4o+Q3GkiY4s7AheixW4xyd OgtA==; darn=lists.php.net ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772694604; x=1773299404; darn=lists.php.net; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=lSx+jSkxLEgiOKnSKfOf6wx3lY1tiUVWY1n2quYJo50=; b=Gm/r92PBCLgC1sluhdayxoPnJxvNKxWI59/YNEu3G3HAkw18PJT3ZLNChTPFw/L0Wy zTuPy7RbXmz9b9LvV7jT4poJeiYklyhHLkW6Gr8mJoCX0i9RRxQKKAWDanqd/XBKw+15 NmWXgYLxU2rbsre+v1CSTfpZHP0hfPBONBjCIzA7+bMUG2UPno7RBDF9i/30Iwc5cqif l8Whk9UKwnSyFTjedtrkczLt7jA+UVBxIfoBmsI8llkE0LIPJUi9j8CAhk06R5tsRYJa 2VS0ocsAdxXU7ROrU9t3dyfUl9G6RWn2cbyBRleZqFpmMlspXZK4pHndMePV1U26YdQM 183w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772694604; x=1773299404; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lSx+jSkxLEgiOKnSKfOf6wx3lY1tiUVWY1n2quYJo50=; b=o7inTe4xuX2HSU4Y2shGlhkurO5U2EgLm7714YgykC3SOba25BhCLhRHvmrXjyi9hF eQv61n3JX9b7SL4aHjj858ZvFseek+5t7KpZ5mExX/hxR1F3EQ4MRqBz2i18dy8FiH6U Vu9Tljhv9uIYEhWXgEqJG9bF4YJ/9WOMkAk2AAWrqKyQnxV8+uc52148OmBTGThbltOs 5IL+rkcX1Z2q3vbBhNWrlY4lM2AhrMT7d5wF22f++wKos27oGKQI/l5Ktm99T2YzNKUe IVVYHrAiv2euOoXClOJA/NiHFd/pIwmfCKQPs4Mud9WwzhXO7VEccZnYzuhpondKWHD5 iKcQ== X-Forwarded-Encrypted: i=1; AJvYcCXmK7HrDB7yh69P/gUhXL88QGsHPbJjPYxtp5yKqX+NLO9hs7kcIYaJVV5y7fWMgVzL70pMSaVILlw=@lists.php.net X-Gm-Message-State: AOJu0Yy8DcF6028J3ZHYucRiPQ0opPylPzVLXjkxtFDQRWQHDU8jKTCJ 8v7wlnUKpur2gjO9GdYn4XU6HdbxsicZkWPliTw9D6mVksbqTEZ09N3pZqh1kU8ujU0bd3cLUk8 lX1+zUiCrIxp+l1X+2ckvZ2RAapXT6Dk= X-Gm-Gg: ATEYQzx4bUzpZUtDbJr8NXOfeQsM/lMWnwRojmM7Rwwp9TGs53ffI8wIoJ8LbxG5k+U CyUiTM6c2VxCogeDrTXlsEGhb4uqy50HIkssDZXdDguNiaoihAxGrYNvM861HxW8tSrxa6l/lKM KS3/qRMmBOJFkChNFA6am9Xe1cpSUGc7PkCINOTi60CDIDbjdIs2YLeHw3PivraF6BflDLfSF8O sKqynqm4i0sxCEV2J63o6JqP+YDscNI2jJjE7W6UNALpS8G/Rx8yA/KzqBKiijsIBlMC0lbpXAJ fr0nCh/u X-Received: by 2002:a05:6820:4b8f:b0:67a:1eb7:e784 with SMTP id 006d021491bc7-67b176e978dmr3208593eaf.2.1772694604403; Wed, 04 Mar 2026 23:10:04 -0800 (PST) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 References: <83238ad3-c844-4457-dfb3-11321787e022@php.net> In-Reply-To: Date: Thu, 5 Mar 2026 08:09:52 +0100 X-Gm-Features: AaiRm51mQyyQ_PwkRic1vuVNJbU30j0n00mJ7QNrRnpatqrtaKO7yYH_NKTXZqs Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Followup Improvements for ext/uri To: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= , PHP Internals List Content-Type: multipart/alternative; boundary="00000000000030d73f064c41a097" From: nyamsprod@gmail.com (ignace nyamagana butera) --00000000000030d73f064c41a097 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi M=C3=A0t=C3=A9, As always I tried to implement a polyfill for the Percent-Encoding and Decoding Support RFC. Turns out while doing so I was able to refactor the enum. Of note, the case names are the one used in the text and *NOT* in the Enum example provided as they differ. I will update the names once you have updated them. Here's my alternate proposal for Uri\Rfc3986. Keep in mind that the same reasoning would apply for the Uri\Whatwg counterpart. namespace Uri\Rfc3986 { enum UriComponent { case UserInfo; case Host; case Path; case PathSegment; case AbsolutePathReferenceFirstSegment; case RelativePathReferenceFirstSegment; case Query; case FormQuery; case Fragment; case AllReservedCharacters; case AllButUnreservedCharacters; /** * @throws InvalidUriException */ public function encode(string $input): string; /** * @throws InvalidUriException */ public function decode(string $input): string; } } As previously stated, I added the encode/decode method in the Enum this way the feature is fully handled by the Enum and no direct reference to the Uri class via a static method is done. The Enum is renamed *UriComponent* instead of the current *UriPercentEncodingMode *the name change highlights the intent of the Enum encoding and decoding URI component, where each enum case represents a defined component context. Both methods may trigger an exception (I do not know if specific exceptions like UnableToEncodeException and/or UnableToDecodeException should be added but, for now, the generic InvalidUriException is used. This rewrite also greatly simplifies the Enum usage. Below you will see your examples from the RFC rewritten - Decoding the fragment $uri =3D new Uri\Rfc3986\Uri("https://example.com#_%40%2F"); $fragment =3D $uri->getFragment(); // returns "_%40%2F" echo Uri\Rfc3986\UriComponent::Fragment->decode($fragment); //returns "_%40/" - Decoding the query //with the query component $uri =3D new Uri\Rfc3986\Uri("https://example.com/?q=3D%3A%29"); $query =3D $uri->getQuery(); // returns "q=3D%3A%29" echo Uri\Rfc3986\UriComponent::Query->decode($query); //returns "q=3D:)" - Usage with the new Uri::withPathSegments method $uri =3D new Uri\Rfc3986\Uri("https://example.com"); $uri =3D $uri->withPathSegments([ "foo", Uri\Rfc3986\UriComponent::PathSegment->decode("bar/baz") ]); $uri->toRawString(); // https://example.com/foo/bar%2Fbaz Let me know what you think, regards, Ignace On Tue, Mar 3, 2026 at 10:24=E2=80=AFAM ignace nyamagana butera wrote: > Hi M=C3=A1t=C3=A9, > > I just re-read the RFC and I like the updates and precision you've brough= t > to it here's my review: > For the builders I have nothing more design wise to add this is already > solid. I may nitpick on the *Builder::clear() method name I would have go= ne > with *Builder::reset() but I presume other developers would go with clear= . > Other than that the public API is spot on. > > For the Enum, my only concern is that they serve just as flags and their > usage is tightly coupled to the Uri classes. I would add 2 static named > constructors fromUrl and tryFromUrl just for completeness. I believe the > maintenance cost is negligible but the developer DX is improved and allow= s > for a broader usage of the Enum. > > In regards to the path segments usage and constructor I see you already > integrate my Enum suggestions and you have explained why a fully > fledged class is not the right approach. So the current design is already > solid. > > Last but not least, The Percent encoding feature should be IMHO improved > by moving the encode/decode methods from being static methods on the URI > classes to becoming public API on the Enum. This would indeed imply > renaming the enum from Uri\Rfc3986\UriPercentEncodingMode to > Uri\Rfc3986\UriPercentEncoder with two methods encode/decode. Again it > makes for a more self-contained feature and adds to the DX. Developer wil= l > not have to always statically call the URI classes for encoding/decoding > strings as the Enums and their cases already convey the information > correctly. > > Overall I believe this is going into the right direction > > Regards, > Ignace > > > On Sun, Mar 1, 2026 at 11:09=E2=80=AFPM M=C3=A1t=C3=A9 Kocsis > wrote: > >> Hey Ignace et al, >> >> I have updated the RFC in the past few weeks with a lot of extra info, >> mostly related to path segment handling: I investigated WHATWG URL's >> behavior more thoroughly, and it turned out that path segments are >> handled very interestingly, so there was a significant difference compar= ed >> to RFC 3986 yet again. >> >> Please give the RFC another read, if possible. >> >> Regards, >> M=C3=A1t=C3=A9 >> >> --00000000000030d73f064c41a097 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi M=C3=A0t=C3=A9,

As = always I tried to implement a polyfill for the Percent-Encoding and Decodin= g Support RFC.=C2=A0 Turns out while doing so I was able to refactor the en= um. Of note, the case names are the one used in the text and *NOT* in the E= num example provided as they differ. I will update the names once you have = updated them.
Here's my alternate proposal for Uri\Rfc3986. K= eep in mind that the same reasoning would apply for the Uri\Whatwg counterp= art.

namespace<=
/span> Uri\Rfc3986 {
    enum UriComponent
    {
        case UserInfo;
        case Host;
        case Path;
        case PathSegment;
        case AbsolutePathReferenceFirstSegment;
        case RelativePathReferenceFirstSegment;
        case Query;
        case FormQuery;
        case Fragment;
        case AllReservedCharacters;
        case AllButUnreservedCharacters;

        /**
         * @throws InvalidUriException
         */
        public function encode(string $input): string<=
/span>;

        /**
         * @throws InvalidUriException
         */
        public function decode(string $input): string<=
/span>;
    }
}

As previously stated, I added the encode/d= ecode method in the Enum this way the feature is fully handled by the Enum = and no direct reference to the Uri class via a static method is done.
=
The Enum is renamed UriComponent instead of the current=C2=A0UriPercentEncodingMode the name change highlights the intent of the En= um encoding and decoding URI component, where each enum case represents a d= efined component context.
Both methods may trigger an exception (= I do not know if specific exceptions like UnableToEncodeException and/or Un= ableToDecodeException should be added but, for now, the generic InvalidUriE= xception is used.
This rewrite also greatly simplifies the Enum u= sage.=C2=A0

Below you will see your examples from = the RFC rewritten

- Decoding the fragment

$uri =3D new Uri\Rfc3986\Uri("https://example.com#_%40%2F");
$fragment =3D $uri->getFragment(); // returns "_%40%2F"
e= cho Uri\Rfc3986\UriComponent::Fragment->decode($fragment); //returns "_%40/"

- Decoding the query=

<= tr style=3D"box-sizing:border-box;background-color:rgba(0,0,0,0)">
//with the query component
$uri =3D new Uri<= /span>\Rfc3986\Uri("https://example.com/?q=3D%3A%29");
$query =3D $uri->getQuery(); // returns "q=3D%3A= %29"
echo Uri\Rfc3986\UriComponent::Query->deco= de(= $query); //returns "q=3D:)"

- Usage with the new Uri::wit= hPathSegments method

= Uri\Rfc3986\UriComponent::PathSegment->de= code("bar/baz")<= /tr>
$uri =3D new Uri\Rfc3986\Uri("h= ttps://example.com");
$uri =3D $uri->withPathSegments([<= /td>
"foo",
]);
$uri->toRawString(); // https://example.com/foo/bar%2Fbaz

Let me know what you think,
regards,
Ignace

On Tue, Mar 3, 2026 at 10:24=E2=80=AFAM i= gnace nyamagana butera <nyamsprod@gmail.com> wrote:
Hi M=C3=A1t=C3=A9,

I just re-read the RFC and I like the updates and precision you'= ;ve brought to it here's my review:
For the builders I have n= othing more design wise to add this is already solid. I may nitpick on the = *Builder::clear() method name I would have gone with *Builder::reset() but = I presume other developers would go with clear. Other than that the public = API is spot on.

For the Enum, my only concern is t= hat they serve just as flags and their usage is tightly=C2=A0coupled to the= Uri classes. I would add=C2=A02 static=C2=A0named constructors fromUrl and= tryFromUrl just for completeness. I believe the maintenance cost is neglig= ible=C2=A0but the developer DX is improved and allows for a broader usage o= f the Enum.

In regards to the path segments usage = and constructor I see you already integrate my Enum suggestions and you hav= e explained why=C2=A0a fully fledged=C2=A0class is not the right approach. = So the current design is already solid.

Last=C2=A0= but not=C2=A0least, The Percent encoding feature should be IMHO improved by= moving the encode/decode methods from being static methods on the URI clas= ses to becoming public API on the Enum. This would indeed imply renaming th= e enum from=C2=A0 Uri\Rfc3986\UriPercentEncodingMode to Uri\Rfc3986\UriPerc= entEncoder with two methods encode/decode. Again it makes for a more self-c= ontained feature and adds to the DX. Developer will not have to always stat= ically call the URI classes for encoding/decoding strings as the Enums and = their cases already convey the information correctly.

<= div>Overall I believe this is going into the right direction=C2=A0

Regards,
Ignace


On Sun, Mar = 1, 2026 at 11:09=E2=80=AFPM M=C3=A1t=C3=A9 Kocsis <kocsismate90@gmail.com> wrote= :
Hey Ignace et al,

I have updated the= RFC in the past few weeks with a lot of extra info, mostly related to path= segment handling: I investigated WHATWG URL's
behavior more = thoroughly, and it turned out that path segments are handled very interesti= ngly, so there was a significant difference compared
to RFC 3986 = yet again.

Please give the RFC another read, if po= ssible.

Regards,
M=C3=A1t=C3=A9

--00000000000030d73f064c41a097--