Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127230 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id DDED91A00BC for ; Mon, 28 Apr 2025 20:48:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1745873146; bh=AVGWR4EUjBjrPbnM4KZ9MLRRipWNNvZLlcZNdab1fnQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=WAnWCveix9rvqpfNKfkLuouxhBwGLsl7adHFwgL2dnA2cqe0jMauU+fHq5igIoOTZ v/qi5frponMRMhVqXb4zJR/Vi/ZdWsvxXIY9Hg651t9tnHNhaCJ+biL9LpSDVZp8cb G902ZC3pe8zsi1iBCQHE1mWvxpe3fEStJpEvjDxBKwOAw+aZl3w7Nlx1fDeyCM+RFE DFdM4eCxTBoXN6ky3I0eGUbdJdSy4oziuSWgaWYFHvuuOZdJr52PypnlrjLJOyDdeg Gd3SH5WdrcKJzQaOTaKbXvKzcOtVz2hobpkxoIhhio1tN6e3xI043n6uNNGSpu2o8Q LIGGMZ+CYTWbQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9C914180034 for ; Mon, 28 Apr 2025 20:45:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 28 Apr 2025 20:45:45 +0000 (UTC) Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-301918a4e1bso4403210a91.1 for ; Mon, 28 Apr 2025 13:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745873281; x=1746478081; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jLae1rhmpCtY+OKwW4o3OyyE+iMnZKPAC4fpS5Fg0hs=; b=i8WxxxJ1ZJjCbhf5gtDv60fN6lzP1LHA5BYIVHpNKRGGniC3MpAkuH+PPvAkoSmaDL DfoBHqW1QycoIMGsGiTIdditdbwXZtb3t1T2FnnOKjiIE5vbeeARqbd4n+hgoOKwXTwC e7cS3OrCGg3t6vK1IvvQZFAJyw8o7+fTCAhTLtElpNYzdXOFdW4oigyZUNYQ/qhq2WR5 ig4Tj026Tynw7+sv57mbtunBZ203akh10m1hW2u/eNZusio4WZFLrT0AEwE6AV/w00Ll H5f/UNASxAwanRRR6GdB6YBaarq+nD0/FucFUak+teFTcq7pR1AwWHE0lU6db276JOWc RiXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745873281; x=1746478081; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jLae1rhmpCtY+OKwW4o3OyyE+iMnZKPAC4fpS5Fg0hs=; b=suIoDh9L68bZ2dqB6s/2WMKHeylx1KM0rrpcl4OJ3kO4B9K7biHiExIKlDUw5EBd/b nnnOhILghi6zQMToXRgmcY5c5mQXWVfIUlzGNuMWV4qg6vrE2L0fXyku6Zf/+e/3QF8K wjjl1I1vY9sfSejyG21ocBvPf4/gUOOkvymJzrfAOYm2Lpq5TSzvxFT5OcTITEvu/6C3 eNkxI3PGpT3dYVFdKt8jPnNGu+aDMp/ptu+zXOCQGbGya1+ww7jYyn7CcPVYZWx9ZirJ 4Pn84F5ING+7ReqMEl65QqGF8KkQXg+SFD6JTVNw2QkcAFjurl4c7p01UODQyWlo1dky Fj2w== X-Forwarded-Encrypted: i=1; AJvYcCVKEy0bWkYA93g8k/lvwEfkydV8Wb6nubYFCqydQTqofwm24xI7wSCij6B+Wy+uZX63j7FKih+4Z1I=@lists.php.net X-Gm-Message-State: AOJu0YyaAXjOZIHGiP5pVsfZ/KJEJygmZxG01w1QlzJ6HgHcrW3bW646 BZ6SQkZUYlGOVKWLBMbQ301djCq6cQDIqs3WgCoA8tyRbaszJj/NX8jfHMnbQEtZK8WAFVDUj06 HwcwqBQ/XG2Z/nRlP8J/GKx2glwc= X-Gm-Gg: ASbGncsC9ZSboDjRVqXORX8B9XVj6vwiR+7WPh+45QUPL/SaIJvLp+mNmcSsMKFGDde o153oYViJEYqKHdbav48mFKRpkls9Zo5MZSZC/T+Vq2v2BHoPALwxdJURdTyL0HHR5nqqzu/3RH bDDURjS63Jemw4s8LfKEuSKjvsGNasSyLJEnqrkAjGf8m/o612P7qq6VSmVmRsGXpXArs= X-Google-Smtp-Source: AGHT+IGbt5wsH68SzT6ARuixKdI7L1ZgaMJnGJJhhtxfkJ/Sj1dnxJ/cfP93z8TSpFdpb1x6E0W7YSXWRiiVMOLjN54= X-Received: by 2002:a17:90b:384b:b0:308:637c:74f2 with SMTP id 98e67ed59e1d1-30a2155ae9cmr1873117a91.17.1745873281072; Mon, 28 Apr 2025 13:48:01 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <8df04e01-deac-404b-beb7-cd982423db63@bastelstu.be> <33427cd03035ef084245c44290b56a55@bastelstu.be> <0aa1eefc3941bdea0092e935074daa58@bastelstu.be> <76d96ea8a78c6025128c0a4b01c94c0a@bastelstu.be> In-Reply-To: Date: Mon, 28 Apr 2025 22:47:49 +0200 X-Gm-Features: ATxdqUHXWaA0BLLzy0tM06sjhJQWzRxlyJ-6sYPKKTTW5l0li-z3NltIAdpXVI0 Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: "Paul M. Jones" Cc: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= , Internals Content-Type: multipart/alternative; boundary="000000000000bdc4690633dccc7a" From: nyamsprod@gmail.com (ignace nyamagana butera) --000000000000bdc4690633dccc7a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Paul, > The Rfc3986\Uri `raw()` methods present a departure from existing userland expectations when working with URIs. No existing URI package that I'm aware of retains the normalized values as their "main" values; the values are generally retained-as-given (i.e. "raw"). Nor do they afford getting two versions of the retained values (one raw, one normalized). As a maintainer of a userland URI package I disagree with this approach. I believe offering both raw and normalized methods in a single class while representing a new approach in PHP also offers a better representation of URIs in general. The current approach in userland mixes both raw and half normalized components as well as RFC3986 and RFC3987 specification with ambiguity around normalization, input, constructior, what needs to be encoded where and when, something this proposal has been successful at avoiding by using the raw and normalized methods. > - fulfill existing userland expectations; Existing userland expectations are mostly built around `parse_url` which is one of the reasons the RFC exists to improve the status quo and to introduce in PHP valid parsers against recognizable URI specifications. Yes some adaptation will be needed to use them in userland but I believe this work is easy to do, talking from the POV of a URI package maintainer. > - replace the toString()/toRawString() with a single idiomatic __toString() in each class; For all the reasons explained in the RFC, adding a `__toString` method is a bad architectural design for an URI. There are so many ways to represent an URI that having a `__toString` for string representation gives a false sense of "there can be only one true representation for a single URI" which is not true. URI can be normalized, raw, and have different representations depending on the context in which it will be used. So again, I believe the RFC made the right call to not implement the Stringable interface to force the developer to make the right call or to encapsulate the value object into a proper URI representational class or method that can use the exposed raw and normalized representation of each component to produce the expected URI representation. > - move normalization logic into the NormalizedUri class. The classes follow specifications that describe how normalization should be. Why would you split the responsibilities in other classes ? What would be the added value ? Again, I understand this is new code and current URI packages, mine included, will have to adapt but on the longer run I believe the proposed API is more predictive and easier to reason about. To quote someone "Comfort and the fear of change are the greatest enemies of success." Best regards, Ignace Nyamagana Butera On Mon, Apr 28, 2025 at 9:53=E2=80=AFPM Paul M. Jones = wrote: > Hi Mat=C3=A9 and all, > > > On Apr 27, 2025, at 16:47, M=C3=A1t=C3=A9 Kocsis wrote: > > > > Hi Tim, > ... > >> So it seems to be safer to use the naming without the `raw` and then i= n > >> the documentation explain what happens with useful examples, just like > >> the RFC already does. > > > > We discussed this off the list, and the recommendation made sense to me > at last. > > I am glad to see it! > > * * * > > Removing the `raw()` methods from the Whatwg\Url class opens up another > opportunity. > > The Rfc3986\Uri `raw()` methods present a departure from existing userlan= d > expectations when working with URIs. No existing URI package that I'm awa= re > of retains the normalized values as their "main" values; the values are > generally retained-as-given (i.e. "raw"). Nor do they afford getting two > versions of the retained values (one raw, one normalized). > > This might be solved by renaming the Rfc3986\Uri methods so that the > "main" methods return the raw values, and the alternative methods return > the normalized versions. For example, getPath() would become > getNormalizedPath(), and getRawPath() would become getPath(). > > But that's pretty verbose, and on considering it further, I think I think > there are two classes combined inside Rfc3986\Uri. > > Proposal: > > Instead of a single Rfc3986\Uri class that tries to hold *both* raw *and* > normalized values and logic at the same time, introduce a NormalizedUri > class to operate with normalized values, and treat the current Uri class = as > operating with raw values. That would, among other things: > > - fulfill existing userland expectations; > - eliminate the getRaw() methods; > - replace the toString()/toRawString() with a single idiomatic > __toString() in each class; > - move normalization logic into the NormalizedUri class. > > Optionally, there could be one additional method one or both classes, > toNormalizedUri(), to create and return a normalized instance. For Uri th= e > return would be a new NormalizedUri; for NormalizedUri, the return would > either be itself ($this) or a clone of itself. > > If the RFC pursues that approach, it will also lend itself to either an > abstract they each extend or (preferably) an interface they each implemen= t. > If an interface, I opine it should be called Uri; the current Uri class > might become RawUri (with NormalizedUri not needing a rename). > > Thoughts? > > > -- pmj > --000000000000bdc4690633dccc7a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Paul,

> The Rfc3986\Uri `raw()` m= ethods present a departure from existing userland expectations when working= with URIs. No existing URI package that I'm aware of retains the norma= lized values as their "main" values; the values are generally ret= ained-as-given (i.e. "raw"). Nor do they afford getting two versi= ons of the retained values (one raw, one normalized).

As= a maintainer of a userland URI package I disagree with this approach. I be= lieve offering both raw and normalized methods in a single class while repr= esenting a new approach in=C2=A0 PHP also offers a better representation of= URIs in general. The current approach in userland mixes both raw and half= =C2=A0normalized components as well as RFC3986 and RFC3987 specification wi= th ambiguity=C2=A0around normalization, input, constructior, what needs to = be encoded where and when, something this proposal has been successful at a= voiding by using the raw and normalized methods.

=
> - fulfill existing userland expectations;

Existing userland expectations are mostly built around `parse_url` which = is one of the reasons the RFC exists to improve the status quo and to intro= duce in PHP valid parsers against recognizable URI specifications. Yes some= adaptation will be needed to use them in userland but I believe this work = is easy to do, talking from the POV of a URI package maintainer.
=
> - replace the toString()/toRawString() with a single id= iomatic __toString() in each class;

For all the re= asons explained in the RFC, adding a `__toString` method is a bad architect= ural design for an URI. There are so many ways to represent an URI that=C2= =A0 having a `__toString` for string representation gives a false sense of = "there can be only one true representation for a single URI" whic= h is not true. URI can be normalized, raw, and have different representatio= ns depending on the context in which it will be used. So again, I believe t= he RFC made the right call to not implement the Stringable interface to for= ce the developer to make the right call or to encapsulate the value object = into a proper URI representational class or method that can use the exposed= raw and normalized representation of each component to produce the expecte= d URI representation.

> - move normalization lo= gic into the NormalizedUri class.
The classes follow=C2=A0 specif= ications that describe how normalization should be. Why would you split the= responsibilities in other classes ? What would be the added value ?=C2=A0<= /div>

Again, I understand this is new code and current U= RI packages, mine included, will have to adapt but on the longer run I beli= eve the proposed API is more predictive and easier to reason about.=C2=A0To= quote someone "Comfort and the fear of change are the greatest enemie= s of success."

Best regards,
Ignace= Nyamagana Butera


On Mon, Apr 28,= 2025 at 9:53=E2=80=AFPM Paul M. Jones <pmjones@pmjones.io> wrote:
Hi Mat=C3=A9 and all,

> On Apr 27, 2025, at 16:47, M=C3=A1t=C3=A9 Kocsis <kocsismate90@gmail.com> w= rote:
>
> Hi Tim,
...
>> So it seems to be safer to use the naming without the `raw` and th= en in
>> the documentation explain what happens with useful examples, just = like
>> the RFC already does.
>
> We discussed this off the list, and the recommendation made sense to m= e at last.

I am glad to see it!

* * *

Removing the `raw()` methods from the Whatwg\Url class opens up another opp= ortunity.

The Rfc3986\Uri `raw()` methods present a departure from existing userland = expectations when working with URIs. No existing URI package that I'm a= ware of retains the normalized values as their "main" values; the= values are generally retained-as-given (i.e. "raw"). Nor do they= afford getting two versions of the retained values (one raw, one normalize= d).

This might be solved by renaming the Rfc3986\Uri methods so that the "= main" methods return the raw values, and the alternative methods retur= n the normalized versions. For example, getPath() would become getNormalize= dPath(), and getRawPath() would become getPath().

But that's pretty verbose, and on considering it further, I think I thi= nk there are two classes combined inside Rfc3986\Uri.

Proposal:

Instead of a single Rfc3986\Uri class that tries to hold *both* raw *and* n= ormalized values and logic at the same time, introduce a NormalizedUri clas= s to operate with normalized values, and treat the current Uri class as ope= rating with raw values. That would, among other things:

- fulfill existing userland expectations;
- eliminate the getRaw() methods;
- replace the toString()/toRawString() with a single idiomatic __toString()= in each class;
- move normalization logic into the NormalizedUri class.

Optionally, there could be one additional method one or both classes, toNor= malizedUri(), to create and return a normalized instance. For Uri the retur= n would be a new NormalizedUri; for NormalizedUri, the return would either = be itself ($this) or a clone of itself.

If the RFC pursues that approach, it will also lend itself to either an abs= tract they each extend or (preferably) an interface they each implement. If= an interface, I opine it should be called Uri; the current Uri class might= become RawUri (with NormalizedUri not needing a rename).

Thoughts?


-- pmj
--000000000000bdc4690633dccc7a--