Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126703 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 778431A00BC for ; Mon, 10 Mar 2025 22:52:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1741646966; bh=44/6nJawUsPCzIRjkiMXldQrtVgtqEUld+vSucxP5DI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=aQ95B3qe4R8zUbqZRoxa2l2fazsAtMxpwbYC2h//yjwLYKKS7xhHXZPSW7gXdASc7 RSdZ/80UD89ld8plRYnzvioI7FL/GtTlVv4ZPkoH4DyaGe1dLRrHfvXsvygyOMr6vX RYmZFNs3PXCy7Pi1ekmkxw3XSrosuuSyKoxz1jLpRpOXOLqQwJyQTHrcdOr4qZE5As ALcmd0Rgs3BV5G4Huwq3mp/D0MuzEizi2Yg7agieC6hi59HjQNoeHr31Kr9oPeHmdY KePB7dph9IwxrE6lZIH1hdpkorTl4Pxd9tSVGdL9KfYf+RmmP92+zja+eGfHYCBTy2 9RJZ0RMBVQtyQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9BB1418004B for ; Mon, 10 Mar 2025 22:49:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 10 Mar 2025 22:49:24 +0000 (UTC) Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-476964b2c1dso11713961cf.3 for ; Mon, 10 Mar 2025 15:51:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741647118; x=1742251918; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=44/6nJawUsPCzIRjkiMXldQrtVgtqEUld+vSucxP5DI=; b=S8Cxi/tamXev2cqo0QUYtn6e1bTB0c7m7qei98d2WF3MDNTrccHekFnp8g7WqYee+s UWZs0M1owTWXX5rrAi1Op7jWgIPr1Hgtdk529cnYuq/K9ULa+NnYpaFiZWFP21c3vvgn +TSztvS9vB69wIg1foRwv21GIE1d955Wfc9LcjYO7k+h4Jt+10rYjaeFM4Wi/S+VenNh PS3bMpo9+vl00+L54FJwRzYyKEcJ1GpESCO/8n1rZj1V9g7v4jqElTQGF0frIMJIjFC8 3y79jGqkPNL8f8VOocrxtrrqIHTaxzeslR1I2jk83ZPSA5bcc1URsqL/zi7ODc3EVylS 3kLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741647118; x=1742251918; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=44/6nJawUsPCzIRjkiMXldQrtVgtqEUld+vSucxP5DI=; b=whZnHHEaEeUOsJ3ompNev4UN1eKFz30E/y+0zOYzBr+w0UHFVdqZovUKMKGfQcbOZh txj2QwAeCdYYXEBoanMFnsgsF3v40t2WzyDZwf7Fl3PubT4h5CfWAWqSgFK7KF3W79sY BScDrTrFQElcUaJhZQuuuWHKoLl3C26q0gSOCDp3wHrg1r+W8xj3jQn9Pwolj5qmuym3 mJEnoHLOLfOr91jtq9nqI5McgYkH8yGXe2wXKb302IvjuilYRnjCY0Mgq5++txuypn11 TTWGlqOnSHNty57j/ZbmEgc98OB4xpNmcb1/k2Dr9RlUHhk+M+wKgJ3sGbJQ8Z32+Qab xSkw== X-Gm-Message-State: AOJu0YyfncgbgcIUWp5oEDf+Zwbhm/Ot6h3oZi5F3YVrdQItKmTXUWXw ZoHh8h3bBT7kMUK9m+ZUWVSBsHDiWnYQkzyTLcou7yQcOsk0NkW7E/aOqejZvEk8GNwsrIuLRKL JpoPr1enEyVghta+wq5iVD7XpoAZ/K42X3oLBXA== X-Gm-Gg: ASbGnct/UCo42GPpZq1BgbeS2B0lUnOdwLPFxFeDfm7SBcRLx5ZIabDTpvDow0rWF1Z X3BKOtGO0gM5M7L8mQ2nfnSTQA/KCcohGZtFqjOOljGseWH+hIRv5SIThlpG1V76c84D/58h1Qi urSs2zMHKzMp8rQps8hvJ75+kjvg== X-Google-Smtp-Source: AGHT+IEupXQELVslnM1Bcx8UsrDEqT0gG3Q969NcyU7gLptcIuU54lP4s04wh/g/kr94LdyUOsR0sk5XGPck0Ed/VCM= X-Received: by 2002:a05:622a:451:b0:476:8e88:6632 with SMTP id d75a77b69052e-4768e8868fdmr73457671cf.29.1741647117533; Mon, 10 Mar 2025 15:51:57 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: In-Reply-To: Date: Mon, 10 Mar 2025 23:51:45 +0100 X-Gm-Features: AQ5f1JpYowvnprbNMwTNu8jBTDH899zWfgjm4yODnxKK5dME2XFSzl-hpvljb0s Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: "Gina P. Banyard" Cc: PHP Internals List Content-Type: multipart/alternative; boundary="000000000000c3c521063004d19d" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --000000000000c3c521063004d19d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Gina, 1. > The paragraph in at the beginning of the RFC in the > Relevant URI > specifications > WHATWG URL section seems to be incomplete. > Hopefully it's good now. Although I know this section doesn't include much information. 2. > I don't really understand how the UninitializedUriException exception can > be thrown? > Is it somehow possible to create an instance of a URI without initializin= g > it? > This seems unwise in general. > I think I've already answered this since then in my previous email (and in the RFC as well), but yes, it's possible via reflection. I don't really have an idea how this possibility could be avoided without also making the classes final. > > 3. > I'm not really convinced by using the constructor to be able to create a > URI object. > I think it would be better for it to be private/throwing and have two > static constructor `parse` and `tryParse`, > mimicking the API that exists for creating an instance of a backed enum > from a scalar. > I'm not completely against using parse() and tryParse(), but I think the constructor already makes it clear that it either returns a valid object or throws. > > 4. > I think changing the name of the `toString` method to `toRawString` bette= r > matches the rest of the proposed API, > and also removes the question as to why it isn't the magic method > `__toString`. > For RFC 3986, we could go with toString() instead of toNormalizedString() and toRawString() instead of toString() so that we use the same convention as for getters. Recently I learnt that for some reason WHATWG normalizes the IP address *during* component recomposition, so its toString() is not really the most rare (at least not in the same way as "raw getters" are). So for WHATWG, I think keeping toString() and toDisplayString() probably still makes sense. > > 5. > I will echo Tim's concerns about the non-final-ity of the URI classes. > This seems like a recipe for disaster. > I can _maybe_ see the usefulness of extending Rfc3986\Uri by a subclass > Ldap\Uri, > but being able to extend the WhatWg URI makes absolutely no sense. > The point of these classes is that if you have an instance of one of > these, you *know* that you have a valid URI. > Being able to subclass a URI and mess with the `equals`, `toString`, > `toNormalizedString` methods throws away all the safety guarantees provid= ed > by ***possessing*** a Uri instance. > I'm sure that people will find their use-cases to subclass all these new classes, including the WHATWG implementation. As Nicolas mentioned, his main use-case is minly adding convenience and new factory methods that don't specifically need all methods to be reimplemented. While I share your opinion that leaving the URI classes open for extension is somewhat risky and it's difficult to assess its impacts right now, I can also sympathise with what Nicolas wrote in a later message ( https://externals.io/message/123997#126489): that we shouldn't close the door for the public from using interchangeable implementations. I know that going final without any interfaces is the most "convenient" for the PHP project itself, because the solution has much less BC surface to maintain, so we are relatively free and safe to make future changes. This is useful for an API in its early days that is huge like this. Besides the interests of the maintainers, we should also take two important things into account: - Heterogeneous use-cases: it's out of question that the current API won't fit all use-cases, especially because we have already identified some followup tasks that should be implemented (see "Future Scope" section in the RFC). - Interoperability: Since URI handling is a very widespread problem, many people and libraries will start to use the new extension once it's available. But because of the above reason, many of them want to use their own abstraction, and that's exactly why a common ground is needed: there's simply not a single right possible implementation - everyone has their own, given the complexity of the topic. So we should try to be considerate about these factors by some way or another. So far, we have four options: - Making the classes open for extension: this solution has acknowledged technical challenges ( https://github.com/php/php-src/pull/14461#discussion_r1847316607), and it limits our possibilities of adding changes the most, but users can effectively add any behavior that they need. Of course, they are free to introduce bugs and spec-incompatible behavior into their own implementation, but none of the other solutions could prevent such bugs either, since people will write their custom code wherever they can: if they can't have it in a child class, then they will have in MyUri, or in UriHelper, or just in a 200 lines long function. Being able to extend the built-in classes also means that child classes can use the behavior of their parent by default - there's no need to create wrapper classes around the built-in ones (aka using composition), that is a tedious task to implement, and also which would incur some performance penalty because of the extra method calls. - Making the classes open for extension, but making some methods final: same benefits as above, without the said technical challenges - in theory. I am currently trying to figure out if there is a combination of methods that could be made final so that the known challenges become impossible to be triggered - although I haven't managed to come up with a sensible solution yet. - Making the classes final: It avoids some edge-cases for the built-in classes (the uninitialized state most prominently), while it leaves the most room for making future changes. Projects that may want to ship their own abstractions for the two built-in classes can use composition to create their own URI implementations. They can instantiate these implementations however they want to (i.e. $myUri =3D new MyUri($uri)). If they need to pass an URI to other libraries then they could extract the wrapped built-in class (i.e. $myUri->getUri()). On the flipside, backporting methods added in future PHP versions (aka polyfills) will become impossible to implement for URIs according to my knowledge, as well as mocking in PHPUnit will also be a lost feature (I'm not sure if it's a good or a bad thing, but it may be worth to point out). Also, the current built-in implementations may have alternative implementations that couldn't be used instead of them. For example, the ADA URL library (which is mentioned in the RFC) also implements the WHATWG specification - possibly the very same way as Lexbor, the currently used library - does. These alternative implementations may have different performance characteristics, platform requirements, or level of maintenance/support, which may qualify them as more suitable for some use-cases than what the built-in ones can offer. If we make these classes final, there's no way to use alternative implementations as a replacement for the default ones, although they all implement the same specification having mostly clear semantics. - Making the classes final, but adding a separate interface for each: The impact of making the built-in classes final would be mitigated by adding one interface for each specification (I didn't like this idea in the past, but it now looks much more useful in the perspective of the final vs non-final debate). Because of the interfaces, there would be a common denominator for the different possible implementations. I'm sure that someone would suggest that the community (aka PHP-FIG) should come up with such an interface, but I think we shouldn't expect someone else to do the work when *we* are in the position to do it the best, as those interfaces should be internal ones, since the built-in URI classes should also implement them. If we had these interfaces, projects could use whatever abstraction they want via composition, but they could more conveniently pass around the same object everywhere. I intentionally don't try to draw a conclusion for now, first of all because it already took me a lot of time to try to mostly objectively compare the different possibilities, and I hope that we can find more pros-cons (or fix my reasonings if I made mistakes somewhere) in order to finally reach some kind of consensus. > Similarly, I don't understand why the WhatWgError is not final. > Even if subclassing of the Uri classes is allowed, any error it would hav= e > would not be a WhatWg one, > so why should you be able to extend it. > I made it final now. Thank you for your comments: M=C3=A1t=C3=A9 --000000000000c3c521063004d19d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Gina,

=
1.
The paragraph in at the beginning of the RFC in the &= gt; Relevant URI specifications > WHATWG URL section seems = to be incomplete.

Hopef= ully it's good now. Although I know this section doesn't include mu= ch information.

2.
I don't really underst= and how the UninitializedUriException exception can be thrown?=
Is it some= how possible to create an instance of a URI without initializing it?
<= div style=3D"font-family:Arial,sans-serif;font-size:14px">This seems unwise= in general.

I think I've already= answered this since then in my previous=C2=A0email (and in the RFC as well= ), but yes, it's possible via reflection.
I don't really = have an idea how this possibility could be avoided without also making the = classes final.
=C2=A0

3.
I'm not really co= nvinced by using the constructor to be able to create a URI object.
I think it would b= e better for it to be private/throwing and have two static constructor `par= se` and `tryParse`,
mimicking the API that exists for creating an instance of a backed= enum from a scalar.

I'm not comp= letely against using parse() and tryParse(), but I think the constructor al= ready makes it clear that it either returns
a valid object or thr= ows.

4.
I think changing the name of the `toString` meth= od to `toRawString` better matches the rest of the proposed API,
and also removes the = question as to why it isn't the magic method `__toString`.

For RFC 398= 6, we could go with toString() instead of toNormalizedString() and toRawStr= ing() instead of toString() so that we use
the same convention as for getters.=C2=A0

Recently I learnt=C2=A0that= for some reason WHATWG normalizes the IP address *during* component recomp= osition, so=C2=A0its=C2=A0toS= tring()=C2=A0is
<= div>not really the most rare (at lea= st not in the same way as "raw getters" are). So for WHATWG, I th= ink keeping toString() and
toDisplayString() probably still makes sense.
=C2= =A0

5.
I will echo Tim's concerns about the non-fina= l-ity of the URI classes.
This seems like a recipe for disaster.
I can _maybe_ see the usefulness= of extending Rfc3986\Uri by a subclass Ldap\Uri,
but being able to exten= d the WhatWg URI makes absolutely no sense.
The point of these classes is that if you = have an instance of one of these, you *know* that you have a valid URI.
Being able to = subclass a URI and mess with the `equals`, `toString`, `toNormalizedString`= methods throws away all the safety guarantees provided by ***possessing***= a Uri instance.

I'm sure that pe= ople will find their use-cases to subclass all these new classes, including= the WHATWG implementation. As Nicolas mentioned,
his main use-ca= se is minly adding convenience and new factory methods that don't speci= fically need all methods to be reimplemented.

Whil= e I share your opinion that leaving the URI classes open for extension is s= omewhat risky and it's difficult to assess its impacts right now, I can also
sympathise with what Nicolas wrote in a later m= essage (https://externals.io/message/123997#12648= 9): that we shouldn't= =C2=A0close the door for the = public from
using i= nterchangeable implementations.

I know that= going final without any interfaces is the most "convenient" for = the PHP project itself, because the solution has much less BC surface to ma= intain,
so we are relatively free and=C2=A0safe to make future changes. This is useful for an API in = its early days that is huge like this. Besides the interests of the maintai= ners,
we should=C2= =A0also take two important th= ings into account:
=
-=C2=A0= Heterogeneous=C2=A0use-cases: it'= ;s out of question that the current API won't fit all use-cases, especi= ally because we have already identified some followup tasks
that should be implemented (see &q= uot;Future Scope" section in the RFC).
- Interoperability: Since URI handling is a very widespread problem, many peopl= e and libraries will start to use the=C2=A0new extension once it's=C2= =A0available. But because
of the=C2=A0above reas= on, many of them want=C2=A0to use their own abstraction, and that's exa= ctly why a common ground is needed:=C2=A0there's simply not a single right possible
implementation=C2=A0- everyone=C2=A0has their own, given the com= plexity of the topic.

So we shoul= d=C2=A0try to be considerate = about these factors by some way or another.=C2=A0So far, we have four options:

- Making the classes open for extension: this solution has acknowledged = technical challenges (https://github.com/php/php-src/pull/14461#discussio= n_r1847316607),
and=C2=A0it limits our possibilities of adding c= hanges the most, but users can effectively add any behavior that they need.= Of course, they are free to introduce bugs and
spec-incompatible behavior=C2=A0into their= own implementation, but none of the other solutions could prevent such bug= s either, since people will write their custom code
wherever= they can: if they can't have it in a child class, then they will have = in MyUri, or in UriHelper, or just in a 200 lines long function.

Being able to extend the built-in classes also means= that child classes can use the behavior of their parent by default - there= 's no need to create wrapper
classes around the built-in = ones (aka using composition), that is a tedious task to implement, and also= which would incur some performance penalty because of the
ex= tra=C2=A0method calls.=

- Making the classes open for extension, but making some m= ethods final: same benefits as above, without the said technical challenges= - in theory. I am currently
try= ing to figure out if there is a combination of methods that could be made= =C2=A0final so that the known challe= nges become impossible to=C2=A0be=C2=A0triggered - although I haven't
managed to come up with a sensible solution yet.
<= div style=3D"color:rgb(191,191,191)">
- Making the classes final:= It avoids some edge-cases for the built-in classes (the uninitialized stat= e most prominently), while it leaves the most room for making future
<= div style=3D"color:rgb(191,191,191)">changes. Projects that may want to=C2=A0ship=C2=A0their= =C2=A0own abstractions for the=C2=A0two built-in classes can use compositio= n to create their own URI implementations.
They can instantia= te these implementations however they want to=C2=A0(i.e. $myUri =3D new MyUri($uri)). If they=C2=A0need to pass an URI to other libraries then they could extrac= t
the wrapped=C2=A0built-in class (i.e. $myUri->getUri()).

On the flipside, backporting methods added in future PHP=C2=A0versions= (aka polyfills)=C2=A0will become=C2=A0impossible to=C2=A0implement for URIs according=C2=A0to my knowle= dge, as well as mocking
<= span style=3D"color:rgb(191,191,191)">in PHPUnit will also be a lost featur= e (I'm not sure if it's a good or a bad thing, but it may be worth = to point out).

=
Also, the current built-in implementa= tions may have alternative implementations that couldn't be used instea= d of them. For example, the ADA URL library (which is mentioned
in the RFC)=C2=A0also implements the WHATWG specification - possibly the very s= ame way as Lexbor, the currently used library - does. These alternative imp= lementations may have
different=C2=A0performance=C2=A0characteristics, platform requirements, or level of maintenance/s= upport, which may qualify them as more suitable for some use-cases than wha= t the built-in
ones can offer.=C2=A0If we make these classes final, there's no way to = use alternative implementations as a replacement for the default ones, alth= ough they all implement the same
specification=C2=A0having mostly clear=C2=A0semantics.

- = Making the classes final, but adding a separate interface for each: The imp= act of making the built-in classes final would be mitigated by adding one i= nterface
for each specification = (I didn't like this idea in the past, but it now looks much more useful= in=C2=A0the=C2=A0perspective of the final vs non-final debate). Because of= the interfaces,
there would be = a common denominator for the different possible implementations. I'm su= re that someone would suggest that the community (aka PHP-FIG)
should come up with such an interface, but I= think we shouldn't expect someone else to=C2=A0do=C2=A0the work=C2=A0<= span style=3D"color:rgb(191,191,191)">when *we*=C2=A0are in the position to= do=C2=A0it the best, as those interfaces
should be internal = ones, since the built-in URI classes should also implement them.

If we had these interfaces, projects could use= whatever abstraction they want via composition, but they could more conven= iently pass around the same object everywhere.

I intentionally don't try to draw a conclusion for now, first of a= ll because it already took me a lot of time to try to mostly objectively co= mpare the different possibilities, and
I hope=C2=A0that we can find more pros-cons (or fix= my reasonings if I made mistakes somewhere) in order to finally reach=C2=A0some kind of consensus.
=C2=A0
Similarly, I don't understand why the = WhatWgError is not final.
Even if subclassing of the Uri classes is allowed, an= y error it would have would not be a WhatWg one,
so why should you be able to extend i= t.

I made it final now.
Thank you for your comments:
M=C3=A1t=C3=A9

--000000000000c3c521063004d19d--