Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126111 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id AA6AC1A00BD for ; Thu, 5 Dec 2024 21:49:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1733435209; bh=/BoCRryu8db3cumF+DMUZGrFCQeDURE0QD5X+PH2F+g=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=TrECzN39nUaaVs/P9znsHp/QV+jhU0Hf35HK1SZDpyuIfJjxu2j/ZJKyI7rvMqHt9 Z2GykzI7BOAtnSd9FRf7AWEcAgdebA457nHNhZCsonEZUvPHEdkQEyB1UKd/Zgl8/F M7qVrK6yB9pF4fSBeo9wO7E4z3avr42+VGFjFmkkEMvAbfbW7zv+sDTLvYcTD5+zSB 7SFFTeqMzohblLhox9gppLvmcW3HDKXJI8zStshREMW53wkWIW3XWhhR4LbFGqMH75 L/B4dK+6bXVWOSZNMVko3/bxTs9581Z6r9lwYJNgkj5d5UIw0TXbJgUtZEiogcpEK8 mTArZtl0BEoAQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id C82BA180056 for ; Thu, 5 Dec 2024 21:46:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 5 Dec 2024 21:46:48 +0000 (UTC) Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-466897883dcso10236041cf.2 for ; Thu, 05 Dec 2024 13:49:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733435395; x=1734040195; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/BoCRryu8db3cumF+DMUZGrFCQeDURE0QD5X+PH2F+g=; b=Wob6Ll0jFF+uhKg8aeJ759EyUtkvwk+80TH2dwVjpU8QZ90+wwQrU75U4EARdqCWfy Tt1UQVoh2iZHprHxeFrfjSqAtO9Al9JLfrUwr6wIwG24P7QXXdjf7xaPW7+HOu0jnifP ljMGV6G/VzJEh9+xZ9W9s7FGuW6Mk4+WWkJQdnEi5j6cz667uSHBwU+E/9iJ4L5fT2aU AXlR9HxEtHEbnN0bYHNn1xxEhqCWacn9v1rXSfwtYmCqTSi+J/gFZo3YzI/jUlLLHBjP wxZvDqp5XucodImmFCrURrZZQXm3DYt14Kl1sAMzdfDJp10rIyK2w91MCGT9sYt4D1y+ Wgvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733435395; x=1734040195; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/BoCRryu8db3cumF+DMUZGrFCQeDURE0QD5X+PH2F+g=; b=qt0DW+yJYjf+bdKYpL+2LYBVvVnH/nDi6rc5hi5j52Xr1QiMZt0rkMiZmkTxmyNxjq 2688704m1ksBW0azFLaFhxqaCjLtDXrtx3m1+7Im/p/zeD1X0sz7L7+9vvVoBCULPj2G dGXRVH3UXypo0tG2NhO8QNyablwZI+HZtbeWlvzFXwpH7HONbbIBK1bBCjCx6y+xmVOH 6/nw40lBNCrhsHYsk2lPVhHdRy7YRQnR49k14XYjOoPMToOwQ9kTlcFvvOHW+SRm+o6c xU1ACJ1OQfppXhINbheIn2lsfiS5paG00WGi4wNbKWGbPBDobgwgnq8iWuJiP1RxQdjK Ck/Q== X-Forwarded-Encrypted: i=1; AJvYcCW3cAGbYF0Ea48j9xX1KGYvRxHEs+jW1pdRicAgPEiwtW/DX15jCm00zHWKXi8vKKNYbuUQYD8OMZg=@lists.php.net X-Gm-Message-State: AOJu0YwGFF65B33/WhwV2sQI9qrVuboaDT/08qGS68RTP8PMWQwCbBLl b0Ps8pVsNc83Cuns1MPd5iqiKoJrOqWxLxYf7Y+xXXWdnZecWIqLFFgFDLLO2JEedNwzxGC9cz3 MY8+nr24yqT3I649gz8cYWHV6B0CmHKE6 X-Gm-Gg: ASbGncsoDI9VJ8QbqfadxLoTaPxRSGk4MyTY+knyvL83S+XCRyxT8/dm2cmhuw2OBoS CahDRQYIGj/5idDcLlgg0/0c/xLbV2tc= X-Google-Smtp-Source: AGHT+IETcex7hX4YgGz/P/b/1naXzXwvwEA5XUec1LNDA75jQqOer4O4dmBOW3PSp7PdlsAp4xO45LUaEGjXt4j67ZU= X-Received: by 2002:a05:622a:1a26:b0:45d:82a0:5028 with SMTP id d75a77b69052e-46734ca2bcfmr10867571cf.1.1733435394880; Thu, 05 Dec 2024 13:49:54 -0800 (PST) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <57d09ef6591a4c70e60569b9de7ce330@bastelstu.be> In-Reply-To: <57d09ef6591a4c70e60569b9de7ce330@bastelstu.be> Date: Thu, 5 Dec 2024 22:49:43 +0100 Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: ignace nyamagana butera , Niels Dossche , internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000f3fce506288ce0c2" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --000000000000f3fce506288ce0c2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Tim, Thanks for your feedback! > The RFC is not listed in the overview page: https://wiki.php.net/rfc Uh, indeed! I've just fixed it. > 2. > > I agree with Dennis' remark that the `Rfc3986Uri` and `WhatWgUri` > classes must be final. The RFC makes the argument that: > > > Having separate classes for the two standards makes it possible to > > indicate explicit intent at the type level that one specific standard > > is required. > > Developers extending the classes could accidentally violate the > respective standard, which nullifies the benefit of making invalid > states unrepresentable at the type-level. > On the one hand, I also have some concerns about making these classes final or non-final as you probably saw in my last email (the concern came up with a question about an implementation detail: https://github.com/php/php-src/pull/14461#discussion_r1847316607). On the other hand though, if someone overrides an URI implementation, then I assume there's definitely a purpose for doing so (i.e. the child class has additional capabilities, or it can handle additional protocols). If developers cannot achieve this via inheritance, then they will do so otherwise (by using composition, or putting the custom logic in a helper class etc.). It's just not realistic to prevent logical bugs by making classes final. I would rather ask whether it's possible to make the 2 built-in URI implementations having quite some special internal behavior behave consistently with userland classes, even if they are overridden? For now, the answer seems to be yes (especially after hearing Niels' solution in the GitHub thread linked above), but of course new issues may arise later which we don't know about yet. And of course, it's much easier to make a class final first and relax the inheritance rules later, than the other way around... So these are the only reasons why I'd make the classes final, but otherwise it would be useful to be able to extend them. > > This also means that the return type of the =E2=80=9Cwithers=E2=80=9D sho= uld be `self` > instead of `static`, which also means that the =E2=80=9Cwithers=E2=80=9D = in the > interface must be `self`. Perhaps this means that they should not exist > on the interface at all. `DateTimeInterface` only provides the getters, > likely for a similar reason. > Using the `self` return type over `static` would be counterproductive in my opinion: it's mostly because static is the correct type semantically, and it can be useful for forward compatibility later if we once want to remove the final modifier. Regarding the analogy with DateTimeInterface, I think this one is wrong: the ext/uri API is completely immutable, while ext/date has the mutable DateTime implementation, so it's not possible to include setters in the interface, otherwise one couldn't know what to expect after modification. I believe the `UriException` class as the base exception should not be > `abstract`. There is no real benefit to it, especially since it doesn't > specify any additional abstract methods. > I have no hard feelings regarding this. If I make it a concrete class, then likely implementations will start to throw it instead of more specific subclasses. That's probably not an issue, people are not usually interested in the exact reason of an exception. Since ext/date also added a generic parent exception (DateError) recently which wasn't abstract, then I'm fine with doing the same with ext/uri. > I'm not sure I like the `Interface` suffix on the `UriInterface` > interface. Just `Uri\Uri` would be equally expressive. > Yes, I was expecting this debate :) To be honest, I never liked interfaces without an "Interface" suffix, and my dislike didn't go away when I had to use such an interface somewhere, because it was difficult for me to find out what the symbol I was typing actually referred to. But apart from my personal experiences, I prefer to stay with "UriInterface" because the 2 most well known internal PHP interfaces also have the same suffix (DateTimeInterface, SessionInterface), and this name definitely conveys that people should not try to instantiate it. I am not sure about the `*User()` and `*Password()` methods existing on > the interface. As the RFC acknowledges, RFC 3986 only specifies a > =E2=80=9Cuserinfo=E2=80=9D segment. Should the `*User()` and `*Password()= ` methods > perhaps be specific to the `WhatWgUri` class? > Really good question, and I hesitated a lot about the same (even in some of my messages to the mailing list). In fact, RFC 3986 has some notion of user/password, because the specificati= on mentions the "user:password" format as deprecated [in favor of passing authentication information in other places]. So I think the `*User()` and `*Password()` methods are legitimately part of the interface. And it's not even without precedent to have them in an interface: PSR-7 made use of the "user" and "password" notions in the `UriInterface::withUserInfo()` method which accepts a `$user` and a `$password` parameter. I know people on this list generally don't like PSR-7, but t would be useful to know why PHP FIG chose to use these two parameters= . Due to the reasons above, the question for me is really whether we want to add the `*UserInfo()` methods to the interface or at least to Uri/Rfc3986Uri. Since WhatWg doesn't even mention user info (apart from "userinfo percent-encode set" which refers to something else), I'd prefer not to add the methods in question to Uri/UriInterface. If people insist on it, then I'm fine to add the methods to Uri\Rfc3986Uri though. I disagree with this change and believe that with the current > capabilities of PHP the out-parameter is the correct API design choice, > because then the =E2=80=9Cfailure=E2=80=9D case would be returning a fals= y value, which > IMO is pretty idiomatic PHP: Yes, I can live with any of the solutions, I'm just not sure which is less bad. :) If only we had out parameters... But wishful thinking aside, I am fine with whatever the majority of people prefer. Probably being able to unify the API of the two implementations is a good argument no one thought about so far for using passing by reference... Regards, M=C3=A1t=C3=A9 --000000000000f3fce506288ce0c2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tim,=

Thanks for your feedback!
=C2=A0
The RFC is not listed in the overview page: https://wiki.php.net/rfc

Uh, indeed! I've just fixed it.=C2=A0
=C2=A0
2.

I agree with Dennis' remark that the `Rfc3986Uri` and `WhatWgUri`
classes must be final. The RFC makes the argument that:

> Having separate classes for the two standards makes it possible to > indicate explicit intent at the type level that one specific standard =
> is required.

Developers extending the classes could accidentally violate the
respective standard, which nullifies the benefit of making invalid
states unrepresentable at the type-level.

On the one hand, I also have some concerns about making these classes fi= nal or non-final
as you probably saw in my last email (the concer= n came up with a question about an=C2=A0implementation
= detail:=C2=A0https://github= .com/php/php-src/pull/14461#discussion_r1847316607). On=C2=A0the other = hand though,
if som= eone overrides an URI implementation, then I assume=C2=A0there's definitely a purpose for doing
so (i.e. the child clas= s has additional capabilities, or it can handle additional protocols). If d= evelopers cannot
ac= hieve this via inheritance, then they will do so otherwise (by using compos= ition, or putting the custom logic
in a helper class etc.). It's just not=C2=A0realistic to prevent logical bugs by mak= ing classes final.
=
I would rather= ask whether it's possible to make the 2 built-in URI implementations h= aving
quite some sp= ecial internal behavior behave consistently with userland classes, even if = they are overridden?
For now, the answer seems to be yes (= especially after hearing Niels' solution in the GitHub thread linked ab= ove),
but of course new issues may arise later which we don't= know about yet. And of course, it's much easier to make
a cl= ass final first and relax the inheritance rules later, than the other way a= round... So these are the only reasons
why I'd make the class= es final, but otherwise it would be useful to be able to extend them.
=

This also means that the return type of the =E2=80=9Cwithers=E2=80=9D shoul= d be `self`
instead of `static`, which also means that the =E2=80=9Cwithers=E2=80=9D in= the
interface must be `self`. Perhaps this means that they should not exist on the interface at all. `DateTimeInterface` only provides the getters, likely for a similar reason.

Using the = `self` return type over `static` would be counterproductive in=C2=A0my=C2= =A0opinion:
it's mostl= y=C2=A0because static=C2=A0is the correct type=C2=A0semantically,=C2=A0and it can be useful for
forward compatibility later=C2=A0if we once want to remove the final modifier.=

Regarding the analogy with DateTimeInte= rface, I think this one is=C2= =A0wrong: the ext/uri API is
completely=C2=A0imm= utable, while ext/date=C2=A0h= as the mutable DateTime implementation,
so it's not possible=C2=A0to include setters=C2=A0in the interface, otherwise one couldn't know
what to=C2=A0expect after= =C2=A0modification.

I believe the `UriException` class as t= he base exception should not be
`abstract`. There is no real benefit to it, especially since it doesn't=
specify any additional abstract methods.

I have no hard feelings regarding this. If I make it a concrete class, th= en likely
implementations will start to throw it instead of more = specific subclasses. That's
probably not an issue, people are= not usually interested in the exact reason of an=C2=A0exception.
Since ext/date also added a generic parent exception (DateError) recently = which wasn't abstract,
then I'm fine with doing the same = with ext/uri.
=C2=A0
I'm not sure I like the `Interface` suffix on the `UriInterface`
interface. Just `Uri\Uri` would be equally expressive.

Yes, I was expecting this debate :) To be honest, I never l= iked interfaces without an "Interface"
suffix, and my d= islike didn't go away when I had to use such an interface somewhere, be= cause it
was difficult for me to find out what the symbol I was t= yping actually referred to. But apart from my personal
experience= s, I prefer to stay with "UriInterface" because the 2 most well k= nown internal PHP interfaces
also have the same suffix (DateTimeI= nterface, SessionInterface), and this name definitely conveys=C2=A0that
people should not try to instantiate it.

I am not sure about the `*User()` and `*Password()` methods existing on the interface. As the RFC acknowledges, RFC 3986 only specifies a
=E2=80=9Cuserinfo=E2=80=9D segment. Should the `*User()` and `*Password()` = methods
perhaps be specific to the `WhatWgUri` class?

Really good question, and I hesitated a lot about the same (even in = some of my messages to the mailing list).
In fact, RFC 3986 has some notion of user/password, because= the=C2=A0specification menti= ons the=C2=A0"user:passw= ord"
format as= deprecated [in favor of passing authentication information in other places= ]. So I think the `*User()` and
`*Password()` methods are legitimately part of the interface. = And it's not even without precedent to have them in
an interface:=C2=A0PSR-7 made=C2=A0use of the "user" and "password" notions= in the=C2=A0`UriInterface::<= /span>withUserInfo()` method<= /div>
which=C2=A0accepts=C2=A0a `$user`=C2=A0and a `$password` parameter. I know people on=C2=A0this list generally don't like PSR-7,
but=C2=A0t would be useful=C2=A0to know why PHP FIG chose to use=C2=A0these two param= eters.

<= /div>
Due to the reasons above, the question for me is really whether w= e want to add the `*UserInfo()` methods to the
interface or at le= ast to=C2=A0Uri/Rfc3986Uri. Since Wh= atWg doesn't even mention user info (apart from "userinfo
percent-encode set" wh= ich=C2=A0refers to something = else), I'd prefer not to add the=C2=A0methods in question to=C2=A0Uri/UriInterface.
If people insist on it, then I'= m fine to add the methods to Uri\Rfc3986Uri though.

I disagree with this change and believe that with the= current
capabilities of PHP the out-parameter is the correct API design= choice,
because then the =E2=80=9Cfailure=E2=80=9D case would be return= ing a falsy value, which
IMO is pretty idiomatic PHP:
<= br>
Yes, I can live with any of the solutions, I'm just not s= ure which is less bad. :) If only we had out parameters... But wishful
thinking aside, I am fine with whatever the majority of people prefer= . Probably being able to unify the API of the two
implementations=C2=A0is a good argument no one thought about so far for using pa= ssing by reference...

Regards,
M= =C3=A1t=C3=A9
--000000000000f3fce506288ce0c2--