Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124255 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 015221A009C for ; Sun, 7 Jul 2024 10:40:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1720348909; bh=4WmJIHD+C1Ljilr3OnY6hC4wmWk6B/CxxjOtlrqiNxg=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=RIjBrJCI3kV/Aj2N5ldh3Y5wM8xvMgWd7JcGR0YkSDIKzskMYrBUG9mKTwPnI/SUX QWnbO0NTKfqogerzE/BHxI/tzcmpB8Z/LlGY02G/0VbEF2K4Yut9TheR74Lqwxy31j eR80SV4NSN3PfpWHIeWUJJCDYSvbI/q94MoFV7Yt6r/2hxspuzdn01Bt1m5Vn1AHJl iCOFkeNHjt5Tdk6AlAtLl1vUN+4eSXor5KOAZ/INEJDpQbSHHrwsF1sMCUYNtmve0M 22lRVnJbFRfEY94M5XWnJONdrNWrl4M6IjG+h2mOrLnTINwRxR7BHLLwOcgiPfDr9T O8lYZ0p6m7ryw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A5B86180600 for ; Sun, 7 Jul 2024 10:41:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from fout1-smtp.messagingengine.com (fout1-smtp.messagingengine.com [103.168.172.144]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 7 Jul 2024 10:41:48 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfout.nyi.internal (Postfix) with ESMTP id 80F60138028A; Sun, 7 Jul 2024 06:40:23 -0400 (EDT) Received: from imap49 ([10.202.2.99]) by compute1.internal (MEProxy); Sun, 07 Jul 2024 06:40:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:cc:content-type:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1720348823; x= 1720435223; bh=4WmJIHD+C1Ljilr3OnY6hC4wmWk6B/CxxjOtlrqiNxg=; b=U KA1h3VLyvKog9tiNxevXli7J2p182gjtF41dLxinf3xI+lsDwgFnx6GP/uaMcDhb PKQ1CWz4WskRCyw4JLUN69gD/wq2liWDhhoo6qf3QNMVLu7uYNFGZEgNLPCyLg0Q KqMyLt5QthqSavP+9AEbz8sraPYendji/nXfUDnV2r8cOXJ6x36eBIDOZInsp+/i d1Lv38SzAp0axLT7BC/9Y6aY6nk1DzhLXJL+T+wgrlOIlYSaEO6QDzdaiobnLXsx Uh9nwHFeJ4h5ufViAHPvzQxbR3PFA6mp/cKtaCJwqKJBOfsMxChYrDgzUq9YD+db h1Bc52fPwAD1xeFN+YZow== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1720348823; x=1720435223; bh=4WmJIHD+C1Ljilr3OnY6hC4wmWk6 B/CxxjOtlrqiNxg=; b=oAzhRT+iLd9xvjGDDoTR7rTctq+qX4H747/szvRdD4h7 bv8aUqN6U6vk5TnZS5YxNw8qIXfGGlU9o8xYOzt+nko5E0+WFIW0h8YFnjhVi8W5 anbBEERkH4srTG3qyDAcZ7xVzxs5blD2ZdWvvlRDsL8mXlRPADEDXPDo4k3S1++G 3Y6FNfVF3s8K4NKlm4AHEqY1mLfxBrbVhTroEmD8hGlJKYuDO+PZhDp+s6PYPNxR E9DAvs9VXcNW5uFf/ZQauPNjCG/zukTqCciDE0W9kE/Oddr9XtvqzTB2C5QaD7JK LlwBw4W55DVaMTSKEVyZYTQBDUmvA91Y+Q/80DnNUA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrvdehgdefvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenog fuuhhsphgvtghtffhomhgrihhnucdlgeelmdenucfjughrpefofgggkfgjfhffhffvvefu tgesrgdtreerreerjeenucfhrhhomhepfdftohgsucfnrghnuggvrhhsfdcuoehrohgsse gsohhtthhlvggurdgtohguvghsqeenucggtffrrghtthgvrhhnpeduvdevgeejtefhleei hfetvdelgedvleejteeufeduhfeuveehleefleejjeeihfenucffohhmrghinhepfehvge hlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho mheprhhosgessghothhtlhgvugdrtghouggvsh X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id E0FA115A0092; Sun, 7 Jul 2024 06:40:22 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-566-g3812ddbbc-fm-20240627.001-g3812ddbb Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Message-ID: <883a7cc2-c63c-479b-a8be-3a5fdac43c03@app.fastmail.com> In-Reply-To: References: <71a73b87-cc2f-4ee5-a961-7bf2b191fbb6@gmail.com> <5159E0AB-C8B0-4A54-9654-986C1D9C858F@koalephant.com> <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> Date: Sun, 07 Jul 2024 12:40:02 +0200 To: =?UTF-8?Q?M=C3=A1t=C3=A9_Kocsis?= , nyamsprod@gmail.com Cc: internals@lists.php.net, "Stephen Reay" Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API Content-Type: multipart/alternative; boundary=a24729788ef140a1a92eacf04415bffd From: rob@bottled.codes ("Rob Landers") --a24729788ef140a1a92eacf04415bffd Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sun, Jul 7, 2024, at 11:13, M=C3=A1t=C3=A9 Kocsis wrote: > Hi Ignace, >=20 >> As far as I understand it, if this RFC were to pass as is it will mod= el >> PHP URLs to the WHATWG specification. While this specification is >> getting a lot of traction lately I believe it will restrict URL usage= in >> PHP instead of making developer life easier. While PHP started as a >> "web" language it is first and foremost a server side general purpose >> language. The WHATWG spec on the other hand is created by browsers >> vendors and is geared toward browsers (client side) and because of >> browsers history it restricts by design a lot of what PHP developers = can >> currently do using `parse_url`. In my view the `Url` class in >> PHP should allow dealing with any IANA registered scheme, which is not >> the case for the WHATWG specification. >=20 > Supporting IANA registered schemes is a valid request, and is definite= ly useful. > However, I think this feature is not strictly required to have in the = current RFC. > Anyone we needs to support features that are not offered by the WHATWG > standard can still rely on parse_url(). And of course, we can (and sho= uld) add > support for other standards later. If we wanted to do all these in the= same > RFC, then the scope of the RFC would become way too large IMO. That's = why I > opt for incremental improvements. It's also worth pointing out (as another reason not to do this) is that = IANA may-or-may not be valid in the current network. For example, TOR, H= andshake, IPFS, Freenet, etc. all have their own DNS schemes and do not = (usually) use IANA registered schemes, and many people create sites that= cater to those networks. >=20 > Besides, I fail to see why a WHATWG compliant parser wouldn't be usefu= l in PHP: > yes, PHP is server side, but it still interacts with browsers very hea= vily. Among other > use-cases I cannot yet image, the major one is most likely validating = user-supplied URLs > for opening in the browser. As far as I see the situation, currently t= here is no acceptably > reliable possibility to decide whether a URL can be opened in browsers= or not. Looking at the spec for WHATWG, it looks like `example%2Ecom` will be pa= rsed as a valid URL, and transformed to `example.com`, while this doesn'= t currently happen in parse_url(): https://3v4l.org/NtqQm I don't know if that may be an issue, but might be if you are expecting = the string to remain URL encoded. >=20 >> - parse_url and parse_str predates RFC3986 >> - URLSearchParans was ratified before PSR-7 BUT the first implementat= ion >> landed a year AFTER PSR-7 was released and already implemented. >=20 > Thank you for the historical context! >=20 > Based on your and others' feedback, it has now become clear for me tha= t parse_url() > is still useful and ext/url needs quite some additional capabilities u= ntil this function > really becomes superfluous. That's why it now seems to me that the beh= avior of > parse_url() could be leveraged in ext/url so that it would work with a= Url/Url class (e.g. > we had a PhpUrlParser class extending the Url/UrlParser, or a Url\Url:= :fromPhpParser() > method, depending on which object model we choose. Of course the names= are TBD). >=20 >> For all these arguments I would keep the proposed `Url` free of all >> these concerns and lean toward a nullable string for the query string=20 >> representation. And defer this debate to its own RFC regarding query=20 >> string parsing handling in PHP. >=20 > My WIP implementation still uses nullable properties and return types.= I only changed those > when I wrote the RFC. Since I see that PSR-7 compatibility is very low= prio for everyone > involved in the discussion, then I think making these types nullable i= s fine. It was neither my > top prio, but somewhere I had to start the object design, so I went wi= th this. The spec contains elements and their types. It would be good to adhere t= o the spec (simplifies documentation): 1. scheme may be null or empty string 2. port may be null 3. path is never null, but may be empty string 4. query may be null 5. fragment may be null 6. user/password may be null (to differentiate between an empty passwor= d or no password) 7. host may be null (for relative URLs >=20 > Again, thank you for your constructive criticism. >=20 > Regards, > M=C3=A1t=C3=A9 =E2=80=94 Rob --a24729788ef140a1a92eacf04415bffd Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable
On Sun, Jul 7, = 2024, at 11:13, M=C3=A1t=C3=A9 Kocsis wrote:
Hi Ignace,

As far as I understand i= t, if this RFC were to pass as is it will model
PHP URLs t= o the WHATWG specification. While this specification is
ge= tting a lot of traction lately I believe it will restrict URL usage in
PHP instead of making developer life easier. While PHP star= ted as a
"web" language it is first and foremost a server = side general purpose
language. The WHATWG spec on the othe= r hand is created by browsers
vendors and is geared toward= browsers (client side) and because of
browsers history it= restricts by design a lot of what PHP developers can
curr= ently do using `parse_url`. In my view the `Url` class in
= PHP should allow dealing with any IANA registered scheme, which is not
the case for the WHATWG specification.

Supporting IANA registered schemes is a valid requ= est, and is definitely useful.
However, I think this featu= re is not strictly required to have in the current RFC.
An= yone we needs to support features that are not offered by the WHATWG
=
standard can still rely on parse_url(). And of course, we can= (and should) add
support for other standards later. If we= wanted to do all these in the same
RFC, then the scope of= the RFC would become way too large IMO. That's why I
opt = for incremental improvements.
It's also worth pointing out (as another reason not to do th= is) is that IANA may-or-may not be valid in the current network. For exa= mple, TOR, Handshake, IPFS, Freenet, etc. all have their own DNS schemes= and do not (usually) use IANA registered schemes, and many people creat= e sites that cater to those networks.


Besides, I fail to see why a WHATWG comp= liant parser wouldn't be useful in PHP:
yes, PHP is server= side, but it still interacts with browsers very heavily. Among other
use-cases I cannot yet image, the major one is most likely v= alidating user-supplied URLs
for opening in the browser. A= s far as I see the situation, currently there is no acceptably
=
reliable possibility to decide whether a URL can be opened in brows= ers or not.

Lookin= g at the spec for WHATWG, it looks like `example%2Ecom` will be parsed a= s a valid URL, and transformed to `example.com`, while this doesn't curr= ently happen in parse_url():


=
I don't know if that may be an issue, but might be if you are expec= ting the string to remain URL encoded.


- parse_url and parse_str predates RFC39= 86
- URLSearchParans was ratified before PSR-7 BUT the fir= st implementation
landed a year AFTER PSR-7 was released a= nd already implemented.

Thank = you for the historical context!

Based= on your and others' feedback, it has now become clear for me that = parse_url()
is still useful and ext/url needs quite some a= dditional capabilities until this function
really becomes = superfluous. That's why it now seems to me that the behavior of
parse_url() could be leveraged in ext/url so that it would work wi= th a Url/Url class (e.g.
we had a PhpUrlParser class exten= ding the Url/UrlParser, or a Url\Url::fromPhpParser()
meth= od, depending on which object model we choose. Of course the names = are TBD).

For all these arguments I would k= eep the proposed `Url` free of all
these concerns and lea= n toward a nullable string for the query string
represen= tation. And defer this debate to its own RFC regarding query
<= div> string parsing handling in PHP.

My WIP implementation still uses nullable properties and return t= ypes. I only changed those
when I wrote the RFC. Since I s= ee that PSR-7 compatibility is very low prio for everyone
= involved in the discussion, then I think making these types nullabl= e is fine. It was neither my
top prio, but somewhere I had= to start the object design, so I went with this.
<= /blockquote>

The spec contains elements and their typ= es. It would be good to adhere to the spec (simplifies documentation):

  1. scheme may be null or empty string
  2. port may be null
  3. path is never null, but may be empty = string
  4. query may be null
  5. fragment may be null
  6. user/password may be null (to differentiate between an empty p= assword or no password)
  7. host may be null (for relative URLs<= br>


Ag= ain, thank you for your constructive criticism.

=
Regards,
M=C3=A1t=C3=A9

=E2=80=94 Rob
--a24729788ef140a1a92eacf04415bffd--