Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124257 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 020BD1A009C for ; Sun, 7 Jul 2024 11:00:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1720350092; bh=ndsyBVXXH0ng4JUBXkjzj4K2friWxFMb6rZAn8tIgbE=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=PKHtXJO8+JK2nc2IACucdEiVfzSHsHrZZQ2fMKFeQWHh8XnIdJv+W63tmThVObIbj lTOAEW09nwqR9gic4gCMlKgwZPP8S3PghWslf+HZqjG7F9uWd3f6CG0CEuqpv0RPBC BHdc6HShDZK19pb+blV3g/ruqn2bZoSoJ1u1Ihi6XK2UmrKEZUhBT0F4aBt/uazURt kbqg2vigiOOnn9nXe2gwny4ZfxLe/o0TEKIXxn/mHb84g64Qz7Ov0oEpF7bWv+G3WP 5d2mi+e4WjJcuoLoKC09bhAfyv2YFu+JNrn1UN3AZgAZumCI0IvXqCLHeu4B5jD+tE 5dfPWqjz84LFA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 068F6180339 for ; Sun, 7 Jul 2024 11:01:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, HTTP_ESCAPED_HOST,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from fhigh6-smtp.messagingengine.com (fhigh6-smtp.messagingengine.com [103.168.172.157]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 7 Jul 2024 11:01:31 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 09F6F11403B3; Sun, 7 Jul 2024 07:00:07 -0400 (EDT) Received: from imap49 ([10.202.2.99]) by compute1.internal (MEProxy); Sun, 07 Jul 2024 07:00:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:cc:content-type:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm3; t=1720350007; x= 1720436407; bh=r4RQaKdjc1Y6Fy2i04sgaYZBInA6VUhxuYcv/y9fAQ0=; b=B ntNM5t7NeV+ZRjxCrs/gFoCi6joplaVh6mcpkVlabeTnOaFpoT4nU9Z5pUIkD4uf UrOQkZdqRHieKVyqmOFyPeT9xH1gbLzyNrEk+rvW9gv/j3v3viZUxwvy0t26yI9Y fzN3VYa1vuQBFef0l/jTCeKSfiFk5pWeLGmtPL+Vtpx8rIWxShKctZpCoTAaiLh3 M6WLjao/Qp5NFCPRS0kPTXL8B8fOhT71GpXyzT7LmIuHcWvN1TlP5ez5bqYSvfxS /N0UAuRfBKUea7D891xfyRuVywR6PczgHLXciYWg+I/Ibso2U6XRnYYHXlvwA524 KlBo9VEgR3trR4MvpIPNQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1720350007; x=1720436407; bh=r4RQaKdjc1Y6Fy2i04sgaYZBInA6 VUhxuYcv/y9fAQ0=; b=LCxr2eZ/9oxIlZb/U9YU8S/sTPbx15p40Czxp11vRENT vEaamEn+EAmh1qdCfoA+lHK9RbmPvKuXwONcZKw8YADamaat7i2/lNsi5Ldc2xsr 2Pgf5R4rNnx8cVfJplHNUoBLt2EIxkUFv+rvlK2bnYX0m+/BhKzPgsJHrpHuYIhM TjWrpZ8MeTvEPwYfRvGqVWDfaWwbVeUBeGS5KBxHLoG6N1jLieFcrKtuy1kYrKp/ WuG88o+1ue8SbGXkzvDakXGdrETQNpAbdx1moq2kJKWZZSFuzffyr35KsKFxSsFF hO6qN72e0044WjwwrG+gMsQxNemS15dIMcT8Qlrfog== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrvdehgdefiecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenog fuuhhsphgvtghtffhomhgrihhnucdlgeelmdenucfjughrpefofgggkfgjfhffhffvvefu tgesrgdtreerreerjeenucfhrhhomhepfdftohgsucfnrghnuggvrhhsfdcuoehrohgsse gsohhtthhlvggurdgtohguvghsqeenucggtffrrghtthgvrhhnpefgffetueelfeegtdei hefhtddtleevueejtdehhefhuedvffeguedtuedtgeejkeenucffohhmrghinhepfehvge hlrdhorhhgpdgvgigrmhhplhgvrdgtohhmnecuvehluhhsthgvrhfuihiivgeptdenucfr rghrrghmpehmrghilhhfrhhomheprhhosgessghothhtlhgvugdrtghouggvsh X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 9246915A0092; Sun, 7 Jul 2024 07:00:06 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-566-g3812ddbbc-fm-20240627.001-g3812ddbb Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Message-ID: <2a0f8ffd-481d-42ae-9f6d-9082d7b63262@app.fastmail.com> In-Reply-To: <883a7cc2-c63c-479b-a8be-3a5fdac43c03@app.fastmail.com> References: <71a73b87-cc2f-4ee5-a961-7bf2b191fbb6@gmail.com> <5159E0AB-C8B0-4A54-9654-986C1D9C858F@koalephant.com> <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> <883a7cc2-c63c-479b-a8be-3a5fdac43c03@app.fastmail.com> Date: Sun, 07 Jul 2024 12:59:45 +0200 To: =?UTF-8?Q?M=C3=A1t=C3=A9_Kocsis?= , nyamsprod@gmail.com Cc: internals@lists.php.net, "Stephen Reay" Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API Content-Type: multipart/alternative; boundary=aa8d5541c181482581ed14377659f79b From: rob@bottled.codes ("Rob Landers") --aa8d5541c181482581ed14377659f79b Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sun, Jul 7, 2024, at 12:40, Rob Landers wrote: > On Sun, Jul 7, 2024, at 11:13, M=C3=A1t=C3=A9 Kocsis wrote: >> Hi Ignace, >>=20 >>> As far as I understand it, if this RFC were to pass as is it will mo= del >>> PHP URLs to the WHATWG specification. While this specification is >>> getting a lot of traction lately I believe it will restrict URL usag= e in >>> PHP instead of making developer life easier. While PHP started as a >>> "web" language it is first and foremost a server side general purpose >>> language. The WHATWG spec on the other hand is created by browsers >>> vendors and is geared toward browsers (client side) and because of >>> browsers history it restricts by design a lot of what PHP developers= can >>> currently do using `parse_url`. In my view the `Url` class in >>> PHP should allow dealing with any IANA registered scheme, which is n= ot >>> the case for the WHATWG specification. >>=20 >> Supporting IANA registered schemes is a valid request, and is definit= ely useful. >> However, I think this feature is not strictly required to have in the= current RFC. >> Anyone we needs to support features that are not offered by the WHATWG >> standard can still rely on parse_url(). And of course, we can (and sh= ould) add >> support for other standards later. If we wanted to do all these in th= e same >> RFC, then the scope of the RFC would become way too large IMO. That's= why I >> opt for incremental improvements. >=20 > It's also worth pointing out (as another reason not to do this) is tha= t IANA may-or-may not be valid in the current network. For example, TOR,= Handshake, IPFS, Freenet, etc. all have their own DNS schemes and do no= t (usually) use IANA registered schemes, and many people create sites th= at cater to those networks. >=20 >>=20 >> Besides, I fail to see why a WHATWG compliant parser wouldn't be usef= ul in PHP: >> yes, PHP is server side, but it still interacts with browsers very he= avily. Among other >> use-cases I cannot yet image, the major one is most likely validating= user-supplied URLs >> for opening in the browser. As far as I see the situation, currently = there is no acceptably >> reliable possibility to decide whether a URL can be opened in browser= s or not. >=20 > Looking at the spec for WHATWG, it looks like `example%2Ecom` will be = parsed as a valid URL, and transformed to `example.com`, while this does= n't currently happen in parse_url(): >=20 > https://3v4l.org/NtqQm >=20 > I don't know if that may be an issue, but might be if you are expectin= g the string to remain URL encoded. >=20 >>=20 >>> - parse_url and parse_str predates RFC3986 >>> - URLSearchParans was ratified before PSR-7 BUT the first implementa= tion >>> landed a year AFTER PSR-7 was released and already implemented. >>=20 >> Thank you for the historical context! >>=20 >> Based on your and others' feedback, it has now become clear for me th= at parse_url() >> is still useful and ext/url needs quite some additional capabilities = until this function >> really becomes superfluous. That's why it now seems to me that the be= havior of >> parse_url() could be leveraged in ext/url so that it would work with = a Url/Url class (e.g. >> we had a PhpUrlParser class extending the Url/UrlParser, or a Url\Url= ::fromPhpParser() >> method, depending on which object model we choose. Of course the name= s are TBD). >>=20 >>> For all these arguments I would keep the proposed `Url` free of all >>> these concerns and lean toward a nullable string for the query string >>> representation. And defer this debate to its own RFC regarding query >>> string parsing handling in PHP. >>=20 >> My WIP implementation still uses nullable properties and return types= . I only changed those >> when I wrote the RFC. Since I see that PSR-7 compatibility is very lo= w prio for everyone >> involved in the discussion, then I think making these types nullable = is fine. It was neither my >> top prio, but somewhere I had to start the object design, so I went w= ith this. >=20 > The spec contains elements and their types. It would be good to adhere= to the spec (simplifies documentation): >=20 > 1. scheme may be null or empty string > 2. port may be null > 3. path is never null, but may be empty string > 4. query may be null > 5. fragment may be null > 6. user/password may be null (to differentiate between an empty passw= ord or no password) > 7. host may be null (for relative URLs >=20 >>=20 >> Again, thank you for your constructive criticism. >>=20 >> Regards, >> M=C3=A1t=C3=A9 >=20 > =E2=80=94 Rob Here's a list of examples worth adding to the RFC: //example.com? ftp://user@example.com/path/to/ffile https://user:@example.com https://user:pass@example%2Ecom/?something=3Dother&bool#heading etc. =E2=80=94 Rob --aa8d5541c181482581ed14377659f79b Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable

=

On Sun, Jul 7, 2024, at 12:40, Rob Landers wrote:
On Sun, Jul 7= , 2024, at 11:13, M=C3=A1t=C3=A9 Kocsis wrote:
Hi Ignace,

As far as I = understand it, if this RFC were to pass as is it will model
PHP URLs to the WHATWG specification. While this specification is
<= /div>
getting a lot of traction lately I believe it will restrict UR= L usage in
PHP instead of making developer life easier. Wh= ile PHP started as a
"web" language it is first and foremo= st a server side general purpose
language. The WHATWG spec= on the other hand is created by browsers
vendors and is g= eared toward browsers (client side) and because of
browser= s history it restricts by design a lot of what PHP developers can
currently do using `parse_url`. In my view the `Url` class in
PHP should allow dealing with any IANA registered scheme, wh= ich is not
the case for the WHATWG specification.

Supporting IANA registered schemes is = a valid request, and is definitely useful.
However, I thin= k this feature is not strictly required to have in the current RFC.
<= /div>
Anyone we needs to support features that are not offered by th= e WHATWG
standard can still rely on parse_url(). And of co= urse, we can (and should) add
support for other standards = later. If we wanted to do all these in the same
RFC, then = the scope of the RFC would become way too large IMO. That's why I
opt for incremental improvements.

It's also worth pointing out (as another reason = not to do this) is that IANA may-or-may not be valid in the current netw= ork. For example, TOR, Handshake, IPFS, Freenet, etc. all have their own= DNS schemes and do not (usually) use IANA registered schemes, and many = people create sites that cater to those networks.


Besides, I fail to see= why a WHATWG compliant parser wouldn't be useful in PHP:
= yes, PHP is server side, but it still interacts with browsers very heavi= ly. Among other
use-cases I cannot yet image, the major on= e is most likely validating user-supplied URLs
for opening= in the browser. As far as I see the situation, currently there is no ac= ceptably
reliable possibility to decide whether a URL can = be opened in browsers or not.
Looking at the spec for WHATWG, it looks like `example%2Ecom= ` will be parsed as a valid URL, and transformed to `example.com`, while= this doesn't currently happen in parse_url():

<= div>https://3v4l.org/NtqQm

I don't know if that may be an issue, but might b= e if you are expecting the string to remain URL encoded.
<= br>

- parse= _url and parse_str predates RFC3986
- URLSearchParans was = ratified before PSR-7 BUT the first implementation
landed = a year AFTER PSR-7 was released and already implemented.

Thank you for the historical context!

Based on your and others' feedback, it has n= ow become clear for me that parse_url()
is still usef= ul and ext/url needs quite some additional capabilities until this funct= ion
really becomes superfluous. That's why it now seems to= me that the behavior of
parse_url() could be leveraged in= ext/url so that it would work with a Url/Url class (e.g.
= we had a PhpUrlParser class extending the Url/UrlParser, or a Url\Url::f= romPhpParser()
method, depending on which object model we&= nbsp;choose. Of course the names are TBD).

For all these arguments I would keep the proposed `Url` free of all
these concerns and lean toward a nullable string for the qu= ery string
representation. And defer this debate to its ow= n RFC regarding query
string parsing handling in PHP.
<= /div>

My WIP implementation still uses n= ullable properties and return types. I only changed those
= when I wrote the RFC. Since I see that PSR-7 compatibility is very low p= rio for everyone
involved in the discussion, then I t= hink making these types nullable is fine. It was neither my
top prio, but somewhere I had to start the object design, so I went wi= th this.

The spec = contains elements and their types. It would be good to adhere to the spe= c (simplifies documentation):

  1. scheme may= be null or empty string
  2. port may be null
  3. path i= s never null, but may be empty string
  4. query may be null
    <= /li>
  5. fragment may be null
  6. user/password may be null (to d= ifferentiate between an empty password or no password)
  7. host = may be null (for relative URLs


Again, thank you for your constructiv= e criticism.

Regards,
M=C3=A1= t=C3=A9

=E2=80=94 Rob

Her= e's a list of examples worth adding to the RFC:

=
//example.com?
ftp://user@example.com/path/to/ffile
https://user:@example.com
https://user:pass@e= xample%2Ecom/?something=3Dother&bool#heading

etc.

=E2=80=94 Rob
--aa8d5541c181482581ed14377659f79b--