Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124280 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 8701C1A009C for ; Mon, 8 Jul 2024 07:51:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1720425187; bh=TTD4ONF5oJkyDfy+FR84WqLD059rr5qATn7vAZpv2Do=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=lmNOVigZl1eSZMaG24Y1IHKQfLe3qnz2ZOEPUpChB9hUkw0ZOeDHbKJqbFtftBdai aoPndFdrAmFjXxuDhmuDNyhIzYb3wllLtFEiKIo0w5X1qAwq8FM2niEgVaocSFS3fO jo6y0NhpOxX/YZso91cDmOKlW8/5kCfRMHgAtdNAj/+p4KxFbUNIXa3fCvqUm/RI1T Tj+3uxuD6UhhWElEYyTgDI7qQe+C7frOTOLhC6E+KzHlc/ctspDcSVO8vzDEBF4tPj zS7MwPaJsJR9Pac67YEVkpJGA5fpi76r5VF7+v+1yijNi5hxEJG93BCUYvvAy+iC9p Pj/u/t8oDbqGQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id EC4EC18007F for ; Mon, 8 Jul 2024 07:53:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 8 Jul 2024 07:53:06 +0000 (UTC) Received: by mail-lj1-f182.google.com with SMTP id 38308e7fff4ca-2ebeefb9a7fso42759051fa.0 for ; Mon, 08 Jul 2024 00:51:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720425100; x=1721029900; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TTD4ONF5oJkyDfy+FR84WqLD059rr5qATn7vAZpv2Do=; b=U1gjFmA/WqBGj15pwa2XBNNzHYELVEoRSw/s5AZnh3EhvV9IDnU8OX1IkNr8jQPHP3 Y1PRqG+FROjZmTFp7/IAhk72FHT2gBGvHKvLJWci1Q40iB80arXcwfJ+JeD+Xb+JdFRN mVz7+5qtPLmC/5X3tGNRtd4k3CzI0/s1KEhOXRhN/CXNijyoODHgcrvyNhYA3i9iOR0M RbQQh28ZKDUeuHgz8xJrgVXWxAUW/xm/OLGW7Sk43vkO89w2ClpQQdE0edzBtF5sv0fx HQqsyL0qC4ZYczS4QAg2U+GxhVd8Vcmt+yWcVA4sHBLu0YRFC+f2SWnF94LywWlo+ZHH 5y4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720425100; x=1721029900; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TTD4ONF5oJkyDfy+FR84WqLD059rr5qATn7vAZpv2Do=; b=wXw6HSwqRqXNL1WMUf6b7Csj5WJKDG0HTkXSDbNF8OhwDfjf3izM+2GXZM9xS5JP+C PEuhrOeigaUg9472nqFiVWwaQiSFFR59l3Cit+LCZu7kbdn8NDiW7ScRSee0V7nOcQt3 4nlhQJ2hyE7ulXhlRVUxUQBPaUKcDg/VjF0Os9LX6Iu+OTudpUOq7jdgfqmtRtu8dvgA 7VqcpYbi5mgrINmY/dVRutixjMSnVZjBrJe2TpEdKsDK/WosOmr4Ho+NapTPdipFfAom x8U6eX9Oh771kOt2CKju9a23Dyf91YIY0gW/0t9YhQNsr73V9s4pjBTEg/rhrW0pQP/C PYaw== X-Forwarded-Encrypted: i=1; AJvYcCXWRer4N2iH4TVJXz26pyaJrOFPhu28Lww8RLBsdXr/TvA4JorzLgH3PhlxbUXbvqrVZDKTla1C0C9LM2KPZxRTUwmMCospgw== X-Gm-Message-State: AOJu0YytFX0BOxZkSHuEqcd8Vd7RQws6gCE92j5KLyVeeaINGSGWNhZA Iu5s6blmCcGZ/O/xUoU55o7TrmcLEQqL/NIkRPtnrfXIKwNRhrrmFg0AVJNfdcKV1z0F+xAMDag o5ueAB2I1cjpwFh5G8VCUONxNFb0= X-Google-Smtp-Source: AGHT+IGxNdKmZ9MYarYoY//pLc0BMgo1cKeVE6mWeYbge3cFuVOxTz9sMdPyHLBYvgu3WukFdNu6Tsa9PV8r08rH7CA= X-Received: by 2002:a2e:a988:0:b0:2ee:7255:5047 with SMTP id 38308e7fff4ca-2ee8ee0f784mr83252501fa.50.1720425099999; Mon, 08 Jul 2024 00:51:39 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <71a73b87-cc2f-4ee5-a961-7bf2b191fbb6@gmail.com> <5159E0AB-C8B0-4A54-9654-986C1D9C858F@koalephant.com> <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> In-Reply-To: Date: Mon, 8 Jul 2024 09:51:27 +0200 Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= Cc: nyamsprod@gmail.com, internals@lists.php.net, Stephen Reay Content-Type: multipart/alternative; boundary="000000000000f2c1f0061cb7ae2e" From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --000000000000f2c1f0061cb7ae2e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Mat=C3=A9, Fantastic RFC :) Le dim. 7 juil. 2024 =C3=A0 11:17, M=C3=A1t=C3=A9 Kocsis a =C3=A9crit : > Hi Ignace, > > As far as I understand it, if this RFC were to pass as is it will model >> PHP URLs to the WHATWG specification. While this specification is >> getting a lot of traction lately I believe it will restrict URL usage in >> PHP instead of making developer life easier. While PHP started as a >> "web" language it is first and foremost a server side general purpose >> language. The WHATWG spec on the other hand is created by browsers >> vendors and is geared toward browsers (client side) and because of >> browsers history it restricts by design a lot of what PHP developers can >> currently do using `parse_url`. In my view the `Url` class in >> PHP should allow dealing with any IANA registered scheme, which is not >> the case for the WHATWG specification. > > > Supporting IANA registered schemes is a valid request, and is definitely > useful. > However, I think this feature is not strictly required to have in the > current RFC. > Anyone we needs to support features that are not offered by the WHATWG > standard can still rely on parse_url(). > If I may, parse_url is showing its age and issues like https://github.com/php/php-src/issues/12703 make it unreliable. We need an escape plan from it. FYI, we're discussing whether a Uri component should make it in Symfony precisely to work around parse_url's issues in https://github.com/php/php-src/issues/12703 Your RFC would be the perfect answer to this discussion but IRI would need to be part of it. I agree with everything Ignace said. Supporting RFC3986 from day-1 would be absolutely great! Note that we use parse_url for http-URLs, but also to parse DSNs like redis://localhost and the likes. > And of course, we can (and should) add > support for other standards later. If we wanted to do all these in the sa= me > RFC, then the scope of the RFC would become way too large IMO. That's why= I > opt for incremental improvements. > > Besides, I fail to see why a WHATWG compliant parser wouldn't be useful i= n > PHP: > yes, PHP is server side, but it still interacts with browsers very > heavily. Among other > use-cases I cannot yet image, the major one is most likely validating > user-supplied URLs > for opening in the browser. As far as I see the situation, currently ther= e > is no acceptably > reliable possibility to decide whether a URL can be opened in browsers or > not. > > - parse_url and parse_str predates RFC3986 >> - URLSearchParans was ratified before PSR-7 BUT the first implementation >> landed a year AFTER PSR-7 was released and already implemented. >> > > Thank you for the historical context! > > Based on your and others' feedback, it has now become clear for me that > parse_url() > is still useful and ext/url needs quite some additional capabilities unti= l > this function > really becomes superfluous. That's why it now seems to me that the > behavior of > parse_url() could be leveraged in ext/url so that it would work with a > Url/Url class (e.g. > we had a PhpUrlParser class extending the Url/UrlParser, or a > Url\Url::fromPhpParser() > method, depending on which object model we choose. Of course the names ar= e > TBD). > > For all these arguments I would keep the proposed `Url` free of all >> these concerns and lean toward a nullable string for the query string >> representation. And defer this debate to its own RFC regarding query >> string parsing handling in PHP. >> > > My WIP implementation still uses nullable properties and return types. I > only changed those > when I wrote the RFC. Since I see that PSR-7 compatibility is very low > prio for everyone > involved in the discussion, then I think making these types nullable is > fine. It was neither my > top prio, but somewhere I had to start the object design, so I went with > this. > > Again, thank you for your constructive criticism. > > Regards, > M=C3=A1t=C3=A9 > --000000000000f2c1f0061cb7ae2e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Mat=C3=A9,

Fantastic RFC = :)

Le=C2=A0dim. 7 juil. 2024 =C3=A0=C2=A011:17, M=C3=A1t=C3=A9 Kocsis &l= t;kocsismate90@gmail.com> = a =C3=A9crit=C2=A0:
Hi Ignace,
<= br>
As far as I unde= rstand it, if this RFC were to pass as is it will model
PHP URLs to the = WHATWG specification. While this specification is
getting a lot of tract= ion lately I believe it will restrict URL usage in
PHP instead of making= developer life easier. While PHP started as a
"web" language = it is first and foremost a server side general purpose
language. The WHA= TWG spec on the other hand is created by browsers
vendors and is geared = toward browsers (client side) and because of
browsers history it restric= ts by design a lot of what PHP developers can
currently do using `parse_= url`. In my view the `Url` class in
PHP should allow dealing with any IA= NA registered scheme, which is not
the case for the WHATWG specification= .

Supporting IANA registered schemes is a v= alid request, and is definitely useful.
However, I think this fea= ture is not strictly required to have in the current RFC.
Anyone = we needs to support features that are not offered by the WHATWG
s= tandard can still rely on parse_url().
<= br>
If I may, parse_url is showing its age and issues like https://github.com/php/ph= p-src/issues/12703 make it unreliable. We need an escape plan from it.<= /div>

FYI, we're discussing whether a Uri component = should make it in Symfony precisely to work around parse_url's issues i= n https://github.co= m/php/php-src/issues/12703
Your RFC would be the perfect answ= er to this discussion but IRI would need to be part of it.
I agree with everything Ignace said. Supporting=C2=A0RFC3986 f= rom day-1 would be absolutely great!

Note that we = use parse_url for http-URLs, but also to parse DSNs like redis://localhost = and the likes.
=C2=A0
And of cou= rse, we can (and should) add
support for other standards later. I= f we wanted to do all these in the same
RFC, then the scope of th= e RFC would become way too large IMO. That's why I
opt for in= cremental improvements.

Besides, I fail to see why= a WHATWG compliant parser wouldn't be useful in PHP:
yes, PH= P is server side, but it still interacts with browsers very heavily. Among = other
use-cases I cannot yet image, the major one is most likely = validating user-supplied URLs
for opening in the browser. As far = as I see the situation, currently there is no acceptably
reliable= possibility to decide whether a URL can be opened in browsers or not.

- pa= rse_url and parse_str predates RFC3986
- URLSearchParans was ratified be= fore PSR-7 BUT the first implementation
landed a year AFTER PSR-7 was re= leased and already implemented.

Thank y= ou for the historical context!

Based on your= and others' feedback, it has now become=C2=A0clear for me that parse_u= rl()
is still useful and ext/url needs quite some additional = capabilities until this function
really becomes superfluous. That= 's why it now seems to me that the behavior of
parse_url() co= uld be leveraged in ext/url so that it would work with a Url/Url class (e.g= .
we had a PhpUrlParser class extending the Url/UrlParser, or a U= rl\Url::fromPhpParser()
method, depending on which object model w= e=C2=A0choose. Of course the names are TBD).

For all these arguments I would keep the proposed `Url` free of all
these concerns and lean toward a nullable string for the query string
representation. And defer this debate to its own RFC regarding query
string parsing handling in PHP.

My WIP = implementation still uses nullable properties and return types. I only chan= ged those
when I wrote the RFC. Since I see that PSR-7 compatibil= ity is very low prio for everyone
involved=C2=A0in the discussion= , then I think making these types nullable is fine. It was neither my
=
top prio, but somewhere I had to start the object design, so I went wi= th this.

Again, thank you for your constructive cr= iticism.

Regards,
M=C3=A1t=C3=A9
--000000000000f2c1f0061cb7ae2e--