Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124252 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 5D0281A009C for ; Sun, 7 Jul 2024 09:14:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1720343736; bh=H2930eT/K2xCIQ5f5gFLzKXxvmm7TMNzcXl6kLp+QSA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=RVr93pNhXll/xGuMeai/t2DTugrKK8frCuk93I9LU6HTrGfauCsutVUejsqEK7yr+ 9TCrDcFPfgQ0OW6Zfd25+LYJWpRzgeUWrF+JqiZb8kNOZ+rAm+KE2t8NiFxjid7h4/ 4pfoQqVpp/zcF5SB2F+/+g8H0K2yFrgsDUF1g5rWYb9aaAolZcyEGsL56sBdQfVpq5 sxNNY9NTLLs2hP25b+oarOuQfIiFKgY9L0WiwMTD9rl9S/pQIkdKCTV+qbWBQcyEmV Yx0YuLHViA+UE+703RWIzD0eCNY6ZIc7UPiFt7ZnQtiPhiBtMaP5mTFiUCQBOm0Sbz dUigro96Q6s+w== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 28725180057 for ; Sun, 7 Jul 2024 09:15:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 7 Jul 2024 09:15:34 +0000 (UTC) Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-447e652ce5fso4333821cf.1 for ; Sun, 07 Jul 2024 02:14:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720343650; x=1720948450; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=H2930eT/K2xCIQ5f5gFLzKXxvmm7TMNzcXl6kLp+QSA=; b=JkcFjU3G3gbjh2+ZVGzZ8Z39Jf/1KgmCuxFEfnVpPi8R3bJdUu3CnG2/F+QLXTEXlI qZM8fHu1q5jws8dpnyUiQsK6S8YGmVPdbMiBDbG3GeLivwHSa2BFBJoiFPTCjrtWri4T WmCEwea+PWnGbw6/tJF/dIjdfiZt2JpZekTNwHlRo5c2KhFP3+xvOv4LAVpHY3nPT09s 2SN94kjl3a62LVUqIhulsIUwg9yDzseoHBFDffqc6xjA1Z6cNqy9MJTO4x/jmoVLlWxr 5QilR48GqLk1yysRq7wZpWgzo6j9cWOvQfNysqOjjA9AcfI8ymIFJSxdVPp7dx/gLAk6 ii6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720343650; x=1720948450; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=H2930eT/K2xCIQ5f5gFLzKXxvmm7TMNzcXl6kLp+QSA=; b=GbZiGK3B8UFRMBZYeGXG0Dy3gafcrIYIjL0HtHIgp1wepNizGbornzeTl5HRdiOMUM y3IGFbd//xlsU3cC4QT+IsCWShp/DZ0NsOFrupydY+pzO+esTE8NPn5qkaAd6KD6bzAI b7ONhkyWaWCSuIYl0aL4KETV2/jl6ZLJhiyVCX0Z2BAkMd+/kyjlorH1Z64Mxl6IOtOO S0a0kMrLRoVJn5iCwYRaRVwEWyMRmLeQ5HsN2vuy1kWdDL2im5yeZefGYwIjT2/5y+nX tv059e5JITT4ezKXIR465NhC1ZmyaZM9Z7Qz0Alki/kSRPLn01xOwF2KuDJR3TXZmhgs 1h0Q== X-Gm-Message-State: AOJu0YyIQrT6Vids98zVBsp1HVXeGWPCVJOhINZVaiCJ6BZm+2JyfA7S KSuYzhMIBo3LbH3yR6/SLe2LqoAp9sGLLH9TsUuINhgP7NWA92k8Cvzau4v8bAA4iXUqo0U0wEQ sTmrVCuABgjFwFO6wcLd/d1ondII= X-Google-Smtp-Source: AGHT+IHCuAaILOadPFaui+JYLsUPakCDu4x4DAXYhAzPPSqhDW5uCEmESJhdLO+o/NTY1tjW6IVzZLWe0T0h3kdXxss= X-Received: by 2002:a05:622a:1454:b0:446:5bbd:4802 with SMTP id d75a77b69052e-447cbf9bd80mr127649161cf.56.1720343649842; Sun, 07 Jul 2024 02:14:09 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <71a73b87-cc2f-4ee5-a961-7bf2b191fbb6@gmail.com> <5159E0AB-C8B0-4A54-9654-986C1D9C858F@koalephant.com> <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> In-Reply-To: <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> Date: Sun, 7 Jul 2024 12:13:58 +0300 Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: nyamsprod@gmail.com Cc: internals@lists.php.net, Stephen Reay Content-Type: multipart/alternative; boundary="000000000000240202061ca4b8f3" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --000000000000240202061ca4b8f3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Ignace, As far as I understand it, if this RFC were to pass as is it will model > PHP URLs to the WHATWG specification. While this specification is > getting a lot of traction lately I believe it will restrict URL usage in > PHP instead of making developer life easier. While PHP started as a > "web" language it is first and foremost a server side general purpose > language. The WHATWG spec on the other hand is created by browsers > vendors and is geared toward browsers (client side) and because of > browsers history it restricts by design a lot of what PHP developers can > currently do using `parse_url`. In my view the `Url` class in > PHP should allow dealing with any IANA registered scheme, which is not > the case for the WHATWG specification. Supporting IANA registered schemes is a valid request, and is definitely useful. However, I think this feature is not strictly required to have in the current RFC. Anyone we needs to support features that are not offered by the WHATWG standard can still rely on parse_url(). And of course, we can (and should) add support for other standards later. If we wanted to do all these in the same RFC, then the scope of the RFC would become way too large IMO. That's why I opt for incremental improvements. Besides, I fail to see why a WHATWG compliant parser wouldn't be useful in PHP: yes, PHP is server side, but it still interacts with browsers very heavily. Among other use-cases I cannot yet image, the major one is most likely validating user-supplied URLs for opening in the browser. As far as I see the situation, currently there is no acceptably reliable possibility to decide whether a URL can be opened in browsers or not. - parse_url and parse_str predates RFC3986 > - URLSearchParans was ratified before PSR-7 BUT the first implementation > landed a year AFTER PSR-7 was released and already implemented. > Thank you for the historical context! Based on your and others' feedback, it has now become clear for me that parse_url() is still useful and ext/url needs quite some additional capabilities until this function really becomes superfluous. That's why it now seems to me that the behavior of parse_url() could be leveraged in ext/url so that it would work with a Url/Url class (e.g. we had a PhpUrlParser class extending the Url/UrlParser, or a Url\Url::fromPhpParser() method, depending on which object model we choose. Of course the names are TBD). For all these arguments I would keep the proposed `Url` free of all > these concerns and lean toward a nullable string for the query string > representation. And defer this debate to its own RFC regarding query > string parsing handling in PHP. > My WIP implementation still uses nullable properties and return types. I only changed those when I wrote the RFC. Since I see that PSR-7 compatibility is very low prio for everyone involved in the discussion, then I think making these types nullable is fine. It was neither my top prio, but somewhere I had to start the object design, so I went with this. Again, thank you for your constructive criticism. Regards, M=C3=A1t=C3=A9 --000000000000240202061ca4b8f3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Ignace,

<= /div>
As far as I understa= nd it, if this RFC were to pass as is it will model
PHP URLs to the WHAT= WG specification. While this specification is
getting a lot of traction = lately I believe it will restrict URL usage in
PHP instead of making dev= eloper life easier. While PHP started as a
"web" language it i= s first and foremost a server side general purpose
language. The WHATWG = spec on the other hand is created by browsers
vendors and is geared towa= rd browsers (client side) and because of
browsers history it restricts b= y design a lot of what PHP developers can
currently do using `parse_url`= . In my view the `Url` class in
PHP should allow dealing with any IANA r= egistered scheme, which is not
the case for the WHATWG specification.

Supporting IANA registered schemes is a valid= request, and is definitely useful.
However, I think this feature= is not strictly required to have in the current RFC.
Anyone we n= eeds to support features that are not offered by the WHATWG
stand= ard can still rely on parse_url(). And of course, we can (and should) add
support for other standards later. If we wanted to do all these in= the same
RFC, then the scope of the RFC would become way too lar= ge IMO. That's why I
opt for incremental improvements.
<= div>
Besides, I fail to see why a WHATWG compliant parser wou= ldn't be useful in PHP:
yes, PHP is server side, but it still= interacts with browsers very heavily. Among other
use-cases I ca= nnot yet image, the major one is most likely validating user-supplied URLs<= /div>
for opening in the browser. As far as I see the situation, curren= tly there is no acceptably
reliable possibility to decide whether= a URL can be opened in browsers or not.

- parse_url and parse_str predates= RFC3986
- URLSearchParans was ratified before PSR-7 BUT the first imple= mentation
landed a year AFTER PSR-7 was released and already implemented= .

Thank you for the historical context!=

Based on your and others' feedback, it = has now become=C2=A0clear for me that parse_url()
is still us= eful and ext/url needs quite some additional capabilities until this functi= on
really becomes superfluous. That's why it now seems to me = that the behavior of
parse_url() could be leveraged in ext/url so= that it would work with a Url/Url class (e.g.
we had a PhpUrlPar= ser class extending the Url/UrlParser, or a Url\Url::fromPhpParser()
<= div>method, depending on which object model we=C2=A0choose. Of course the n= ames are TBD).

For all these arguments I would keep the proposed `Url` free of all
these concerns and lean toward a nullable string for the query string
representation. And defer this debate to its own RFC regarding query
string parsing handling in PHP.

My WIP = implementation still uses nullable properties and return types. I only chan= ged those
when I wrote the RFC. Since I see that PSR-7 compatibil= ity is very low prio for everyone
involved=C2=A0in the discussion= , then I think making these types nullable is fine. It was neither my
=
top prio, but somewhere I had to start the object design, so I went wi= th this.

Again, thank you for your constructive cr= iticism.

Regards,
M=C3=A1t=C3=A9
--000000000000240202061ca4b8f3--