Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:124047
X-Original-To: internals@lists.php.net
Delivered-To: internals@lists.php.net
Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5])
	by qa.php.net (Postfix) with ESMTPS id EA28C1A009C
	for <internals@lists.php.net>; Sat, 29 Jun 2024 16:19:28 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail;
	t=1719678048; bh=bBbTqsdtdcC3YxYdzg3Ep1ljExxyj5iOEQEh1U27z5o=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=THE7Fwv0AjMmqnG15MxVqB/dS9ylOjSL8GRyUQ30uo+13SqvrMQUbEkNT3o8eFlCw
	 Vl4V3m8ME5Zx/BdflbcvdTVIfJMrDZcweSOVEGP8iQQSnc2LjvlhPTMUR03oC7lUmJ
	 /DHNQoBFPZHB/5R8DiV7SenNSqjG97kAR6QozWPDrD5Jx7xkJhR3QDS2sPI/5rFOh8
	 duDqqH+F/RAJz0XGWcYi3f+mby3+p03uyGb5WmxWV2RLefFwOjNx+BfxwR9q87mk29
	 hmVJOv2Nk+bZVOXMyFE0DMQeSVyBx8FZfPDm/AkGr4b8oSm0inQqmIPIGryVLBhL56
	 RBivOR/fOaBxw==
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id E00B51801E4
	for <internals@lists.php.net>; Sat, 29 Jun 2024 16:20:46 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=0.6 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,
	HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,T_PDS_PRO_TLD,
	T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0
X-Spam-Virus: Error (Cannot connect to unix socket
	'/var/run/clamav/clamd.ctl': connect: Connection refused)
X-Envelope-From: <juris@glaive.pro>
Received: from relay.mailchannels.net (gt-egress-001.relay.mailchannels.net [199.10.31.235])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Sat, 29 Jun 2024 16:20:46 +0000 (UTC)
X-Sender-Id: yszpovajlk|x-authuser|juris@glaive.pro
Received: from relay.mailchannels.net (localhost [127.0.0.1])
	by relay.mailchannels.net (Postfix) with ESMTP id 0E88643B40;
	Sat, 29 Jun 2024 16:19:20 +0000 (UTC)
Received: from server42.areait.lv (unknown [127.0.0.6])
	(Authenticated sender: yszpovajlk)
	by relay.mailchannels.net (Postfix) with ESMTPA id 09BCD42B74;
	Sat, 29 Jun 2024 16:19:14 +0000 (UTC)
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1719677955; a=rsa-sha256;
	cv=none;
	b=HPrLERLbDj93o6U0GkL+HDq4sBulOsLnXqsZjPLDx3I0faY33MGVEfMcam5FQPqdEQLTTP
	tHFOS0dFSTyboel7ispzWLld4utGy3ge90AjN+QP6BtPTu3atNebccAvo6TrbHxWcMEb6s
	TV+QISN2x8kxUvU3W9cwqyV8XMoCiIuE0QB+TpiDYpDRsVE9SnXT6N+M5bgOFbFZvhyjTp
	IkhVhhdCHGgVUSIjmgXOLXVQym6Gj3JwIdfxavoqCgf8kpoeQkJ78GlL4Ys4MoT/7MAszE
	XnEkGOnJ8JP2BNJi3Cb8CoABs7CmxMkbWcGkQDMEE6et3HnO6pyEfLTJOhvbXg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=mailchannels.net;
	s=arc-2022; t=1719677955;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=DP1W40p8+8pzC+U1dFv/RUr2cweOXIlf9U20yXKQPmI=;
	b=llbhwCA0Q3oZMOrQtEHun4yLN6xwn/Vj4m2jKzYWJslAIR/TaHJ2OKNuh/rAj1LDxfpKkt
	Xj1JYCrjpw4NWAWhKYiLj9ITxPpYulyqIrUvwnOQ0YP7KEkwQ5KLLdg9b7fWKbbSWx9CUg
	Cb00+J5fjFKh7nQznOlpJQhoeNCQTjremqkR0v3b8nmAnxf2HiYJli/Obog+KBMpMDjlDP
	7kdRt2tKcxltpjLn/pdZZ9EzWbIASAUR72L+fLjJQ/Sc8ormM0siyli62fStBoo5ilwfDK
	t+6LUDnYSDww4qBSFq3RE5NpRArP9VyapDr0mg9tPcg3Sr2zoWlqn4FkXSaYyA==
ARC-Authentication-Results: i=1;
	rspamd-79677bdb95-7645s;
	auth=pass smtp.auth=yszpovajlk smtp.mailfrom=juris@glaive.pro
X-Sender-Id: yszpovajlk|x-authuser|juris@glaive.pro
X-MC-Relay: Neutral
X-MailChannels-SenderId: yszpovajlk|x-authuser|juris@glaive.pro
X-MailChannels-Auth-Id: yszpovajlk
X-Tart-Tank: 6104e32d6a17160e_1719677955719_2651017963
X-MC-Loop-Signature: 1719677955719:1692401885
X-MC-Ingress-Time: 1719677955719
Received: from server42.areait.lv (server42.areait.lv [212.7.207.88])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384)
	by 100.112.86.14 (trex/6.10.3);
	Sat, 29 Jun 2024 16:19:15 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=glaive.pro;
	s=default; h=Content-Type:Message-ID:References:In-Reply-To:Subject:Cc:To:
	From:Date:MIME-Version:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
	Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
	:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
	List-Post:List-Owner:List-Archive;
	bh=DP1W40p8+8pzC+U1dFv/RUr2cweOXIlf9U20yXKQPmI=; b=Dc0n2Cg+cu7cdOyYhQLgu5tOsS
	Bc+z+Axtmy1NVwnitgkoC+CEbVHBY6WmOJf9z4a0KGsuinrqIxXuJ5qsBKeWnW/Qq3IbiclN+jkf5
	7WAZ4mbZTXbOXqNwauGNVuVNwmSplKGh2MvI6pqiRZ7lzL6xC8yfDgoxsdy4Ygp9oANhLaNHYi547
	wKHH3lnziitJEzUIzpxKbcY7RKZagapHV9anNzD2woudc5aMhGlwEEPNtr9r2LSlX0zrJjcXlmyot
	hTkJMJXsC4uT8MaLhmMIR5g6xqNovv9zt6/IaqQoNesrd/r1VcWxeyD58bu4sD8wNXlFhVkQCVMBm
	9UP5TzaQ==;
Received: from [::1] (port=43380 helo=glaive.pro)
	by server42.areait.lv with esmtpa (Exim 4.96.2)
	(envelope-from <juris@glaive.pro>)
	id 1sNanA-00DU75-1H;
	Sat, 29 Jun 2024 19:19:13 +0300
Precedence: bulk
list-help: <mailto:internals+help@lists.php.net
list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net>
list-post: <mailto:internals@lists.php.net>
List-Id: internals.lists.php.net
MIME-Version: 1.0
Date: Sat, 29 Jun 2024 19:19:12 +0300
To: =?UTF-8?Q?M=C3=A1t=C3=A9_Kocsis?= <kocsismate90@gmail.com>
Cc: PHP Internals List <internals@lists.php.net>
Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API
In-Reply-To: <CAH5C8xUb1O20ZDrOQNC=ckFxHUUWSK7sw_njQQzFBd0qgQqoww@mail.gmail.com>
References: <CAH5C8xUb1O20ZDrOQNC=ckFxHUUWSK7sw_njQQzFBd0qgQqoww@mail.gmail.com>
User-Agent: Roundcube Webmail/1.4.8
Message-ID: <570363f9043e017025deea69131313ba@glaive.pro>
X-Sender: juris@glaive.pro
Organization: SIA "Glaive.pro"
Content-Type: multipart/alternative;
 boundary="=_a0f7d47845e25131530e12c344bbbfea"
X-AuthUser: juris@glaive.pro
From: juris@glaive.pro (Juris Evertovskis)

--=_a0f7d47845e25131530e12c344bbbfea
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8;
 format=flowed

On 2024-06-28 23:06, Máté Kocsis wrote:

> Hi Everyone,
> 
> I've been working on a new RFC for a while now, and time has come to 
> present it to a wider audience.
> 
> Last year, I learnt that PHP doesn't have built-in support for parsing 
> URLs according to any well established standards (RFC 1738 or the 
> WHATWG URL living standard), since the parse_url() function is 
> optimized for performance instead of correctness.
> 
> In order to improve compatibility with external tools consuming URLs 
> (like browsers), my new RFC would add a WHATWG compliant URL parser 
> functionality to the standard library. The API itself is not final by 
> any means, the RFC only represents how I imagined it first.
> 
> You can find the RFC at the following link: 
> https://wiki.php.net/rfc/url_parsing_api
> 
> Regards,
> Máté

Hey,

That's great that you've made the Url class readonly. Immutability is 
realiable. And I fully agree that a better parser is needed.

I agree with the otters that

- the enum might be fine without the backing, if it's needed at all
- I'm not convinced a separate UrlParser is needed, 
Url::someFactory($str) should be enough
- getters seem unnecessary, they should only be added if you can be sure 
they are going to be used for compatibility with PSR-7
- treating $query as a single string is clumsy, having some kind of bag 
or at least an array to represent it would be cooler and easier to build 
and manipulate

I wanted to add that it might be more useful to make all the Url 
constructor arguments optional. Either nullable or with reasonable 
defaults. So you could `$url = new Url(path: 'robots.txt'); foreach 
($domains as $d) $r[] = file_get_contents($url->withHost($d))` and stuff 
like that.

Similar modifiers would be very useful for the query stuff, e.g. `$u = 
Url::current(); return $u->withQueryParam('page', $u->queryParam->page + 
1);`.

Sure, all of that can be done in the userland as long as you drop 
`final` :)

BR,
Juris
--=_a0f7d47845e25131530e12c344bbbfea
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset=
=3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen=
eva,sans-serif'>
<p id=3D"reply-intro">On 2024-06-28 23:06, M&aacute;t&eacute; Kocsis wrote:=
</p>
<blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2=
px solid; margin: 0">
<div id=3D"replybody1">
<div dir=3D"ltr">Hi Everyone,
<div>&nbsp;</div>
<div>I've been working on a new RFC for a while now, and time has come to p=
resent it to a wider audience.</div>
<div>&nbsp;</div>
<div>Last year, I learnt that PHP doesn't have built-in support for parsing=
 URLs according to any well established&nbsp;standards (RFC&nbsp;1738 or th=
e WHATWG URL living standard), since the parse_url() function is optimized =
for performance instead of correctness.</div>
<div>&nbsp;</div>
<div>In order to improve compatibility with external tools consuming&nbsp;U=
RLs (like browsers), my new RFC would add a WHATWG compliant URL parser fun=
ctionality to the standard library. The API itself is not final by any mean=
s, the RFC only represents how I imagined it first.</div>
<div>&nbsp;</div>
<div>You can find the RFC at the following link:&nbsp;<a href=3D"https://wi=
ki.php.net/rfc/url_parsing_api" target=3D"_blank" rel=3D"noopener noreferre=
r">https://wiki.php.net/rfc/url_parsing_api</a></div>
<div>&nbsp;</div>
<div>Regards,</div>
<div>M&aacute;t&eacute;</div>
<div>&nbsp;</div>
</div>
</div>
</blockquote>
<p>Hey,</p>
<p>That's great that you've made the Url class readonly. Immutability is re=
aliable. And I fully agree that a better parser is needed.</p>
<p>I agree with the otters that</p>
<p>- the enum might be fine without the backing, if it's needed at all<br /=
>- I'm not convinced a separate UrlParser is needed, Url::someFactory($str)=
 should be enough<br />- getters seem unnecessary, they should only be adde=
d if you can be sure they are going to be used for compatibility with PSR-7=
<br />- treating $query as a single string is clumsy, having some kind of b=
ag or at least an array to represent it would be cooler and easier to build=
 and manipulate</p>
<p>I wanted to add that it might be more useful to make all the Url constru=
ctor arguments optional. Either nullable or with reasonable defaults. So yo=
u could `$url =3D new Url(path: 'robots.txt'); foreach ($domains as $d) $r[=
] =3D file_get_contents($url-&gt;withHost($d))` and stuff like that.</p>
<p>Similar modifiers would be very useful for the query stuff, e.g. `$u =3D=
 Url::current(); return $u-&gt;withQueryParam('page', $u-&gt;queryParam-&gt=
;page + 1);`.</p>
<p>Sure, all of that can be done in the userland as long as you drop `final=
` :)</p>
<p>BR,<br />Juris</p>

</body></html>

--=_a0f7d47845e25131530e12c344bbbfea--