Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124047 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id EA28C1A009C for ; Sat, 29 Jun 2024 16:19:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1719678048; bh=bBbTqsdtdcC3YxYdzg3Ep1ljExxyj5iOEQEh1U27z5o=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=THE7Fwv0AjMmqnG15MxVqB/dS9ylOjSL8GRyUQ30uo+13SqvrMQUbEkNT3o8eFlCw Vl4V3m8ME5Zx/BdflbcvdTVIfJMrDZcweSOVEGP8iQQSnc2LjvlhPTMUR03oC7lUmJ /DHNQoBFPZHB/5R8DiV7SenNSqjG97kAR6QozWPDrD5Jx7xkJhR3QDS2sPI/5rFOh8 duDqqH+F/RAJz0XGWcYi3f+mby3+p03uyGb5WmxWV2RLefFwOjNx+BfxwR9q87mk29 hmVJOv2Nk+bZVOXMyFE0DMQeSVyBx8FZfPDm/AkGr4b8oSm0inQqmIPIGryVLBhL56 RBivOR/fOaBxw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id E00B51801E4 for ; Sat, 29 Jun 2024 16:20:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_50, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS,T_PDS_PRO_TLD, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from relay.mailchannels.net (gt-egress-001.relay.mailchannels.net [199.10.31.235]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 29 Jun 2024 16:20:46 +0000 (UTC) X-Sender-Id: yszpovajlk|x-authuser|juris@glaive.pro Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 0E88643B40; Sat, 29 Jun 2024 16:19:20 +0000 (UTC) Received: from server42.areait.lv (unknown [127.0.0.6]) (Authenticated sender: yszpovajlk) by relay.mailchannels.net (Postfix) with ESMTPA id 09BCD42B74; Sat, 29 Jun 2024 16:19:14 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1719677955; a=rsa-sha256; cv=none; b=HPrLERLbDj93o6U0GkL+HDq4sBulOsLnXqsZjPLDx3I0faY33MGVEfMcam5FQPqdEQLTTP tHFOS0dFSTyboel7ispzWLld4utGy3ge90AjN+QP6BtPTu3atNebccAvo6TrbHxWcMEb6s TV+QISN2x8kxUvU3W9cwqyV8XMoCiIuE0QB+TpiDYpDRsVE9SnXT6N+M5bgOFbFZvhyjTp IkhVhhdCHGgVUSIjmgXOLXVQym6Gj3JwIdfxavoqCgf8kpoeQkJ78GlL4Ys4MoT/7MAszE XnEkGOnJ8JP2BNJi3Cb8CoABs7CmxMkbWcGkQDMEE6et3HnO6pyEfLTJOhvbXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1719677955; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DP1W40p8+8pzC+U1dFv/RUr2cweOXIlf9U20yXKQPmI=; b=llbhwCA0Q3oZMOrQtEHun4yLN6xwn/Vj4m2jKzYWJslAIR/TaHJ2OKNuh/rAj1LDxfpKkt Xj1JYCrjpw4NWAWhKYiLj9ITxPpYulyqIrUvwnOQ0YP7KEkwQ5KLLdg9b7fWKbbSWx9CUg Cb00+J5fjFKh7nQznOlpJQhoeNCQTjremqkR0v3b8nmAnxf2HiYJli/Obog+KBMpMDjlDP 7kdRt2tKcxltpjLn/pdZZ9EzWbIASAUR72L+fLjJQ/Sc8ormM0siyli62fStBoo5ilwfDK t+6LUDnYSDww4qBSFq3RE5NpRArP9VyapDr0mg9tPcg3Sr2zoWlqn4FkXSaYyA== ARC-Authentication-Results: i=1; rspamd-79677bdb95-7645s; auth=pass smtp.auth=yszpovajlk smtp.mailfrom=juris@glaive.pro X-Sender-Id: yszpovajlk|x-authuser|juris@glaive.pro X-MC-Relay: Neutral X-MailChannels-SenderId: yszpovajlk|x-authuser|juris@glaive.pro X-MailChannels-Auth-Id: yszpovajlk X-Tart-Tank: 6104e32d6a17160e_1719677955719_2651017963 X-MC-Loop-Signature: 1719677955719:1692401885 X-MC-Ingress-Time: 1719677955719 Received: from server42.areait.lv (server42.areait.lv [212.7.207.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.112.86.14 (trex/6.10.3); Sat, 29 Jun 2024 16:19:15 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=glaive.pro; s=default; h=Content-Type:Message-ID:References:In-Reply-To:Subject:Cc:To: From:Date:MIME-Version:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=DP1W40p8+8pzC+U1dFv/RUr2cweOXIlf9U20yXKQPmI=; b=Dc0n2Cg+cu7cdOyYhQLgu5tOsS Bc+z+Axtmy1NVwnitgkoC+CEbVHBY6WmOJf9z4a0KGsuinrqIxXuJ5qsBKeWnW/Qq3IbiclN+jkf5 7WAZ4mbZTXbOXqNwauGNVuVNwmSplKGh2MvI6pqiRZ7lzL6xC8yfDgoxsdy4Ygp9oANhLaNHYi547 wKHH3lnziitJEzUIzpxKbcY7RKZagapHV9anNzD2woudc5aMhGlwEEPNtr9r2LSlX0zrJjcXlmyot hTkJMJXsC4uT8MaLhmMIR5g6xqNovv9zt6/IaqQoNesrd/r1VcWxeyD58bu4sD8wNXlFhVkQCVMBm 9UP5TzaQ==; Received: from [::1] (port=43380 helo=glaive.pro) by server42.areait.lv with esmtpa (Exim 4.96.2) (envelope-from ) id 1sNanA-00DU75-1H; Sat, 29 Jun 2024 19:19:13 +0300 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Date: Sat, 29 Jun 2024 19:19:12 +0300 To: =?UTF-8?Q?M=C3=A1t=C3=A9_Kocsis?= Cc: PHP Internals List Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API In-Reply-To: References: User-Agent: Roundcube Webmail/1.4.8 Message-ID: <570363f9043e017025deea69131313ba@glaive.pro> X-Sender: juris@glaive.pro Organization: SIA "Glaive.pro" Content-Type: multipart/alternative; boundary="=_a0f7d47845e25131530e12c344bbbfea" X-AuthUser: juris@glaive.pro From: juris@glaive.pro (Juris Evertovskis) --=_a0f7d47845e25131530e12c344bbbfea Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8; format=flowed On 2024-06-28 23:06, Máté Kocsis wrote: > Hi Everyone, > > I've been working on a new RFC for a while now, and time has come to > present it to a wider audience. > > Last year, I learnt that PHP doesn't have built-in support for parsing > URLs according to any well established standards (RFC 1738 or the > WHATWG URL living standard), since the parse_url() function is > optimized for performance instead of correctness. > > In order to improve compatibility with external tools consuming URLs > (like browsers), my new RFC would add a WHATWG compliant URL parser > functionality to the standard library. The API itself is not final by > any means, the RFC only represents how I imagined it first. > > You can find the RFC at the following link: > https://wiki.php.net/rfc/url_parsing_api > > Regards, > Máté Hey, That's great that you've made the Url class readonly. Immutability is realiable. And I fully agree that a better parser is needed. I agree with the otters that - the enum might be fine without the backing, if it's needed at all - I'm not convinced a separate UrlParser is needed, Url::someFactory($str) should be enough - getters seem unnecessary, they should only be added if you can be sure they are going to be used for compatibility with PSR-7 - treating $query as a single string is clumsy, having some kind of bag or at least an array to represent it would be cooler and easier to build and manipulate I wanted to add that it might be more useful to make all the Url constructor arguments optional. Either nullable or with reasonable defaults. So you could `$url = new Url(path: 'robots.txt'); foreach ($domains as $d) $r[] = file_get_contents($url->withHost($d))` and stuff like that. Similar modifiers would be very useful for the query stuff, e.g. `$u = Url::current(); return $u->withQueryParam('page', $u->queryParam->page + 1);`. Sure, all of that can be done in the userland as long as you drop `final` :) BR, Juris --=_a0f7d47845e25131530e12c344bbbfea Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

On 2024-06-28 23:06, Máté Kocsis wrote:=

Hi Everyone,
 
I've been working on a new RFC for a while now, and time has come to p= resent it to a wider audience.
 
Last year, I learnt that PHP doesn't have built-in support for parsing= URLs according to any well established standards (RFC 1738 or th= e WHATWG URL living standard), since the parse_url() function is optimized = for performance instead of correctness.
 
In order to improve compatibility with external tools consuming U= RLs (like browsers), my new RFC would add a WHATWG compliant URL parser fun= ctionality to the standard library. The API itself is not final by any mean= s, the RFC only represents how I imagined it first.
 
You can find the RFC at the following link: https://wiki.php.net/rfc/url_parsing_api
 
Regards,
Máté
 

Hey,

That's great that you've made the Url class readonly. Immutability is re= aliable. And I fully agree that a better parser is needed.

I agree with the otters that

- the enum might be fine without the backing, if it's needed at all
- I'm not convinced a separate UrlParser is needed, Url::someFactory($str)= should be enough
- getters seem unnecessary, they should only be adde= d if you can be sure they are going to be used for compatibility with PSR-7=
- treating $query as a single string is clumsy, having some kind of b= ag or at least an array to represent it would be cooler and easier to build= and manipulate

I wanted to add that it might be more useful to make all the Url constru= ctor arguments optional. Either nullable or with reasonable defaults. So yo= u could `$url =3D new Url(path: 'robots.txt'); foreach ($domains as $d) $r[= ] =3D file_get_contents($url->withHost($d))` and stuff like that.

Similar modifiers would be very useful for the query stuff, e.g. `$u =3D= Url::current(); return $u->withQueryParam('page', $u->queryParam->= ;page + 1);`.

Sure, all of that can be done in the userland as long as you drop `final= ` :)

BR,
Juris

--=_a0f7d47845e25131530e12c344bbbfea--