Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126962 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 49FEC1A00BC for ; Thu, 27 Mar 2025 21:04:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1743109332; bh=Gh5kDAnSu/6zL4Oep5KiubMT6kZFkP8DHHvXozeRDS0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Di3v/WLDHRcP+dbHiT5sPmeNXi+oTn9nNtEs7U5qxGx/mmbjN6fZBxDVUManwlFho uURCBdlpaF3t5IDtv/g8iCfJO+lSKaEtXl4kLmVHlg1WywPOLYHEruOC8BOWk1OR0B YRoCpbRfSmImPsn+ujgyFwFcnyUip1wvySpTXfRFohxxvf71+rQcwNP358GSCexyy5 OeB+pLmIc8g/rJpbcgP6wQuIS0PZorX5SrY9aQhElopJjtqQuALbahnoyXCXEpZStb 9EnAogwNUE+aBc9vD0EdKRTzHPNYycwwA3D+LFQTLFcxqOhmWwBFSFZjefdXvq4nhJ nMZULR2FbQkUg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A82B6180068 for ; Thu, 27 Mar 2025 21:02:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_05,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 27 Mar 2025 21:02:11 +0000 (UTC) Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-6ecfa716ec1so13367936d6.2 for ; Thu, 27 Mar 2025 14:04:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743109479; x=1743714279; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=D1kZ6NTAuv+TKthjlxvyf0DPVyA4liaxGdEvyEp9vXw=; b=jdzjLjlQfNAav9pyIRWqT0g5SG8gZwx5HQFu24iKJhZR/zOqhGDLIHj3/spD7yGbgC 1n30C/Y/yW3GzTS0Fx04OhX3XjldBazWeFfitedVHhi4AWBWJVxvRyOgF8hXsDCBkNHw cJZ60ib6ZMG+fphjGIVS+OJgRAqi1Hp7ZEG9aHdm2zsT+6qyerEE4NIk0Q7BHvcp/M1w C4prfg09hehp7d4T4peIO0dacqxvU8cNu15DK+1OcjwdpjcST31sIa6zzD6GJXPiql49 VDaUA6ZVE0Q4H9ETRL68oOOhz75h+xyQevJ02VG55HEK5NkezwDdqaAIdvro43xpZYcb G5ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743109479; x=1743714279; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=D1kZ6NTAuv+TKthjlxvyf0DPVyA4liaxGdEvyEp9vXw=; b=vVjCFjQEBKaSCRBQOdYS+Be//HirD+YuPJY9gC/2YXkTy5ImQGxxI6d7i7iViDCtN0 r+EMFpMj7qKKxSbgSs4bWZfDTUJM9HY+oeIR8nMZzMZDhFt3Pub5u2D5kA/bOKrCImhW AZDuxgx91urI8RYmPNIaCEBDopc7e1zNL91S7jjlSk0E43tDqhVg/+U+liav/kKLmiM/ R3vP8PQWPU05MZlIpXdeHryAYbCgb+tokBLte9WZHAIFgYGgUDGctz6S1/6xTcP3VxEO pj3k6yq1mCkrCCEtk0LIid8CVkD1+v5/IyoDEwCh2LnDgTjTc7pXqJqhxIEWHlz4CzRI mtMg== X-Gm-Message-State: AOJu0YwEZGMiuqTYGVVoB+3sEoGuM69cesjkO9S7AHykrapX6KTT5gpo fZXYdmonjgB5qGWMKFRZPJqAfJy8H8yzDGtiU7LFWBByVh9M0omG/5GPzR2kB3gaPwdZmqKK0Ie vJZPlKbA+sZynlZbAKf9YEFMb4PZvBZtfHN2TdQ== X-Gm-Gg: ASbGncuRrKV7Wc1i5NeXHbDIzZN37G+LEd+uk4YCnGdL4OE/dDRX0qQNuCEFeRA7GG6 o1udTXd75vzCDLxc0q0KtBpFKY/UvcSpNwIwH4M0VuBvaHFg8H3VeQVt0Fqk8mJXqaavKcS4Tsi BCWYxHcO7fnEbY3iZ/FAvckFxxhQuF99tIYUkD7Q== X-Google-Smtp-Source: AGHT+IE0upW2fAmWuYq7jbFlQykfGDXn099yR3sGBCqi17eIYpbloMh9LPxp4Ja8QxtOwz0BMhErsdsYPjZPQb5RjP4= X-Received: by 2002:a05:6214:f06:b0:6e8:9525:2ac3 with SMTP id 6a1803df08f44-6ed23904ebamr90260916d6.34.1743109478713; Thu, 27 Mar 2025 14:04:38 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> <6430b9ed-638d-4247-9fa9-d1a9148c382b@gmail.com> <2e95e8fe-7cf0-493f-bd0a-9fff0956baaa@gmail.com> In-Reply-To: <2e95e8fe-7cf0-493f-bd0a-9fff0956baaa@gmail.com> Date: Thu, 27 Mar 2025 22:04:27 +0100 X-Gm-Features: AQ5f1JqYWoGrB3yP81_51B7Mk_OFSbBHUK6dHY5f14LFAuexZiLdNfIz7b1Qgzc Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: Ignace Nyamagana Butera Cc: PHP Internals List Content-Type: multipart/alternative; boundary="000000000000488ecc0631594da6" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --000000000000488ecc0631594da6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Ignace, While implementing the polyfill I am finding easier DX wise to make the > constructor private and use instead named constructors for instantiation.= I > would be in favor of > > `Uri::parse` and `Uri::tryParse` like it is done currently with Enum and > the `from` and `tryfrom` named constructors. > > My reasoning is as follow: > > there's no right way or wrong way to instantiate an URI there are only > contexts. While the parse method is all about parsing a string, one could > legitimately use other named constructors like `Uri::fromComponents` whic= h > would take for instance the result of parse_url to build a new URI. This > can become handy in the case of RFC3986 URI if you need to create an new > URI not related to the http scheme and that do not use all the components > like the email, data or FTP schemes. > > By allowing creating URI based on their respective components value you > make it easier for dev to use the class. Also this means that if we want = to > have a balance API then a `toComponents` method should come hand in hand > with the named constructor. > > I would understand if that idea to add both components related methods is > rejected, they could be implemented on userland, but the main point was t= o > prove that from the VO or the developer POV in absence of a clearly defin= ed > instantiation process, having a traditional constructor fails to convey a= ll > the different way to create an URI. > There are a few things which came to my mind: - Currently, the underlying C libraries don't support a `fromComponents` feature. How I could naively imagine this to work is that the components are recomposed to a URI string based on the relevant algorithm (for RFC 3986: https://datatracker.ietf.org/doc/html/rfc3986#section-5.3), and then this string is parsed and validated. Unfortunately, I recently realized that this approach may leave room for some kind of parsing confusion attack, namely when the scheme is for example "https", the authority is empty, and the path is "example.com". This will result in a https://example.com URI. I believe a similar bug is not possible with the rest of the components because they have their delimiters. So possibly some other solution will be needed, or maybe adding some additional validation (?). - Nicolas raised my awareness that if URIs didn't have a proper constructor, then one wouldn't be able to use URI objects as parameter default values, like below: function (Uri $foo =3D new Uri('blah')) I think this omission would cause some usability regression. For this reason, it may make sense to have a distinguished way of instantiating an Uri. - I have a similar feeling for a toComponents() method as for another named constructor instead of __construct(): I am not completely against it, but I'm not totally convinced about it. M=C3=A1t=C3=A9 --000000000000488ecc0631594da6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Ignace,

While implementing the polyfill I am finding easier DX wise to make the constructor private and use instead named constructors for instantiation. I would be in favor of=C2=A0

`Uri::parse` and `Uri::tryParse` like it is done currently with Enum and the `from` and `tryfrom` named constructors.

My reasoning is as follow:

=C2=A0there's no right way or wrong way to instantiate an URI th= ere are only contexts. While the parse method is all about parsing a string, one could legitimately use other named constructors like `Uri::fromComponents` which would take for instance the result of parse_url to build a new URI. This can become handy in the case of RFC3986 URI if you need to create an new URI not related to the http scheme and that do not use all the components like the email, data or FTP schemes.

=C2=A0By allowing creating URI based on their respective components value you make it easier for dev to use the class. Also this means that if we want to have a balance API then a `toComponents` method should come hand in hand with the named constructor.

I would understand if that idea to add both components related methods is rejected, they could be implemented on userland, but the main point was to prove that from the VO or the developer POV in absence of a clearly defined instantiation process, having a traditional constructor fails to convey all the different way to create an URI.

=C2=A0
There are = a few things which came to=C2=A0my mind:
- Currently, the underly= ing C libraries don't support a `fromComponents` feature. How I could n= aively imagine this to work is that the components are recomposed to a URI = string based=C2=A0on the relevant algorithm (for RFC 3986: https://datatracker.i= etf.org/doc/html/rfc3986#section-5.3), and then this string is parsed a= nd validated. Unfortunately, I recently realized that this approach may lea= ve room for some kind of parsing confusion attack, namely when the scheme i= s for example "https", the authority is empty, and the path is &q= uot;example.com". This will result = in a https://example.com URI. I believe= a similar bug is not possible with the rest of the components because they= have their delimiters. So possibly some other solution will be needed, or = maybe adding some additional validation (?).

- Nic= olas raised my awareness that if URIs didn't have a proper constructor,= then one wouldn't be able to use URI objects as parameter default valu= es, like below:
function (Uri $foo =3D new Uri('blah'))
I think this omission would cause some usability regression. For t= his reason, it may make sense to have a distinguished way of instantiating = an Uri.

- I have a similar feeling=C2=A0for a toCo= mponents() method as for another named constructor instead of __construct()= : I am not completely against it, but I'm not totally convinced about i= t.

M=C3=A1t=C3=A9

--000000000000488ecc0631594da6--