Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126762 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 2B7581A00BC for ; Fri, 14 Mar 2025 19:45:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1741981385; bh=pQvrS+Cbv9ylRF7ftlhvXwIe7BczbYgr9B9ztLqkE5U=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=EsQeL3g75GXo5+iQlSMTnCOIAeKJRJAWEztlIS1VaRSSuYdAClHB+qPCXD2zBRteE 0O+g2ieiRc5+RgFCT1gU0N+LkvPdkIKC7iu2tOWSfwIUxZcMoV/JpI+/ODPiahJtb5 8f7MjOahv/7qY4Svpe8H0nJYdIa5G3VI2oaeKMp7lkVf7X46AAbeDWzVx9t97wagU0 maGr6wNAOp0QoURCBg1usWU1NgPz5uH+Jky2H1nFEUeQkqs6yJD9uDhwp1Fs25Zgj1 V5YrnD2kR0oU+yxPE97u+NJL2WM5KMPAivZziiTaZMh+mXEQyuZC1wS9OwjAuGW4Bs paTPYxXyQUD6g== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3607E180068 for ; Fri, 14 Mar 2025 19:43:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 14 Mar 2025 19:43:04 +0000 (UTC) Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-47698757053so29833941cf.0 for ; Fri, 14 Mar 2025 12:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741981536; x=1742586336; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=luKGTbc6vWSoNjbZq53NEj57GbXbCFd4gtAm+lEyxdY=; b=Cz0mWTzs1+7YJkBj3eKugDx2a6+PP31LS69a0D+XQ3OkHcTP5y84g8ymrJVHNrvJZh +sHMoJG9HgpGVnSl1/PinEBXprzgEDVL5wrc7bCPoiPrfruQmaG+hq52CIls9rHbJ/vw Wwk7ky5uwkv1atIpT6yvt3Qgmnf/H/3M6hJnO3jqkoR0z5O+FfGJjnFprayi0/XZrl3T a9owd1RUeR4jGcRJVtJ3QOFSWrmqoBpKklOwkYiCbbiT3JQW+J3VvcoLiuBA1XCVTDwF jrJWXyeahi44Lq/MZD6MMEyeRlZvuD/Lro1fQhMXlbB2ufT8T1GInzedJXnt9hzO4GDE kNkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741981536; x=1742586336; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=luKGTbc6vWSoNjbZq53NEj57GbXbCFd4gtAm+lEyxdY=; b=GXc0X4XPoJCJx6ER5xjDO1FOt0Fhn7lWvM3VvFfn7gU8MjogoFkJ66XFCJ1LNHPuvx FB9c5WV/YWp98Aywx6SlshCi9upHsdQlUCbPrZfQ1aAm0n3Yn97xCXlrPDv8Dk0mkgNP eTfgjxQELZ4v6UUZiNmc3c+VjQSitv2iAFNK+BC8cxClIunV0A7jgXPkrDAq9jYjdU5Z Cnu/c/HZy3BAUa9Gn+Y2UPHOF1Ne05ZDIIefWavQCLxvd3TWG9axzMvMfaCLDRIe8Ps8 +bYTwBkBvOHcnFsRGKgtqshUDn6AGicQNHDwfP21JDI4CplDJwF4OKQoiztR3bJ+KGW8 wlsQ== X-Gm-Message-State: AOJu0YxL5WGlvqzB/nVW8D1ZwvvrUavH3+WchmVirhwM5AndDsrsMGP6 6iHkjUFZcrlTZwOUeK7OMmNGknbIjnC79TtT272csDShY0t/zvN10PZluwWqObpp0+QSjaxk3+r wSlrXgGC9wJWsZZIy/PHFf1VFRIqUiyRLPllGWg== X-Gm-Gg: ASbGncv3g9r0R7ED5pQyn3QbLT3ssdr31rABsmKUPfHOugFt1IsRWKetw4ezmdWehfZ MQCWmHBVOlFg9YSMB2nZW50ExOWt4jrsAfGK+z6NMEq22qp64clNR13JvZ9s4Pb/soW13Psx04v /PAjBh24Tgw28vf9O1V1e7kzJMVQ== X-Google-Smtp-Source: AGHT+IGl11YDCNkHYOOZTnlZ6nhyEHEXVgya24ofGZnM6glRI+uFFj801J4QwQ3a7iyMUaNrApquiht5n+4zjjZpnMU= X-Received: by 2002:ac8:5d48:0:b0:476:9eb5:6684 with SMTP id d75a77b69052e-476c813ef31mr56855631cf.20.1741981534527; Fri, 14 Mar 2025 12:45:34 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> In-Reply-To: <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> Date: Fri, 14 Mar 2025 20:45:23 +0100 X-Gm-Features: AQ5f1JqEtbqmOuS2AduwoOE8lqIBRYkYuZcjAAT7J1FhgUl9Xi4vWNcFIMUUlqA Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: Ignace Nyamagana Butera Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000922916063052aed5" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --000000000000922916063052aed5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Ignace, > > All URI components - with the exception of the host - can be > retrieved in two formats: > > I believe you mean - with the excepotion of the Port > > Even though I specifically meant WHATWG's host that is only available in only one format, you are right, the port is never available in two formats. So I've changed the wording accordingly. > 0 - It is a unfortunate that there's no IDNA support for RFC3986, I > understand the reasoning behind that decision but I was wondering if it > was possible to optin its use when the ext-intl extension is present ? > Good question, I think it's probably not the main concern. My specific concern is that RFC 3987 has around same length as RFC 3986, in a lot of cases it uses the exact wording of the initial RFC but changes URI to IRI, and of course adds the IDNA specific parts. Maybe it's just me, but it's not easy to find it out exactly what has to be implemented above RFC 3986, and also, how it can be best achieved= ? By extending the class for RFC 3986? Creating a totally separate class that can transform itself to an RFC 3986 URI? These and quite some other questions have to be answered first, which I would like to postpone. > > 1 - Does it means that if/when Rfc3986/Uri get Rfc3987 supports they > will also get a `Uri::toDisplayString` and `Uri::getHostForDisplay` > maybe this should be stated in the Futurscope ? > It's a question that I also asked from myself. For now, I'd say that Rfc3986/Uri shouldn't have these methods, since it doesn't support any such capabilities. But Rfc3986\Iri should likely have these toString methods. > 4 - For consistency I would use toRawString and toString just like it is > done for components. > I'm fine with this, I also think doing so would reasonably continue the convention getters do. > > 5 - Can the returned array from __debugInfo be used in a "normal" method > like `toComponents` naming can be changed/improve to ease migration from > parse_url or is this left for userland library ? > I intend to add the __debugInfo() method purely to help debugging. Without this method, even I had a hard time when trying to compare the expected vs actua= l URIs in my tests. But more importantly, sometimes the recomposed string is not enough to have a good understanding exactly what value each component has. For example one can naively assume that the "mailto:kocsismate@php.net" URI has a user(info) component of "kocsismate" and a hostname of "php.net" (I probabl= y also did so before reading the RFCs). The representation provided by __debugInfo() can quickly highlight that "kocsismate@php.net" is the path in fact. One could try to call the individual getters to find the needed component, but having such a method like __debugInfo() provides a much more clear picture about the anatomy of the URI. But otherwise I don't know how useful this method would be. Is there anythi= ng else besides helping the migration? Regards, M=C3=A1t=C3=A9 --000000000000922916063052aed5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi I= gnace,
=C2=A0
=
=C2=A0> All URI component= s - with the exception of the host - can be
retrieved in two formats:

I believe you mean - with the excepotion of the Port


Even though I specifically meant WHATWG's host that i= s only available in only
one format, you are right, the port is n= ever available in two formats. So I've
changed the wording ac= cordingly.
=C2=A0
0 - It is a unfortunate that there's no IDNA support for RFC3986, I understand the reasoning behind that decision but I was wondering if it was possible to optin its use when the ext-intl extension is present ?
<= /blockquote>

Good question, I think it's probably no= t the main concern. My specific concern is that
RFC 3987 has arou= nd same length as RFC 3986, in a lot of cases it uses the exact
w= ording of the initial RFC but changes URI to IRI, and of course adds the
IDNA specific parts. Maybe it's just me, but it's not easy = to find it out exactly what
has to be implemented above RFC 3986,= and also, how it can be best achieved?
By extending the class=C2= =A0for RFC 3986? Creating a totally separate class that can
trans= form itself to an RFC 3986 URI? These and quite some other questions have
to be answered first,=C2=A0= which I would like to postpone.
=C2=A0

1 - Does it means that if/when Rfc3986/Uri get Rfc3987 supports they
will also get a `Uri::toDisplayString` and `Uri::getHostForDisplay`
maybe this should be stated in the Futurscope ?

It's a question that I also asked from myself. For now, I'= d say that
Rfc3986/Uri sho= uldn't have these methods, since it doesn't support any such=
capabilities. But Rfc3986= \Iri should likely have these toString methods.


4 - For consistency I would use toRawString and toString just like it is done for components.

I'm fine with = this, I also think doing so would reasonably=C2=A0continue the convention
getters do.
=C2=A0

5 - Can the returned array from __debugInfo be used in a "normal"= method
like `toComponents` naming can be changed/improve to ease migration from parse_url or is this left for userland library ?

<= /div>
I intend to add the __debugInfo() method purely to help debugging= . Without this
method, eve= n I had a hard time when=C2=A0trying to compare the expected=C2=A0vs actual=
URIs in=C2=A0my te= sts.

But more importantly, someti= mes the recomposed string is not enough to have a
good understanding exactly what value each c= omponent has. For example
one can naively assume that the = "mailto:kocsismate@php.net&q= uot; URI has a
user(info)=C2=A0component of "kocsismate" and a hostname=C2=A0of "php.net= " (I probably
also did so before reading the RFCs). The representation provided by
__debugInfo() can qui= ckly highlight that "kocsismate@php.net" is the path in fact.
One could try to call the individual g= etters to find the needed component, but=C2=A0having
such a method like __debugInfo() provides= a much more clear picture about the anatomy of
the URI.

But otherwise I don't know how useful this method would be. Is the= re=C2=A0anything else<= /div>
besides helping the migrat= ion?

Regards,
M=C3=A1t=C3=A9
=C2=A0
--000000000000922916063052aed5--