Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127118 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id C07A61A00BC for ; Tue, 15 Apr 2025 21:55:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1744753996; bh=jxja60DEVFiY47BI/xa3W4kaSSTRzHPJpDJODGvFjJo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Jr5TILLTPyr3GIwrIvepI/7BepdQASQBG5tZRtKdImpjgKE6BIVrnCaF0mcHcrEDD Zub/hAu7R+j5pAyYfjOkXyF79d3r4mVPuarvnaT85bMW4zOVQPCpt13/leBWCadZzz 7ByiRFO7hnDY+lmHjUQ7D6xWOlPZyEFmMm5ICvxgebpu40YdVCJdEF7f0t4bCSDLl+ LpX1ebpU4zADzPy3O+MMTL4vYO5fRzlpXxs7av4MkI0IqsCbmf50lL8pppD01zH3p9 Lle6YfmaUjwiRZ+1T+22okQo+EY4frZ0o50FqhJHIYZ7pkBj2KeLUr0q7Ro1kEM81n gLjewSOLmKgXQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A68CA180052 for ; Tue, 15 Apr 2025 21:53:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,WEIRD_PORT autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 15 Apr 2025 21:53:15 +0000 (UTC) Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4774ce422easo62846901cf.1 for ; Tue, 15 Apr 2025 14:55:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744754137; x=1745358937; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=CacLFjD+OfXuNTekk6BvNOUdgDGMk7gm0JbfwvxFFGQ=; b=DqJTJCS7/e7dNRUbTkZ2jokk4CQeBRIM7TlNucFvZJCxwFfPPl/FoeM61MZtRxFBbN LJt8z7psl/X3a6kIKujXwKXY/VDvVUD3onId1VjsInf8pylLFKIi08ReoxKdFF9OhSUT 07J27YWsU/kC7R6/Z3QE9EmZxamYyKWkwgxvv/+iqgZJNJZCevTGuHvajwC/7yRxsAo2 q+e5yAhZjxERC/H5jO51XNxfSthPNyxsuUc7mivvqPMIrmosKtvMTXX2DA+d41vFOQps myISRkhGVE8PahYFKmRETonN6+3A30IA7y41xNFNYCx+dmWLad16MdBuIM11XvbyGC2H 42cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744754137; x=1745358937; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CacLFjD+OfXuNTekk6BvNOUdgDGMk7gm0JbfwvxFFGQ=; b=p5jJ2YMKYkPRNNzqO1GdCN9cLE/UFR2ZE/BQ5UPgTZWFyL2QcZYqUvrnEVDsXAhJL/ ZfLCtdzkAHZvyOeVXWHDXkuAZLrlyFavTX1sNvNoNbqk2F1laP2+HDyrTKs8/ywDLQCJ wwhwLN1nrCv/fNvzSXBVbVsN101GjQERZZF0jeADPO2Wd86EYUhhUaFefhO7EGhREMhq HFsrhZdrhfRiy47FsVQVnN0wk8V9qkV2JEPetg1jdaXcNSp48qUGESfC2GGzIsyyBZX7 6LHxNfLla1o1wBcua/O7whqm9azdAd3CZcrjxJ+65d10caKo6h/uA4Lu71WPh8idROj3 PGtw== X-Gm-Message-State: AOJu0YzDa2ip8acfYG0IFoBtexs6LL1epq1F4kQew/ZXAdlp6sR5mNoU wT9133JfxppBUwhccF5DkYQdJ6nUMINpvVhsCR7kY3f9fz25CE3Syc457pXqGGgJKIOLBp1asfI z0RrC0RLWRaRjrGw7tAm3FatWeDwoxdXj X-Gm-Gg: ASbGncutLA1EsLri+1zz5VRx3fMBuWXyrW4Bqtwo8+Ye8JANy7KgJ+vxYp4pUkGyNY4 mcT1hqFic4L4KcFjUnMFmJnZhJryMPZ8zPLWDZAuvE7pXitZcZMoJ1GlJE3/2t5YvfVfCpnR48W WNsWgfqTZso1SekZ/cZmOgr2veQniEs9Ji X-Google-Smtp-Source: AGHT+IHsWc5yBWKP02GPNxdPhzicAFXpIAZlS4gTCj7yw1NE8Vd+8U2cwvmfxr2f/rxKBo0B1Lq+wAdCQ1V3w7eaHhk= X-Received: by 2002:ac8:5f8d:0:b0:476:8c58:4f69 with SMTP id d75a77b69052e-47ad39fe5e3mr16398001cf.1.1744754136538; Tue, 15 Apr 2025 14:55:36 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <8df04e01-deac-404b-beb7-cd982423db63@bastelstu.be> <33427cd03035ef084245c44290b56a55@bastelstu.be> In-Reply-To: <33427cd03035ef084245c44290b56a55@bastelstu.be> Date: Tue, 15 Apr 2025 23:55:25 +0200 X-Gm-Features: ATxdqUH32rJXSBMUf1AVB2oK2X7U15cFDMK1PzfTabpTetWhbmhZU2HlSWml4n8 Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: Internals Content-Type: multipart/alternative; boundary="00000000000087647c0632d83a80" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --00000000000087647c0632d83a80 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Tim, > Perhaps the correct solution would be to offer only the non-raw methods > for WHATWG URL and to not attempt any additional percent-decoding there? > My reasoning is that the WHATWG URL is a living standard anyways, so > trying to add additional semantics on top will result in sadness. My > understanding is also that it is primarily intended for interaction with > web browsers or to embed these URLs into HTML. For access control, e.g. > in your framework the RFC3986 URI should be used. It's what HTTP uses > internally and it supports well-defined normalization. > > What do you think? > This was one of my (unspoken) ideas as well. I used to think there must have been a correct logic for percent-decoding of WHATWG components, but if none of us can come up with a sensible idea, then it's best not to try it, I agree. Unintuive probably is not the best word. But I expect users to primarily > interact with the path component of an URL (e.g. within their > framework=E2=80=99s router). So I think it makes sense to be extra explic= it with > examples there. As an example, I recently learned that Symfony's router > does not support (encoded) slashes within a component: > > #[Route('/test/{message}', name: 'test')] > > will work for http://localhost:8000/test/foo, but not for > http://localhost:8000/test/foo%2fbar, resulting in: > > No route found for "GET http://localhost:8000/test/foo%2fbar" > > So if you would just extend the: =E2=80=9CLet's have a look at some other= tricky > example with Uri\Rfc3986\Uri:=E2=80=9D to my suggestion, I would be happy= :-) > Alright, I'll add it. It won't hurt for sure! Note: I believe there is a small mistake in the example when you last > modified it. It says: > > echo $uri->getHost(); // > [2001:0db8:0001:0000:0000:0ab9:C0a8:0102] > > Should the 'C' in 'C0a8' also be lowercased? Yes, nice catch! I swear I double checked it multiple times if there was any uppercase letters that should be lowercased... > So it indeed seems to be a limitation of the WHATWG specification and > your PHP implementation is consistent with node.js. That is a good thing > and when a user stumbles upon this, we can point them towards node.js / > the spec. Not great, but this is workable! > Thank you for the test! To be honest, I pretty much don't like how WHATWG setters are specified, they seem to behave very "ad hoc" based on what I saw so far. :( Regards, M=C3=A1t=C3=A9 --00000000000087647c0632d83a80 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tim,
=C2=A0
Perhaps the correct solution would be to offer only the non-raw methods for WHATWG URL and to not attempt any additional percent-decoding there? My reasoning is that the WHATWG URL is a living standard anyways, so
trying to add additional semantics on top will result in sadness. My
understanding is also that it is primarily intended for interaction with web browsers or to embed these URLs into HTML. For access control, e.g. in your framework the RFC3986 URI should be used. It's what HTTP uses <= br> internally and it supports well-defined normalization.

What do you think?

This was one of my (= unspoken) ideas as well. I used to think there must have been a correct log= ic
for percent-decoding of WHATWG components, but if none of us c= an come=C2=A0up with a sensible
idea, then it's best not to t= ry it, I agree.

Unintuive probably is not the best word. But I expect users to primarily interact with the path component of an URL (e.g. within their
framework=E2=80=99s router). So I think it makes sense to be extra explicit= with
examples there. As an example, I recently learned that Symfony's router=
does not support (encoded) slashes within a component:

=C2=A0 =C2=A0 =C2=A0#[Route('/test/{message}', name: 'test'= )]

will work for http://localhost:8000/test/foo, but not for
http://localhost:8000/test/foo%2fbar, resulting in:

=C2=A0 =C2=A0 =C2=A0No route found for "GET http://localhos= t:8000/test/foo%2fbar"

So if you would just extend the: =E2=80=9CLet's have a look at some oth= er tricky
example with Uri\Rfc3986\Uri:=E2=80=9D to my suggestion, I would be happy := -)

Alright, I'll add it. It won'= ;t hurt for=C2=A0sure!

Note: I believe there is a small mistake in the example when you last
modified it. It says:

=C2=A0 =C2=A0 =C2=A0echo $uri->getHost();=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0//
[2001:0db8:0001:0000:0000:0ab9:C0a8:0102]

Should the 'C' in 'C0a8' also be lowercased?
=C2=A0
Yes, nice catch! I swear I double checked it multiple t= imes if there was any uppercase letters that should
be lowercased= ...
=C2=A0
So it indeed seems to be a limitation of the WHATWG specification and
your PHP implementation is consistent with node.js. That is a good thing and when a user stumbles upon this, we can point them towards node.js / the spec. Not great, but this is workable!

<= div>Thank you for the test! To be honest, I pretty much don't like how = WHATWG
setters are specified, they seem to behave very "ad h= oc" based on what I saw so=C2=A0far. :(

Regards,
M=C3=A1t=C3=A9
--00000000000087647c0632d83a80--