Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:125299
X-Original-To: internals@lists.php.net
Delivered-To: internals@lists.php.net
Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5])
	by qa.php.net (Postfix) with ESMTPS id 35D321ADF73
	for <internals@lists.php.net>; Mon, 26 Aug 2024 22:25:57 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail;
	t=1724711269; bh=kDFXITsV/6x+IUeMN8hIbXOwWCg9v5l0C2gSJkzLuDc=;
	h=From:Subject:Date:In-Reply-To:Cc:To:References:From;
	b=HOF2zZ5WSdUiQ6n9ssH9IR1cjd++cVLBQoMFK/axGDQYUkqHHGR6dAcIYN4LSNwiK
	 aTTstqfsiPqdQFP4FC3M2KglDstRxhvGsGgG0RFDKvVN8YLP+L+/H9Hk9wnWQ3+9tQ
	 5t8EDMeuRgs7rORa30+Z6Bay1XptxgmqZuGKW5ZPbWkkV3RVqttOJbyzGjC8vN21pi
	 GsHEl/zufI8/9aVBioqraY8+citlCT+Vxm9GnSW7eh/Mv+QgITXNdxfuDzYyhnocOf
	 ITsa18/COkG00a0fwZqV0nINWJ/0h4qRoelSk/UvZmqVP/pJ1W4dJjwD46NaV/QdLB
	 aV863ldHIkDjQ==
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id 038651801E5
	for <internals@lists.php.net>; Mon, 26 Aug 2024 22:27:46 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net
X-Spam-Level: **
X-Spam-Status: No, score=2.1 required=5.0 tests=BAYES_50,BODY_8BITS,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,
	HTML_MESSAGE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no
	version=4.0.0
X-Spam-Virus: No
X-Envelope-From: <dennis.snell@automattic.com>
Received: from mx1.dfw.automattic.com (mx1.dfw.automattic.com [192.0.84.151])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Mon, 26 Aug 2024 22:27:42 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by mx1.dfw.automattic.com (Postfix) with ESMTP id DB2D034098F
	for <internals@lists.php.net>; Mon, 26 Aug 2024 22:25:48 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com;
	 h=x-mailer:references:in-reply-to:date:date:subject:subject
	:mime-version:content-type:content-type:message-id:from:from
	:received:received:received:received:received:received; s=
	automattic1; t=1724711148; bh=kDFXITsV/6x+IUeMN8hIbXOwWCg9v5l0C2
	gSJkzLuDc=; b=L/+cvxaP7MuxLpy3bGKnCK5C50RROBzPYyThqSFoIG2jc3Upjq
	ER6pZ1sA6IOCYHYoY5QnOpb+C7iT+VlMdxGfLwlOtEQqEfXGcQ5O0pIKfjowTYyq
	INzMV+hyiDVfxzNmAux0GLipzgbQBIMyLAeUkQQFGaWTmUQEfkxXSBDCduLwoJR4
	PpxiRpdddH34bJbcVKx0D4Oc8weo9Fzu7QXV0SYUWt39wCTUb0+sHGp7hTs3s7nF
	9fmfJR6/ZosaiZ+q6KI6S1kcVKxKXLAQ413FwNX/A6aPYX5dYtVPZmZXZ3gRb5ed
	2oLIV494TG9+XJkmGso6ryydUORodwJYAQJA==
X-Virus-Scanned: Debian amavisd-new at wordpress.com
Received: from mx1.dfw.automattic.com ([127.0.0.1])
	by localhost (mx1.dfw.automattic.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id BvmWYIFGo0JA for <internals@lists.php.net>;
	Mon, 26 Aug 2024 22:25:48 +0000 (UTC)
Received: from smtp-gw2.dfw.automattic.com (smtp-gw2.dfw.automattic.com [192.0.95.72])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mx1.dfw.automattic.com (Postfix) with ESMTPS id 537F834091E
	for <internals@lists.php.net>; Mon, 26 Aug 2024 22:25:48 +0000 (UTC)
Authentication-Results: mail.automattic.com;
	dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="S14qeLvc";
	dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="GKBiTzk8";
	dkim=fail reason="signature verification failed" (2048-bit key) header.d=automattic.com header.i=@automattic.com header.b="QAi/lEau";
	dkim-atps=neutral
Received: from smtp-gw2.dfw.automattic.com (localhost.localdomain [127.0.0.1])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by smtp-gw2.dfw.automattic.com (Postfix) with ESMTPS id 3FC23A03C9
	for <internals@lists.php.net>; Mon, 26 Aug 2024 22:25:48 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com;
	s=automattic2; t=1724711148;
	bh=kDFXITsV/6x+IUeMN8hIbXOwWCg9v5l0C2gSJkzLuDc=;
	h=From:Subject:Date:In-Reply-To:Cc:To:References:From;
	b=S14qeLvcojcQG6z6BTRcJLroD4cVfF0yjfOoZoCiuvVlViaKrzkltw6FT37oJPAUr
	 g+fQSzArSjuu+Y7tK5jyP+cCflehkSVu3DoqN36utz0AZNvp9d75r4vbys9/OBxtcu
	 m6Ycr+EMjj/yl2YwcT/q7D7ZWxRz8kYaauglgBMgTbqcZRlXwdTMecRK8R1QtrYlnd
	 l5S823oCX0F++yBu0qlNAbDvrsnD5h6/7lrB4ne71A2moiPPoLCKPGVzcM9rBsjCmG
	 TNK5Ypa1NZZqzCGgfVChnCI2igx/502OYoXZXu4LaIF5/JQ62vAtkI+UTEIp+TWp0A
	 FNhH6kgnwlhpg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com;
	s=automattic1; t=1724711148;
	bh=kDFXITsV/6x+IUeMN8hIbXOwWCg9v5l0C2gSJkzLuDc=;
	h=From:Subject:Date:In-Reply-To:Cc:To:References:From;
	b=GKBiTzk8JlnD38KpS1vRJibUAJzYCr/PJNj6RFbjXXQWsWfotW/RvqGTLRiL5pWyD
	 q7VtuoR9ai9P1kF49Fsr+96UPOxiJcNCDI/dWyXSE09fNPKFMUcwMkAHqG6b/JOICo
	 IwKluhC2kw6RR4AhkG6fu4cwPm+g2a11caXbb0xsfNnVjOiJzuRexwZVUI9hy4bXuq
	 XhypczuxaTDkI+dpvgLpV5cVbsMOtA5AvYaV8RoODrwlGOYkpz405n0UeIGbLmL6mC
	 evImAEnUQwER+Ax9PkDamG2hGLai5yehzDVnkSYoPXzZUqFI1l8kkFq5r6dZyKQmb/
	 y60ZIvwVdwqzg==
Received: from mail-io1-f69.google.com (mail-io1-f69.google.com [209.85.166.69])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by smtp-gw2.dfw.automattic.com (Postfix) with ESMTPS id 272ACA023E
	for <internals@lists.php.net>; Mon, 26 Aug 2024 22:25:48 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com;
	s=automattic2; t=1724711148;
	bh=kDFXITsV/6x+IUeMN8hIbXOwWCg9v5l0C2gSJkzLuDc=;
	h=From:Subject:Date:In-Reply-To:Cc:To:References:From;
	b=QAi/lEauzgeUbFjeN2loPDJEmqjn8JKdp0Chy1EjtC8+dhhhPE1BGy1ZuS7PbBCbv
	 LO6ODUvHhCMSEmCVMzuo58557HFd0ZqVkO2aVLRDSP9cns2+vpKvRqR6D1aCpJHtLR
	 e2CfLfZ/fagNZyYzPPeXC/xLWl8lCtm3jRYRNKPBypFJlCmmYV91Rf3o7ZhLV6j5Ff
	 5Yq2XGAMus/cRGnkSv4pvOeBrT8bjTIE2+BBtFcPtPZKTJtFQvWMx1678HFNU5okWg
	 l9tYbYv5N31oxsHonzkmp7Qxqj2GbJpumSnsr09+1d5MgfLw4q4Yb0vy2JTEpkpwEC
	 3eAu0nK4i1DJw==
Received: by mail-io1-f69.google.com with SMTP id ca18e2360f4ac-81f959826ccso510096439f.3
        for <internals@lists.php.net>; Mon, 26 Aug 2024 15:25:48 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1724711147; x=1725315947;
        h=references:to:cc:in-reply-to:date:subject:mime-version:message-id
         :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=jLauX6xfpZXWq8tvh/y0hAV03EY15FiNRhRa7aqe5jE=;
        b=C5PfyXCL6pS0IySfkcuC9wnAeJcULYyL2oeVCrPOm9+Gg5h/odcsmZym8WIezJADfB
         0ramAV2R4lj+rMh9rdOTDBb2ttSEoJJoLxHgv/9bDBDKf/AnAmYGd+pFF0Xn0YowKYX3
         CCrUaldPEsqEHUBGwZfVOxuVW0w/pVnsmuWfc+2Uin8ZqtuYVBQP6pfeDtTw+kQ14Hmv
         YR/hlq1C9REPXwX9KzLaGPuk4Sg9sfH0uyDySltnf6i3PWOPMRwYjFWub/XFEAJHnMi0
         F3odnnCultco+WXekJNcXPUHK72hKGA+d7W/lIJOhp/QzhyHqkOM2P9+15VeZ/cE9oyk
         Wqtw==
X-Gm-Message-State: AOJu0YzclTdt2xvSalyvwf31VZvRrxburZn+HJkdSBf83dO0trJVwANn
	OTgHMgLDgTPWC986c/VIDtTUMf28z+9g/nImuWKFPkRaOJgB0BBO36SNEYaAz4E/u0RveHm/ULj
	+5etQ7D+OtzMCSEC3BRVyFvp4/VOkIepn3YHfTj7P1QW3RU2LCXbj/ZQvU5QssKsAeQ==
X-Received: by 2002:a05:6602:2b04:b0:81f:803d:cbe7 with SMTP id ca18e2360f4ac-827881ae8bemr1564305039f.12.1724711147401;
        Mon, 26 Aug 2024 15:25:47 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IFNDvOL2RZegYmfWhyEwMNJb5dAa00aBRndMtEHFiHzQ8kg4uVCJ9qta1gIOzbKRUm5yQuBJg==
X-Received: by 2002:a05:6602:2b04:b0:81f:803d:cbe7 with SMTP id ca18e2360f4ac-827881ae8bemr1564301639f.12.1724711146829;
        Mon, 26 Aug 2024 15:25:46 -0700 (PDT)
Received: from smtpclient.apple (ip70-171-161-83.om.om.cox.net. [70.171.161.83])
        by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ce710c4ac4sm2354830173.134.2024.08.26.15.25.46
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Mon, 26 Aug 2024 15:25:46 -0700 (PDT)
X-Google-Original-From: Dennis Snell <dennis.snell@automattic.com>
Message-ID: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com>
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_E926BF61-55B3-49B2-B7A6-B6F7AEBA79C4"
Precedence: bulk
list-help: <mailto:internals+help@lists.php.net
list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net>
list-post: <mailto:internals@lists.php.net>
List-Id: internals.lists.php.net
x-ms-reactions: disallow
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\))
Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API
Date: Mon, 26 Aug 2024 17:25:35 -0500
In-Reply-To: <CAH5C8xV67frqOBCvLt73RM7QO86_pu40sHP5h71_kDRbKPtA8Q@mail.gmail.com>
Cc: Internals <internals@lists.php.net>
To: =?utf-8?B?TcOhdMOpIEtvY3Npcw==?= <kocsismate90@gmail.com>
References: <CAH5C8xUb1O20ZDrOQNC=ckFxHUUWSK7sw_njQQzFBd0qgQqoww@mail.gmail.com>
 <dd61999c-1ebd-4765-9add-cd8065968965@gmail.com>
 <CAOV5rgYZ_s1of-igLFEK7oqqh6=3HeYv0=JvvKvvjkcp0F6Q9Q@mail.gmail.com>
 <CAH5C8xV67frqOBCvLt73RM7QO86_pu40sHP5h71_kDRbKPtA8Q@mail.gmail.com>
X-Mailer: Apple Mail (2.3776.700.51)
From: dennis.snell@automattic.com (Dennis Snell)


--Apple-Mail=_E926BF61-55B3-49B2-B7A6-B6F7AEBA79C4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

> Hi Everyone,
>=20
> I've been working on a new RFC for a while now, and time has come to=20=

> present it to a wider audience.
>=20
> Last year, I learnt that PHP doesn't have built-in support for parsing =
URLs=20
> according to any well established standards (RFC 1738 or the WHATWG =
URL=20
> living standard), since the parse_url() function is optimized for=20
> performance instead of correctness.
>=20
> In order to improve compatibility with external tools consuming URLs =
(like=20
> browsers), my new RFC would add a WHATWG compliant URL parser =
functionality=20
> to the standard library. The API itself is not final by any means, the =
RFC=20
> only represents how I imagined it first.
>=20
> You can find the RFC at the following link:=20
> https://wiki.php.net/rfc/url_parsing_api
>=20
> Regards,=20
> M=C3=A1t=C3=A9
>=20
M=C3=A1t=C3=A9, thanks for putting this together.

Whenever I need to work with URLs there are a few things missing that I =
would love to see incorporated into any change in PHP that brings us a =
spec-compliant parsing class.

First of all, I typically care most about WhatWG URLs because the PHP =
code I=E2=80=99m working with is making decisions about HTML that a =
browser will interpret. Paramount above all other concerns that code on =
the server can understand content in the same way that the browsers =
will, otherwise we will invite security issues. People may have valid =
critiques with the WhatWG specification, but it=E2=80=99s also the =
most-relevant specification for users of much or most of the PHP code we =
write, and it=E2=80=99s valuable because it allows us to talk about URLs =
in the same way a browser would.

I=E2=80=99m worried about the side-effects that having a global =
uri.default_handler could have with code running differently for no =
apparent reason, or differently based on what is calling it. If someone =
is writing code for a controlled system I could see this being valuable, =
but if someone is writing a framework like WordPress and has no control =
over the environments in which code runs, it seems dangerous to hope =
that every plugin and every host runs compatible system configurations. =
Nobody is going to check `ini_get( =E2=80=98uri.default_handler=E2=80=99 =
)` before every line that parses URLs. Beyond this, even just allowing a =
pluggable parser invites broken deployments because PHP code that is =
reading from a browser or sending output to one needs to speak the =
language the browser is speaking, not some arbitrary language that=E2=80=99=
s similar to it.

> One thing I feel is missing, is a method to parse a (partial) URL =
relative to another


Being able to parse a relative URL and know if a URL is relative or =
absolute would help WordPress, which often makes decisions differently =
based on this property (for instance, when reading an `href` property of =
a link). I know these aren=E2=80=99t spec-compliant URLs, but they  =
still represent valid values for URL fields in HTML and knowing if they =
are relative or not requires some amount of parsing specific details =
everywhere, vs. in a class that already parses URLs. Effectively, this =
would imply that PHP=E2=80=99s new URL parser decodes  =
`document.querySelector( =E2=80=98a=E2=80=99 ).getAttribute( =E2=80=98href=
=E2=80=99 )`, which should be the same as `document.querySelector( =
=E2=80=98a=E2=80=99 ).href`, and indicates whether it found a full URL =
or only a portion of one.

  * `$url->is_relative` or `$url->is_absolute`
  * `$url->specificity =3D URL::Relative | URL::Absolute`

> the URI parser libraries used don't support modification of the URI

Having methods to add query arguments, change the path, etc=E2=80=A6 =
would be a great way to simplify user-space code working with URLs. For =
instance, read a URL and then add a query argument if some condition =
within the URL warrants it (for example, the path ends in `.png`).

Was it intended to add this to the RFC before it=E2=80=99s finalized?

> I would not make Url final. "OMG but then people can extend it!" =
Exactly.

My counter-point to this argument is that I see security exploits appear =
everywhere that functions which implement specifications are pluggable =
and extendable. It=E2=80=99s easy to see the need to create a class that =
limits possible URLs, but that also doesn=E2=80=99t require extending a =
class. A class can wrap a URL parser just as it could extend one. Magic =
methods would make it even easier.

A problem that can arise with adding additional rules onto a =
specification like this is that the subclass gets used in more places =
than it should and then somewhere some PHP code allows a malicious URL =
because it failed to parse and then the inspection rules weren=E2=80=99t =
applied.

----

Finally, I frequently find the need to be able to consider a URL in both =
the display context and the serialization context. With Ada we have =
`normalize_url()`, `parse_search_params()`, and the IDNA functions to =
convert between the two representations. In order to keep strong =
boundaries between security domains, it would be nice if PHP could =
expose the two variations: one is an encoded form of a URL that machines =
can easily parse while the other is a =E2=80=9Cplain string=E2=80=9D in =
PHP that=E2=80=99s easier for humans to parse but which might not even =
be a valid URL. Part of the reason for this need is that I often see =
user-space code treating an entire URL as a single text span that =
requires one set of rules for full decoding; it=E2=80=99s multiple =
segments that each have their own decoding rules.

 - Original [ https://xn--google.com/secret/../search?q=3D=F0=9F=8D=94 ]
 - `$url->normalize()` [ https://xn--google.com/search?q=3D%F0%9F%8D%94 =
]
 - `$url->for_display()` Displayed [ =
https://=E4=95=AE=E4=95=B5=E4=95=B6=E4=95=B1.com/search?q=3D=F0=9F=8D=94 =
]

Having this in the RFC would give everyone the tools they need to =
effectively and safely set links within an HTML document.

----

All the best,
Dennis Snell


--Apple-Mail=_E926BF61-55B3-49B2-B7A6-B6F7AEBA79C4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"><p =
style=3D"box-sizing: border-box; border-width: 0px; border-style: solid; =
border-color: rgba(229, 231, 235, var(--tw-border-opacity)); margin: =
0.5rem 0px 0.75rem; font-family: ui-sans-serif, system-ui, =
-apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, =
&quot;Helvetica Neue&quot;, Arial, &quot;Noto Sans&quot;, sans-serif, =
&quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe =
UI Symbol&quot;, &quot;Noto Color Emoji&quot;;"></p><blockquote =
type=3D"cite"><p style=3D"box-sizing: border-box; border-width: 0px; =
border-style: solid; border-color: rgba(229, 231, 235, =
var(--tw-border-opacity)); margin: 0.5rem 0px 0.75rem; font-family: =
ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, &quot;Segoe =
UI&quot;, Roboto, &quot;Helvetica Neue&quot;, Arial, &quot;Noto =
Sans&quot;, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI =
Emoji&quot;, &quot;Segoe UI Symbol&quot;, &quot;Noto Color =
Emoji&quot;;">Hi Everyone,</p><p style=3D"box-sizing: border-box; =
border-width: 0px; border-style: solid; border-color: rgba(229, 231, =
235, var(--tw-border-opacity)); margin: 0.5rem 0px 0.75rem; font-family: =
ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, &quot;Segoe =
UI&quot;, Roboto, &quot;Helvetica Neue&quot;, Arial, &quot;Noto =
Sans&quot;, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI =
Emoji&quot;, &quot;Segoe UI Symbol&quot;, &quot;Noto Color =
Emoji&quot;;">I've been working on a new RFC for a while now, and time =
has come to&nbsp;<br style=3D"box-sizing: border-box; border-width: 0px; =
border-style: solid; border-color: rgba(229, 231, 235, =
var(--tw-border-opacity)); --tw-border-opacity: 1; --tw-shadow: 0 0 =
#0000; --tw-ring-inset: var(--tw-empty, ); --tw-ring-offset-width: 0px; =
--tw-ring-offset-color: #fff; --tw-ring-color: rgba(59, 130, 246, 0.5); =
--tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 =
#0000;">present it to a wider audience.</p><p style=3D"box-sizing: =
border-box; border-width: 0px; border-style: solid; border-color: =
rgba(229, 231, 235, var(--tw-border-opacity)); margin: 0.5rem 0px =
0.75rem; font-family: ui-sans-serif, system-ui, -apple-system, =
BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, &quot;Helvetica =
Neue&quot;, Arial, &quot;Noto Sans&quot;, sans-serif, &quot;Apple Color =
Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;, =
&quot;Noto Color Emoji&quot;;">Last year, I learnt that PHP doesn't have =
built-in support for parsing URLs&nbsp;<br style=3D"box-sizing: =
border-box; border-width: 0px; border-style: solid; border-color: =
rgba(229, 231, 235, var(--tw-border-opacity)); --tw-border-opacity: 1; =
--tw-shadow: 0 0 #0000; --tw-ring-inset: var(--tw-empty, ); =
--tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; =
--tw-ring-color: rgba(59, 130, 246, 0.5); --tw-ring-offset-shadow: 0 0 =
#0000; --tw-ring-shadow: 0 0 #0000;">according to any well established =
standards (RFC 1738 or the WHATWG URL&nbsp;<br style=3D"box-sizing: =
border-box; border-width: 0px; border-style: solid; border-color: =
rgba(229, 231, 235, var(--tw-border-opacity)); --tw-border-opacity: 1; =
--tw-shadow: 0 0 #0000; --tw-ring-inset: var(--tw-empty, ); =
--tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; =
--tw-ring-color: rgba(59, 130, 246, 0.5); --tw-ring-offset-shadow: 0 0 =
#0000; --tw-ring-shadow: 0 0 #0000;">living standard), since =
the&nbsp;<code style=3D"box-sizing: border-box; border-width: 0px; =
border-style: solid; border-color: rgba(229, 231, 235, =
var(--tw-border-opacity)); --tw-border-opacity: 1; --tw-shadow: 0 0 =
#0000; --tw-ring-inset: var(--tw-empty, ); --tw-ring-offset-width: 0px; =
--tw-ring-offset-color: #fff; --tw-ring-color: rgba(59, 130, 246, 0.5); =
--tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; =
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, =
&quot;Liberation Mono&quot;, &quot;Courier New&quot;, monospace; =
font-size: 0.75rem; border-radius: 0.25rem; --tw-bg-opacity: 1; =
background-color: rgba(254, 242, 242, var(--tw-bg-opacity)); padding: =
0.25rem; line-height: 1rem; --tw-text-opacity: 1; color: rgba(185, 28, =
28, var(--tw-text-opacity));">parse_url()</code>&nbsp;function is =
optimized for&nbsp;<br style=3D"box-sizing: border-box; border-width: =
0px; border-style: solid; border-color: rgba(229, 231, 235, =
var(--tw-border-opacity)); --tw-border-opacity: 1; --tw-shadow: 0 0 =
#0000; --tw-ring-inset: var(--tw-empty, ); --tw-ring-offset-width: 0px; =
--tw-ring-offset-color: #fff; --tw-ring-color: rgba(59, 130, 246, 0.5); =
--tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 =
#0000;">performance instead of correctness.</p><p style=3D"box-sizing: =
border-box; border-width: 0px; border-style: solid; border-color: =
rgba(229, 231, 235, var(--tw-border-opacity)); margin: 0.5rem 0px =
0.75rem; font-family: ui-sans-serif, system-ui, -apple-system, =
BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, &quot;Helvetica =
Neue&quot;, Arial, &quot;Noto Sans&quot;, sans-serif, &quot;Apple Color =
Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;, =
&quot;Noto Color Emoji&quot;;">In order to improve compatibility with =
external tools consuming URLs (like&nbsp;<br style=3D"box-sizing: =
border-box; border-width: 0px; border-style: solid; border-color: =
rgba(229, 231, 235, var(--tw-border-opacity)); --tw-border-opacity: 1; =
--tw-shadow: 0 0 #0000; --tw-ring-inset: var(--tw-empty, ); =
--tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; =
--tw-ring-color: rgba(59, 130, 246, 0.5); --tw-ring-offset-shadow: 0 0 =
#0000; --tw-ring-shadow: 0 0 #0000;">browsers), my new RFC would add a =
WHATWG compliant URL parser functionality&nbsp;<br style=3D"box-sizing: =
border-box; border-width: 0px; border-style: solid; border-color: =
rgba(229, 231, 235, var(--tw-border-opacity)); --tw-border-opacity: 1; =
--tw-shadow: 0 0 #0000; --tw-ring-inset: var(--tw-empty, ); =
--tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; =
--tw-ring-color: rgba(59, 130, 246, 0.5); --tw-ring-offset-shadow: 0 0 =
#0000; --tw-ring-shadow: 0 0 #0000;">to the standard library. The API =
itself is not final by any means, the RFC&nbsp;<br style=3D"box-sizing: =
border-box; border-width: 0px; border-style: solid; border-color: =
rgba(229, 231, 235, var(--tw-border-opacity)); --tw-border-opacity: 1; =
--tw-shadow: 0 0 #0000; --tw-ring-inset: var(--tw-empty, ); =
--tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; =
--tw-ring-color: rgba(59, 130, 246, 0.5); --tw-ring-offset-shadow: 0 0 =
#0000; --tw-ring-shadow: 0 0 #0000;">only represents how I imagined it =
first.</p><p style=3D"box-sizing: border-box; border-width: 0px; =
border-style: solid; border-color: rgba(229, 231, 235, =
var(--tw-border-opacity)); margin: 0.5rem 0px 0.75rem; font-family: =
ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, &quot;Segoe =
UI&quot;, Roboto, &quot;Helvetica Neue&quot;, Arial, &quot;Noto =
Sans&quot;, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI =
Emoji&quot;, &quot;Segoe UI Symbol&quot;, &quot;Noto Color =
Emoji&quot;;">You can find the RFC at the following link:&nbsp;<br =
style=3D"box-sizing: border-box; border-width: 0px; border-style: solid; =
border-color: rgba(229, 231, 235, var(--tw-border-opacity)); =
--tw-border-opacity: 1; --tw-shadow: 0 0 #0000; --tw-ring-inset: =
var(--tw-empty, ); --tw-ring-offset-width: 0px; --tw-ring-offset-color: =
#fff; --tw-ring-color: rgba(59, 130, 246, 0.5); --tw-ring-offset-shadow: =
0 0 #0000; --tw-ring-shadow: 0 0 #0000;"><a =
href=3D"https://wiki.php.net/rfc/url_parsing_api" rel=3D"nofollow" =
target=3D"_blank" style=3D"box-sizing: border-box; border-width: 0px; =
border-style: solid; border-color: rgba(229, 231, 235, =
var(--tw-border-opacity)); --tw-border-opacity: 1; --tw-shadow: 0 0 =
#0000; --tw-ring-inset: var(--tw-empty, ); --tw-ring-offset-width: 0px; =
--tw-ring-offset-color: #fff; --tw-ring-color: rgba(59, 130, 246, 0.5); =
--tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; color: =
rgba(220, 38, 38, var(--tw-text-opacity)); text-decoration: inherit; =
--tw-text-opacity: =
1;">https://wiki.php.net/rfc/url_parsing_api</a></p><p =
style=3D"box-sizing: border-box; border-width: 0px; border-style: solid; =
border-color: rgba(229, 231, 235, var(--tw-border-opacity)); margin: =
0.5rem 0px 0.75rem; font-family: ui-sans-serif, system-ui, =
-apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, =
&quot;Helvetica Neue&quot;, Arial, &quot;Noto Sans&quot;, sans-serif, =
&quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe =
UI Symbol&quot;, &quot;Noto Color Emoji&quot;;">Regards,&nbsp;<br =
style=3D"box-sizing: border-box; border-width: 0px; border-style: solid; =
border-color: rgba(229, 231, 235, var(--tw-border-opacity)); =
--tw-border-opacity: 1; --tw-shadow: 0 0 #0000; --tw-ring-inset: =
var(--tw-empty, ); --tw-ring-offset-width: 0px; --tw-ring-offset-color: =
#fff; --tw-ring-color: rgba(59, 130, 246, 0.5); --tw-ring-offset-shadow: =
0 0 #0000; --tw-ring-shadow: 0 0 =
#0000;">M=C3=A1t=C3=A9</p></blockquote><div>M=C3=A1t=C3=A9, thanks for =
putting this together.</div><div><br></div><div>Whenever I need to work =
with URLs there are a few things missing that I would love to see =
incorporated into any change in PHP that brings us a spec-compliant =
parsing class.</div><div><br></div><div>First of all, I typically care =
most about WhatWG URLs because the PHP code I=E2=80=99m working with is =
making decisions about HTML that a browser will interpret. Paramount =
above all other concerns that code on the server can understand content =
in the same way that the browsers will, otherwise we will invite =
security issues. People may have valid critiques with the WhatWG =
specification, but it=E2=80=99s also the most-relevant specification for =
users of much or most of the PHP code we write, and it=E2=80=99s =
valuable because it allows us to talk about URLs in the same way a =
browser would.</div><div><br></div><div>I=E2=80=99m worried about the =
side-effects that having a global&nbsp;<span style=3D"caret-color: =
rgb(51, 51, 51); color: rgb(51, 51, 51); font-family: &quot;Source Code =
Pro&quot;, &quot;Courier New&quot;, Courier, monospace, sans-serif; =
background-color: rgb(255, 255, =
255);">uri.default_handler</span>&nbsp;could have with code running =
differently for no apparent reason, or differently based on what is =
calling it. If someone is writing code for a controlled system I could =
see this being valuable, but if someone is writing a framework like =
WordPress and has no control over the environments in which code runs, =
it seems dangerous to hope that every plugin and every host runs =
compatible system configurations. Nobody is going to check `ini_get( =
=E2=80=98uri.default_handler=E2=80=99 )` before every line that parses =
URLs. Beyond this, even just <i>allowing</i>&nbsp;a pluggable parser =
invites broken deployments because PHP code that is reading from a =
browser or sending output to one needs to speak the language the browser =
is speaking, not some arbitrary language that=E2=80=99s similar to =
it.</div><div><br></div><div><blockquote type=3D"cite">One thing I feel =
is missing, is a method to parse a (partial) URL relative to =
another</blockquote></div><div><br></div><div>Being able to parse a =
relative URL and know if a URL is relative or absolute would help =
WordPress, which often makes decisions differently based on this =
property (for instance, when reading an `href` property of a link). I =
know these aren=E2=80=99t spec-compliant URLs, but they &nbsp;still =
represent valid values for URL fields in HTML and knowing if they are =
relative or not requires some amount of parsing specific details =
everywhere, vs. in a class that already parses URLs. Effectively, this =
would imply that PHP=E2=80=99s new URL parser decodes =
&nbsp;`document.querySelector( =E2=80=98a=E2=80=99 ).getAttribute( =
=E2=80=98href=E2=80=99 )`, which should be the same as =
`document.querySelector( =E2=80=98a=E2=80=99 ).href`, and indicates =
whether it found a full URL or only a portion of =
one.</div><div><br></div><div>&nbsp; * `$url-&gt;is_relative` or =
`$url-&gt;is_absolute`</div><div>&nbsp; * `$url-&gt;specificity =3D =
URL::Relative | URL::Absolute`</div><div><br></div><blockquote =
type=3D"cite"><div><div>the URI parser libraries used don't support =
modification of the =
URI</div></div></blockquote><div><br></div><div>Having methods to add =
query arguments, change the path, etc=E2=80=A6 would be a great way to =
simplify user-space code working with URLs. For instance, read a URL and =
then add a query argument if some condition within the URL warrants it =
(for example, the path ends in `.png`).</div><div><br></div><div>Was it =
intended to add this to the RFC before it=E2=80=99s =
finalized?</div><div><br></div><div><blockquote type=3D"cite">I would =
not make Url final. "OMG but then people can extend it!" =
Exactly.</blockquote><br></div><div>My counter-point to this argument is =
that I see security exploits appear everywhere that functions which =
implement specifications are pluggable and extendable. It=E2=80=99s easy =
to see the need to create a class that <i>limits</i>&nbsp;possible URLs, =
but that also doesn=E2=80=99t require extending a class. A class can =
wrap a URL parser just as it could extend one. Magic methods would make =
it even easier.</div><div><br></div><div>A problem that can arise with =
adding additional rules onto a specification like this is that the =
subclass gets used in more places than it should and then somewhere some =
PHP code allows a malicious URL <i>because</i>&nbsp;it failed to parse =
and then the inspection rules weren=E2=80=99t =
applied.</div><div><br></div><div>----</div><div><br></div><div>Finally, =
I frequently find the need to be able to consider a URL in both the =
<i>display</i>&nbsp;context and the <i>serialization</i>&nbsp;context. =
With Ada we have `normalize_url()`, `parse_search_params()`, and the =
IDNA functions to convert between the two representations. In order to =
keep strong boundaries between security domains, it would be nice if PHP =
could expose the two variations: one is an encoded form of a URL that =
machines can easily parse while the other is a =E2=80=9Cplain string=E2=80=
=9D in PHP that=E2=80=99s easier for humans to parse but which might not =
even be a valid URL. Part of the reason for this need is that I often =
see user-space code treating an entire URL as a single text span that =
requires one set of rules for full decoding; it=E2=80=99s multiple =
segments that each have their own decoding =
rules.</div><div><br></div><div>&nbsp;- Original =
[&nbsp;https://xn--google.com/secret/../search?q=3D=F0=9F=8D=94 =
]</div><div>&nbsp;- `$url-&gt;normalize()` =
[&nbsp;https://xn--google.com/search?q=3D%F0%9F%8D%94 =
]</div><div>&nbsp;- `$url-&gt;for_display()` Displayed [ =
https://=E4=95=AE=E4=95=B5=E4=95=B6=E4=95=B1.com/search?q=3D=F0=9F=8D=94 =
]</div><div><br></div><div>Having this in the RFC would give everyone =
the tools they need to effectively and safely set links within an HTML =
document.</div><div><br></div><div>----</div><div><br></div><div>All the =
best,</div><div>Dennis Snell</div><div><br></div></body></html>=

--Apple-Mail=_E926BF61-55B3-49B2-B7A6-B6F7AEBA79C4--