Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:129595 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id B978F1A00BC for ; Fri, 12 Dec 2025 12:31:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1765542679; bh=N+isDimxbNMlGfNfkoKZg6hNE79GMGO9OCNnTNnMHWE=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=HmOBG4gZkVfzCd3ryFFlghCDlKzNU9XsSKl7P8iaM7/11JQzAXp39vota2Oh7bKtJ /dQldCgICblku1YXxI07h12HqGLwpj3wjuEr8NGKkMlWDC4/+VH79GxlbUhb0xnyvT Fnub7njEeBrecZ5/uYJv+Iw/+5CZN8xBtOfnJqOgmbSsEH3GNsNZqRB/llSeJHG0bx U74IvFiclrF1246s+t/staQWCYUU0Dd1V9dqy9ypCfzfwOfHVmmdA9+Jh/A2TUhBkD wP3EUK1YqtZd+3wkBwaB0ExKYdn3s/cH/fdR8c3YdLeZk80cYNrwy0ASC5bxf57rEu l/6WLXYuD63OQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 288071801E5 for ; Fri, 12 Dec 2025 12:31:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 12 Dec 2025 12:31:14 +0000 (UTC) Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-4ed66b5abf7so10566151cf.1 for ; Fri, 12 Dec 2025 04:31:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765542668; x=1766147468; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=mIc2ZgB0BiBclWRc8fjsl5odPZtwI8EG02afRpSnCIE=; b=R5N1k8zmgJLPBASA6jU1XWNYNDASVsVyE9bPvPSvk4MOQUHKleh6xRDFRc8B2frS3D ALYhyFtcN/iEqYT4FG3NPMdf+fsG3BAuWJyGl02vx2EfLeYCWEApcNiAVy1s6iVq1KcP TFlzspC73Oiz+nXBOW80Kxc3dmmxsO2+mQqFQoSRWSCpCexBjcYWJzEpszdgosiu3WsN Ar9AsR7EPufPtXlHzKmtJxmCbsXAcwV4C2V5QbQdWVYhYaHLZ0PdAXRnAjA/zYyAEMU3 D8KKBPl9V2JzZ+Ud9VYIeaHkYdpr5ZcTBTGnvusue1/i5yJCXyoAoCvouUVIZWhXgEfu XQrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765542668; x=1766147468; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mIc2ZgB0BiBclWRc8fjsl5odPZtwI8EG02afRpSnCIE=; b=VNgG+qR6gA2dOUC29M+ruM4JioH1et48y/Eoy5ikmveYFgxnIqpzrs3vAZtsph1NRC mZsRXQJ1TYglINO97htN66LmZ/IJPGuI6r2U9jfkA1cLJQyeRmbWm1mhfTaCKbsGEIMK rSy2rMbyv0eMdRch+7sj61oXsuX1mwOXN1XwijQ9UAecJlogByi+t5UNkJJbjw+w0KK/ QvSC6b2KC33e0m3xgScdFG0fGwtDfXdTqWPs+Wvwpbo/S3haHk/EqrbxbaeegM+1L17T 8pQ7hmzJHw69JCLbzVlmT0Oi9QNxU16XgQR9kLc7H5vtplWZUhnHZD7NbE8GDvwQoiwt 1t9w== X-Gm-Message-State: AOJu0YwSBhQXgaeFfUgHk+OW5YGLZFWGiDBYwZlCOQ6RDg3WdstnIap8 o6bRlXlLgMyWZoCzM40jHPm1CQVIjNSBVm9i7THrTQSD72yN7X3YQDQhIelCab4rC0oV/4qzyAv j/bpMW0XFdW/30f4/XEsj/TUN130nz4+gOFJT X-Gm-Gg: AY/fxX6LwA5e0W/oUEoKCnzyLUVl2L6CnlMnpKhJnbIx9NRlWwXJMPjqYnPwYzuV9Tx rsgJGbTrTCLUjh6oT01sh1c/0+mwGdK/+6gMGRGmPmyy65auT4kh8G1GIcbytJGwbtlXN55KXAM JvqO/U2lwBddcv4zunDgSHOpfbWyePtuq916b4jTKllL4uWfGTzuuNP1adgS90LPQPs3oO2NCgd qMnm5L36laSJdibdjabiobDxeKBdL4UfhL3jhp3QZNUvnnE7xRrR9tUlG13EiLep/cKqw== X-Google-Smtp-Source: AGHT+IE25aVMTGAcDOfEcpIg/UUet16BVlJTCq08Senxy+U/I5eHWdLQRae5YFEYPBgDVVKGZtitcIGWBBKAOSgXJcE= X-Received: by 2002:a05:622a:148c:b0:4f1:8bfe:e446 with SMTP id d75a77b69052e-4f1cf62768dmr23514921cf.41.1765542668310; Fri, 12 Dec 2025 04:31:08 -0800 (PST) Precedence: list list-help: list-unsubscribe: list-post: List-Id: x-ms-reactions: disallow MIME-Version: 1.0 References: In-Reply-To: Date: Fri, 12 Dec 2025 13:30:57 +0100 X-Gm-Features: AQt7F2rjXJ5ZRKKWzp28qcHTHvCIbXaoQ7gtdYc50cLjg2ZNadq6WnDc6tXffNo Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Followup Improvements for ext/uri To: ignace nyamagana butera Cc: PHP Internals List , =?UTF-8?Q?Tim_D=C3=BCsterhus?= Content-Type: multipart/alternative; boundary="000000000000949e670645c06f84" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --000000000000949e670645c06f84 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Ignace, > > The getter methods return null if the path is empty (https://example.co= m), > an empty array when the path > > consists of a single slash (https://example.com/), and a non-empty > array otherwise. > Yes, that's correct! > Instead, I would rather always get a single type, the array as return > value. The issue you are facing is that > you want to convey via your return type if the path is absolute or not. > But, we already have access to this > information via the UriType Enum, at least in the case of the > Uri\Rfc3986\Uri class. > The UriType enum in its current form is not really suitable, because it can only distinguish relative and absolute path references ("foo" vs "/foo"), but not absolute URIs (" https://example.com" vs "https://example.com/"). "https://example.com" and "https://example.com/" are both absolute URIs, and the former one has an empty path. In order to find out the correct behavior, I think we should first try to dig deeper into the definition of path segments. Also, in order to have some inspiration, I checked how similar functionality works in other languages, C# notably: https://learn.microsoft.com/en-us/dotnet/api/system.uri.segments?view=3Dnet= -10.0#system-uri-segments Making the leading "/" its own segment feels a little bit off at the first sight (not to mention that the "/" characters are part of the segments), because RFC 3986 specifies that path segments start after the leading "/" due to the following ABNF rule: path-abempty =3D *( "/" segment ) That is, for URIs containing **an authority component**, the path is either empty, or contains a "/" followed by a segment one or multiple times. Then segments have the following syntax: segment =3D *pchar That is, segments are composed of zero or multiple characters in the "pchar" charset (the exact values don't matter in this case). So let's see some basic examples with absolute URIs: "https://example.com" -> no path segments: [] "https://example.com/foo" -> one path segment "foo": ["foo"] Consequently: "https://example.com/" -> one path segment which is empty: [""] "https://example.com/foo/" -> two path segments: ["foo", ""] Then the behavior of C# starts to make some sense - at least when the path only consists of a "/" character (IMO it doesn't make sense for other cases like "/foo"). Now let's see what to do with path references: "" (empty string) -> no path segments: [] "/foo" -> one path segment "foo": ["foo"] "foo" -> one path segment "foo": ["foo"] "foo/" -> two path segments: ["foo", ""] "/" -> one path segment which is empty: [""] Unfortunately, this is not all, there are a few other special cases for absolute URIs: "https://" -> means that there's an authority, but it's empty, therefore the path is also empty, therefore no path segments -> [] "https:/" -> means that there's no authority, and the path is "/", therefore one path segment which is empty -> [""] "https:" -> means that there's no authority, and the path is "", therefore no path segments -> [] As far as I can see, this behavior is completely logical and satisfies the definitions of RFC 3986. However, one case may possibly need disambiguation in relation to the withPathSegments() method: "/foo" vs "foo". (P.S. the uriparser library had to use a special field for tracking exactly these cases.) That being said, I agree with you that the currently suggested signatures should be changed. However, accepting an additional UriType parameter by the withPathSegments() method wouldn't be correct, because I've just demonstrated that the behavior doesn't depend on whether an URI is absolute or relative, but whether the authority component is defined or not. So my alternative idea for disambiguating the above mentioned case is the following: adding a 2nd parameter $addLeadingSlashForNonEmptyRelativeUri to the withPathSegments() method (I know this param name is insanely long, so I'm happy to get recommendations), and then a leading slash would be added to the path if and only if all the 3 criteria are satisfied: - the $addLeadingSlashForNonEmptyRelativeUri boolean parameter is true - the first item in the $pathSegments array parameter is non-empty - the target URI is relative This means that calling $uri->withPathSegments(["", "foo"], false) and $uri->withPathSegments(["foo"], true) would result in the same path reference ("/foo") when $uri doesn't have an authority. I'm fine with bikeshedding/fine-tuning these rules, but I do think we should go with something along the lines of this. For the Uri\WhatWg\Uri the information is less crucial as the validation > and normalization rules of the WHATWG > specifications will autocorrect the path if needed. > Yes, true. M=C3=A1t=C3=A9 --000000000000949e670645c06f84 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Ignace= ,
=C2=A0
<= div dir=3D"ltr">> The getter methods return null if the path is empty (<= a href=3D"https://example.com" target=3D"_blank">https://example.com), = an empty array when the path
> consists of a single slash (https://example.com/), and a n= on-empty array otherwise.

Yes, th= at's correct!
=C2=A0
Instead, I would rather always get a single= type, the array as return value. The issue you are facing is that
you w= ant to convey via your return type if the path is absolute or not. But, we = already have access to this
information via the UriType Enum, at least i= n the case of the Uri\Rfc3986\Uri class.

<= div>The=C2=A0UriType enum in its current form is not really suitable, becau= se it can only distinguish relative and absolute
path references = ("foo" vs "/foo"), but not absolute URIs ("https://example.com" vs= "https://example.c= om/").
"https://example.com" and "https://example.com/" are both absolute URI= s, and the former one has an empty path.

In order = to find out the correct behavior, I think we should first try to dig deeper= into the definition=C2=A0of path segments.

Also, = in order to have some inspiration, I checked how similar functionality work= s in other languages, C# notably:
Making the leading "/&q= uot; its own segment feels a little bit off at the first sight (not to ment= ion that the "/" characters
are part of the segments), = because RFC 3986 specifies that path segments start after the leading "= ;/" due to the
following ABNF rule:

path-abempty =C2=A0=3D *( "/" segment )

That is, for URIs containing **an authority component**, the path is eithe= r empty, or contains a "/" followed by a segment
one or= multiple times. Then segments have the following syntax:

segment =C2=A0 =C2=A0 =C2=A0 =3D *pchar

Th= at is, segments are composed of zero or multiple characters in the "pc= har" charset (the exact values don't matter
in this case= ). So let's see some basic examples with absolute URIs:

<= /div>
"https://example.com&quo= t; -> no path segments: []
"https://example.com/foo" -> one path segment "foo&q= uot;: ["foo"]

Consequently:
&qu= ot;https://example.com/foo/" = -> two path segments: ["foo", ""]

Then the behavior of C# starts=C2=A0to make some sense - at least wh= en the path only consists of a "/" character (IMO it
do= esn't make sense for other cases like=C2=A0 "/foo").

Now let's see what to do with path references:

"" (empty string) -> no path segments: []
"/foo" -> one path segment "foo": ["foo= "]
"foo" -> one path segment "foo": [= "foo"]
"foo/" -> two path segments: ["= ;foo", ""]
"/" -> one path segment wh= ich is empty: [""]

Unfortunately, this i= s not all, there are a few other special cases for absolute URIs:

"https://" -> means that there's an author= ity, but it's empty, therefore the path is also empty, therefore no pat= h segments=C2=A0 -> []
"https:/" -> means that th= ere's no authority, and the path is "/", therefore one path s= egment which is empty=C2=A0 -> [""]
"https:&quo= t; -> means that there's no authority, and the path is "",= therefore no path segments=C2=A0 -> []

As far = as I can see, this behavior is completely logical and satisfies the definit= ions of RFC 3986. However, one case
may possibly need disambiguat= ion in relation to the withPathSegments() method: "/foo" vs "= ;foo". (P.S. the uriparser library
had to use a special fiel= d for tracking exactly these cases.)

That being sa= id, I agree with you that the currently suggested signatures should be chan= ged. However, accepting an
additional UriType parameter by the wi= thPathSegments() method wouldn't be correct, because I've just demo= nstrated
that the behavior doesn't depend on whether an URI i= s absolute or relative, but whether the authority component is
de= fined or not.

So my alternative idea for disambigu= ating the above mentioned case is the following: adding a 2nd parameter
$addLeadingSlashForNonEmptyRelativeUri to the withPathSegments() met= hod (I know this param name is insanely long,
so I'm happy to= get recommendations), and then a leading slash would be added to the path = if and only if all the 3
criteria are satisfied:

- the $addLeadingSlashForNonEmptyRelativeUri boolean parameter is = true
- the first item in the $pathSegments array parameter is non= -empty
- the target URI is relative

This= means that calling $uri->withPathSegments(["", "foo"= ;], false) and $uri->withPathSegments(["foo"], true) would res= ult
in the same path reference ("/foo") when $uri doesn= 't have an authority. I'm fine with bikeshedding/fine-tuning these = rules,
but I do think we should go with something along the lines= of this.

For the Uri\WhatWg\Uri the information is less crucia= l as the validation and normalization rules of the WHATWG=C2=A0
s= pecifications will autocorrect the path if needed.
<= br>
Yes, true.
=C2=A0
M=C3=A1t=C3=A9
--000000000000949e670645c06f84--