Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127044 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 396171A00BC for ; Fri, 4 Apr 2025 17:47:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1743788674; bh=wEFIYf/vfQpNGC84vjLiW95pptij0uxW/O2tzmEXMY8=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=IzHRWJM5NObYUcueVN49peUzBdd+Pm/t5gmEVwMAseo8mMp8m+O6UoWhgsvHYUOAS TvHAjOQkR52mih73WNQMkVH0WVa5ngIy0HzrYgQ9vg+M5kTjtKdv1vkkiunZCgVQ/G Nz0/jJomyQdr/VOBfqhY9lRIQcOcS0n4h+W3pW79sAvxR7UFnh9kOOj/tO9qGacpGH RqwNKD5ePufQcRXFGJ7/pvfYhWVsMVp1GSUio6Cx8jBR7mPBjMeA9OzL9GzTaSMJiY lhyrG3ptTtpcjBbVykGI8aUQE8hf/Aw1hA7ofdGjyOfpQot0c8GMu0RQJvmGPjrCrF HaCeDZ4UqesWg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 64B7E180080 for ; Fri, 4 Apr 2025 17:44:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 4 Apr 2025 17:44:33 +0000 (UTC) Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-39c266c2dd5so2004409f8f.3 for ; Fri, 04 Apr 2025 10:46:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743788817; x=1744393617; darn=lists.php.net; h=in-reply-to:from:content-language:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=xFDZpTNvclKsYBeHPQ3OXDRUwyYE1Jm+VYInYfcK60w=; b=OIHQa74SPlM91U3YNokXU4Qu4WVeeMqO0opdRu+Z9f7Y9nimqI6rJqXmxDhH2i7pP6 S8BB0tkY2VMbwwMymZe5FapRiAsWRT7NPWDCPTcwzjJDVQuvVDLd6ztqJexLUI2WLUau WSnVIOffHF4XEtRi+DdGURGgomNO1epJ+FnTXtI4J9iYA2FxYgsb+oV8M+fp7AhN1dcZ IyQyzRVF64B8qyOfFHiDQ3ZN2dHG7FjXRwQJJobrg/0hFRCV3bOm5F0usybXmr4StotO 86rJymNKUJ/rd8Fa2zPAnvFrI7WComGx3gRfPV9Rxp+5Yn6VBgzsdWCtGwnuiZ/1RkGH ostQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743788817; x=1744393617; h=in-reply-to:from:content-language:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=xFDZpTNvclKsYBeHPQ3OXDRUwyYE1Jm+VYInYfcK60w=; b=rL6fleyYWsi2XzxZqBvex9Zqb3tl/d8ybw+Edf425w5engNUUsIQLAC68wl6MAuwLs oKZzqVxoYxWbbP5DEjHSNmAP/JbgmiAB4OPkw03M5vp9KPKKbD7m59KdI49xVT/Z2EP9 FccstMd1pq68G6CMnAHh/Vusg49ep+GmNo82Q9+n+TdnwejDZd9xT31sD/hJRf3XYlDK 4rJSNxPmKoKKzwFZXERmgcYpDUsRp4AHJG3z1dr6FnPqHrOPXXtU1MUW2oVgptDSg9RO qjiCOzyUFT3AuttltPyWnjMXxZXvVl50FK9oAjn7UcspOjqIn/oHiHJLwWwvieR4+C0Z U+Vw== X-Forwarded-Encrypted: i=1; AJvYcCVnHlcBWkBPLKit/Iq7pjGg/8Ibn56ojf8LUN8nW/9quPPnTEBX8Xuf5prd5l06bM6zP4yy41G+EF8=@lists.php.net X-Gm-Message-State: AOJu0YxJC43IIq+hUBa1MDRzRIYaKssNwh4aMfvSpQ3DzhewDjdbQHA4 kEmfLHcUmXWWb3nwCbBcJwOxOdcbGHPC6rszqgPvrCqD+kZKswK4XIixtQ== X-Gm-Gg: ASbGncucEeqRkQEfMdrcs5JsdiWMwpOfH1v1veMV7yRyS2FZsq4HBdEBqW8Z+UM+AmF u3IhZFmAx3hD3Q7NDLpqTJJbMKpHucqHsfcQMITpkyPBS8+aFj3JMJzpE+VYftQkti8LH2ELn9o lCkRkhPIwaNCfAR3UPUmvZUrp9OhyYz8XeBXou4OhvNt169En8Q/yJ+inxkpojcUnHN3Xz5WdnU Th/DuzxlQS3gOnr6pBmrYTz71Ejdz8pIxDSK0/q9EeHW5MwW8g4BvYbW53YUtnqAjno1UT+l0gC +YPTfNTwOz1ox6XQu0v14/L0M6CnLz8X6EDVoGmBLL0WKo7hwAiNYKCPSHUfRKblljO1nkm0/qz QMSwZvZseJxRXNsj8Ub5PQXcgpXNAhdvtMoUUPHExvYdWMLZud+lC6xBEAHOz9fAZDPBtw6k+Lu 7sVD/PCE5umYqFUafudMy2QWY= X-Google-Smtp-Source: AGHT+IEj8Rv68G5usIAneR0D3u4BiXH/WVF2G/920odoms3NRW0YcExsXR9M6cQjQLy8xqsKbWWEgA== X-Received: by 2002:a5d:5f96:0:b0:39c:cc7:3c62 with SMTP id ffacd0b85a97d-39d147577famr3425235f8f.51.1743788817169; Fri, 04 Apr 2025 10:46:57 -0700 (PDT) Received: from ?IPV6:2a02:1811:3716:cb00:9000:884b:610b:4bc6? (ptr-9c16nbdn1oqq7szemdi.18120a2.ip6.access.telenet.be. [2a02:1811:3716:cb00:9000:884b:610b:4bc6]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-39c3020d5d1sm4953609f8f.77.2025.04.04.10.46.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 04 Apr 2025 10:46:56 -0700 (PDT) Content-Type: multipart/alternative; boundary="------------az9hZd390p5NKqBGTcft82ke" Message-ID: <5003143b-450a-4404-9cd4-4ecb4d63690f@gmail.com> Date: Fri, 4 Apr 2025 19:46:55 +0200 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= Cc: =?UTF-8?Q?Tim_D=C3=BCsterhus?= , PHP Internals List References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> <6430b9ed-638d-4247-9fa9-d1a9148c382b@gmail.com> <2e95e8fe-7cf0-493f-bd0a-9fff0956baaa@gmail.com> <7d715757cc2dfd71019d106b01c69aed@bastelstu.be> Content-Language: fr In-Reply-To: From: nyamsprod@gmail.com (Ignace Nyamagana Butera) This is a multi-part message in MIME format. --------------az9hZd390p5NKqBGTcft82ke Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 02/04/2025 19:59, Máté Kocsis wrote: > Hi Ignace, > > I spotted another inconsistency in the normalization under RFC3986 > > > Thanks for spotting this: apparently, it is due to a small bug in the > uriparser library, which I managed to fix locally, PR is on its way to > upstream. > > Máté Hi Máté I have a couple of questions regarding RFC3986\Uri - I believe during normalization of IPv6 host the letter a-f should be lowercase in accordance with the RFC since RFC3986 follows https://www.rfc-editor.org/rfc/rfc3513 which has been replaced by https://www.rfc-editor.org/rfc/rfc4291 which is updated by https://www.rfc-editor.org/rfc/rfc5952#section-4.3 which recommends lowecasing the letters. (yeah that was quite a digging I know 🙂 ) - Since the withers expect well encoded components does it means that it is the same for the constructor. What is the expected result for the following code ? ```php $uri =new Uri\Rfc3986\Uri("https://example,com/?foo[]=1&foo[]=2"); ``` Should the above trigger an exception because the query component contain invalid characters or is it acceptable ? Asking because currently our dear old parse_url does not fail on this and probably most PHP developers expect this not to fail. IMHO I am in favor of it failing to get a consistent experience when using the class because otherwse you introduce an inconsistency between the constructor behaviour and the rest of the class API. Best regards, Ignace Nyamagana Butera --------------az9hZd390p5NKqBGTcft82ke Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit


On 02/04/2025 19:59, Máté Kocsis wrote:
Hi Ignace,

I spotted another inconsistency in the normalization under RFC3986


Thanks for spotting this: apparently, it is due to a small bug in the uriparser library, which I managed to fix locally, PR is on its way to upstream.

Máté


Hi Máté I have a couple of questions regarding RFC3986\Uri

- I believe during normalization of IPv6 host the letter a-f should be lowercase in accordance with the RFC since

RFC3986 follows https://www.rfc-editor.org/rfc/rfc3513 which has been replaced by https://www.rfc-editor.org/rfc/rfc4291 which is updated by https://www.rfc-editor.org/rfc/rfc5952#section-4.3 which recommends lowecasing the letters. (yeah that was quite a digging I know 🙂 )

- Since the withers expect well encoded components does it means that it is the same for the constructor. What is

the expected result for the following code ?

```php

$uri = new Uri\Rfc3986\Uri("https://example,com/?foo[]=1&foo[]=2");
```

Should the above trigger an exception because the query component contain invalid characters or
is it acceptable ? Asking because currently our dear old parse_url does not fail on this and
probably most PHP developers expect this not to fail.

IMHO I am in favor of it failing to get a consistent experience when using the class because
otherwse you introduce an inconsistency between the constructor behaviour and the rest of the class
API.

Best regards,
Ignace Nyamagana Butera


--------------az9hZd390p5NKqBGTcft82ke--