Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126770 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 551AC1A00BC for ; Fri, 14 Mar 2025 22:26:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1741991018; bh=CdhuE4DPHB/AHD2jt19+TVL7B3cjjujDtQekX7WkLVQ=; h=Date:Reply-To:Subject:To:References:From:In-Reply-To:From; b=cQeOhSyHbAQ2Z3VotyH5CqNq0Fy+599yHXpJIMALghRRFwvNPSEQ/swE2y3LW+6sj GDf1Ft94Sov6dXUmoFvpvzNYWSnznjZoCPmwY36F81M570d7RahT1IYQ+gnY2hzHlk RmB7fkL72BOkj3sqJAqkMndd+ma9C43y5IGn5FP7bT2pTjJs8wXOhpZCUdYlJhcuJj qH4Vg7vfvAjYC/lLWqV/w4fT4KKXIEBis1d3unAbPbvJlTleoPREg3sp4yEWPQS3Ma yxPvHzAZ6howt0guwHXblqSv14FDdtkNssf2O4oLlxxC5vMoE51v25hHtwBfc+N8kJ Hty/9icPTpn/g== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9186A180086 for ; Fri, 14 Mar 2025 22:23:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 14 Mar 2025 22:23:36 +0000 (UTC) Received: by mail-wm1-f44.google.com with SMTP id 5b1f17b1804b1-43cec5cd73bso934275e9.3 for ; Fri, 14 Mar 2025 15:26:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741991168; x=1742595968; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:reply-to:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=9otHUHdNb+OtIB2JFK67q0qFXTjew5GNmZQL4R1o9js=; b=IGVm95Jn/gnbxuakJU0jeztS92d8Wqt5f4PciTdws9i/CtyQqxguBqHh+if28k9d0l XwrI0ZFasZw7lDYlpQz3tjB2bu9qVTPWckYbzukyWl2QZRY5AA/hQJQq4LAdzS4UaVFi /Tj2NAlGfc8PFA3UfsmlHgigNYSHOcu3UA/ovKlNd64zjS2q8QCU1NHeNz+62bQcjLs3 GLFLvSjmsZth++zzHbQb961j1Zty2WA9cxmOF+S1rW/J85bKL545iMmpydrqFqK8hcM6 J/fiXvL46JiFk88xUZxwz5DPeWF/XdgQ02YccAes0dTpMCfQxTRh2rkIRa+u4eTfxO+5 2c2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741991168; x=1742595968; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:reply-to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9otHUHdNb+OtIB2JFK67q0qFXTjew5GNmZQL4R1o9js=; b=C62lyrarGrJTqMSNJdtxvbuFif0H65lPmBSghTuUfFOYqqzTjKpu7hAe+iziXxw1UI 8jLXkDU2GClwouM2rZtMuxMbDN9/VAWVbSjvizveKo4E9dmXexEUBFI5h7oK5B8APX45 FhDLJ0kIuQSqp0Lv2fIecEkylWMUAO1oTuGshd8TlW6VLyBnSoqHJAH2Vhy36Iqkcl5G QXahWGll20TFUo2tGdnSw1Fu8Ph6ao9CO0Yda9pxN/iew0gKTiAVQehAAM9SdVXmd4to s8DGvaKfYduBYqM8aaVA6awXn/csCacpj2vikaWF+I+sgQT9+UjhyjZTr5far2e/t7Wr ddgA== X-Forwarded-Encrypted: i=1; AJvYcCXkBQQtkG4DDezA0xniS5SMFn5oj1n1/FS72Eq7bw+LoQsTMJQPCVRAEvNGseQf0g38bweydSPksYw=@lists.php.net X-Gm-Message-State: AOJu0YxjoLDcI6bNh3wTlNVX/L1d4jmGvWpOxNHCl7WmL0dJA9oQZq90 hYbDJqmXz3lFfMm/ma/44Y6fhb5YUs2cYnmG661nWcpB4RCabPMUedLR6w== X-Gm-Gg: ASbGncs0PgxMPv0SFPHahySalmI63vk4DA/DN2XnGpaT2ZkbX5K0xZUfY+nwjN06j/s gty9LLt7XV48DDnAaNfyFip6Pa7nEoAx1/q8U0bJS0B5Aj8UXJkLTIF+xZwsgWZp1cIZs/2Lm+f zDWuREVD2a3N9RpHESbeMCRm/9cU/jSxoFzzud9SluIyUkCnrln6eQGm6/WP3j4v88ES/lw00aS 9rXcr4ZRvQ1w2Kq1/Yxiaj9YW3A0dB+ib0DfVgSpvUERk06AMwA5wqaNhORFrNdbDkIZld8KO9i 5/fN55kfcSXzK3bsJIZ/pG7RjLR8uYIiznRDSlyl95zbIC+AJXSBgGrGfQOaEKO/QurYXpaIXJN i2k0lM3ne1ZEPu+EbhkVmj9uHEQNyDmV2eAbT7JNNUZt0fWASxQJWuSn0Q7CRseaBYZ0OHDM5N+ 0FEYJMfUDeGwW9cZ/jM4FD3NdYlWm2itc= X-Google-Smtp-Source: AGHT+IFr5cTVi1zfFhP4E1uhfR6RO3wpQQPnXmocrg53+zBJwtn4iS1U+MnoNuYMg/MxQdkDE3D1lg== X-Received: by 2002:a05:600c:214:b0:43c:f16a:641e with SMTP id 5b1f17b1804b1-43d1f0a3aaemr51696355e9.6.1741991167376; Fri, 14 Mar 2025 15:26:07 -0700 (PDT) Received: from ?IPV6:2a02:1811:3716:cb00:e416:f1e:9f22:9ce0? (ptr-9c16nbex2v2i5554ulc.18120a2.ip6.access.telenet.be. [2a02:1811:3716:cb00:e416:f1e:9f22:9ce0]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d1ffbcf00sm29435475e9.10.2025.03.14.15.26.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 14 Mar 2025 15:26:06 -0700 (PDT) Message-ID: <6430b9ed-638d-4247-9fa9-d1a9148c382b@gmail.com> Date: Fri, 14 Mar 2025 23:26:04 +0100 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: nyamsprod@gmail.com Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= , PHP Internals List References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> Content-Language: fr In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: nyamsprod@gmail.com (ignace nyamagana butera) On 14/03/2025 20:45, Máté Kocsis wrote: > Hi Ignace, > >  > All URI components - with the exception of the host - can be > retrieved in two formats: > > I believe you mean - with the excepotion of the Port > > > Even though I specifically meant WHATWG's host that is only available in > only > one format, you are right, the port is never available in two formats. > So I've > changed the wording accordingly. > > 0 - It is a unfortunate that there's no IDNA support for RFC3986, I > understand the reasoning behind that decision but I was wondering if it > was possible to optin its use when the ext-intl extension is present ? > > > Good question, I think it's probably not the main concern. My specific > concern is that > RFC 3987 has around same length as RFC 3986, in a lot of cases it uses > the exact > wording of the initial RFC but changes URI to IRI, and of course adds the > IDNA specific parts. Maybe it's just me, but it's not easy to find it > out exactly what > has to be implemented above RFC 3986, and also, how it can be best achieved? > By extending the class for RFC 3986? Creating a totally separate class > that can > transform itself to an RFC 3986 URI? These and quite some other > questions have > to be answered first, which I would like to postpone. > > > 1 - Does it means that if/when Rfc3986/Uri get Rfc3987 supports they > will also get a `Uri::toDisplayString` and `Uri::getHostForDisplay` > maybe this should be stated in the Futurscope ? > > > It's a question that I also asked from myself. For now, I'd say that > Rfc3986/Uri shouldn't have these methods, since it doesn't support any such > capabilities. But Rfc3986\Iri should likely have these toString methods. > > > 4 - For consistency I would use toRawString and toString just like > it is > done for components. > > > I'm fine with this, I also think doing so would reasonably continue the > convention > getters do. > > > 5 - Can the returned array from __debugInfo be used in a "normal" > method > like `toComponents` naming can be changed/improve to ease migration > from > parse_url or is this left for userland library ? > > > I intend to add the __debugInfo() method purely to help debugging. > Without this > method, even I had a hard time when trying to compare the expected vs actual > URIs in my tests. > > But more importantly, sometimes the recomposed string is not enough to > have a > good understanding exactly what value each component has. For example > one can naively assume that the "mailto:kocsismate@php.net > " URI has a > user(info) component of "kocsismate" and a hostname of "php.net php.net>" (I probably > also did so before reading the RFCs). The representation provided by > __debugInfo() can quickly highlight that "kocsismate@php.net > " is the path in fact. > One could try to call the individual getters to find the needed > component, but having > such a method like __debugInfo() provides a much more clear picture > about the anatomy of > the URI. > > But otherwise I don't know how useful this method would be. Is there > anything else > besides helping the migration? > > Regards, > Máté Thanks for the clarification. I have other questions upon further readings: 1) around `Uri\UninitializedUriException` If I look at the behaviour of `DatetimeImmutable` in the same scenario or a Userland object instead of throwing an exception an error is thrown see: - https://3v4l.org/d4VrY - https://3v4l.org/Wn7En Shouldn't the URI feature follow the same path for consistency ? Instead of throwing an exception it should throw an Error on uninitialized issue at least. 2) around Normalization. In case of query normalization, sorting the query string is not mention does it means that with the current feature `http://example.com?foo=bar&foo=rab` is different from `http://example.com?foo=rab&foo=bar`