Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126490 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 57DCE1A00BC for ; Mon, 24 Feb 2025 12:48:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1740401139; bh=m1LjzhR73mEMIdKDN6QSO4QkXSg6gp871mApuwF4utI=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=RnrzXWz2nCEpFY8oCSlbz5s2/BOs0ZrCWbLcPggzKq+Ln3KtxHZYvD/VWA1RsbHpJ CnfxdLSaLwQkXnfklWLT//sDjWhdPlM4kKqZ2JisZY2dGpq9Sj4EogiCaBjWFkYDl9 5/fCzyOkhzzHoqn2a4GJkwQpI//+xF2tJeWvlGWuHfcz4qEadA99Hr1rOex/eDO7E4 C3FkntHbLq1BQTXGuD6+FUIuhdX7nyRjnPd3mSOze5LoB2EqUIOp9kCIHB7Q5X7C3b UUcrzn4Imgy4Pkf9u4BsOnRATp8fX5t7irCMxPovGWiSXiHphnQrscTjkWNSTAGwXg smCc/pHpJYemw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2BE64180069 for ; Mon, 24 Feb 2025 12:45:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 24 Feb 2025 12:45:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1740401294; bh=M7LuNMZAi8UOJGZ6bygbcaL4wVjregalagOzBKCx1/A=; h=MIME-Version:Date:From:To:Cc:Subject:In-Reply-To:References: Message-ID:Content-Type:from:to:cc:subject:message-id; b=GtVcBEd+pNhWkh77jKXfXx0gNr415zc+vYvHc665Zsw3K2WPN9e2wXBWaWppF2xMQ FGHs6UlY3cdmuaHuxFlfz8tLC4vC0UVBZnMqksmmvVtwBefDeV1vzGL9OME9xHVimZ cQ09p5RzQzt+2Em2t8HHKtaC5PwBefqnyCHU5anzpexuZxKuTg0mUtn/JeHu9Rrefc K5GK0d17Hnk3GstdiycoNUjqNB24+R8cN25kDN+I+6qvFxztILMuwwdilEQuYKK7y8 NTN/D2aLNKX5qnRpbWfeQrci6LULHmo3LzOjqdzvvn0fhWyuf6NXSmAYy1u0kRcdzh k8Rk5RcyCUknw== Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Date: Mon, 24 Feb 2025 13:48:14 +0100 To: Nicolas Grekas Cc: internals@lists.php.net Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API In-Reply-To: References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> Message-ID: <804ddb57fee36c23839c5b5a50ddd51f@bastelstu.be> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=C3=BCsterhus?=) Hi Am 2025-02-24 12:08, schrieb Nicolas Grekas: > The situation I'm telling about is when one will accept an argument > described as > function (\Uri\WhatWg\Url $url) > > If the Url class is final, this signature means only one possible > implementation can ever be passed: the native one. Composition cannot > be > achieve because there's no type to compose. Yes, that's the point: The behavior and the type are intimately tied together. The Uri/Url classes are representing values, not services. You wouldn't extend an int either. For DateTimeImmutable inheritance being legal causes a ton of needless bugs (especially around serialization behavior). > Fine-tuning the behavior provided by the RFC is what we might be most > interested in, but we should not forget that we also ship a type. By > making For a given specification (RFC 3986 / WHATWG) there is exactly one correct interpretation of a given URL. “Fine-tuning” means that you are no longer following the specification. > the type non-final, we keep things open enough for userland to build on > it. This works: final class HttpUrl { private readonly \Uri\Rfc3986\Uri $uri; public function __construct(string $uri) { $this->uri = new \Uri\Rfc3986\Uri($uri); if ($this->uri->getScheme() !== 'http') { throw new ValueError('Scheme must be http'); } } public function toRfc3986(): \Uri\Rfc3986\Uri { return $this->uri; } } Userland can easily build their convenience wrappers around the classes, they just need to export them to the native classes which will then guarantee that the result is fully validated and actually a valid URI/URL. Keep in mind that the ext/uri extension will always be available, thus users can rely on the native implementation. > By making the classes non-final, there will be one base type to build > upon > for userland. > (the alternative would be to define native UrlInterface, but that'd > increase complexity for little to no gain IMHO - althought that'd solve > my > main concern). Mate already explained why a native UriInterface was intentionally removed from the RFC in https://news-web.php.net/php.internals/126425. > The RFC is also missing whether __debugInfo returns raw or non-raw > components. Then, I'm wondering if we need this per-component break for > debugging at all? It might be less confusing (on this encoding aspect) > to > dump basically what __serialize() returns (under another key than __uri > of > course). That would also work for me. >> It can make sense to normalize a hostname, but not the path. My usual >> example against normalizing the path is that SAML signs the *encoded* >> URI instead of the payload and changing the case in percent-encoded >> characters is sufficient to break the signature > > > I would be careful with this argument: signature validation should be > done > on raw bytes. Requiring an object to preserve byte-level accuracy while > the > very purpose of OOP is to provide abstractions might be conflicting. > The > signing topic can be solved by keeping the raw signed payload around. Yes, the SAML signature behavior is wrong, but I did not write the SAML specification. I just pointed out how it a possible use-case where choosing the raw or normalized form depends on the component and where a “get all components” function would be dangerous. Best regards Tim Düsterhus