Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126979 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 1B0CC1A00BC for ; Sun, 30 Mar 2025 12:42:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1743338408; bh=8JPzE5e0ZAPqEL9Rx51fGUcj1ZIS/7Nn/oZdkNwxXmY=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=gwRDxLNxXOdOA2Vpe+gHDG9GzQFYOf6UQjUPaEQ6n5G/TLYZeLFU9odazAu8vJ762 lwwawF2HppNf868dZLD4Chl3SRJKaaXjwC+7QrH6qa1FjWBnyhzx7FHQcX32lNg1lY 3cJ1RoWGAuzTuiOJMyo2RUTFJ5FXw0SvNxJTsRZqq1V6TGDPRQ8Jn4flAV6S/Kb+u8 webeOQb65v7prGjLedzU5PlIr5AuxJOZcQfLdNTj5aoQjrZjNpJzndryiWT1W9YgdA voTSyhn3t6iltD25ixu3g2GH2Qx2XcvjqI/C9xe0o3DUW5sJQxSrB3w9PgqFpERC+o mBhG/kUnoeskg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A174E180551 for ; Sun, 30 Mar 2025 12:40:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 30 Mar 2025 12:40:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1743338553; bh=o2uQIoNWt9sfkFXBzI3ISRSCG7Nm2q68MrNJtGoVeXo=; h=MIME-Version:Date:From:To:Cc:Subject:In-Reply-To:References: Message-ID:Content-Type:from:to:cc:subject:message-id; b=QmCuvuiHILRagbYUEaZhxA2CNWp4+3T6ampsziIDzTcBTyqndkBd9ZZ55uynGbyNt tzC+Ys33blXvDVmQ7SltVKgRhsv5dzhPgI2kg5YrksoCBUeUgHoeVCYwuKnbKSAYZM Lx1xsUXRZYh+bUKFUup3NDsJkPT7qdzVn6/hsey19qQhzhUY2B143IbQctJecWwUbD DRrou0IDkzTiHQDjrkmgsE3l589AJNj2B38wg1V5wi/FQGdpglMTCPns7933xRDR57 VYzlua9szCeI2EuBvny7FncJK74JyCATx4ZOLr+/QSf9xmKYwSUD+LskHaDzosWJOh CukDZ8sRGoC4Q== Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Date: Sun, 30 Mar 2025 14:42:33 +0200 To: Ignace Nyamagana Butera Cc: =?UTF-8?Q?M=C3=A1t=C3=A9_Kocsis?= , PHP Internals List Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API In-Reply-To: References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> <6430b9ed-638d-4247-9fa9-d1a9148c382b@gmail.com> <2e95e8fe-7cf0-493f-bd0a-9fff0956baaa@gmail.com> Message-ID: <7d715757cc2dfd71019d106b01c69aed@bastelstu.be> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=C3=BCsterhus?=) Hi Am 2025-03-27 23:49, schrieb Ignace Nyamagana Butera: > Hi Máté, > > for RFC 3986: > https://datatracker.ietf.org/doc/html/rfc3986#section-5.3), and then > this string is parsed and validated. Unfortunately, I recently > realized that this approach may leave room for some kind of parsing > confusion attack, namely when the scheme is for example "https", the > authority is empty, and the path is "example.com > ". This will result in a https://example.com > URI. I believe a similar bug is not possible with the rest of the > components because they have their delimiters. So possibly some > other solution will be needed, or maybe adding some additional > validation (?). > > This is not correct according to RFC3986 > https://datatracker.ietf.org/doc/html/rfc3986#section-3 > > > *When authority is present, the path must either be empty or begin with > a slash ("/") character. When authority is not present, the path cannot > begin with two slash characters ("//"). * > > So in your example it should throw an Uri\InvalidUriException 🙂 for > RFC3986 and in case of the WhatwgUrl algorithm it should trigger a soft > error and correct the behaviour for the http(s) schemes. > This is also one of the many reasons why at least for RFC3986 the path > component can never be `null` but that's another discussion. Like I > said having a `fromComponenta` named constructor would allow the > "removal" of the need for a UriBuilder (in your future section) and > would IMHO be useful outside of the context of the http(s) scheme but I > can understand it being left out of the current implementation it might > be brought back for future improvements. I just tested this with the implementation and it also appears to not yet be correct: var_dump((new Uri\Rfc3986\Uri("example.com"))->getHost()); // NULL var_dump((new Uri\Rfc3986\Uri("example.com"))->withScheme('https')->getHost()); // string(11) "example.com" var_dump((new Uri\Rfc3986\Uri("example.com"))->withScheme('https')->toRawString()); // string(19) "https://example.com" and var_dump((new Uri\Rfc3986\Uri("foo/bar"))->withPath('//foo/bar')->getHost()); // string(3) "foo" Best regards Tim Düsterhus