Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126489 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 582111A00BC for ; Mon, 24 Feb 2025 11:08:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1740395148; bh=/vZOYM9f6Gop0JWUmFSq320A/5fAWPcO+1ny3D41E3c=; h=References:In-Reply-To:From:Date:Subject:To:From; b=PBxcN412nKv5xweN8zkTAiLYEZW5U/lbslI6Dax0DrWWZN3i/0D93xTx0GJuQhKwy EcdshP8JFNVwIxHDOCHVZVlhZr7DTpWfLaFJexHnJZV9nqZ0uey2ZE62YeUO9tU69o gtLShUoJMycH0R7af4HjSORDqAB0iBuy6NYt16vfhDCoIRMsVWFfUQvb2vcvbzQqlA RKw6Y6GVSoqylNxm3E0jEdjCSDsAOaco4N0H6Ij9SpwC4pTZO6LYviJqfSD9iNkSwG EM0tHmmOazaFBKuToOsc9NyZu+190FEvf3Z6u7y1km0+FJ1xK/dRnPqiZptWIwwQxY mJ+3oJRiRctJg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 0CD11180084 for ; Mon, 24 Feb 2025 11:05:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_20,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 24 Feb 2025 11:05:43 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-547bcef2f96so4264535e87.1 for ; Mon, 24 Feb 2025 03:08:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740395301; x=1741000101; darn=lists.php.net; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=/vZOYM9f6Gop0JWUmFSq320A/5fAWPcO+1ny3D41E3c=; b=l+bmjAm98VH8iBC5cuZLeLt2j3szQhysx5BWnQmt4EVKPLbSi5nKgRWP11cqq50aDn IWIY1G+/UMuKaWKlS6pLa2To6Q/ZJmX4chnC56CEj5ebiz74EYCV6ZUCA16Sr6Nj+83q iz1geuJofP9VlAH5xPv0N64j2Yd3LWNf75iS26VisdJpoybr+q4yrIvnlKkDvv5eL0ia er95/NKBwSLob0GuduB1hBDL0QpAqTpPlyJ0mB781HhEmGBvIGDWL4182L1+lI92U1z0 oV/hwkwJGD7TZkUCEwTQ12o84Kf3wiO4GOWU/NhDgjcz5YYFOVEnRqoLG4f3YU97caaB zjcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740395301; x=1741000101; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/vZOYM9f6Gop0JWUmFSq320A/5fAWPcO+1ny3D41E3c=; b=EJsRiRBCiPwmARq1GVWqqpF+XxO7mwR30Hm20dPKPKct3ivuOJfoQuUvC6VXGycp/F ImCxWo00X0LkEurHK4ggs1Orpo0OwITYJvjc9WntCHSJkWWMmZ6cJri+xHBHSrrk01AS 3l0eCEEZNwETF4FspB5LkosGlbrwRkU9OHE0BinvqlH2OeBal4PkGLDMbft16nLDQlKc v0UUJ7ln4M6tteRx+KhCo2xG+wsbtI6A8qMRGsP6WYwGHLakc+kZFTwnmriZpRgOC9QG Toxs8mnV52TFshrjdngk4vo1c06My7CP3Za5Nq8ozQxMaykMqaUnS+nY9QSwWDxIvhWr rtAw== X-Gm-Message-State: AOJu0YzfWXQQrteD52f4pahyMKzD5LmHtdeJ25lI9oNxtPRNsS8mxQ+n qEE+Fz+1xEpuAhyTzMKMLH5WrxFQ300bmn4CQLZ3jyuFlxR4GQdz6CEc9V0eP/9qL5hE2b0PlOo 6uiSFM7PmQsLQb4E4lIvIGFdJaUXSdRjDen0= X-Gm-Gg: ASbGnct2rzdlSNT0bAYlWo2FiBLdQo1uuNSCNUp/CrbGPbCDZ9YaCgHqmy0QZdsQA/7 BayVKSF6Qy1TaWtMXTWcNA+WuZOSuAEptROvVoeYegT4HXp8Gxy724IvwoRW65nyNX3U4DiqEmq 5ww466RA== X-Google-Smtp-Source: AGHT+IEZYRup9dQbceZSF1Xx82xhROfA1KrCJ155itGcpvjUYjN8r2Qs3f377GFJ8sXHvo/O2SC6TJlgXm/W9bzUOPY= X-Received: by 2002:a05:6512:238a:b0:545:ea9:1a1e with SMTP id 2adb3069b0e04-54838ef5c21mr4846026e87.26.1740395300406; Mon, 24 Feb 2025 03:08:20 -0800 (PST) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> In-Reply-To: Date: Mon, 24 Feb 2025 12:08:07 +0100 X-Gm-Features: AWEUYZkNkK_U5AZETfEtLDGNZKwgIbDEB5cpZqjN_fpf_R22Et4XlTn5S5fEfwE Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000a64f0c062ee15ba2" From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --000000000000a64f0c062ee15ba2 Content-Type: text/plain; charset="UTF-8" Hi, Thanks for all the efforts making this RFC happen, it'll be a game changer in the domain! I'm seeing a push to make the classes final. Please don't! This would badly break the open/closed principle to me. When shipping a new class, one ships two things: a behavior and a type. The behavior is what some want to close by making the class final. But the result is that the type will also be final. And this would lead to a situation where people tighly couple their code to one single implementation - the internal one. The situation I'm telling about is when one will accept an argument described as function (\Uri\WhatWg\Url $url) If the Url class is final, this signature means only one possible implementation can ever be passed: the native one. Composition cannot be achieve because there's no type to compose. Fine-tuning the behavior provided by the RFC is what we might be most interested in, but we should not forget that we also ship a type. By making the type non-final, we keep things open enough for userland to build on it. If not, we're going to end up with a fragmented community: some will tightly couple to the native Url implementation, some others will define a UriInterface of their own and will compose it with the native implementation, all these with non-interoperable base types of course, because interop is hard. By making the classes non-final, there will be one base type to build upon for userland. (the alternative would be to define native UrlInterface, but that'd increase complexity for little to no gain IMHO - althought that'd solve my main concern). > 5 - Can the returned array from __debugInfo be used in a "normal" > > method like `toComponents` naming can be changed/improve to ease > > migration from parse_url or is this left for userland library ? > > I would prefer not expose this functionality for the same reason that > there are no raw properties provided: The user must make an explicit > choice whether they are interested in the raw or in the normalized > version of the individual components. > The RFC is also missing whether __debugInfo returns raw or non-raw components. Then, I'm wondering if we need this per-component break for debugging at all? It might be less confusing (on this encoding aspect) to dump basically what __serialize() returns (under another key than __uri of course). This would also close the avenue of calling __debugInfo() directly (at the cost of making it possibly harder to move away from parse_url(), but I don't think we need to make this simpler - getting familiar with the new API before would be required and welcome actually.) > It can make sense to normalize a hostname, but not the path. My usual > example against normalizing the path is that SAML signs the *encoded* > URI instead of the payload and changing the case in percent-encoded > characters is sufficient to break the signature I would be careful with this argument: signature validation should be done on raw bytes. Requiring an object to preserve byte-level accuracy while the very purpose of OOP is to provide abstractions might be conflicting. The signing topic can be solved by keeping the raw signed payload around. --000000000000a64f0c062ee15ba2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Thanks for all the efforts making this RFC happen, it'= ll be a game changer in the domain!

I'm = seeing a push to make the classes final. Please don't!
This would badly break the open/closed principle to me.

When shippi= ng a new class, one ships two things: a behavior and a type. The behavior i= s what some want to close by making the class final. But the result is that= the type will also be final. And this would lead to a situation where peop= le tighly couple their code to one single implementation - the internal one= .

The si= tuation I'm telling about is when one will accept an argument described= as
function (\Uri\WhatWg\Url $url)

If the Url class= is final, this signature means only one possible implementation can ever b= e passed: the native one. Composition cannot be achieve because there's= no type to compose.

Fine-tuning the behavior provided by the RFC is what we might = be most interested in, but we should not forget that we also ship a type. B= y making the type non-final, we keep things open enough for userland to bui= ld on it. If not, we're going to end up with a fragmented community: so= me will tightly couple to the native Url implementation, some others will d= efine a UriInterface of their own and will compose it with the native imple= mentation, all these with non-interoperable base types of course, because i= nterop is hard.

By making the classes non-final, there will be one base type to bui= ld upon for userland.
(the alternative would= be to define native UrlInterface, but that'd increase complexity for l= ittle to no gain IMHO - althought that'd solve my main concern).
<= div class=3D"gmail_attr">
> 5 - Can the = returned array from __debugInfo be used in a "normal"
> method like `toComponents` naming can be changed/improve to ease
> migration from parse_url or is this left for userland library ?

I would prefer not expose this functionality for the same reason that
there are no raw properties provided: The user must make an explicit
choice whether they are interested in the raw or in the normalized
version of the individual components.

T= he RFC is also missing whether __debugInfo returns raw or non-raw component= s. Then, I'm wondering if we need this per-component break for debuggin= g at all? It might be less confusing (on this encoding aspect) to dump basi= cally what=C2=A0__serialize() returns (under another key than __uri of cour= se).
This would also close the avenue of calling __debugInfo() di= rectly (at the cost of making it possibly harder to move away from parse_ur= l(), but I don't think we need to make this simpler - getting familiar= =C2=A0with the new API before would be required and welcome actually.)

=C2=A0
It can make sense to normalize a hostname, but not the path. My usu= al
example against normalizing the path is that SAML signs the *encoded*
URI instead of the payload and changing the case in percent-encoded
characters is sufficient to break the signature

=
I would be careful with this argument: signature validation should be = done on raw bytes. Requiring an object to preserve byte-level accuracy whil= e the very purpose of OOP is to provide abstractions might be conflicting. = The signing topic can be solved by keeping the raw signed payload around.

=C2=A0

--000000000000a64f0c062ee15ba2--