Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127175 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id A5BD81AE503 for ; Wed, 23 Apr 2025 10:52:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1745405383; bh=Qqa7xdhfcnbfpNLX2hW6k9AOa9AO0eXL6Fzuci0CkWA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=fVYyHgAPWxzQRjs/XymWC/co1uKIihAqHUzQToZ8BN2OLrwelMHBuPGgeSPg6e09t Fqvhj3ZWrcZrDwYpxvqL7UwEHMszylTywOD8ZMm/yJHCsf0wnIw3kv3QhO0/pkDge0 CVEHjtCQOR69Z7QSB4Q4u5PAruNkhBgYosq0IReIJE1KBrzSgwrNDp1eC4ELpTMstL 2siQiADLAI9gFA+nLBq2WFzyGr4DNiQBpD62mPAxvnlrMsUITFfbhq/a6LCMNKxjgX 2mjgsaMEQvzL8davSKs/6iI+KjqDMOKsaxgBg7K8McjJfP8kqjDnnATKGciaPdme8L eRFEnWvWsr1XQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 1E3471801EB for ; Wed, 23 Apr 2025 10:49:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 23 Apr 2025 10:48:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1745405445; bh=Ze9XHAyLdFUTxQ+AwiFf17PCBqvo5sNZJczWSNc37o0=; h=MIME-Version:Date:From:To:Cc:Subject:In-Reply-To:References: Message-ID:Content-Type:from:to:cc:subject:message-id; b=FAQySXFaFeP9bDUA3vF3ColjeowCNkXbC6DZ82yDBxonXdLpqhc4JwnW3mYEJHx8c GtlLCGH43R1djydSo35WUZSABQY7yaJwoTPpNtGT+FMTUFucv9nluQ4C6GiQx1NOej 3DXyui/5LuFr0chyk8YRlwMweyZYr/IRApLvHXqAW/c8kVL9FI7Hh0Xnwy4j5otTqL kQ5+VDZas48ALSL2kgyZrk6f/bCiemGE95Y12veQJ+f+C1MxIcC5qtTPeTY7GQfU80 DXfB9SFM29cOYniFFhX6FG39XqO3nCYUVfy7FI/fJN+osFN7EyM4T3P+gl88+FvD7d F6SfEOf5yO9Lw== Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Date: Wed, 23 Apr 2025 12:50:44 +0200 To: =?UTF-8?Q?M=C3=A1t=C3=A9_Kocsis?= Cc: Internals Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API In-Reply-To: References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <8df04e01-deac-404b-beb7-cd982423db63@bastelstu.be> <33427cd03035ef084245c44290b56a55@bastelstu.be> <0aa1eefc3941bdea0092e935074daa58@bastelstu.be> Message-ID: <76d96ea8a78c6025128c0a4b01c94c0a@bastelstu.be> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=C3=BCsterhus?=) Hi Am 2025-04-17 13:18, schrieb Máté Kocsis: >> Sweet. I believe this was/is the last remaining blocker for the RFC or >> is there still anyone else from your side that needs to be discussed? >> I >> need to give the RFC another read once you made the adjustment to >> remove >> the WhatWg raw methods (and adjusted the corresponding explanations), >> but I think I'm happy then :-) >> > > No, I also think that was the last one, as I don't have any questions > left. > Although, > we should finalize what the WHATWG getters should be named? I like the > explicit "raw" > that you suggested, but I can also see that it may be confusing for > some > people. Altogether > I think I prefer adding "raw" so that it's clear that they behave > similarly > how the raw RFC 3986 getters > do. In https://news-web.php.net/php.internals/127114 I suggest to only provide the "non-raw" methods, so I believe you misread that. I've just given the RFC another read and thought about the naming and I believe I still prefer not having the "raw" in the name: - Having the `raw` in the name makes the API very clunky / verbose to use. - Other implementations, such as in browsers or node.js, also simply use the component name without any indication of the output being raw. - Future changes to the WHATWG URL specification might introduce some normalization for components that currently doesn't have normalization. This would make the `raw` naming a misnomer and might require new methods / deprecations on PHP's end. So it seems to be safer to use the naming without the `raw` and then in the documentation explain what happens with useful examples, just like the RFC already does. ------------ Other than that, I noticed the following small issues: 1. The `UrlValidationError` class is `final` in the implementation, but not in the RFC text. I assume that is an oversight. 2. In the "Advanced examples" section, the "another tricky example". There is a duplicate `?foo=bar%26baz%3Dqux` in the query-string. I assume that is unintentional and not part of the example. 3. In the "Advanced examples" section, the "another tricky example". I think it would be useful to have an explicit comparison to the output of the WHATWG URL, especially around the IPv6 normalization. I've seen that this is also mentioned later, but it's probably useful to have here as well. 4. In the "Component modification" section, for the "In order to offer consistent behavior with the parsing rules of RFC 3986, withers of Uri\Rfc3986\Uri also only accept properly formatted input," example: There is a `echo $uri->getRawHost(); // [2001:0db8:0001:0000:0000:0ab9:C0A8:0102]` call, but the host is never modified. That appears to be an error. 5. In the "Serialization" section: The explanation of the serialization format is overly specific regarding the implementation details. I would simplify that to just say "it supports serialization by using the toRawString() output and performs strict checks during unserialization" or similar. The reason is that I want to make some suggestions to the serialization format to provide greater flexibility for future changes during the technical review of the implementation :-) ------------ I did not give the implementation another test, since with the removal of the percent-decoding for WHATWG, the RFC just does what the other specifications already require. So this all makes sense to me and any differences would simply be a regular bug in the code, rather than the RFC text. Best regards Tim Düsterhus