Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126845 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 89BAB1A00BC for ; Wed, 19 Mar 2025 15:13:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1742397088; bh=XeotpkNEycdf+FaTypuLdo/1+yJohxArvytnmfpNPlk=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=grMTj1ecMlLt2Xpfkxu1BAtxQCxZwiSSA99Fawg2CmoHxHNMyOnNvp++EsSqdUqIO fSl8I05aGTV6SGGmeLh5hn0do7AITOxTssAOZjt9jYq+1ZbzrExwcqRov9mGjqUVm9 7vubuBRnymHD0cGuZ2ob24108s2Hk4m5bES+qIfjD1zv7sDDqz5qDME4urfIl5Qk73 brxr1Pke485e7GFcXdJBghaM884mObuS2QyS2A+yGbZu2foGOwbibIBU9euCks2Zwi JoipR5XBa0mDNdwgfjUpUcrpsvcU71c1nrpS7xmdk86stJIAtvxqrs/J8BooZPIEgt OWvWn+uNvnfIw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7505E180032 for ; Wed, 19 Mar 2025 15:11:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from premium76-5.web-hosting.com (premium76-5.web-hosting.com [162.213.255.108]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 19 Mar 2025 15:11:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=pmjones.io; s=default; h=To:References:Message-Id:Content-Transfer-Encoding:Cc:Date: In-Reply-To:From:Subject:Mime-Version:Content-Type:Sender:Reply-To:Content-ID :Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To: Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe :List-Post:List-Owner:List-Archive; bh=XeotpkNEycdf+FaTypuLdo/1+yJohxArvytnmfpNPlk=; b=m8o0IjOfs62qmRIVhiw0e18c3V xuLam6MnJf3eJA9ZMP7gFzjitlXaLganz8S3VUtvmX4RT9+SZIHL3R9m/ajA6kydFdw2tYPg+Dg+s NaEmZmB5ku6BtOsL0Rgk96kNGThrtMNAiwNFpN1Ipa3dI/ADxIrsotMuYS2bT4UD6PXqaZXICqS06 EMZqjo0iaQQBzO5MiLonaH4rPhjxeJt4fDA773CTBHU/BaVh/wqYKUxkJc2mHDD53yGvvnTPpKBQS z5Sxdeb99aNLRBb1hrwJNJMDSvsvASBuwVfxc3QWZ34oqQS44ytOUwfAAQC6LFGVoJwCXV0jAUPVE X9lMEvOA==; Received: from 107-223-28-39.lightspeed.nsvltn.sbcglobal.net ([107.223.28.39]:63946 helo=smtpclient.apple) by premium76.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.1) (envelope-from ) id 1tuv7C-00000008z1m-1CJi; Wed, 19 Mar 2025 11:13:54 -0400 Content-Type: text/plain; charset=utf-8 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API In-Reply-To: Date: Wed, 19 Mar 2025 10:13:42 -0500 Cc: PHP Internals List Content-Transfer-Encoding: quoted-printable Message-ID: References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> <6430b9ed-638d-4247-9fa9-d1a9148c382b@gmail.com> <1FD11284-D682-4CB7-893F-D74A1904610D@pmjones.io> To: =?utf-8?B?TcOhdMOpIEtvY3Npcw==?= X-Mailer: Apple Mail (2.3826.400.131.1.6) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - premium76.web-hosting.com X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - pmjones.io X-Get-Message-Sender-Via: premium76.web-hosting.com: authenticated_id: pmjones@pmjones.io X-Authenticated-Sender: premium76.web-hosting.com: pmjones@pmjones.io X-Source: X-Source-Args: X-Source-Dir: X-From-Rewrite: unmodified, already matched From: pmjones@pmjones.io ("Paul M. Jones") Hi Mat=C3=A9, > On Mar 18, 2025, at 15:15, M=C3=A1t=C3=A9 Kocsis = wrote: >=20 > There's no way I would have written an implementation from scratch. = I'm using the url module of the Lexbor C library = (https://github.com/lexbor/lexbor/) for handling WHATWG URLs. It's = already bundled in core, and it's also battle tested, and it has = exceptional maintenance. I did not mean to imply writing a parser from scratch; my apologies for = phrasing it poorly. > All I had to implement is the glue between userland and the C library. That is more what I was getting at. Rowbot has a lot of what looks to be = good design work on structures that come out of the parsing, in addition = to a separate parser class. The RFC might benefit from an explicit and intentional review of, and = maybe incorporation of, some of the pre-existing Rowbot design work. At = least one thing from Rowbot is absolutely not applicable to the RFC = (e.g. the PSR-3 logging); maybe none of rest of it will be applicable = either, but as prior art from someone acknowledged in the WHATWG-URL = spec, I think it bears your close attention. As an overview, the following is a brief comparison between Rowbot and = the RFC; any missed or misrepresented functionality is unintentional. * * * ## RFC One non-final readonly Url class: - 5 getRaw...() methods, 8 get...() methods, and one get...ForDisplay() = method - immutability via 8 with...() methods, broadly expecting = properly-encoded arguments, and soft-erroring on invalid characters - a static parse() method, with relative parsing capability and a place = to capture errors - equals() to compare two URLs - toString() for machine-friendly string recomoposition - toDisplayString() for human-friendly string recomposition - resolve() to resolve a relative URL using the current URL as the base - serialize/deserialize; "the serialized form only includes the = recomposed URI itself exposed as the `__uri` field, but the individual = properties or URI components are not present." - no URLSearchParams implementation ## Rowbot (None of the classes are readonly or final; these look to hew closely to = the WHATWG-URL spec.) A BasicURLParser class: - affords relative parsing capability and an option parameter for the = target URLRecord - returns a URLRecord A URLRecord class: - public mutable properties for the URL components - $scheme is a Scheme implementation with equals() and other is...() = methods - $host is a HostInterface (and implementations) with equals() and other = is...() methods - $path is a PathInterface (and PathList implementation) with = PathSegment manipulation methods - setUsername() and setPassword() mutators - serializing - getOrigin(), includesCredentials(), isEqual() A URL class: - Composed of a URLRecord and a URLSearchParams object - Constructor takes a string, parses it to a URLRecord, and retains the = URLRecord - a static parse() method with relative parsing, as a convenience method - __toString() and toString() return the serialized URLRecord - Virtual properties for $href, $origin, $protocol, $username, = $password, $host, $hostname, $port, $pathname, $search, $searchParams, = $hash - Mutability of virtual properties via magic __set() - Readability of virtual properties via magic __get() A URLSearchParams class: - search params manipulation methods - implements Countable, Iterator, Stringable - composed of a QueryList implementation and (optionally) the = originating URLRecord * * * -- pmj