Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126969 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 4B71E1A00BC for ; Fri, 28 Mar 2025 15:44:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1743176520; bh=Cu/4wm9C1GtyQFbyDIjBOR/qz0QOUjJDycT69qQ5hLw=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=JCPzeT1ql4nkYmaTUh6V78CzT0g4b1A1N+d8va9SX6IkNYWAdaQ983qoTsAK40n6P H3AL2oSQH5DmV2nGIXN/r7WnT0z/p4uSCP8vbxh2nWu1Az4KlHt87qjsLSzkr8nBiM yRui7lu5iyur+3iarlybpMst435Kev+Rd9AEgn7763Ht2vDFYhM+L/whPYCM/el6Hc RlkQtAdfTTzIkwhwrzH3Qs+GWJuMph8tXi15hBPAteY8YRcDpgB++4M5juA0tMRzw9 a1tuWZu2+cnNz8qSrln/qVTg6CXnN8CE6xC5MVP0dSMU9FKm98f/+nHdaFxLv6+oUt Rw8dpY5JbwkCw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 1067F180053 for ; Fri, 28 Mar 2025 15:42:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_20,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from premium76-5.web-hosting.com (premium76-5.web-hosting.com [162.213.255.108]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 28 Mar 2025 15:41:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=pmjones.io; s=default; h=To:References:Message-Id:Content-Transfer-Encoding:Cc:Date: In-Reply-To:From:Subject:Mime-Version:Content-Type:Sender:Reply-To:Content-ID :Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To: Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe :List-Post:List-Owner:List-Archive; bh=g+wNnPvs2R7ds1xdOyLFR5rm3xplUTYUvmjgRZDWjX4=; b=DWIgWdHX8lUINduAt1Q3Sfw3iY /vYZro+d9w7QObRCXi3y4Ji7antC2oOFpihEvtWDoITScR9brSA3O61niIU9NVo9IamTrYLrUga+O ukbgRYqjRBlsMw16DFvHRfXOknrmeLdX5gpyMfNDz7tPLmiDkAp+vFyInueeK/ABTUSi8R5vG1llE Xvr/+yBvOSo/YuVQbyadJ8RlzWqNLh1QDCfrI2eqyUaQ6VQLBrwXej+LLTjX05d4M8NffiCXxv1KP JpKTzy10A8B/JGJGmAy+kdV+k+sTYftxSl1foc2b3YPKL7kqHeaLcpnCzga+5gZ/2RznsRCm6Cr/2 0Ahc4J2w==; Received: from 107-223-28-39.lightspeed.nsvltn.sbcglobal.net ([107.223.28.39]:60125 helo=smtpclient.apple) by premium76.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.1) (envelope-from ) id 1tyBsg-0000000G4Xj-2G5c; Fri, 28 Mar 2025 11:44:25 -0400 Content-Type: text/plain; charset=utf-8 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API In-Reply-To: Date: Fri, 28 Mar 2025 10:44:14 -0500 Cc: PHP Internals List Content-Transfer-Encoding: quoted-printable Message-ID: <1A7E42FA-27EA-404B-85EF-25190AFFDE79@pmjones.io> References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> <6430b9ed-638d-4247-9fa9-d1a9148c382b@gmail.com> <1FD11284-D682-4CB7-893F-D74A1904610D@pmjones.io> To: =?utf-8?B?TcOhdMOpIEtvY3Npcw==?= X-Mailer: Apple Mail (2.3826.400.131.1.6) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - premium76.web-hosting.com X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - pmjones.io X-Get-Message-Sender-Via: premium76.web-hosting.com: authenticated_id: pmjones@pmjones.io X-Authenticated-Sender: premium76.web-hosting.com: pmjones@pmjones.io X-Source: X-Source-Args: X-Source-Dir: X-From-Rewrite: unmodified, already matched From: pmjones@pmjones.io (Paul M . Jones) Hi Mat=C3=A9 and all, > On Mar 25, 2025, at 03:45, M=C3=A1t=C3=A9 Kocsis = wrote: Regarding Rowbot slowness compared to the RFC: > I can only assume that the excessive usage of objects makes the = library much slower than what's possible > even for a userland library (obviously, an internal C implementation = will always be faster). According to my results, the RFC's = implementation was > **two orders of magnitude** faster than the Rowbot library for parsing = a very basic "https://example.com" URL 1000 times (~0.002 sec vs ~0.56 = sec). I would not presume that the dedicated value objects are what "makes the = [Rowbot] library much slower" than the RFC -- instead, my first = intuition is that the *parsing* operations are slower in userland than = in C, and are primarily responsible for the comparative slowness. = Speedwise, creation of multiple objects from the parsed results would be = a rounding error compared to the parsing itself. > What I want to say with this is that it's perfectly fine to optimize a = userland library for ergonomics and for the usage of advanced OOP in = mind, but an internal > implementation should also keep efficiency in mind besides developer = experience. That's why I don't see myself implement separate objects for = some of > the components for now. But nothing would block us from doing it = later, if we found out it's necessary. I think that's fair. The main thing that stands out to me is not the = Scheme, Host, etc. value objects, but that the RFC presents no UrlRecord = -- which is very definitely part the WHATWG-URL specification. That is, = from reading the spec, I'd expect to see a UrlRecord, and a Url composed = from it. > I believe the most fundamental difference between the Rowbot library = and the RFC is that the RFC has native support for percent-decoding = (because > most properties are accessible in 2 variants), while the library = completely leaves this task for the user. I have some thoughts on that, but I'll save them for later. Meanwhile, AFAICT, neither Rowbot nor the RFC provide a percent = *en*coding mechanism, for consumers to put together properly-encoded = values. Have I missed it in the RFC, or is it somehow not necessary, or = something else? > This RFC is a synthesis of almost a year of discussion and refinement, = collaborated by some very clever folks, who have a lot of hands-on = experience of > URL parsing and handling. I would not presume otherwise! Even so, everyone makes mistakes and = oversights from time to time, including very clever folks of the kind = you describe above. > That's why I would say that input from Trevor Rowbotham is also = welcome in the discussion (especially his experience of some edge cases = he had to deal with) I agree -- it would be great for the RFC team to seek him out and invite = him to comment in this thread. > but the said library is nowhere near as widely adopted for it to = qualify as something we must definitely take into consideration > when designing PHP's new URL parsing API. Not to be too blunt, but the Rowbot library is far more widely adopted = than the RFC currently is; I think Rowbot represents an intersection of = theory and practice that one would be unwise to discard without = intentional and extensive consideration. >> A URLSearchParams class: >=20 > I like this concept too. And in fact, support for such a class is on = my to-do list, and is mentioned in the "Future Scope". Because it is part of the WHATWG-URL spec, I think it deserves = first-class treatment in this RFC ... > I just didn't want to make the RFC even longer, because we already = have a lot of details to discuss. ... but yeah, the sheer volume of the RFC makes it difficult to review = and pick apart. Which leads to my last point: I would really like to see at least two = separate RFCs here. They be a lot easier to review and critique that = way: - one for dealing with URIs as they exist now, especially one that the = honors the ways-of-working that exist in userland; and, - one for dealing with WHATWG-URL in its entirety, with all its = differences (some subtle, some not) from URIs. I can see arguments for either one being the "base" on which the other = would build. -- pmj