Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126930 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 694841A00BC for ; Tue, 25 Mar 2025 08:45:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1742892176; bh=LQ/KYA0fm1AjsFb7OLzlQi8Qx7znBQX/6ukh/MOyVPs=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=QD1u6cI+HbA52Zp1vn+fPvnWa/qa1MeJDpNRL05V3BHoNDgQi7jWZ/1GrUAcsX5oO mlAZ03moznoYyuWKU3W6yeht844TajlGa+2jQwCYEO5c1OeIf4q7ljVdxBqcurrY7T EyJXE8xeBtc9qfgDxuXJfbrGYdEYG0g281FtZPgTB7llzjWahqBxzMVQpesqRoi3ko lqqCLVipRZr6n4oQKY9B9tssJ55Lix71a8RwqlBusW4YAvmyoA35VNXFhpdY93cjaO HVgoc/m4Q66rZAbr/2maX2/GtMWufFBAD5xjdlIZUVa5kE5rq3vzFUzmpWs5+wVwb0 rrmpt7GpY4KsQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id CEAC0180041 for ; Tue, 25 Mar 2025 08:42:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_05,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 25 Mar 2025 08:42:55 +0000 (UTC) Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-4774d68c670so14878321cf.0 for ; Tue, 25 Mar 2025 01:45:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742892324; x=1743497124; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=LQ/KYA0fm1AjsFb7OLzlQi8Qx7znBQX/6ukh/MOyVPs=; b=iIzFIm2yXg265RX7gc5lBw1OLQkof/QXgk2ijMiWFvQoDgneS89yUR4e8P8PYtoLtb N/8tOTIRNppUoBRfn1sQsjB+TsBWqR3I8KQkME6J0hRRZ8zGPd068un4RqdjmfZgVJ21 J0GTbnQ7aaLH43HH6fp/5u5wM8his/xFXQ3wwJQ+AR5IVvxtN5xrLlJtxSTOE4wqxdWp bUCER5Iu+RUCvmdB9D3LI1VcYQyfPzvvGcVayKH2nP6oqu4fMM/o4t9mZcOPzuxplywJ pbCW1b7Frju9tzV21VTiaPjGz+Gyb52V2xBReddspnZNl4Yp2qgdPnV8WpF5MS0L4k7d AI/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742892324; x=1743497124; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LQ/KYA0fm1AjsFb7OLzlQi8Qx7znBQX/6ukh/MOyVPs=; b=jBJ8eWUXJsKdvITK6ilStond5yKEgl5bNHyKTxYNDFz/ZGen5NaA/eDtCNg3GE9ISw lS3sunKjsR10LRHnBG7JrA82Yq6+x5AEXfYnfgZNHAMoAZ5salb26K8E7qflhfJ+Rygu 2X8OdtkUQHW8r01HwzkYoT7wfxoe1nf8GBp6sRKmy/fDXDOxjIVCK0Tp1vPo5sofKA9R bwNZa4EFPoOBAP4066jGaXBYHKiK0QxIe3Mm+LmtpWyC62wuTVuuFQn3mahENGnQ9Lkm JtQD0Q7L6UVKBj+wQRhFQaJXEt1qw0EKpxDqL5CfPGXgewp75M1nWFCjPqqi3yukDGe5 04SA== X-Gm-Message-State: AOJu0YwTrI70aco6Xooh9gbS4y2B0IyrEbyrUeee4wVNqUdHWMtBcbWH D07Abh/T80qdwrBXfMu9PzpedkuMibPy3CfFa0Ot0oAAZmeBjMa+JSSzwUwNNS3fSWK5mawUWsy T2VQkRT1U9LXMW3DM2D4DKsEy/GbFw/fDnJuJUA== X-Gm-Gg: ASbGnctUVU3zbJ57yFVMWk5/GdoU7XFXoQY8cD/zVkDWSYx0zpPYvNCC0P1XYrU/Gcx sMdQc070ymY1oAb+POP6hzFOpFYdrO3KvjexgeHOiJZ3uEASQt3iQO42Y8nRsFQ3kTEKOu2fTQ+ ZYu0RaInqRfhtO2+zTjWFBOBSLYHs= X-Google-Smtp-Source: AGHT+IEAG2e9y7OYoJDlbPqDXGAcxZaAZWfPkEG/vq22Kcet4xKKJVRLClQMQgaeZ8u9wGBDoqRTN4wxrCme7pk9gKw= X-Received: by 2002:a05:622a:5509:b0:476:6e2e:58b with SMTP id d75a77b69052e-4771de8e1a0mr332781621cf.49.1742892323675; Tue, 25 Mar 2025 01:45:23 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <1BCB4144-231D-45EA-A914-98EE8F0F503A@automattic.com> <8E614C9C-BA85-45D8-9A4E-A30D69981C5D@automattic.com> <9bf11a89-39d9-457b-b0ea-789fd07d7370@gmail.com> <6430b9ed-638d-4247-9fa9-d1a9148c382b@gmail.com> <1FD11284-D682-4CB7-893F-D74A1904610D@pmjones.io> In-Reply-To: Date: Tue, 25 Mar 2025 09:45:12 +0100 X-Gm-Features: AQ5f1JrB8hf-PQZvKysdA47vkj-FhL2nU_OpSw5yg8V-Fefa6Vor9YtSo3uAQeQ Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: "Paul M. Jones" Cc: PHP Internals List Content-Type: multipart/alternative; boundary="000000000000d5a317063126bde2" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --000000000000d5a317063126bde2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Paul, ## Rowbot > > (None of the classes are readonly or final; these look to hew closely to > the WHATWG-URL spec.) > > A BasicURLParser class: > > - affords relative parsing capability and an option parameter for the > target URLRecord > - returns a URLRecord > > A URLRecord class: > > - public mutable properties for the URL components > - $scheme is a Scheme implementation with equals() and other is...() > methods > - $host is a HostInterface (and implementations) with equals() and other > is...() methods > - $path is a PathInterface (and PathList implementation) with PathSegment > manipulation methods > - setUsername() and setPassword() mutators > - serializing > - getOrigin(), includesCredentials(), isEqual() > > A URL class: > > - Composed of a URLRecord and a URLSearchParams object > - Constructor takes a string, parses it to a URLRecord, and retains the > URLRecord > - a static parse() method with relative parsing, as a convenience method > - __toString() and toString() return the serialized URLRecord > - Virtual properties for $href, $origin, $protocol, $username, $password, > $host, $hostname, $port, $pathname, $search, $searchParams, $hash > - Mutability of virtual properties via magic __set() > - Readability of virtual properties via magic __get() > I like some of the solutions this library uses - the usage of dedicated value objects for some components (Scheme, HostInterface, PathInterface) -, but these features are what make the implementation extremely slow compared to the implementation the RFC proposes. I didn't dig into the details when I performed a very quick benchmark last week, so I can only assume that the excessive usage of objects makes the library much slower than what's possible even for a userland library (obviously, an internal C implementation will always be faster). According to my results, the RFC's implementation was **two orders of magnitude** faster than the Rowbot library for parsing a very basic "https://example.com" URL 1000 times (~0.002 sec vs ~0.56 sec). What I want to say with this is that it's perfectly fine to optimize a userland library for ergonomics and for the usage of advanced OOP in mind, but an internal implementation should also keep efficiency in mind besides developer experience. That's why I don't see myself implement separate objects for some of the components for now. But nothing would block us from doing it later, if we found out it's necessary. I believe the most fundamental difference between the Rowbot library and the RFC is that the RFC has native support for percent-decoding (because most properties are accessible in 2 variants), while the library completely leaves this task for the user. Apart from that, the mutable design of the library is fragile for the same reason as the DateTime class is not safe to use in most cases, so that's definitely a no-go for me. This RFC is a synthesis of almost a year of discussion and refinement, collaborated by some very clever folks, who have a lot of hands-on experience of URL parsing and handling. That's why I would say that input from Trevor Rowbotham is also welcome in the discussion (especially his experience of some edge cases he had to deal with), but the said library is nowhere near as widely adopted for it to qualify as something we must definitely take into consideration when designing PHP's new URL parsing API. > A URLSearchParams class: > > - search params manipulation methods > - implements Countable, Iterator, Stringable > - composed of a QueryList implementation and (optionally) the originating > URLRecord > > I like this concept too. And in fact, support for such a class is on my to-do list, and is mentioned in the "Future Scope". I just didn't want to make the RFC even longer, because we already have a lot of details to discuss. M=C3=A1t=C3=A9 --000000000000d5a317063126bde2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Paul,

=
## Rowbot

(None of the classes are readonly or final; these look to hew closely to th= e WHATWG-URL spec.)

A BasicURLParser class:

- affords relative parsing capability and an option parameter for the targe= t URLRecord
- returns a URLRecord

A URLRecord class:

- public mutable properties for the URL components
- $scheme is a Scheme implementation with equals() and other is...() method= s
- $host is a HostInterface (and implementations) with equals() and other is= ...() methods
- $path is a PathInterface (and PathList implementation) with PathSegment m= anipulation methods
- setUsername() and setPassword() mutators
- serializing
- getOrigin(), includesCredentials(), isEqual()

A URL class:

- Composed of a URLRecord and a URLSearchParams object
- Constructor takes a string, parses it to a URLRecord, and retains the URL= Record
- a static parse() method with relative parsing, as a convenience method - __toString() and toString() return the serialized URLRecord
- Virtual properties for $href, $origin, $protocol, $username, $password, $= host, $hostname, $port, $pathname, $search, $searchParams, $hash
- Mutability of virtual properties via magic __set()
- Readability of virtual properties via magic __get()
=
I like some of the solutions this library uses - the usage o= f dedicated value objects for some components (Scheme, HostInterface, PathI= nterface) -, but
these features are what make the implementation = extremely slow compared to the implementation the RFC proposes. I didn'= t dig into the details when
I performed a very quick benchmark la= st week, so I can only assume that the excessive usage of objects makes the= library much slower than what's possible
even for a userland= library (obviously, an internal C implementation will always be faster). A= ccording to my results, the RFC's implementation was
**two or= ders of magnitude** faster than the Rowbot library for parsing a very=C2=A0= basic "https://example.com" U= RL 1000 times (~0.002 sec vs ~0.56 sec).

What I wa= nt to say with this is that it's perfectly fine to optimize a userland = library for ergonomics and for the usage of advanced OOP in mind, but an in= ternal
implementation should also keep=C2=A0efficiency in mind be= sides developer experience. That's why I don't see myself implement= separate objects for some of
the components for now. But nothing= would block us from doing it later, if we found out it's necessary.

I believe the most fundamental difference between th= e Rowbot library and the RFC is that the RFC has native support for percent= -decoding (because
most properties are accessible in 2 variants),= =C2=A0while the library completely leaves this task for the user. Apart fro= m that, the mutable design of the library
is fragile for the same= reason as the DateTime class is not safe to use in most cases, so that'= ;s definitely a no-go for me.

This RFC is a synthe= sis of almost a year of discussion and refinement, collaborated by some ver= y clever folks, who have a lot of hands-on experience of
URL pars= ing and handling. That's why I would say that input from Trevor Rowboth= am is also welcome in the discussion (especially his experience of some
edge cases he had to deal with), but the said library is nowhere nea= r as widely adopted for it to qualify as something we must definitely take = into consideration
when designing PHP's new URL parsing API.<= /div>


A URLSearchParams class:

- search params manipulation methods
- implements Countable, Iterator, Stringable
- composed of a QueryList implementation and (optionally) the originating U= RLRecord


I like this concept too. And in fact, = support for such a class is on my to-do list, and is mentioned in the "= ;Future Scope". I just didn't want to make the RFC
even = longer, because we already have a lot of details to discuss.

=
M=C3=A1t=C3=A9
--000000000000d5a317063126bde2--