Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124013 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: by qa.php.net (Postfix, from userid 65534) id 6B0171A009D; Sat, 29 Jun 2024 08:20:11 +0000 (UTC) To: internals@lists.php.net,=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= Message-ID: <45d5a1cc-cb14-4e88-a202-a81554f06979@gmail.com> Date: Sat, 29 Jun 2024 10:20:11 +0200 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: nyamsprod@gmail.com Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API References: Content-Language: fr In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Posted-By: 94.225.114.231 From: nyamsprod@gmail.com (nyamsprod the funky webmaster) On 28/06/2024 22:06, Máté Kocsis wrote: > Hi Everyone, > > I've been working on a new RFC for a while now, and time has come to > present it to a wider audience. > > Last year, I learnt that PHP doesn't have built-in support for parsing > URLs according to any well established standards (RFC 1738 or the WHATWG > URL living standard), since the parse_url() function is optimized for > performance instead of correctness. > > In order to improve compatibility with external tools consuming URLs > (like browsers), my new RFC would add a WHATWG compliant URL parser > functionality to the standard library. The API itself is not final by > any means, the RFC only represents how I imagined it first. > > You can find the RFC at the following link: > https://wiki.php.net/rfc/url_parsing_api > > > Regards, > Máté > As a maintainer of a PHP userland URI toolkit I have a couple of questioms/remarks on the proposal. Fist, I look forward for finally having a real Url parser AND validator in PHP core. Any effort on that direction is always a welcomed good news. As far as I understand it, if this RFC were to pass as is it will model PHP URLs to the WHATWG specification. While this specification is getting a lot of traction lately I believe it will restrict URL usage in PHP instead of making developer life easier. While PHP started as a "web" language it is first and foremost a server side general purpose language. The WHATWG spec on the other hand is created by browsers vendors and is geared toward browsers (client side) and because of browsers history it restricts by design a lot of what PHP developers can currently do using `parse_url`. In my view the `Url` class in PHP should allow dealing with any IANA registered scheme, which is not the case for the WHATWG specification. Therefore, I would rather suggest we ALSO include support for RFC3986 and RFC3987 specification properly and give both specs a go (at the same time!) and a clear way to instantiate your `Url` with one or the other spec. In clear, my ideal situation would be to add to the parser at least 2 named constructors `UrlParser::fromRFC3986` and `UrlParser::fromWHATWG` or something similar (name can be changed or improved). While this is an old article by Daniel Stenberg (https://daniel.haxx.se/blog/2017/01/30/one-url-standard-please/), it conveys with more in depth analysis my issues with the WHATWG spec and its usage in PHP if it were to be use as the ONLY available parser in PHP core for URL. the PSR-7 relation is also unfortunate from my POV: PSR-7 UriInterface is designed to be at its core an HTTP URI representation (so it shares the same type of issue as the WHATWG spec!) meaning in absence of a scheme it falls back to the HTTP scheme validation. This is why the interface can forgone any nullable component because the HTTP spec allows it, other schemes do not. For instance the FTP scheme prohibits the presence of the query and fragment components which means they MUST be `null` in that case. By removing PSR-7 constraints we could add - the `Url::(get|to)Components` method: it would mimics `parse_url` returned value and as such ease migration from `parse_url` - the `Url::getUsername` and `Url::getPassword` to access the username and password component individually. You would still use the `withUserInfo` method to update them but you give the developer the ability to access both components directly from the `Url` object. These additions would remove the need for - `UrlParser::parseUrlToArray` - `UrlParser::parseUrlComponent` - `UrlComponent` Enum Cheers, Ignace