Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124256 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: by qa.php.net (Postfix, from userid 65534) id D8DEE1A009D; Sun, 7 Jul 2024 10:55:18 +0000 (UTC) To: internals@lists.php.net,=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= Message-ID: <5b83b423-f95f-4bec-9bc0-ec1f0114426d@gmail.com> Date: Sun, 7 Jul 2024 12:55:18 +0200 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Reply-To: nyamsprod@gmail.com Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API References: <71a73b87-cc2f-4ee5-a961-7bf2b191fbb6@gmail.com> <5159E0AB-C8B0-4A54-9654-986C1D9C858F@koalephant.com> <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> Content-Language: fr In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Posted-By: 94.225.114.231 From: nyamsprod@gmail.com (ignace nyamagana butera) Hi Máté, > Supporting IANA registered schemes is a valid request, and is definitely useful. However, I think this feature is not strictly required to have in the current RFC. True. Having a WHATWG compliant parser in PHP source code is a big +1 from me I have nothing against that inclusion. > Based on your and others' feedback, it has now become clear for me that parse_url() is still useful and ext/url needs quite some additional capabilities until this function really becomes superfluous. `parse_url` can only be deprecated when a RFC3986 compliant parser is added to php-src, hence why I insist in having that parser being present too. I will also add that everything up to now in PHP uses RFC3986 as basis for generating or representing URLs (cURL extension, streams, etc...). Having the first and only OOP representation of an URL in the language not following that same specification seems odd to me. It opens the door to inconcistencies that will only be resolved once an equivalent RFC3986 URL object made its way into the source code. On the public API side I would recommend the following: - if you are to strictly follow the WHATWG specification no URI component can be null. They must all be strings. If we have to plan to use the same object for RFC3986 compliant parser, then all components should be nullable except for the path component which can never be null as it is always present. - As other have mention we should add a method to resolve an URI against a base URI something like Url::resolve(string $url, Url|string|null $baseUrl) where the baseURL argument should be an absolute Url if present. If absent the url argument must be absolute otherwise an exception should be thrown - last but not least the WHATWG specification is not only a URL parser but also a URL validator and can apply some "correction" to malformed URL and report them. The specification has a provision for a structure to report malformed URL errors. I failed to see this mechanism being mention anywhere the RFC. Will the URL only trigger exceptions or will it also triggers warnings ? For inspiration the excellent PHP userland WHATWG URL parser from Trevor Rowbotham https://github.com/TRowbotham/URL-Parser allow using a PSR-3 logger to record those errors. Best regards, Ignace