Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124435 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id D8B091A00B7 for ; Mon, 15 Jul 2024 19:31:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1721071980; bh=6M70tI3l2mCMTA2+VuuYeVNLLBTscPa03E58yPQYDtE=; h=Date:Subject:To:References:From:In-Reply-To:From; b=YHHWQCH/foKhWwp41LEwKImKORSmLcc3+4Qw2oFjKM/pYizMT/ohUx9fA6PosAq74 tOBK+Rzx91BAkpB8Tfe41JH/waVAnJ0leSTyKYyk4DoBKStt29uDEIaqCS4vNSVW/3 RJFpzN8sdYwQ6zf3jhrz9R2VHAKQTcbeRu/fAywyqv7PE3gAFhjm+NPsxy/CTXfEtA io/+pArKxfSYGswxCvgDWI0UDQEscwTN9FY7FQNDjlpISTJyRuOlAUx9n+MU/ah9V7 rhp4pCNmnHUEEKqLyZkyQhe1NLdTqZ0N/txPlLF7/hqD/eWJF9K/TARH0KYNltckec 5eNhYIMonkthA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id E8DD018003F for ; Mon, 15 Jul 2024 19:32:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: * X-Spam-Status: No, score=1.8 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, NUMERIC_HTTP_ADDR,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 15 Jul 2024 19:32:59 +0000 (UTC) Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-4279c10a40eso29402535e9.3 for ; Mon, 15 Jul 2024 12:31:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721071889; x=1721676689; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=oQcRXcaHntAq9jHxTl4MYrmT3nOWAx1YCswpBJSDrO4=; b=gz0TclaJ/T4cj8QeUv+gO+JIAB0vNJm81FlgBY96tSai+npymkJR0xuVGbmZcarYcp eMji6R9h2+iL9a7cfmHshWM/pFC/MMhoifiVKnefo5n0voOuExLHDItAsTwsprOmEzEI IKE5oIYicHwvqEwejPD/70gD6fcZqAEsRQsw5TyTtW7Q9clzFhfaX5lbhGNCtAvE4vlo OLnHThplReF3oZyWQ6WPI19E+axGiVVbS5cLX7JC37ZMM+dVCpvut9T4i5Cl9bSaVl0z NxyJxI21EbDr67hw38x0mNv5C3LKCRAqzqWEfXFT4fx70OZkeuHli3xkMZm4eehpwc6u x//w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721071889; x=1721676689; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oQcRXcaHntAq9jHxTl4MYrmT3nOWAx1YCswpBJSDrO4=; b=bcy3w4bGjjjUDozQ2jNTHP0KtZMzhGFN7pa9JgZdWKauluNVqPaSRs8OeXaOLWsdBQ 6GqDbMOisujurV5ic4gNqKSuoTdET6EY6kS53eABedUf+ARf0v0P/Xl0BDTTd/ojX2oE 0/+Mvze/+vZGV1/oFKJSujGUlwGwHHd1+HQ2jrLRaavgzfmVri6eaPfiQ22hJOHEgxLz yry68UY8AZSWzzckyI056b1hR/JfojJGqQsz+rWTjMijazAVZb+IYOciPdZTepvRdM0/ cXQbiJPRzaI8VH0lCxV+z2kR7S7e6S2VmxQX3QTCPwQ3ZunOV/FZTdbsMLhwWqCmpC4A ymPQ== X-Forwarded-Encrypted: i=1; AJvYcCWh3EI1IKXt/WQbQkiKMpGXHQQBpuLvnlFeEh4BdJgy2/b1xGu/fSVTua1ANVLmPQIpqt/OpM68R50H0DDynlLO/C/WFFE/xQ== X-Gm-Message-State: AOJu0YzPJioo5VTiPD37S/pibD4iCCbwzsGjDgVztGygvCTdDv5kgHCl r38DE0KcS21uBSCFpPurE4ytnkjZuXvjjR/WZM4hd1knSagG3LXyO4k0ey+3 X-Google-Smtp-Source: AGHT+IH3k8ZdwRqhS/cNNgCVh9Nd6abmwqyUCrjlDYsToeUW2rzH/UNYoHn1JkgkIP0YY8We7wd2jQ== X-Received: by 2002:a05:600c:4fc6:b0:426:66fb:fcd6 with SMTP id 5b1f17b1804b1-427b8887ebbmr3833265e9.3.1721071888771; Mon, 15 Jul 2024 12:31:28 -0700 (PDT) Received: from ?IPV6:2a02:1811:3716:cb00:8d33:5ddb:399c:d99a? (ptr-9c16nbdlih3u1lg2hlm.18120a2.ip6.access.telenet.be. [2a02:1811:3716:cb00:8d33:5ddb:399c:d99a]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4279f27846bsm132590755e9.25.2024.07.15.12.31.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 15 Jul 2024 12:31:28 -0700 (PDT) Message-ID: <2252ef52-ab94-475b-a527-e517ad17844a@gmail.com> Date: Mon, 15 Jul 2024 21:31:27 +0200 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= , internals@lists.php.net References: <71a73b87-cc2f-4ee5-a961-7bf2b191fbb6@gmail.com> <5159E0AB-C8B0-4A54-9654-986C1D9C858F@koalephant.com> <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> Content-Language: fr In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: nyamsprod@gmail.com (Ignace Nyamagana Butera) On 15/07/2024 11:20, Máté Kocsis wrote: > Hey Ignace, Nicolas, > > Based on your request for adding support for RFC 3986 spec compatible > parsing, > I evaluated another library (https://github.com/uriparser/uriparser/) > in the recent days > in order to add support for the requested functionality. As far as I > can tell, the results > were very promising, so I'm ok to include this into my proposal (I > haven't pushed my > changes yet and haven't updated the RFC yet). > > Regarding the reference resolution > (https://uriparser.github.io/doc/api/latest/#resolution) > feature which has also already been asked for, I'm genuinely wondering > what the use-case is? > But in any case, I'm fine with incorporating this as well into the > RFC, since apparently > both Lexbor and uriparser support this (naturally). > > What I became puzzled about is the correct object structure and > naming. Now that uriparser > which can deal with URIs came into the picture, while Lexbor can parse > URLs, I don't > know if it's a good idea to have a dedicated URI and a URL class > extending the former one... > If it is, then in my opinion, the logical behavior would be that > Lexbor always instantiates URL > classes, while uriparser would have to decide if the passed-in URI is > actually an URL, and > choose the instantiated class based on this factor... But in this case > the differences between > the RFC 3986 and WHATWG specifications couldn't be spelled out, since > URL objects > could hold URLs parsed based on both specs (and therefore having a > unified interface is required). > > Or rather we should have a separate URI and a WhatwgUrl class so that > the former one would > always be created by uriparser, while the latter one by Lexbor? This > way we could have a dedicated > object interface for both standards (e.g. the RFC 3986 related one > could have a getUserInfo() method, > while the WHATWG related one could have both getUser() and > getPassword() methods). But then > the question is how interchangeable these classes should be? I.e. > should we be able to convert them > back and forth, or should there be an interface that is implemented by > the two classes? > > I'd appreciate any suggestions regarding these questions. > > P.S. due to its bad receptance, I got rid of the UrlParser class as > well as the UrlComponent enum from my > implementation in the meantime. > > Regards, > Máté Hi Máté, > As far as I can tell, the results were very promising, so I'm ok to include this into my proposal (I haven't pushed my changes yet and haven't updated the RFC yet). This is a great news if indeed it is possible to release both specifications at the same time that would be really great. > Regarding the reference resolution (https://uriparser.github.io/doc/api/latest/#resolution) feature which has also already been asked for, I'm genuinely wondering what the use-case is? Resolution is common when using an HTTP client and you defined a base URI and then you can construct subsequent URI based on that base URI using resolution. >  What I became puzzled about is the correct object structure and naming. Now that uriparser which can deal with URIs came into the picture, while Lexbor can parse URLs, I don't know if it's a good idea to have a dedicated URI and a URL class extending the former one... Both specification parse and can be represented by a URL value object. The main difference between both implementation are around normalization and encoding. RFC3986 only allow non destructive normalization which is not true in the case of WHATWG spec: Here's a simple example to illustrate the differences: `HttPs://0300.0250.0000.0001/path?query=foo%20bar` - with RFC3986 you will end up with `https://0300.0250.0000.0001/path?query=foo%20bar` - with WHATWG you will end up with `https://192.168.0.1/path?query=foo+bar` In the case of WHATWG the host is changed and the query string follow a distinctive encoding spec. From my POV you have 2 choices either you use one URL object for both specifications with distinctive named constructors fromRFC3986 and fromWhatwg or you have one interface and two distinctive implementations. I do not think that one can be the extended to create the other one at least that's my POV. Hope this helps you in your implementation. Best regards, Ignace