Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124426 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id A4F601A00B7 for ; Mon, 15 Jul 2024 09:20:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1721035304; bh=kFhoDr2qZxLYq6JrOYbXS9GnB9BKUv5UZEPI0ZwZ2iM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=BT+HQIVzyiqfELlIipY0Idp6N2hOCj2reBkvQ+SJn4I8ilruCoV5SR5VHWc3YFwxc 6ncLLZXYe1w9HymXtjoIQFmQd58fVcqJu513aFmYqtCNkcxuxwrpAAy/8ovAph2a/C xbz2B10LiQju0J9DEYqxjXx1ILBHNFh1OTJVcJ+4QcWW58l9Eof6S/2rWp2jJTr8d+ jA2lju+JQFFeUl6Q+uamc5EjizrJkPulGroMJpMufbvTCGPM2URIRC825FvpfnbRco zI5EHJZuHAL6FMrOpOde/smLVGVjh8YVq3r3UJcSplU3XAX/Yha938m994SolNn1v7 c9JIdVp6tOBhw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6A4B51805AB for ; Mon, 15 Jul 2024 09:21:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 15 Jul 2024 09:21:42 +0000 (UTC) Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-81013580bd5so1144338241.0 for ; Mon, 15 Jul 2024 02:20:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721035213; x=1721640013; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=kFhoDr2qZxLYq6JrOYbXS9GnB9BKUv5UZEPI0ZwZ2iM=; b=bomZvKWJtDdyDxz7oSQMuWH0YR6hNlnenDj7nBRgXxTgLX29AzGvOs5n/EtnzZHSBG 1ZzXv7XUonJI0f+RDDQChJn2NMPGJZr2+gTCmVyg6PLZb+JMDUy6+BqjOiNBXgZpEO5M V8PaWYLzp83GTWek6vnuFXmj4EGBmFxko5RD4Uq6H9dAue50prEy4Pu+/5Fei9VZcH8h zJmLzQU/6+PL3GUa7IismnpRn05plxJHOGC0hAm6f6MQUdES2kawe303QM8tKwWj8F8k zuel6F8xRw9JZxKYzhK4osyWEbYHPYYrMdRPuUtLQAZL/yt6RuEUdllVh7rKyvo8VrJb oWEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721035213; x=1721640013; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kFhoDr2qZxLYq6JrOYbXS9GnB9BKUv5UZEPI0ZwZ2iM=; b=d/vVZFixWu22aXCHEho4yPnYKn83y3aamaW5kDscyh+oHqJrWNs47x7mFpBlCu3Hly YemzxD/TiZ5tFgMx9Kk5u5mYRpZy0pYqXWVjrrfhI0QswP54/U9eroTauCBoLSbqm3bi 2LZcBClk5ChEIUdUwmeg+Uq8nJa5eowWN94IqfLgafyfIprG+izL6EDCU1ptKkIVhGJY piEWrgEBq382t+OvcXo0UpiZG0yZbW2YVrMGoYpxwGmbCWCWmUSx8sxJyuHDr3X2SRsp EzTWZnrfs+shko1HP9ysJCJHNmbAg3SoN2/hrGL/LmnscaJGRW3DAWEjlE7oHIySj5La LQQg== X-Gm-Message-State: AOJu0YxKIP7VQgQrGhk4WvO1uj/plmS2NQM7fj7qsik+ZeMveFlirSb1 7Y/umjmVuJJSCuZrirwYvdhzbTPCk/waQ5oYBrwGUZq3qWpsJiWt8gLIFnZIbjC9MAZVVf5It/4 Samv2MsJgIJSHZPq7qZTOk54MjlsT/Ei1 X-Google-Smtp-Source: AGHT+IGkS9F7/DY4nKYz/0a2QU4r/1WhSXv2V+BSC9hb0+ESe8JN8VK/fmsi+4SrOPYH68YJBoIElPs8pK23FKIH3us= X-Received: by 2002:a05:6102:c02:b0:48f:40c1:3cd0 with SMTP id ada2fe7eead31-4903210e34cmr16322853137.12.1721035213098; Mon, 15 Jul 2024 02:20:13 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <71a73b87-cc2f-4ee5-a961-7bf2b191fbb6@gmail.com> <5159E0AB-C8B0-4A54-9654-986C1D9C858F@koalephant.com> <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> In-Reply-To: <07160e83-7333-44a1-81f2-b121e2cf0ffd@gmail.com> Date: Mon, 15 Jul 2024 11:20:02 +0200 Message-ID: Subject: Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API To: nyamsprod@gmail.com Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="00000000000085d774061d45bca2" From: kocsismate90@gmail.com (=?UTF-8?B?TcOhdMOpIEtvY3Npcw==?=) --00000000000085d774061d45bca2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Ignace, Nicolas, Based on your request for adding support for RFC 3986 spec compatible parsing, I evaluated another library (https://github.com/uriparser/uriparser/) in the recent days in order to add support for the requested functionality. As far as I can tell, the results were very promising, so I'm ok to include this into my proposal (I haven't pushed my changes yet and haven't updated the RFC yet). Regarding the reference resolution ( https://uriparser.github.io/doc/api/latest/#resolution) feature which has also already been asked for, I'm genuinely wondering what the use-case is? But in any case, I'm fine with incorporating this as well into the RFC, since apparently both Lexbor and uriparser support this (naturally). What I became puzzled about is the correct object structure and naming. Now that uriparser which can deal with URIs came into the picture, while Lexbor can parse URLs, I don't know if it's a good idea to have a dedicated URI and a URL class extending the former one... If it is, then in my opinion, the logical behavior would be that Lexbor always instantiates URL classes, while uriparser would have to decide if the passed-in URI is actually an URL, and choose the instantiated class based on this factor... But in this case the differences between the RFC 3986 and WHATWG specifications couldn't be spelled out, since URL objects could hold URLs parsed based on both specs (and therefore having a unified interface is required). Or rather we should have a separate URI and a WhatwgUrl class so that the former one would always be created by uriparser, while the latter one by Lexbor? This way we could have a dedicated object interface for both standards (e.g. the RFC 3986 related one could have a getUserInfo() method, while the WHATWG related one could have both getUser() and getPassword() methods). But then the question is how interchangeable these classes should be? I.e. should we be able to convert them back and forth, or should there be an interface that is implemented by the two classes? I'd appreciate any suggestions regarding these questions. P.S. due to its bad receptance, I got rid of the UrlParser class as well as the UrlComponent enum from my implementation in the meantime. Regards, M=C3=A1t=C3=A9 --00000000000085d774061d45bca2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Ignace, Nicolas,

Based on your requ= est for adding support for RFC 3986 spec compatible parsing,
I ev= aluated another library (https://github.com/uriparser/uriparser/) in the recent days
in order to add support for the requested functionality. As far as I can t= ell, the results
were very promising, so I'm ok to includ= e this into my proposal (I haven't pushed my
changes yet and = haven't updated the RFC yet).

Regarding the re= ference resolution (https://uriparser.github.io/doc/api/latest/#resolution)
feature which has also already been asked for, I'm genuinely=C2= =A0wondering what the use-case is?
But in any=C2=A0case, I'm = fine with incorporating this as well into the RFC, since apparently
both Lexbor and uriparser support=C2=A0this (naturally).

<= /div>
What I became puzzled about is the correct object structure and n= aming. Now that uriparser
which can deal with URIs came into the = picture, while Lexbor can parse URLs, I don't
know if it'= s a good idea to have a dedicated URI and a URL class extending the former = one...
If it is, then in my opinion, the logical behavior would b= e that Lexbor always instantiates URL
classes, while uriparser wo= uld have to decide if the passed-in URI is actually an URL, and
c= hoose the instantiated class based on this factor... But in this case the d= ifferences between
the RFC 3986 and WHATWG specifications couldn&= #39;t be spelled out, since URL objects
could hold URLs parsed ba= sed on both specs (and therefore having a unified interface is required).

Or rather we should have a separate URI and a Whatw= gUrl class so that the former one would
always be created by urip= arser, while the latter one by Lexbor? This way we could=C2=A0have a dedica= ted
object interface for both standards (e.g. the RFC 3986 relate= d one could have a getUserInfo() method,
while the WHATWG related= one could have both getUser() and getPassword() methods). But then
the question is how interchangeable these classes should be? I.e. should= we be able to convert them
back and forth, or should there be an= interface that is implemented by the two classes?

I'd appreciate any suggestions regarding these questions.
P.S. due to its bad receptance, I got rid of the UrlParser cla= ss as well as the UrlComponent enum from my
implementation in the= meantime.

Regards,
M=C3=A1t=C3=A9
= --00000000000085d774061d45bca2--