> Hi Everyone,
> I've been working on a new RFC for a while now, and time has come to=20=

> present it to a wider audience.
> Last year, I learnt that PHP doesn't have built-in support for parsing =
> according to any well established standards (RFC 1738 or the WHATWG =
> living standard), since the parse_url() function is optimized for=20
> performance instead of correctness.
> In order to improve compatibility with external tools consuming URLs =
> browsers), my new RFC would add a WHATWG compliant URL parser =
> to the standard library. The API itself is not final by any means, the =
> only represents how I imagined it first.
> You can find the RFC at the following link:=20
> Regards,=20
> M=C3=A1t=C3=A9
M=C3=A1t=C3=A9, thanks for putting this together.

Whenever I need to work with URLs there are a few things missing that I =
would love to see incorporated into any change in PHP that brings us a =
spec-compliant parsing class.

First of all, I typically care most about WhatWG URLs because the PHP =
code I=E2=80=99m working with is making decisions about HTML that a =
browser will interpret. Paramount above all other concerns that code on =
the server can understand content in the same way that the browsers =
will, otherwise we will invite security issues. People may have valid =
critiques with the WhatWG specification, but it=E2=80=99s also the =
most-relevant specification for users of much or most of the PHP code we =
write, and it=E2=80=99s valuable because it allows us to talk about URLs =
in the same way a browser would.

I=E2=80=99m worried about the side-effects that having a global =
uri.default_handler could have with code running differently for no =
apparent reason, or differently based on what is calling it. If someone =
is writing code for a controlled system I could see this being valuable, =
but if someone is writing a framework like WordPress and has no control =
over the environments in which code runs, it seems dangerous to hope =
that every plugin and every host runs compatible system configurations. =
Nobody is going to check `ini_get( =E2=80=98uri.default_handler=E2=80=99 =
)` before every line that parses URLs. Beyond this, even just allowing a =
pluggable parser invites broken deployments because PHP code that is =
reading from a browser or sending output to one needs to speak the =
language the browser is speaking, not some arbitrary language that=E2=80=99=
s similar to it.

> One thing I feel is missing, is a method to parse a (partial) URL =
relative to another

Being able to parse a relative URL and know if a URL is relative or =
absolute would help WordPress, which often makes decisions differently =
based on this property (for instance, when reading an `href` property of =
a link). I know these aren=E2=80=99t spec-compliant URLs, but they  =
still represent valid values for URL fields in HTML and knowing if they =
are relative or not requires some amount of parsing specific details =
everywhere, vs. in a class that already parses URLs. Effectively, this =
would imply that PHP=E2=80=99s new URL parser decodes  =
`document.querySelector( =E2=80=98a=E2=80=99 ).getAttribute( =E2=80=98href=
=E2=80=99 )`, which should be the same as `document.querySelector( =
=E2=80=98a=E2=80=99 ).href`, and indicates whether it found a full URL =
or only a portion of one.

  * `$url->is_relative` or `$url->is_absolute`
  * `$url->specificity =3D URL::Relative | URL::Absolute`

> the URI parser libraries used don't support modification of the URI

Having methods to add query arguments, change the path, etc=E2=80=A6 =
would be a great way to simplify user-space code working with URLs. For =
instance, read a URL and then add a query argument if some condition =
within the URL warrants it (for example, the path ends in `.png`).

Was it intended to add this to the RFC before it=E2=80=99s finalized?

> I would not make Url final. "OMG but then people can extend it!" =

My counter-point to this argument is that I see security exploits appear =
everywhere that functions which implement specifications are pluggable =
and extendable. It=E2=80=99s easy to see the need to create a class that =
limits possible URLs, but that also doesn=E2=80=99t require extending a =
class. A class can wrap a URL parser just as it could extend one. Magic =
methods would make it even easier.

A problem that can arise with adding additional rules onto a =
specification like this is that the subclass gets used in more places =
than it should and then somewhere some PHP code allows a malicious URL =
because it failed to parse and then the inspection rules weren=E2=80=99t =


Finally, I frequently find the need to be able to consider a URL in both =
the display context and the serialization context. With Ada we have =
`normalize_url()`, `parse_search_params()`, and the IDNA functions to =
convert between the two representations. In order to keep strong =
boundaries between security domains, it would be nice if PHP could =
expose the two variations: one is an encoded form of a URL that machines =
can easily parse while the other is a =E2=80=9Cplain string=E2=80=9D in =
PHP that=E2=80=99s easier for humans to parse but which might not even =
be a valid URL. Part of the reason for this need is that I often see =
user-space code treating an entire URL as a single text span that =
requires one set of rules for full decoding; it=E2=80=99s multiple =
segments that each have their own decoding rules.

 - Original [ ]
 - `$url->normalize()` [ =
 - `$url->for_display()` Displayed [ = =

Having this in the RFC would give everyone the tools they need to =
effectively and safely set links within an HTML document.


All the best,
Dennis Snell

