Hi,
Following our discussions we had on the subject of the new Curl URL API,
and other curl improvements. I decided to only focus on adding the new Curl
URL API and put aside all other improvements. Here is the RFC that reflects
our current conversations.
https://wiki.php.net/rfc/curl-url-api
Feel free to give any feedback, concern or support :-)
Regards,
Pierrick
Here is the RFC that reflects our current conversations.
https://wiki.php.net/rfc/curl-url-api
Feel free to give any feedback, concern or support :-)
Thanks Pierrick,
I think this is a good approach to add the URL functionality to PHP 8.2 with a fairly simple CurlUrl object, just like CurlFile. This would allow developers to avoid the parsing vulnerability issues (as in, using two different libraries that can parse a URL in different ways)... then the completely new, and fully OOP version, would be done at a later date, after a lot more discussion / planning.
Craig
Hi,
Following our discussions we had on the subject of the new Curl URL API,
and other curl improvements. I decided to only focus on adding the new Curl
URL API and put aside all other improvements. Here is the RFC that reflects
our current conversations.https://wiki.php.net/rfc/curl-url-api
Feel free to give any feedback, concern or support :-)
I've two things:
-
The new CurlUrl class should probably be immutable from the start. It was my biggest mistake not to do that with DateTime.
-
What happens if the curl library available on the system does not have the features and functions that this new class relies on? I would expect the class to not be available either, but the RFC does not mention that.
cheers
Derick
Hi Derick,
- The new CurlUrl class should probably be immutable from the start. It
was my biggest mistake not to do that with DateTime.
Thanks for sharing your lessons learned. But I still see some use cases
where mutable objects are easier to use. From the experience you had with
DateTime, do you think that having CurlUrl
being immutable and providing
a MutableCurlUrl
would make sense ? I see some cases where you "navigate"
on a website using the same CurlHandle and just want to manipulate the
MutableCurlUrl handle to change urls.
- What happens if the curl library available on the system does not have
the features and functions that this new class relies on? I would expect
the class to not be available either, but the RFC does not mention that.
Good point. As you expected, if the functions are not available in libcurl,
the class will not be available. Same thing for each constant/feature. The
extension will "adapt" to the curl version. I will add this to the RFC.
Pierrick
any particular reason CurlUrl::getPort() defaults to 0 rather than one of
the valid options? (that being CurlUrl::DEFAULT_PORT
and CurlUrl::NO_DEFAULT_PORT )
Hi Derick,
- The new CurlUrl class should probably be immutable from the start. It
was my biggest mistake not to do that with DateTime.Thanks for sharing your lessons learned. But I still see some use cases
where mutable objects are easier to use. From the experience you had with
DateTime, do you think that havingCurlUrl
being immutable and providing
aMutableCurlUrl
would make sense ? I see some cases where you "navigate"
on a website using the same CurlHandle and just want to manipulate the
MutableCurlUrl handle to change urls.
- What happens if the curl library available on the system does not have
the features and functions that this new class relies on? I would expect
the class to not be available either, but the RFC does not mention that.Good point. As you expected, if the functions are not available in libcurl,
the class will not be available. Same thing for each constant/feature. The
extension will "adapt" to the curl version. I will add this to the RFC.Pierrick
nitpicking but I kind-of doubt the description for CurlUrl::NO_DEFAULT_PORT
is correct, quote:
Instructs the method to return null if the port matches the default port
for the scheme.
makes it sound like these would return null: http://localhost:80/
https://localhost:443/ ftps://localhost:22/
Is that correct? I would imagine it returns null if the port isn't
specified, rather than null if the port when specified matches the default
port?
On Wed, 22 Jun 2022 at 16:46, Hans Henrik Bergan divinity76@gmail.com
wrote:
any particular reason CurlUrl::getPort() defaults to 0 rather than one of
the valid options? (that being CurlUrl::DEFAULT_PORT
and CurlUrl::NO_DEFAULT_PORT )Hi Derick,
- The new CurlUrl class should probably be immutable from the start. It
was my biggest mistake not to do that with DateTime.Thanks for sharing your lessons learned. But I still see some use cases
where mutable objects are easier to use. From the experience you had with
DateTime, do you think that havingCurlUrl
being immutable and providing
aMutableCurlUrl
would make sense ? I see some cases where you
"navigate"
on a website using the same CurlHandle and just want to manipulate the
MutableCurlUrl handle to change urls.
- What happens if the curl library available on the system does not have
the features and functions that this new class relies on? I would expect
the class to not be available either, but the RFC does not mention that.Good point. As you expected, if the functions are not available in
libcurl,
the class will not be available. Same thing for each constant/feature. The
extension will "adapt" to the curl version. I will add this to the RFC.Pierrick
(dammit, mixed sftp:// with ftps:// there, ignore that, i meant sftp:// )
On Wed, 22 Jun 2022 at 16:53, Hans Henrik Bergan divinity76@gmail.com
wrote:
nitpicking but I kind-of doubt the description
for CurlUrl::NO_DEFAULT_PORT is correct, quote:Instructs the method to return null if the port matches the default port
for the scheme.makes it sound like these would return null: http://localhost:80/
https://localhost:443/ ftps://localhost:22/Is that correct? I would imagine it returns null if the port isn't
specified, rather than null if the port when specified matches the default
port?On Wed, 22 Jun 2022 at 16:46, Hans Henrik Bergan divinity76@gmail.com
wrote:any particular reason CurlUrl::getPort() defaults to 0 rather than one of
the valid options? (that being CurlUrl::DEFAULT_PORT
and CurlUrl::NO_DEFAULT_PORT )Hi Derick,
- The new CurlUrl class should probably be immutable from the start. It
was my biggest mistake not to do that with DateTime.Thanks for sharing your lessons learned. But I still see some use cases
where mutable objects are easier to use. From the experience you had with
DateTime, do you think that havingCurlUrl
being immutable and
providing
aMutableCurlUrl
would make sense ? I see some cases where you
"navigate"
on a website using the same CurlHandle and just want to manipulate the
MutableCurlUrl handle to change urls.
- What happens if the curl library available on the system does not
have
the features and functions that this new class relies on? I would
expect
the class to not be available either, but the RFC does not mention
that.Good point. As you expected, if the functions are not available in
libcurl,
the class will not be available. Same thing for each constant/feature.
The
extension will "adapt" to the curl version. I will add this to the RFC.Pierrick
HI Hans,
any particular reason CurlUrl::getPort() defaults to 0 rather than one of
the valid options? (that being CurlUrl::DEFAULT_PORT
and CurlUrl::NO_DEFAULT_PORT )
This is because the default is none of those 2 behaviours, If the port
wasn't set it will return null, but if the port is the default port for the
scheme it will still return it.
makes it sound like these would return null: http://localhost:80/
https://localhost:443/ ftps://localhost:22/
Is that correct? I would imagine it returns null if the port isn't
specified, rather than null if the port when specified matches the default
port?
That's correct, if you use CurlUrl::NO_DEFAULT_PORT. The behaviour you
were expecting is the default one ($flags = 0)
Hi,
Following our discussions we had on the subject of the new Curl URL API,
and other curl improvements. I decided to only focus on adding the new Curl
URL API and put aside all other improvements. Here is the RFC that reflects
our current conversations.https://wiki.php.net/rfc/curl-url-api
Feel free to give any feedback, concern or support :-)
Regards,
Pierrick
IMO, this should mirror the low-level curl url API very directly. The
basis of my opinion is that we do not own libcurl; we are merely
adapting it for use in PHP. We cannot anticipate changes in their
design, nor do we have authority to do so if we feel something should
change. Touching it as little as possible makes it easier to track
upstream changes, etc.
Based on that, I think the naming should be closer to libcurl.:
- CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
- CurlUrl::URL_DECODE should be CurlUrl::URLDECODE
And so on, only differing if necessary because something is a reserved
word. The API should be as exact as possible to what libcurl provides.
The "helpers" getHost, getPassword, etc should be removed and should
expose curl_url_get
more directly.
Of course, it should be object based instead of resource based, but that's it.
A nicer API can be built on top of it, but I don't think that's the
role this particular API should play.
IMO, this should mirror the low-level curl url API very directly. The
basis of my opinion is that we do not own libcurl; we are merely
adapting it for use in PHP. We cannot anticipate changes in their
design, nor do we have authority to do so if we feel something should
change. Touching it as little as possible makes it easier to track
upstream changes, etc.Based on that, I think the naming should be closer to libcurl.:
- CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
- CurlUrl::URL_DECODE should be CurlUrl::URLDECODE
And so on, only differing if necessary because something is a reserved
word. The API should be as exact as possible to what libcurl provides.
The "helpers" getHost, getPassword, etc should be removed and should
exposecurl_url_get
more directly.Of course, it should be object based instead of resource based, but that's it.
A nicer API can be built on top of it, but I don't think that's the
role this particular API should play.
For the record, I disagree with literally everything you've said here.
PHP indeed does not own libcurl, and nor does libcurl own PHP. We are targeting a different audience, and have a different set of facilities available to us. We have our own documentation, so there is no reason a user of PHP should know or care what the underlying implementation looks like, any more than they should know how the memory allocation works.
If the underlying library adds a feature, it will be as easy to add a new method as a new constant. If the underlying library changes behaviour, we will want to make our own decision on whether that meets our compatibility policy, and whether to emulate the older behaviour (or indeed emulate the newer behaviour on systems with an older library).
Twenty years ago, maybe PHP programmers were used to it being a veneer over C. I don't think that is or should be the expectation today, unless you're using FFI.
Regards,
--
Rowan Tommins
[IMSoP]
Hi all, and thanks Levi for your feedback,
If you look at the first thread (Discussion about new Curl URL API and
ext/curl improvements) you'll see that this was my first approach. I even
proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is
the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !
With previous thread answers, I was under the impression that I really was
the only one liking the approach of following the cURL api as much as
possible and leaving the more high level API to things like Guzzle & co
(for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's
important to have this new Curl URL API in 8.2 since it fixes security
issues and that's of course not ideal but I would rather have an
implementation that would not be my first choice, than not having it at all.
What do you think ? I would love more people to give feedback on what they
are expecting, if they don't care if they prefer one approach or the other
and of course why ? I was thinking about doing a vote on this, but I'm not
sure it's a good idea. What do you all think ?
Regards,
Pierrick
Le jeu. 23 juin 2022, à 12 h 49, Levi Morrison levi.morrison@datadoghq.com
a écrit :
On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron pierrick@php.net
wrote:Hi,
Following our discussions we had on the subject of the new Curl URL API,
and other curl improvements. I decided to only focus on adding the new
Curl
URL API and put aside all other improvements. Here is the RFC that
reflects
our current conversations.https://wiki.php.net/rfc/curl-url-api
Feel free to give any feedback, concern or support :-)
Regards,
PierrickIMO, this should mirror the low-level curl url API very directly. The
basis of my opinion is that we do not own libcurl; we are merely
adapting it for use in PHP. We cannot anticipate changes in their
design, nor do we have authority to do so if we feel something should
change. Touching it as little as possible makes it easier to track
upstream changes, etc.Based on that, I think the naming should be closer to libcurl.:
- CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
- CurlUrl::URL_DECODE should be CurlUrl::URLDECODE
And so on, only differing if necessary because something is a reserved
word. The API should be as exact as possible to what libcurl provides.
The "helpers" getHost, getPassword, etc should be removed and should
exposecurl_url_get
more directly.Of course, it should be object based instead of resource based, but that's
it.A nicer API can be built on top of it, but I don't think that's the
role this particular API should play.
Hello Pierrick,
A thin wrapper would be the most flexible. Someone can always write a "friendlier" class using your thin wrapper, as you mentioned, but one cannot easily go the other direction.
-Jeff
-----Original Message-----
From: Pierrick Charron pierrick@php.net
Sent: Friday, June 24, 2022 9:49 AM
To: Levi Morrison levi.morrison@datadoghq.com
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API
Hi all, and thanks Levi for your feedback,
If you look at the first thread (Discussion about new Curl URL API and ext/curl improvements) you'll see that this was my first approach. I even proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !
With previous thread answers, I was under the impression that I really was the only one liking the approach of following the cURL api as much as possible and leaving the more high level API to things like Guzzle & co (for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's important to have this new Curl URL API in 8.2 since it fixes security issues and that's of course not ideal but I would rather have an implementation that would not be my first choice, than not having it at all.
What do you think ? I would love more people to give feedback on what they are expecting, if they don't care if they prefer one approach or the other and of course why ? I was thinking about doing a vote on this, but I'm not sure it's a good idea. What do you all think ?
Regards,
Pierrick
Le jeu. 23 juin 2022, à 12 h 49, Levi Morrison levi.morrison@datadoghq.com a écrit :
On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron pierrick@php.net
wrote:Hi,
Following our discussions we had on the subject of the new Curl URL
API, and other curl improvements. I decided to only focus on adding
the new
Curl
URL API and put aside all other improvements. Here is the RFC that
reflects
our current conversations.https://wiki.php.net/rfc/curl-url-api
Feel free to give any feedback, concern or support :-)
Regards,
PierrickIMO, this should mirror the low-level curl url API very directly. The
basis of my opinion is that we do not own libcurl; we are merely
adapting it for use in PHP. We cannot anticipate changes in their
design, nor do we have authority to do so if we feel something should
change. Touching it as little as possible makes it easier to track
upstream changes, etc.Based on that, I think the naming should be closer to libcurl.:
- CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
- CurlUrl::URL_DECODE should be CurlUrl::URLDECODE
And so on, only differing if necessary because something is a reserved
word. The API should be as exact as possible to what libcurl provides.
The "helpers" getHost, getPassword, etc should be removed and should
exposecurl_url_get
more directly.Of course, it should be object based instead of resource based, but
that's it.A nicer API can be built on top of it, but I don't think that's the
role this particular API should play.
A thin wrapper would be the most flexible. Someone can always write a
"friendlier" class using your thin wrapper, as you mentioned, but one
cannot easily go the other direction.
I think this argument has some validity when talking about the entire API
of a library like curl. But we're not, we're talking about a tiny bounded
context.
The underlying implementation consists of 6 functions, 4 of which are
trivial:
- An argument-less "constructor" https://curl.se/libcurl/c/curl_url.html
- An equivalent to PHP "clone" https://curl.se/libcurl/c/curl_url_dup.html
- A cleanup method, which would be automatic in any PHP implementation
https://curl.se/libcurl/c/curl_url_cleanup.html - A function for looking up error strings from codes
https://curl.se/libcurl/c/curl_url_strerror.html
That leaves the main getter and setter functions
https://curl.se/libcurl/c/curl_url_get.html and
https://curl.se/libcurl/c/curl_url_set.html which take the same arguments:
- A resource handle
- The URL part to handle
- The new value for set, or an output parameter for get
- A set of flags
If we really believe the goal is to expose the C API in PHP, the get
function would look like this:
/**
- @return int Error code
*/
function curl_url_get(resource $url, int $what, string &$part, int $flags):
int
That would be ... awful, and I hope nobody's really suggesting that. So we
have to map each part to some concept in PHP, which is what the proposal
does:
- The resource handle becomes $this
- The $what argument becomes the method name
- The output parameter becomes the return value
- The error code becomes an exception
There's no "flexibility" which has been lost, and no "future changes" which
are being prevented.
The only other change is a few renamed constants, which I suggested on the
PR. I can see an argument for making them match the library exactly; but
it's the values that actually matter, so I don't see why we shouldn't
choose our own names if they're descriptive of the functionality.
Regards,
Rowan Tommins
[IMSoP]
Hi
If you look at the first thread (Discussion about new Curl URL API and
ext/curl improvements) you'll see that this was my first approach. I even
proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is
the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !With previous thread answers, I was under the impression that I really was
the only one liking the approach of following the cURL api as much as
possible and leaving the more high level API to things like Guzzle & co
(for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's
important to have this new Curl URL API in 8.2 since it fixes security
issues and that's of course not ideal but I would rather have an
implementation that would not be my first choice, than not having it at all.What do you think ? I would love more people to give feedback on what they
are expecting, if they don't care if they prefer one approach or the other
and of course why ? I was thinking about doing a vote on this, but I'm not
sure it's a good idea. What do you all think ?
I agree with Levi. The curl extension should be a thin wrapper over libcurl.
A high level API that is more convenient to use may be provided by PHP,
but it should not be called curl. It should have a generic name with the
fact that curl does the heavy lifting only being an implementation detail.
Best regards
Tim Düsterhus
Please do not add yet another thin wrapper of an underlying C API. PHP is
not a drop-in replacement of C. PHP is a much higher-level language.
Developers should not have to understand how the underlying library works
to be able to use PHP. We need to move away from thin C wrappers as a
language. PHP should abstract more, not less, of C code.
The new API doesn't have to have exactly the same names as the C library.
Please follow PHP naming conventions and implement OOP-based API.
Hi all, and thanks Levi for your feedback,
If you look at the first thread (Discussion about new Curl URL API and
ext/curl improvements) you'll see that this was my first approach. I even
proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is
the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !With previous thread answers, I was under the impression that I really was
the only one liking the approach of following the cURL api as much as
possible and leaving the more high level API to things like Guzzle & co
(for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's
important to have this new Curl URL API in 8.2 since it fixes security
issues and that's of course not ideal but I would rather have an
implementation that would not be my first choice, than not having it at all.What do you think ? I would love more people to give feedback on what they
are expecting, if they don't care if they prefer one approach or the other
and of course why ? I was thinking about doing a vote on this, but I'm not
sure it's a good idea. What do you all think ?Regards,
Pierrick
The root question, I think, is who the consumer is for this RFC?
Is the consumer PHP developers? Then it should be a good PHP API.
Is the consumer the Guzzle team and everyone else just uses Guzzle and ignores Curl/CurlUrl? Then it should probably adhere as closely as feasible to Curl's API, awful as it is, and be documented out the wazoo.
There likely isn't time for 8.2 to do the first option, but probably is for the second.
--Larry Garfield