Discussion about new Curl URL API and ext/curl improvements

3 years ago by Pierrick Charron — view source

unread

Hi internals,

Since its version 6.62.0 [1], libcurl features a brand new URL API [2] that
can be used to parse and generate URLs, using libcurl’s own parser. One of
the goals of this API is to tighten a problematic vulnerable area for
applications where the URL parser library would believe one thing and
libcurl another. This could and has sometimes led to security problems [3].
The new API is handled base and composed of 5 new functions [4] (and one
more since 7.80.0 to get more verbose information upon error).

I started to work on an implementation [5] to expose this API to PHP
userland in the curl extension so that PHP users can benefit from it. My
first reflex was to stay consistent on how the extension is currently
working and do a one to one mapping of the new curl functions. I got
feedback from both Ilija and Christoph saying that the API is, I quote, "a
bit clunky" which I can't disagree with.

For a long time, ext/curl worked with handles as "curl resources" and
functions mapped to the libcurl functions, but since PHP8.0 "curl
resources" were converted to opaque classes with no methods. So my main
goal here is to come up with a solution for the new Curl URL API, but maybe
also to be consistent, make some changes to existing curl classes like
CurlHandle to add methods and improve the current state of the extension.

Here is the solution I would propose :

First of course keep all the current ext/curl functions as is not to
break any compatibility (Seems obvious but just to be sure everybody is on
the same page).
For consistency expose the new Curl URL API as functions mapped one to
one to libcurl functions :

function curl_url(?string $url = null): CurlUrl|false {}
function curl_url_set(CurlUrl $url, int $part, string $content, int $flags
= 0): bool {}
function curl_url_get(CurlUrl $url, int $part, int $flags = 0):
string|false {}
function curl_url_errno(CurlUrl $url): int {}
function curl_url_strerror(int $error_code): ?string {}

Add methods to the CurlUrl object to make it less opaque and expose an
object oriented style API. I would keep it minimal and let userlanAd API
provide higher level APIs as guzzle for example. (You can see the current
implementation [5])

final class CurlUrl implements Stringable
{
public function __construct(?string $url = null) {}
public function get(int $part, int $flags = 0): string {}
public function set(int $part, string $content, int $flags = 0): void {}
public function getErrno(): int {}
public function __toString(): string {}
public function __clone() {}
}

It would also be nice to add this object oriented API for existing
CurlHandle, CurlMultiHandle and CurlShareHandle classes. For example the
CurlHandle class would look like that (First implementation [6])

final class CurlHandle
{
public function __construct(?string $url = null) {}
public function setOpt(int $option, mixed $value): bool {}
public function getInfo(?int $option = null): mixed {}
public function exec(): string|bool {}
public function escape(string $string): string|false {}
public function unescape(string $string): string|false {}
public function pause(int $flags): int {}
public function getErrno(): int {}
public function reset(): void;
public function setOptArray(array $options): bool {}
public function upkeep(): bool {}
public function __clone() {}
}

As of right now I still have some unanswered questions like how should we
handle errors on the new CurlUrl API ?

Throw CurlUrlException on both the procedural and object oriented style
API (that's how current implementation works [5])
Throw CurlUrlException with the oriented style API, but return for
example boolean/null on errors when user uses the procedural API ?
Always return null/booleans using both object oriented API and
procedural API ?

The same question applies if we add a new object oriented style API on
existing classes. Current functions MUST stay unchanged but should we throw
CurlException when the user uses the object oriented style API ? (That's
what I would do) Or should we return the same result from OO and procedural
API ? Should we make the new CurlApi consistent with that ?

What are your thoughts about all this ? Any feedback on any of those
subjects are more than welcome.

Pierrick

[1]
https://daniel.haxhttps://github.com/adoy/php-src/tree/curl-object-apix.se/blog/2018/10/31/curl-7-62-0-moar-stuff/
https://daniel.haxx.se/blog/2018/10/31/curl-7-62-0-moar-stuff/
[2] https://daniel.haxx.se/blog/2018/09/09/libcurl-gets-a-url-api/
[3]
https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf
[4] https://github.com/curl/curl/blob/master/include/curl/urlapi.h
[5] https://github.com/php/php-src/pull/8770
[6] https://github.com/adoy/php-src/tree/curl-object-api

3 years ago by Deleu — view source

unread

Hi internals,

Since its version 6.62.0 [1], libcurl features a brand new URL API [2] that
can be used to parse and generate URLs, using libcurl’s own parser. One of
the goals of this API is to tighten a problematic vulnerable area for
applications where the URL parser library would believe one thing and
libcurl another. This could and has sometimes led to security problems [3].
The new API is handled base and composed of 5 new functions [4] (and one
more since 7.80.0 to get more verbose information upon error).

I started to work on an implementation [5] to expose this API to PHP
userland in the curl extension so that PHP users can benefit from it. My
first reflex was to stay consistent on how the extension is currently
working and do a one to one mapping of the new curl functions. I got
feedback from both Ilija and Christoph saying that the API is, I quote, "a
bit clunky" which I can't disagree with.

For a long time, ext/curl worked with handles as "curl resources" and
functions mapped to the libcurl functions, but since PHP8.0 "curl
resources" were converted to opaque classes with no methods. So my main
goal here is to come up with a solution for the new Curl URL API, but maybe
also to be consistent, make some changes to existing curl classes like
CurlHandle to add methods and improve the current state of the extension.

Here is the solution I would propose :

First of course keep all the current ext/curl functions as is not to
break any compatibility (Seems obvious but just to be sure everybody is on
the same page).

For consistency expose the new Curl URL API as functions mapped one to
one to libcurl functions :

function curl_url(?string $url = null): CurlUrl|false {}
function curl_url_set(CurlUrl $url, int $part, string $content, int $flags
= 0): bool {}
function curl_url_get(CurlUrl $url, int $part, int $flags = 0):
string|false {}
function curl_url_errno(CurlUrl $url): int {}
function curl_url_strerror(int $error_code): ?string {}

Add methods to the CurlUrl object to make it less opaque and expose an
object oriented style API. I would keep it minimal and let userlanAd API
provide higher level APIs as guzzle for example. (You can see the current
implementation [5])

final class CurlUrl implements Stringable
{
public function __construct(?string $url = null) {}
public function get(int $part, int $flags = 0): string {}
public function set(int $part, string $content, int $flags = 0): void
{}
public function getErrno(): int {}
public function __toString(): string {}
public function __clone() {}
}

It would also be nice to add this object oriented API for existing
CurlHandle, CurlMultiHandle and CurlShareHandle classes. For example the
CurlHandle class would look like that (First implementation [6])

final class CurlHandle
{
public function __construct(?string $url = null) {}
public function setOpt(int $option, mixed $value): bool {}
public function getInfo(?int $option = null): mixed {}
public function exec(): string|bool {}
public function escape(string $string): string|false {}
public function unescape(string $string): string|false {}
public function pause(int $flags): int {}
public function getErrno(): int {}
public function reset(): void;
public function setOptArray(array $options): bool {}
public function upkeep(): bool {}
public function __clone() {}
}

As of right now I still have some unanswered questions like how should we
handle errors on the new CurlUrl API ?

Throw CurlUrlException on both the procedural and object oriented style
API (that's how current implementation works [5])

Throw CurlUrlException with the oriented style API, but return for
example boolean/null on errors when user uses the procedural API ?

Always return null/booleans using both object oriented API and
procedural API ?

The same question applies if we add a new object oriented style API on
existing classes. Current functions MUST stay unchanged but should we throw
CurlException when the user uses the object oriented style API ? (That's
what I would do) Or should we return the same result from OO and procedural
API ? Should we make the new CurlApi consistent with that ?

What are your thoughts about all this ? Any feedback on any of those
subjects are more than welcome.

Pierrick

[1]
https://daniel.haxhttps://
github.com/adoy/php-src/tree/curl-object-apix.se/blog/2018/10/31/curl-7-62-0-moar-stuff/
https://daniel.haxx.se/blog/2018/10/31/curl-7-62-0-moar-stuff/
[2] https://daniel.haxx.se/blog/2018/09/09/libcurl-gets-a-url-api/
[3]

https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf
[4] https://github.com/curl/curl/blob/master/include/curl/urlapi.h
[5] https://github.com/php/php-src/pull/8770
[6] https://github.com/adoy/php-src/tree/curl-object-api

I understand why breaking changes would definitely be a no go here, but if
someone is adapting their code from procedural to OOP I would say they can
catch exceptions instead of comparing against false. To me it would even be
desirable. Any new code that didn't exist before show prefer throwing over
returning false, imo.

3 years ago by Mel Dafert — view source

unread

[...]

For consistency expose the new Curl URL API as functions mapped one to
one to libcurl functions :

function curl_url(?string $url = null): CurlUrl|false {}
function curl_url_set(CurlUrl $url, int $part, string $content, int $flags
= 0): bool {}
function curl_url_get(CurlUrl $url, int $part, int $flags = 0):
string|false {}
function curl_url_errno(CurlUrl $url): int {}
function curl_url_strerror(int $error_code): ?string {}

Add methods to the CurlUrl object to make it less opaque and expose an
object oriented style API. I would keep it minimal and let userlanAd API
provide higher level APIs as guzzle for example. (You can see the current
implementation [5])

final class CurlUrl implements Stringable
{
public function __construct(?string $url = null) {}
public function get(int $part, int $flags = 0): string {}
public function set(int $part, string $content, int $flags = 0): void {}
public function getErrno(): int {}
public function __toString(): string {}
public function __clone() {}
}

Hello,

When IntlDatePatternGenerator was implemented, I initially also supplied both
a procedural and OO-style API, but was told that this duplication was historical
and not necessary for new APIs.
See the thread following this message: [1]
I would argue that this also applies to curl, and that it might make sense to only
supply the OO-style API here as well.
(With curl_url_strerror being exposed as either a static method on CurlUrl or as getErrorMessage, I suppose.)

It would also be nice to add this object oriented API for existing
CurlHandle, CurlMultiHandle and CurlShareHandle classes. For example the
CurlHandle class would look like that (First implementation [6])

I think this would be a good idea either way.

Regards,
Mel

[1] https://externals.io/message/114473#114658

3 years ago by tim@bastelstu.be — view source

unread

As of right now I still have some unanswered questions like how should we
handle errors on the new CurlUrl API ?

Throw CurlUrlException on both the procedural and object oriented style
API (that's how current implementation works [5])

Throw CurlUrlException with the oriented style API, but return for
example boolean/null on errors when user uses the procedural API ?

Always return null/booleans using both object oriented API and
procedural API ?

Definitely not this one. I want Exceptions for an OO API. I don't
particularly care about the prodecural API, as it is not likely that I
would use it, given that it is more clunky.

The same question applies if we add a new object oriented style API on
existing classes. Current functions MUST stay unchanged but should we throw
CurlException when the user uses the object oriented style API ? (That's
what I would do) Or should we return the same result from OO and procedural
API ? Should we make the new CurlApi consistent with that ?

Same here: Exceptions for OO, please.

What are your thoughts about all this ? Any feedback on any of those
subjects are more than welcome.

Something else you didn't mention: Would it be possible to make the int $part parameter an enum?

Best regards
Tim Düsterhus

3 years ago by Kamil Tekiela — view source

unread

First of all, thank you for working on this. I wanted OO API for Curl for a
long time.

Exceptions all the way. Code that's not using exceptions is usually more
clunky and error-prone. The new OO API shouldn't have the possibility to
return null/false on error. This is considered a thing of the past nowadays
and the majority of developers prefer exceptions on error.

Please do not add any more procedural APIs. The existing ones should be
deprecated and removed at some point in the future.

Regards
Kamil

3 years ago by Robert Landers — view source

unread

First of all, thank you for working on this. I wanted OO API for Curl for a
long time.

Exceptions all the way. Code that's not using exceptions is usually more
clunky and error-prone. The new OO API shouldn't have the possibility to
return null/false on error. This is considered a thing of the past nowadays
and the majority of developers prefer exceptions on error.

Please do not add any more procedural APIs. The existing ones should be
deprecated and removed at some point in the future.

Regards
Kamil

This is considered a thing of the past nowadays
and the majority of developers prefer exceptions on error.

This is certainly not true. A nice thing about errors in returns vs.
an exception is that you don't need to react to the error immediately
or abandon the current flow of the code by jumping out of it. I feel
like I see a lot of devs using exceptions for flow control instead of
actually exceptional behavior. A bad URL is probably exceptional -- if
it's actually malformed vs. something the parser just doesn't
understand yet, such as a new scheme -- while a failed connection is
almost certainly not, that's just TCP doing its thing and should be
expected in most circumstances.

-- Rob

3 years ago by Jeffrey Chimene — view source

unread

First of all, thank you for working on this. I wanted OO API for Curl for a
long time.

Exceptions all the way. Code that's not using exceptions is usually more
clunky and error-prone. The new OO API shouldn't have the possibility to
return null/false on error. This is considered a thing of the past nowadays
and the majority of developers prefer exceptions on error.

Please do not add any more procedural APIs. The existing ones should be
deprecated and removed at some point in the future.

Regards
Kamil
This is considered a thing of the past nowadays
and the majority of developers prefer exceptions on error.

This is certainly not true. A nice thing about errors in returns vs.
an exception is that you don't need to react to the error immediately
or abandon the current flow of the code by jumping out of it. I feel
like I see a lot of devs using exceptions for flow control instead of
actually exceptional behavior. A bad URL is probably exceptional -- if
it's actually malformed vs. something the parser just doesn't
understand yet, such as a new scheme -- while a failed connection is
almost certainly not, that's just TCP doing its thing and should be
expected in most circumstances.

-- Rob

IMNSHO, php is far more similar to basic than c. php longs to be c, but
because it's an interpreter, it can never be c. This discussion is
interesting in that it reflects the collision of interpreted code and
objects.

ON ERROR GOTO was as reprehensible in basic as longjmp in c. That's
why local error handling was gradually adopted in basic, and so should
it be for php.
The days of returning null or false on constructor error are long
behind us. Such implementation reflects php's interpreter aspect. In the
OO world, constructors shouldn't return a nugatory object on failure.
They should resignal the failure, and the coder should handle the error
locally to ease the technical debt. Branch on value rather than branch
on exception should be in the rear view mirror.
re: "...using exceptions for flow control instead of actually
exceptional behavior..." It's not the language's responsibility to be a
nanny. That's why we use IDEs instead of EDT

Thanks for reading,
jec

3 years ago by Larry Garfield — view source

unread

Hi internals,

Since its version 6.62.0 [1], libcurl features a brand new URL API [2] that
can be used to parse and generate URLs, using libcurl’s own parser. One of
the goals of this API is to tighten a problematic vulnerable area for
applications where the URL parser library would believe one thing and
libcurl another. This could and has sometimes led to security problems [3].
The new API is handled base and composed of 5 new functions [4] (and one
more since 7.80.0 to get more verbose information upon error).

I started to work on an implementation [5] to expose this API to PHP
userland in the curl extension so that PHP users can benefit from it. My
first reflex was to stay consistent on how the extension is currently
working and do a one to one mapping of the new curl functions. I got
feedback from both Ilija and Christoph saying that the API is, I quote, "a
bit clunky" which I can't disagree with.

For a long time, ext/curl worked with handles as "curl resources" and
functions mapped to the libcurl functions, but since PHP8.0 "curl
resources" were converted to opaque classes with no methods. So my main
goal here is to come up with a solution for the new Curl URL API, but maybe
also to be consistent, make some changes to existing curl classes like
CurlHandle to add methods and improve the current state of the extension.

Here is the solution I would propose :

First of course keep all the current ext/curl functions as is not to
break any compatibility (Seems obvious but just to be sure everybody is on
the same page).

For consistency expose the new Curl URL API as functions mapped one to
one to libcurl functions :

function curl_url(?string $url = null): CurlUrl|false {}
function curl_url_set(CurlUrl $url, int $part, string $content, int $flags
= 0): bool {}
function curl_url_get(CurlUrl $url, int $part, int $flags = 0):
string|false {}
function curl_url_errno(CurlUrl $url): int {}
function curl_url_strerror(int $error_code): ?string {}

Add methods to the CurlUrl object to make it less opaque and expose an
object oriented style API. I would keep it minimal and let userlanAd API
provide higher level APIs as guzzle for example. (You can see the current
implementation [5])

final class CurlUrl implements Stringable
{
public function __construct(?string $url = null) {}
public function get(int $part, int $flags = 0): string {}
public function set(int $part, string $content, int $flags = 0): void {}
public function getErrno(): int {}
public function __toString(): string {}
public function __clone() {}
}

It would also be nice to add this object oriented API for existing
CurlHandle, CurlMultiHandle and CurlShareHandle classes. For example the
CurlHandle class would look like that (First implementation [6])

final class CurlHandle
{
public function __construct(?string $url = null) {}
public function setOpt(int $option, mixed $value): bool {}
public function getInfo(?int $option = null): mixed {}
public function exec(): string|bool {}
public function escape(string $string): string|false {}
public function unescape(string $string): string|false {}
public function pause(int $flags): int {}
public function getErrno(): int {}
public function reset(): void;
public function setOptArray(array $options): bool {}
public function upkeep(): bool {}
public function __clone() {}
}

As of right now I still have some unanswered questions like how should we
handle errors on the new CurlUrl API ?

Throw CurlUrlException on both the procedural and object oriented style
API (that's how current implementation works [5])

Throw CurlUrlException with the oriented style API, but return for
example boolean/null on errors when user uses the procedural API ?

Always return null/booleans using both object oriented API and
procedural API ?

The same question applies if we add a new object oriented style API on
existing classes. Current functions MUST stay unchanged but should we throw
CurlException when the user uses the object oriented style API ? (That's
what I would do) Or should we return the same result from OO and procedural
API ? Should we make the new CurlApi consistent with that ?

What are your thoughts about all this ? Any feedback on any of those
subjects are more than welcome.

Pierrick

Like most of the responders so far, I would say skip the procedural API, just go OOP and be done with it. However, I would go a step further and say it should be a good OOP API, not just the Curl procedural API with funny syntax.

For example:

public function get(int $part, int $flags = 0): string {}

Absolutely not. :-) It should be public function getHost(), public function getScheme(), etc. The int $part bit is frankly bad design even by procedural standards, and has no place in a modern library. I don't even know what the $flags are or could be, which means that's also a bad design. They should either be omitted, handled via separate methods, or replaced with named arguments with reasonable defaults. (We can do that now, yay!) Translating those into Curl's nutty flag API is something that should be done once, in the wrapping extension/object, and no PHP user space developer should have to think about it.

Similarly, getErrno() should be replaced. If a URL cannot even be parsed at all, that's an exception. Not every error should be an exception (I recently wrote extensively on the subject here: https://peakd.com/hive-168588/@crell/much-ado-about-null), but in this case "I don't know WTF to do with this string you gave my constructor" is an exception. If instead of a constructor it had a static factory method (or multiple), then we could discuss those returning some other sentinel value instead to indicate failure, but "there was an error, go call this other thing to find out what it was" is a fundamentally flawed API design that should not be replicated.

In short, don't put a wrapper on the existing Curl API that happens to use ->. Design a good, usable, PHP-developer-friendly API, and put Curl behind it. Trying to just expose a crappy API and make it slightly less crappy is not a good use of time.

--Larry Garfield

3 years ago by Pierrick Charron — view source

unread

Thanks for taking the time to answer and give feedback, that's really
appreciated.

Firstly, about the procedural API, if everyone agrees that we should not
create it (and it looks like it) I will definitely not go against it. This
was just to make it consistent with the current ext/curl api. Reading the
thread about IntlDatePatternGenerator (thanks Mel), it's clear that the
procedural API for the new API will be discarded.

About making a "Good OOP API", of course the goal is to make a Good OOP
API. But there are things to take into consideration. The proposal here is
yes to expose the new Curl URL API which is quite small, but also to modify
the existing curl API to make it a little bit more digest for developers.
Good or bad the existing ext/curl is here and will continue to exist for
quite some time. It was designed (maybe on purpose or not) to be a low
level API with tons [1] of low level features, and it was clearly designed
as a thin wrapper/bridge over libcurl. I mean the fact is currently each
constant is the exact same one as in libcurl, and each function is a
wrapper over the exact same function in libcurl. Is it OK to have one part
that is a thin wrapper and another part that is a "remodeling" of the API ?
(This is not a rhetorical question, I really ask myself the question).

Yes we could create a completely new high level API using curl underneath
but I think that if this is what we want, we should create a brand new
extension and start from scratch and design it as a nice High level API
from the start. But that's clearly not what I'm pretending to do.

We could also keep the existing API as is and not improve it living only
with the procedural API, but I think that with minimal effort, we could get
some small (but still) benefits. Yes, CurlHandle for example would just be
a wrapper on the procedural API that happens to use -> and throw
exceptions instead of returning null/false, but one of the gains would be
to help developers with autocomplete to find methods they can call on the
handle.

I'm sorry if the fact that this proposal includes two different subjects is
confusing but I think that how we want to deal with the current API will
have an effect on how we want to implement the new one and keep the thing
consistent.

You only talked about the new CurlUrl API in your previous mail, but I'm
curious what you would do with the existing API ?

Pierrick

[1] https://github.com/php/php-src/blob/master/ext/curl/interface.c#L382

Le jeu. 16 juin 2022, à 15 h 48, Larry Garfield larry@garfieldtech.com a
écrit :

Hi internals,

Since its version 6.62.0 [1], libcurl features a brand new URL API [2]
that
can be used to parse and generate URLs, using libcurl’s own parser. One
of
the goals of this API is to tighten a problematic vulnerable area for
applications where the URL parser library would believe one thing and
libcurl another. This could and has sometimes led to security problems
[3].
The new API is handled base and composed of 5 new functions [4] (and one
more since 7.80.0 to get more verbose information upon error).

I started to work on an implementation [5] to expose this API to PHP
userland in the curl extension so that PHP users can benefit from it. My
first reflex was to stay consistent on how the extension is currently
working and do a one to one mapping of the new curl functions. I got
feedback from both Ilija and Christoph saying that the API is, I quote,
"a
bit clunky" which I can't disagree with.

For a long time, ext/curl worked with handles as "curl resources" and
functions mapped to the libcurl functions, but since PHP8.0 "curl
resources" were converted to opaque classes with no methods. So my main
goal here is to come up with a solution for the new Curl URL API, but
maybe
also to be consistent, make some changes to existing curl classes like
CurlHandle to add methods and improve the current state of the
extension.

Here is the solution I would propose :

First of course keep all the current ext/curl functions as is not to
break any compatibility (Seems obvious but just to be sure everybody is
on
the same page).

For consistency expose the new Curl URL API as functions mapped one to
one to libcurl functions :

function curl_url(?string $url = null): CurlUrl|false {}
function curl_url_set(CurlUrl $url, int $part, string $content, int
$flags
= 0): bool {}
function curl_url_get(CurlUrl $url, int $part, int $flags = 0):
string|false {}
function curl_url_errno(CurlUrl $url): int {}
function curl_url_strerror(int $error_code): ?string {}

Add methods to the CurlUrl object to make it less opaque and expose an
object oriented style API. I would keep it minimal and let userlanAd API
provide higher level APIs as guzzle for example. (You can see the current
implementation [5])

final class CurlUrl implements Stringable
{
public function __construct(?string $url = null) {}
public function get(int $part, int $flags = 0): string {}
public function set(int $part, string $content, int $flags = 0):
void {}
public function getErrno(): int {}
public function __toString(): string {}
public function __clone() {}
}

It would also be nice to add this object oriented API for existing
CurlHandle, CurlMultiHandle and CurlShareHandle classes. For example the
CurlHandle class would look like that (First implementation [6])

final class CurlHandle
{
public function __construct(?string $url = null) {}
public function setOpt(int $option, mixed $value): bool {}
public function getInfo(?int $option = null): mixed {}
public function exec(): string|bool {}
public function escape(string $string): string|false {}
public function unescape(string $string): string|false {}
public function pause(int $flags): int {}
public function getErrno(): int {}
public function reset(): void;
public function setOptArray(array $options): bool {}
public function upkeep(): bool {}
public function __clone() {}
}

As of right now I still have some unanswered questions like how should we
handle errors on the new CurlUrl API ?

Throw CurlUrlException on both the procedural and object oriented
style
API (that's how current implementation works [5])

Throw CurlUrlException with the oriented style API, but return for
example boolean/null on errors when user uses the procedural API ?

Always return null/booleans using both object oriented API and
procedural API ?

The same question applies if we add a new object oriented style API on
existing classes. Current functions MUST stay unchanged but should we
throw
CurlException when the user uses the object oriented style API ?
(That's
what I would do) Or should we return the same result from OO and
procedural
API ? Should we make the new CurlApi consistent with that ?

What are your thoughts about all this ? Any feedback on any of those
subjects are more than welcome.

Pierrick

Like most of the responders so far, I would say skip the procedural API,
just go OOP and be done with it. However, I would go a step further and
say it should be a good OOP API, not just the Curl procedural API with
funny syntax.

For example:

public function get(int $part, int $flags = 0): string {}

Absolutely not. :-) It should be public function getHost(), public
function getScheme(), etc. The int $part bit is frankly bad design even
by procedural standards, and has no place in a modern library. I don't
even know what the $flags are or could be, which means that's also a bad
design. They should either be omitted, handled via separate methods, or
replaced with named arguments with reasonable defaults. (We can do that
now, yay!) Translating those into Curl's nutty flag API is something that
should be done once, in the wrapping extension/object, and no PHP user
space developer should have to think about it.

Similarly, getErrno() should be replaced. If a URL cannot even be parsed
at all, that's an exception. Not every error should be an exception (I
recently wrote extensively on the subject here:
https://peakd.com/hive-168588/@crell/much-ado-about-null), but in this
case "I don't know WTF to do with this string you gave my constructor" is
an exception. If instead of a constructor it had a static factory method
(or multiple), then we could discuss those returning some other sentinel
value instead to indicate failure, but "there was an error, go call this
other thing to find out what it was" is a fundamentally flawed API design
that should not be replicated.

In short, don't put a wrapper on the existing Curl API that happens to use
->. Design a good, usable, PHP-developer-friendly API, and put Curl behind
it. Trying to just expose a crappy API and make it slightly less crappy is
not a good use of time.

--Larry Garfield

--

To unsubscribe, visit: https://www.php.net/unsub.php

3 years ago by Lynn — view source

unread

Good timezone!

About making a "Good OOP API", of course the goal is to make a Good OOP
API. But there are things to take into consideration. The proposal here is
yes to expose the new Curl URL API which is quite small, but also to modify
the existing curl API to make it a little bit more digest for developers.
Good or bad the existing ext/curl is here and will continue to exist for
quite some time. It was designed (maybe on purpose or not) to be a low
level API with tons [1] of low level features, and it was clearly designed
as a thin wrapper/bridge over libcurl. I mean the fact is currently each
constant is the exact same one as in libcurl, and each function is a
wrapper over the exact same function in libcurl. Is it OK to have one part
that is a thin wrapper and another part that is a "remodeling" of the API ?
(This is not a rhetorical question, I really ask myself the question).

I'm not a C developer, and the only other place I've ever used cURL is on
the command line. I honestly care very little for it resembling whatever
the C library is, as C is a different language with no OOP. The procedural
API for cURL is really annoying to deal with, and I think it would be worth
designing an OOP counterpart. I think most of this would go into a new
design for setting options among things, as a list of constants with
possible values is one of the worst ways to have to configure something in
my opinion. I know that both the Symfony Http Client and Guzzle are popular
libraries, perhaps design wise they could serve as an inspiration?

I'm personally not too experienced with cURL nor the level of details that
go into http libraries, but I do see this as an opportunity to make a
really good implementation from scratch. In my opinion it's worth spending
some time on this.

Larry said:

Like most of the responders so far, I would say skip the procedural API,

just go OOP and be done with it.

I'm okay implementing one last feature in the procedural API if it's really
desired to have this functionality, especially if this is beneficial for
security and reduces bugs. The reason being is that even if we have the
best possible OOP library for cURL, every single procedural API would
benefit from this change without having to completely rewrite their
implementation, which will be a problem if the OOP interface isn't just a
1:1 copy into object form. That said, we should avoid having: cURL
procedural + cURL procedural in objects + cURL OOP. Having 2 different
object based libraries to do the same thing PHP is confusing and will just
end up in way too many Stack Overflow questions.

Perhaps it is best to split this into 2 separate RFCs?

Regards,
Lynn

3 years ago by Craig Francis — view source

unread

... That said, we should avoid having: cURL procedural + cURL procedural in objects + cURL OOP. Having 2 different object based libraries to do the same thing PHP is confusing and will just end up in way too many Stack Overflow questions.

Perhaps it is best to split this into 2 separate RFCs?

tbh, I'd worry the OOP version would need a fair amount of discussion, and experimentation... e.g. what the methods/parameters should be, deciding how errors/oddities are handled (where exceptions should be used, can NULL be returned (aka the 9.0 grenade), or other fancy approaches like monads?), and how curl_multi_* functionality would work.

Could the OOP work be done separately, maybe starting in userland, so those decisions could be made... and in the mean time, ext/curl gets a quick update to expose this new URL API in the procedural style (keeping consistency, and keeping it as a thin wrapper for now)?

Craig

3 years ago by Pierrick Charron — view source

unread

Hi Lynn,

You have to be careful, even though most developers associate curl with an
http client and only use this feature, it's only part of what it is. It's
a multiprotocol file transfer library that supports DICT, FILE, FTP, FTPS,
GOPHER, GOPHERS, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, MQTT, POP3, POP3S,
RTMP, RTMPS, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET and TFTP.

Again I'm not against a new extension that deals nicely with HTTP and has a
super nice intuitive OOP API. But for me that's just not the curl
extension. That would be a different project like ext/http.

Pierrick

Le ven. 17 juin 2022, à 04 h 27, Lynn kjarli@gmail.com a écrit :

Good timezone!

On Thu, Jun 16, 2022 at 11:44 PM Pierrick Charron pierrick@php.net
wrote:

About making a "Good OOP API", of course the goal is to make a Good OOP
API. But there are things to take into consideration. The proposal here is
yes to expose the new Curl URL API which is quite small, but also to
modify
the existing curl API to make it a little bit more digest for developers.
Good or bad the existing ext/curl is here and will continue to exist for
quite some time. It was designed (maybe on purpose or not) to be a low
level API with tons [1] of low level features, and it was clearly designed
as a thin wrapper/bridge over libcurl. I mean the fact is currently each
constant is the exact same one as in libcurl, and each function is a
wrapper over the exact same function in libcurl. Is it OK to have one part
that is a thin wrapper and another part that is a "remodeling" of the API
?
(This is not a rhetorical question, I really ask myself the question).

I'm not a C developer, and the only other place I've ever used cURL is on
the command line. I honestly care very little for it resembling whatever
the C library is, as C is a different language with no OOP. The procedural
API for cURL is really annoying to deal with, and I think it would be worth
designing an OOP counterpart. I think most of this would go into a new
design for setting options among things, as a list of constants with
possible values is one of the worst ways to have to configure something in
my opinion. I know that both the Symfony Http Client and Guzzle are popular
libraries, perhaps design wise they could serve as an inspiration?

I'm personally not too experienced with cURL nor the level of details that
go into http libraries, but I do see this as an opportunity to make a
really good implementation from scratch. In my opinion it's worth spending
some time on this.

Larry said:

Like most of the responders so far, I would say skip the procedural API,

just go OOP and be done with it.

I'm okay implementing one last feature in the procedural API if it's
really desired to have this functionality, especially if this is beneficial
for security and reduces bugs. The reason being is that even if we have the
best possible OOP library for cURL, every single procedural API would
benefit from this change without having to completely rewrite their
implementation, which will be a problem if the OOP interface isn't just a
1:1 copy into object form. That said, we should avoid having: cURL
procedural + cURL procedural in objects + cURL OOP. Having 2 different
object based libraries to do the same thing PHP is confusing and will just
end up in way too many Stack Overflow questions.

Perhaps it is best to split this into 2 separate RFCs?

Regards,
Lynn

3 years ago by Rowan Tommins — view source

unread

Hi Pierrick,

Is it OK to have one part
that is a thin wrapper and another part that is a "remodeling" of the API ?
(This is not a rhetorical question, I really ask myself the question).

I agree with others that the new API should be exposed via a new,
well-designed object, with no procedural version or low-level equivalence
to libcurl.

Something that nobody's brought up yet is that the curl extension actually
already has a facility designed this way: CurlFile and CurlStringFile are
high-level PHP objects which allow users to interact with file uploads,
without any knowledge of what libcurl's API for them looks like.

I think it's perfectly reasonable for CurlUrl to be designed the same way,
regardless of what else we do with the extension.

We could also keep the existing API as is and not improve it living only
with the procedural API, but I think that with minimal effort, we could get
some small (but still) benefits. Yes, CurlHandle for example would just be
a wrapper on the procedural API that happens to use -> and throw
exceptions instead of returning null/false, but one of the gains would be
to help developers with autocomplete to find methods they can call on the
handle.

This feels like a very small gain, since users can already type "curl_" and
get a list of all curl functions. It also has real costs: users may be
confused why there are two ways of writing the same thing; and it sets the
precedent that methods on CurlHandle should mirror curl_* functions, even
though we know those have terrible usability.

I think if we do add methods to the CurlHandle class, they should be
higher-level methods for common tasks that are currently unnecessarily
awkward. I don't think we'd need to add everything at once, just establish
some general design principles.

(I'm honestly surprised that CURLOPT_RETURNTRANSFER, and curl_setopt in
general, doesn't make it onto more lists of the worst parts of PHP.)

Regards,

Rowan Tommins
[IMSoP]

3 years ago by Pierrick Charron — view source

unread

Hi Rowan, all,

Thanks for your responses and comments.

I think it's perfectly reasonable for CurlUrl to be designed the same way,
regardless of what else we do with the extension.

As suggested by most of you, I created a new version of the proposed new
URL API with a friendlier interface [1]. A quick descriptions of the
changes I made :

The curl_url_* procedural API was removed
The CurlUrl::set() and ::get() methods with the CURLUPART_ constants were
replaced by more explicit setters and getters (ex: setHost() and getHost()).
All the CURLU_* constants are now CurlUrl class constants (ex:
CURLU_GUESS_SCHEME => CurlUrl::GUESS_SCHEME)
All the CURLUE_* constants are now CurlUrlException class constants (ex:
CURLUE_MALFORMED_INPUT => CurlUrlException::MALFORMED_INPUT)

Does it look more like what you were expecting ? Any other improvements
that should be made ?

This feels like a very small gain, since users can already type "curl_" and

get a list of all curl functions. It also has real costs: users may be
confused why there are two ways of writing the same thing; and it sets the
precedent that methods on CurlHandle should mirror curl_* functions, even
though we know those have terrible usability.

This is not the only change that I would like to introduce with a new OO
API. One other change that I would like to introduce is a better type
checking of CurlHandle::setOpt() when declaring strict_types to 1 (I know
Sara once opened a PR about this [2]). The current signature of the
procedural API is curl_setopt(CurlHandle $handle, int $option, mixed $value): bool. The 3rd argument being mixed never throws any TypeError,
and adding this to the current procedural API would be a breaking change.
If we add a new OOP API this kind of change is possible and we can improve
the error message like : "Argument 2 passed to CurlHandle::setOpt() for
option CURLOPT_BUFFERSIZE must be of type int, string given"

However, knowing that the feature freeze for 8.2 is in a month, I will put
more emphasis on the Curl URL API than on a new OO API for existing Curl
API.

Thanks again for your feedback

Pierrick

[1] https://github.com/adoy/php-src/pull/1/files
[2] https://github.com/php/php-src/pull/2495

3 years ago by Sara Golemon — view source

unread

For a long time, ext/curl worked with handles as "curl resources" and
functions mapped to the libcurl functions, but since PHP8.0 "curl
resources" were converted to opaque classes with no methods. So my main
goal here is to come up with a solution for the new Curl URL API, but maybe
also to be consistent, make some changes to existing curl classes like
CurlHandle to add methods and improve the current state of the extension.

This is so bizarre, I know I wrote an OOPified cURL extension some years
ago (called it curli), and now I can't find it anywhere. What universe am
I even in?

Anyway, +1 on making a whole new OOPified cURL implementation. $ch =
(new CurlHandle($uri))->setOpt(Curl::OPT_METHOD, 'post')->perform();
Let it throw exceptions, do proper type checks, all the goodness. Let the
procedural APIs behave differently than the object methods and that's okay.

cURL is a foundational extension in PHP and I respect the choices that were
made in its API design, but it's 2022, and we deserve a better ext/curl.

-Sara

3 years ago by Robert Landers — view source

unread

For a long time, ext/curl worked with handles as "curl resources" and
functions mapped to the libcurl functions, but since PHP8.0 "curl
resources" were converted to opaque classes with no methods. So my main
goal here is to come up with a solution for the new Curl URL API, but maybe
also to be consistent, make some changes to existing curl classes like
CurlHandle to add methods and improve the current state of the extension.

This is so bizarre, I know I wrote an OOPified cURL extension some years
ago (called it curli), and now I can't find it anywhere. What universe am
I even in?

Anyway, +1 on making a whole new OOPified cURL implementation. $ch =
(new CurlHandle($uri))->setOpt(Curl::OPT_METHOD, 'post')->perform();
Let it throw exceptions, do proper type checks, all the goodness. Let the
procedural APIs behave differently than the object methods and that's okay.

cURL is a foundational extension in PHP and I respect the choices that were
made in its API design, but it's 2022, and we deserve a better ext/curl.

-Sara

Also, would it be unreasonable to ask it to take advantage of the
Fiber constructs if in a Fiber context? That would be simply amazing
if this high-level OOP implementation were async-ish by default.

If anything should get native Fiber superpowers, it should be this
extension, IMHO. Also, IMHO, this is an unreasonable request, but
perhaps a separate RFC once this original proposal is accepted and
figured out?

3 years ago by Pierrick Charron — view source

unread

Hi Sara

This is so bizarre, I know I wrote an OOPified cURL extension some years
ago (called it curli), and now I can't find it anywhere. What universe am
I even in?

I don't know but if you pass through the parallel universe where Curli is,
I'm interested.

Anyway, +1 on making a whole new OOPified cURL implementation. $ch =
(new CurlHandle($uri))->setOpt(Curl::OPT_METHOD, 'post')->perform();
Let it throw exceptions, do proper type checks, all the goodness. Let the
procedural APIs behave differently than the object methods and that's okay.

cURL is a foundational extension in PHP and I respect the choices that
were made in its API design, but it's 2022, and we deserve a better
ext/curl.

I hope you don't mind, I took some of your code from your "Enable
strict_types checking for curl_setopt()" pull request [1] to do some test
on introducing this but only on the OOP API. It's working very well [2].
Can I know why this PR was closed ? Any issues ? Or was it more because it
was too much of a change for the procedural API ?

Thanks
Pierrick

[1] https://github.com/php/php-src/pull/2495
[2] https://github.com/adoy/php-src/pull/2

3 years ago by Rowan Tommins — view source

unread

I hope you don't mind, I took some of your code from your "Enable
strict_types checking for curl_setopt()" pull request [1] to do some test
on introducing this but only on the OOP API. It's working very well [2].
Can I know why this PR was closed ? Any issues ? Or was it more because it
was too much of a change for the procedural API ?

While a few type errors there might be useful, it still feels a bit like polishing a turd.

I count about 180 different options that can be passed to curl_setopt. Some of them are used by nearly every user of PHP, and some are for very obscure niche uses. Many of them are useless on their own, and exist only to tweak the behaviour of some other option, or need to be used in groups like usernames and passwords. Others have unintuitive and unhelpful defaults, that need to be overridden with a different option. The manual page has long topped the league table for user comments, currently standing at 102. It is simply not a practical design for what most users of PHP need.

I think one way out of this mess is to start grouping options together, and providing specific methods to set them, with named, type-safe parameters. For instance, CurlHandle::setProxyOptions(string $url, ?string $authUserName, ?string $authPassword, ...) and so on.

If the long lists of parameters still feel a bit messy, we could go a step further and make objects for building sets of options, with methods taking short lists of genuinely dependent options, e.g. CurlProxyOptions::setAuth(string $userName, string $password) and CurlProxyOptions::disableAuth()

Some of the options should be rethought in the process to make more sense to PHP's common usage, rather than libcurl's. As just one example, enabling CURLOPT_VERBOSE sends some debug information to stderr, but on a web server that is normally a log file where it will be mixed in with other information. To capture it instead you need to also set CURLOPT_STDERR, which requires an open file handle. What most users actually want is for that information to be available to them as a string after the request is executed.

Another thing that might be useful is some new factory methods that set sensible defaults, like CurlHandle::createForHttp(string|CurlUrl $url, string $method)

If we can build these things on top of the existing CurlHandle, we can release a few things first, and progressively add features. Rather than having to decide which extension to use, users will still have access to curl_setopt for things there isn't a nicer interface for yet. We can even leave it in place as something like CurlHandle::setRawOption for when users want fine control of the library, or some really obscure setting.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Pierrick Charron — view source

unread

Rowan, all,

Thanks again for your comments. PHP8.2 is going to be feature freeze in
exactly one month now and as the currently proposed improvements have not
received any interest (and that's OK), I will put those aside for the
moment and come back with a new approach that I hope will be better for the
future release (8.3 ?)

However, about the new Curl URL API, I think it's still time to finalize
the discussions and include it in the 8.2 release as it allows us to solve
some potential security issues.

What do you think ? Do you see any interest in adding the new CurlUrl class
in PHP8.2 ? Again currently proposed implementation can be found here [1]
and is waiting for your new comments.

Pierrick

[1]
https://github.com/adoy/php-src/pull/1/files#diff-5bd329a70f563b5b6eded3b710514a4fac605329087a7bd118e40f80b9fc8ee3

Le dim. 19 juin 2022, à 09 h 55, Rowan Tommins rowan.collins@gmail.com a
écrit :

I hope you don't mind, I took some of your code from your "Enable
strict_types checking for curl_setopt()" pull request [1] to do some test
on introducing this but only on the OOP API. It's working very well [2].
Can I know why this PR was closed ? Any issues ? Or was it more because it
was too much of a change for the procedural API ?

While a few type errors there might be useful, it still feels a bit like
polishing a turd.

I count about 180 different options that can be passed to curl_setopt.
Some of them are used by nearly every user of PHP, and some are for very
obscure niche uses. Many of them are useless on their own, and exist only
to tweak the behaviour of some other option, or need to be used in groups
like usernames and passwords. Others have unintuitive and unhelpful
defaults, that need to be overridden with a different option. The manual
page has long topped the league table for user comments, currently standing
at 102. It is simply not a practical design for what most users of PHP need.

I think one way out of this mess is to start grouping options together,
and providing specific methods to set them, with named, type-safe
parameters. For instance, CurlHandle::setProxyOptions(string $url, ?string
$authUserName, ?string $authPassword, ...) and so on.

If the long lists of parameters still feel a bit messy, we could go a step
further and make objects for building sets of options, with methods taking
short lists of genuinely dependent options, e.g.
CurlProxyOptions::setAuth(string $userName, string $password) and
CurlProxyOptions::disableAuth()

Some of the options should be rethought in the process to make more sense
to PHP's common usage, rather than libcurl's. As just one example, enabling
CURLOPT_VERBOSE sends some debug information to stderr, but on a web server
that is normally a log file where it will be mixed in with other
information. To capture it instead you need to also set CURLOPT_STDERR,
which requires an open file handle. What most users actually want is for
that information to be available to them as a string after the request is
executed.

Another thing that might be useful is some new factory methods that set
sensible defaults, like CurlHandle::createForHttp(string|CurlUrl $url,
string $method)

If we can build these things on top of the existing CurlHandle, we can
release a few things first, and progressively add features. Rather than
having to decide which extension to use, users will still have access to
curl_setopt for things there isn't a nicer interface for yet. We can even
leave it in place as something like CurlHandle::setRawOption for when users
want fine control of the library, or some really obscure setting.

Regards,

--
Rowan Tommins
[IMSoP]

--

To unsubscribe, visit: https://www.php.net/unsub.php

3 years ago by Christoph M. Becker — view source

unread

Thanks again for your comments. PHP8.2 is going to be feature freeze in
exactly one month now and as the currently proposed improvements have not
received any interest (and that's OK), I will put those aside for the
moment and come back with a new approach that I hope will be better for the
future release (8.3 ?)

However, about the new Curl URL API, I think it's still time to finalize
the discussions and include it in the 8.2 release as it allows us to solve
some potential security issues.

What do you think ? Do you see any interest in adding the new CurlUrl class
in PHP8.2 ? Again currently proposed implementation can be found here [1]
and is waiting for your new comments.

Pierrick

[1]
https://github.com/adoy/php-src/pull/1/files#diff-5bd329a70f563b5b6eded3b710514a4fac605329087a7bd118e40f80b9fc8ee3

Given that designing a new OO API obviously isn't uncontroversial, and
that there are only four weeks left till feature freeze, it might be
best to go with a simple procedural API for PHP 8.2 only. This can be
marked as being experimental, and can still be deprecated in one of the
next minors. That's not nice for early adopters, but still seems better
than to not introduce the CURL URL API in PHP 8.2, and certainly is
better than to rush some OO API.

Christoph

3 years ago by Craig Francis — view source

unread

about the new Curl URL API, I think it's still time to finalize the discussions and include it in the 8.2 release as it allows us to solve some potential security issues.

Given that designing a new OO API obviously isn't uncontroversial, and that there are only four weeks left till feature freeze, it might be best to go with a simple procedural API for PHP 8.2 only.

+1... while I'd also like an OO API, it will be a much bigger project, and would need a fair amount of thought/discussion.

As to the URL API, it solves the problem of developers making their own URL parsers that often work differently to the cURL parser (those differences can cause security issues); and considering it's possible for cURL 7.62.0 (Oct 2018) to do that parsing, and it's a fairly simple API, that would work in a similar way to the existing cURL functionality in PHP, it would be good for PHP developers to use that in PHP 8.2.

Craig

3 years ago by Rowan Tommins — view source

unread

Given that designing a new OO API obviously isn't uncontroversial, and
that there are only four weeks left till feature freeze, it might be
best to go with a simple procedural API for PHP 8.2 only.

I haven't read back through the thread, but my impression was that for
the curl URL facility specifically the opposite was the case: a simple
object with no procedural equivalent would be everyone's preference.
CurlFile provides enough of a precedent for adding that IMO.

Either way, we have to agree on some naming details, so I've left some
comments on the PR.

(By the way, the PR you linked is to merge into your own fork's master,
not the actual central php-src repo. Not sure if that was deliberate.)

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Pierrick Charron — view source

unread

I haven't read back through the thread, but my impression was that for
the curl URL facility specifically the opposite was the case: a simple
object with no procedural equivalent would be everyone's preference.
CurlFile provides enough of a precedent for adding that IMO.

Either way, we have to agree on some naming details, so I've left some
comments on the PR.

Your suggested names were good, so I used them except for
NON_SUPPORT_SCHEME that I replaced with ALLOW_UNSUPPORTED_SCHEME.

(By the way, the PR you linked is to merge into your own fork's master,
not the actual central php-src repo. Not sure if that was deliberate.)

I didn't do it intentionally. Sorry about that.

Regards
Pierrick

3 years ago by Sara Golemon — view source

unread

I hope you don't mind, I took some of your code from your "Enable
strict_types checking for curl_setopt()" pull request [1] to do some test
on introducing this but only on the OOP API. It's working very well [2].
Can I know why this PR was closed ? Any issues ? Or was it more because it
was too much of a change for the procedural API ?

Don't mind in the slightest. :)
Fair to assume I dropped it due to ADHD. I have a habit of
SQUIRREL!!!!!!!

-Sara