Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:117958
Return-Path: <pierrick@webstart.fr>
Delivered-To: mailing list internals@lists.php.net
Received: (qmail 84247 invoked from network); 16 Jun 2022 05:22:26 -0000
Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5)
  by pb1.pair.com with SMTP; 16 Jun 2022 05:22:26 -0000
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id 9A584180212
	for <internals@lists.php.net>; Thu, 16 Jun 2022 00:10:22 -0700 (PDT)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,
	RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE
	autolearn=no autolearn_force=no version=3.4.2
X-Spam-ASN: AS15169 209.85.128.0/17
X-Spam-Virus: No
X-Envelope-From: <pierrick@webstart.fr>
Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Thu, 16 Jun 2022 00:10:21 -0700 (PDT)
Received: by mail-lj1-f179.google.com with SMTP id l20so556382lji.0
        for <internals@lists.php.net>; Thu, 16 Jun 2022 00:10:21 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
        bh=gLOu2WXGsJxLYa4lMN8dGK0SUTJUGcrxLhOt4zJDgt8=;
        b=U6W1Z/94qchOUFv9lBSKnm5pvFeMbEHqOOgYkcZcfBNVHCsT73dIO1HAztfRS8LP2j
         o5JZ4TdBALZw7JcUVEzifMtv13e0xRFP3/bu6losNbkWOQTHgeRDQgsdSAi+xYdrslJ4
         CoaWaZ90V5Qndi0JfH5KzTLd3NAIO9FgiBQ8zQcR+JBZwfHYwlowwy0smuIMNK+QNBhS
         NMavxjS4CrBBdk9ysYLdC2PUGHc8CdcGDVO8rNaJJoYQkep/QolMXyEEQ1/NYa4z166M
         Y4ngdwm6dP8OEdILWRvYP9ZLCVKbWGfu9nPkQzT9HIifPXWeU9KzTBlB0EtUj5BSI642
         egZQ==
X-Gm-Message-State: AJIora+uV6fhLIVLlleolJU62XUtaUOcNlxPoYelGg5Pu9Foxl1R1g1m
	TrlBMhkRdnlyxeSHv1wGExL+JRQyM7brRZXGRJYG8xwMGNXeMw==
X-Google-Smtp-Source: AGRyM1t4Qm0MCaPe6L4DcxOrsqwSLtu8Y1ygDL3Bf1tRer64NtoISYq0sGxHiYklT+HZiZyqbKIyEzesaaA2pJtLVFM=
X-Received: by 2002:a05:651c:893:b0:249:4023:3818 with SMTP id
 d19-20020a05651c089300b0024940233818mr1899232ljq.44.1655363419780; Thu, 16
 Jun 2022 00:10:19 -0700 (PDT)
MIME-Version: 1.0
Date: Thu, 16 Jun 2022 03:10:08 -0400
Message-ID: <CAOfKkdRzoo7JuD07F8nkZBYvS5eDMiV2r6cNV-FgeLxBrdCsiQ@mail.gmail.com>
To: PHP internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary="0000000000009c4dbb05e18b5485"
Subject: Discussion about new Curl URL API and ext/curl improvements
From: pierrick@php.net (Pierrick Charron)

--0000000000009c4dbb05e18b5485
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi internals,

Since its version 6.62.0 [1], libcurl features a brand new URL API [2] that
can be used to parse and generate URLs, using libcurl=E2=80=99s own parser.=
 One of
the goals of this API is to tighten a problematic vulnerable area for
applications where the URL parser library would believe one thing and
libcurl another. This could and has sometimes led to security problems [3].
The new API is handled base and composed of 5 new functions [4] (and one
more since 7.80.0 to get more verbose information upon error).

I started to work on an implementation [5] to expose this API to PHP
userland in the curl extension so that PHP users can benefit from it. My
first reflex was to stay consistent on how the extension is currently
working and do a one to one mapping of the new curl functions. I got
feedback from both Ilija and Christoph saying that the API is, I quote, "a
bit clunky" which I can't disagree with.

For a long time, ext/curl worked with handles as "curl resources" and
functions mapped to the libcurl functions, but since PHP8.0 "curl
resources" were converted to opaque classes with no methods. So my main
goal here is to come up with a solution for the new Curl URL API, but maybe
also to be consistent, make some changes to existing curl classes like
`CurlHandle` to add methods and improve the current state of the extension.

Here is the solution I would propose :
- First of course keep all the current ext/curl functions as is not to
break any compatibility (Seems obvious but just to be sure everybody is on
the same page).
- For consistency expose the new Curl URL API as functions mapped one to
one to libcurl functions :

function curl_url(?string $url =3D null): CurlUrl|false {}
function curl_url_set(CurlUrl $url, int $part, string $content, int $flags
=3D 0): bool {}
function curl_url_get(CurlUrl $url, int $part, int $flags =3D 0):
string|false {}
function curl_url_errno(CurlUrl $url): int {}
function curl_url_strerror(int $error_code): ?string {}

- Add methods to the CurlUrl object to make it less opaque and expose an
object oriented style API. I would keep it minimal and let userlanAd API
provide higher level APIs as guzzle for example. (You can see the current
implementation [5])

final class CurlUrl implements Stringable
{
    public function __construct(?string $url =3D null) {}
    public function get(int $part, int $flags =3D 0): string {}
    public function set(int $part, string $content, int $flags =3D 0): void=
 {}
    public function getErrno(): int {}
    public function __toString(): string {}
    public function __clone() {}
}

- It would also be nice to add this object oriented API for existing
CurlHandle, CurlMultiHandle and CurlShareHandle classes. For example the
CurlHandle class would look like that (First implementation [6])

final class CurlHandle
{
    public function __construct(?string $url =3D null) {}
    public function setOpt(int $option, mixed $value): bool {}
    public function getInfo(?int $option =3D null): mixed {}
    public function exec(): string|bool {}
    public function escape(string $string): string|false {}
    public function unescape(string $string): string|false {}
    public function pause(int $flags): int {}
    public function getErrno(): int {}
    public function reset(): void;
    public function setOptArray(array $options): bool {}
    public function upkeep(): bool {}
    public function __clone() {}
}

As of right now I still have some unanswered questions like how should we
handle errors on the new CurlUrl API ?
- Throw `CurlUrlException` on both the procedural and object oriented style
API (that's how current implementation works [5])
- Throw `CurlUrlException` with the oriented style API, but return for
example boolean/null on errors when user uses the procedural API ?
- Always return null/booleans using both object oriented API and
procedural API ?

The same question applies if we add a new object oriented style API on
existing classes. Current functions MUST stay unchanged but should we throw
`CurlException` when the user uses the object oriented style API ? (That's
what I would do) Or should we return the same result from OO and procedural
API ? Should we make the new CurlApi consistent with that ?

What are your thoughts about all this ? Any feedback on any of those
subjects are more than welcome.

Pierrick

[1]
https://daniel.haxhttps://github.com/adoy/php-src/tree/curl-object-apix.se/=
blog/2018/10/31/curl-7-62-0-moar-stuff/
<https://daniel.haxx.se/blog/2018/10/31/curl-7-62-0-moar-stuff/>
[2] https://daniel.haxx.se/blog/2018/09/09/libcurl-gets-a-url-api/
[3]
https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-E=
xploiting-URL-Parser-In-Trending-Programming-Languages.pdf
[4] https://github.com/curl/curl/blob/master/include/curl/urlapi.h
[5] https://github.com/php/php-src/pull/8770
[6] https://github.com/adoy/php-src/tree/curl-object-api

--0000000000009c4dbb05e18b5485--