Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125862 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 2C8721A00BD for ; Fri, 25 Oct 2024 14:29:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1729866719; bh=3JKIB1U7dLGwBLi04g0Oz7pmjeMquF8MOZoiube22rQ=; h=References:In-Reply-To:Reply-To:From:Date:Subject:To:From; b=ik3byUkGZu4Ji7ZhdWc7lBc9hPkJlozuvNN10nKYQWI2sZa6c4EZ33PYKvJ7mabwd k2Csp0hnrEuiq2pr8iHtKdri6iLLyGlBtMeHaQZcxi7bNbM5k/wNc1EIUxnLypK0Zw ndZ0dqr7V5DFCL/RFhb6wC41NqSDhDyIDv+FhWvWgRLwweAdhzWZ+wV3qyDeiB8fPR jMVrecvKiPr5tE1iB7hoSMSKnNDYY+1pks30M5mVQm33k7w9gWXq+nohm316I/OUPd 7AKU2jGcqvxHu/iofP3uPGjc+gCtgxQk8mpk+rPmA5jScGwCg6FTW9CuBD5iptux0T KbPwQtj3DMknA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3A2E2180061 for ; Fri, 25 Oct 2024 14:31:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: * X-Spam-Status: No, score=1.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, FREEMAIL_REPLYTO,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-il1-f181.google.com (mail-il1-f181.google.com [209.85.166.181]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 25 Oct 2024 14:31:57 +0000 (UTC) Received: by mail-il1-f181.google.com with SMTP id e9e14a558f8ab-3a3b6b281d4so8486465ab.0 for ; Fri, 25 Oct 2024 07:29:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729866571; x=1730471371; darn=lists.php.net; h=content-transfer-encoding:to:subject:message-id:date:from:reply-to :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3JKIB1U7dLGwBLi04g0Oz7pmjeMquF8MOZoiube22rQ=; b=HvEig7Yr6J4Zbw1evrFiJ7z3ox9XQ2qFBd8f+vGeM2ilHGU6361xsiaa/8CTJk9GvT R0A24EeDc/abbC9H+LYHVZkr7LPusMkfU4lzSOsoQ1KK6SmjEFOCYJqSuky0XG/5qYVc 2oHLEQ3fmfdCx3OzEPlwhn9xx3sPRJIbvWdpHcOX1JmNX+NlnmVm/f4o+n/VBOnxu4BI rxfidyFgSaS1r9Jpj399xq7GaLjnOPIYUcD8DD/NIk7Vtu6+QTPna2r+ZAAz9iT3UDHO 5Vj0x+nc65RslmnpAED6HlH82GHdanT805lg1777Fj8iUAfmJxdz17kThaAfoNkxPkS8 YRiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729866571; x=1730471371; h=content-transfer-encoding:to:subject:message-id:date:from:reply-to :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3JKIB1U7dLGwBLi04g0Oz7pmjeMquF8MOZoiube22rQ=; b=Uu2bVtx2QT0EIif6NmbHFKVf36+w4Nq30AICgLi0htgY+/kC32b8B1F2rxuMSX34Pz VM7yEFFMdXs0L2bJBSJSYyKsM/3Ny3uiSInvffULv/XnUGTFwSLyHqweix6Br9gI4HO+ QAERQYYqiLjJn/mAN02XF54ZBXNOEkthH84FezOqExdlDDQGuZcyHKn49KpLq76Y14wQ 8MWNgw8LgNjkkhRjKbhKDV/h9Q+sm1/nc9E3gjTBc2LlsDZ5GBD01+i6xcfJ3Cc+8OCC qpIMutHt8nSa2H97bkjyBFvxQIIttoQPDzlsnlLDf5BLXealdhyHo5dcnJhdCRfRxu/R mEdg== X-Gm-Message-State: AOJu0YwgacAVaJiOOao04qTTYiggqAoEZJ/hOgnZ4LZOXLK3bWn4CHaK NbybZQgUI/mGcTRRZXi8Ja4oZP6hdQYsVBCfbJtG3VdlI59LmqqQlqnHvfBFpGCaNV5qGMr5qU1 ZcuIZZy75DWdC+Ko7X1o6KuezPdpHMu/x X-Google-Smtp-Source: AGHT+IExyaTY4Vl/KPdbRCuKw+gPxCb4jimLYP4VGeiSY7HBSDZv/hlyKPgwKeXy2TKlQBFJIhnsQwxOEhO+qas80jI= X-Received: by 2002:a92:8712:0:b0:3a4:e452:c42c with SMTP id e9e14a558f8ab-3a4e452c718mr28249865ab.6.1729866570688; Fri, 25 Oct 2024 07:29:30 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <6e533102b9a2e9c8f6a2183440b2601a@bastelstu.be> In-Reply-To: <6e533102b9a2e9c8f6a2183440b2601a@bastelstu.be> Reply-To: erictnorris@gmail.com Date: Fri, 25 Oct 2024 10:29:14 -0400 Message-ID: Subject: Re: [PHP-DEV] [VOTE] Add persistent curl share handles To: PHP internals Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: eric.t.norris@gmail.com (Eric Norris) > On Fri, Oct 25, 2024 at 3:34=E2=80=AFAM Tim D=C3=BCsterhus wrote: > Apologies, I wanted to chime in before the vote started, but I was too > busy. I appreciate that you took the time to respond at all, so thank you. > Persistent handles / resources / objects violate PHP=E2=80=99s shared-not= hing > request model, which is one fundamental part of how PHP works from my > mental model. That's why I generally believe that persisting data across > requests is generally unsafe, especially when used with stateful > connections like a database connection. HTTP is generally stateless and > I generally trust curl to do the right thing, which makes this somewhat > safer, nevertheless the RFC would still be exposing a configurable (and > thus stateful) object to an unknown number of future requests. I am a strong believer of PHP's shared-nothing request model as well! That said, I think it's great that applications can opt-in to persistence where it benefits them. For example, Etsy, a monolithic PHP web application, makes heavy use of persistence with memcached connections - our application performance would not be the same without it. Our application also makes heavy use of "fan-out" by using curl to issue parallel downstream requests to assemble and render a page, and persistence here with the curl share interface would again benefit the design. For what it's worth, one thing I like about persistence in PHP is that while it's not exactly shared-nothing, it is still only shared by a single worker. This means that you don't have to think about locking, etc. like you may have to with other languages, which is great! > I'm especially concerned, because the documentation for > `curl_share_init()` uses `CURL_LOCK_DATA_COOKIE` as the example. I would > also assume that sharing a cookie jar amongst several requests is the > primary use case for leveraging a curl share handle, given that curl > already performs in-request connection reuse when reusing a single > CurlHandle or when performing requests with a CurlMultiHandle. I don't believe that sharing a cookie jar is the primary use case; it certainly wouldn't be for us. Our primary use case is DNS and connection sharing, for the aforementioned fanout. > Accidentally sharing a cookie jar for unrelated requests due to a badly > chosen `$persistent_id` sounds like a vulnerability to is bound to > happen to someone. I'll admit that I don't have a good response to this, since while I agree this is possible, I don't think it outweighs the benefit that persistence gives to the applications that need it (again, we won't be using the cookie jar). I see you've noted concerns about documentation, would this be something you'd at least feel more comfortable with if curl persistence documentation included a strong warning? > Your RFC then lists "DNS lookups" as one of the possible things that > would be sped up. Caching DNS lookups is already a solved problem: > Install a caching resolver on your server. These commonly come with > features such as pre-fetching, ensuring that commonly-performed lookups > will stay cached at all times. Interestingly, one of the reasons that I began this work was that we found a measurable performance impact of talking to the caching resolver on localhost, let alone the connection overhead. Essentially nearly all of our requests will subsequently turn around and issue multiple downstream requests, and each one to a separate host will involve DNS lookups and establishing connections. When I looked into mitigating this, I found the "share" interface, and here we are. So I agree that a caching resolver can reduce the performance cost of doing DNS lookups anew on each request, but as it does not completely eliminate it, nor does it address the cost of connections, there's still some performance gains left on the table. > As for sharing a TLS or HTTP connection, > this might or might not lead to vulnerabilities similarly to cookie > sharing, when TLS client certificates are used for authentication. I believe curl itself handles this - it won't re-use a connection if the settings aren't the same. > Lastly there does not appear to be a way to close a persistent share > handle either implicitly or explicitly, making it hard to reset the > share handle to a well-defined state / to know what state the share > handle is in. I had originally asked this in the "open questions" of the RFC, but as I received no response, I figured that we could address that at the pull-request level; apologies if that was presumptuous. M=C3=A1t=C3=A9 Kocs= is asked this very question in the pull request and I expected to re-raise it there. We do already have a curl_share_close method, so the RFC would not need to add a new method, and since the method is called curl_share_close, it's not really changing the meaning of the method either. > Also badly chosen `$persistent_id`s might result in a > large number of handles accumulating, without any kind of garbage > collection. For the existing persistent handles (e.g. for database > connections), the ID is chosen by PHP itself, ensuring a somewhat > predictable behavior. I agree that this may be a risk, but it's not clear to me that this is a risk that people would realistically run into, considering you'd want to have a small number of persistent IDs in order to benefit from re-using them. Dynamically picking your persistent IDs would lead to less re-use. > All in all, this RFC sounds like a foot-gun just waiting to go off and > the RFC text is little more than a stub which does not inspire > confidence that all the possible consequences have been fully > thought-through. I, of course, disagree that this is a "foot-gun just waiting to go off". I would maybe agree if persistence was the default behavior, but that is not what I am proposing - a developer must opt-in. From a cursory GitHub search, curl_share_init is very rare, and not something that developers are reaching for. Frankly, I see very little reason to use it without persistence, considering that curl_multi_init does the same thing, and is simpler to use. Regarding the RFC text, I apologize for not filling it out more, but it seemed straightforward to me: curl supports the share interface as a primary feature, and there are no glaring warnings in the documentation (https://curl.se/libcurl/c/libcurl-share.html) indicating that you might hold the share interface incorrectly. As I noted, I believe that TLS reuse is safe per the library itself. The only thing new to PHP here is the persistence aspect, which I noted would impact the SAPI by increasing memory usage "proportional to the number of persistent curl share handles." > After all the pros and cons would also need to be > documented in PHP=E2=80=99s documentation and the RFC text would be a goo= d > source for the documentation team. My plan was to help add the documentation (and tests) after this passed; it was unclear to me that documentation may need to be sorted out up-front in the RFC process. That said, I believe I can summarize the remaining cons from your email: - A developer could reuse a cookie jar with cookies from user A in a subsequent request from user B to origin Z if they opted in to persistence, used CURL_LOCK_DATA_COOKIE, and used the same persistent handle for both users. - A developer could balloon memory usage by using dynamic persistent IDs, especially so if curl_share_close doesn't close them. If I were to add notes about these both to the documentation, and curl_share_close closed persistent share handles, would you change your vote for this or a future RFC? If not, and you're fundamentally opposed to persistent handles, I can only urge you to reconsider based on the impact this would have on applications that benefit from persistence. I'd like to share this one piece of feedback from Matthieu Napoli (https://github.com/php/php-src/pull/15603#issuecomment-2348583997): > [...] I think this could have HUGE benefits to applications out there. > > Switching from PHP-FPM to a runtime like FrankenPHP, Laravel Octane (Road= runner/Swoole), or similar, usually gives the following benefits: > > - no longer need to boot the framework on every request (saves a signific= ant amount of time) > - but also no longer need to open new HTTPS connections on every request > > The second part is much less talked about, but very important in terms of= performance. Opening an HTTPS request can sometimes take 100ms or more! > > I believe such a feature could be a performance game changer for the PHP = ecosystem, without having to rewrite apps so that they work in long-lived p= rocesses (assuming libraries like Guzzle, etc. take advantage of it ofc). At Etsy and likely other organizations, it would be infeasible to switch to these frameworks as the effort required would be herculean. Persistence in PHP allows applications like ours to benefit from avoiding duplicative work, while still maintaining the fundamentals of the shared-nothing runtime we're familiar with. Thanks again for your response, Eric