Hello all,
After receiving some feedback about
https://github.com/php/php-src/pull/15603, I'm formally proposing an
RFC to add persistent curl share handles here:
https://wiki.php.net/rfc/curl_share_persistence
Thanks to those who have provided feedback so far! Of note: the
implementation introduces a global variable to the curl extension via
ZEND_BEGIN_MODULE_GLOBALS; this appears preferred over
EG(persistent_list). Should the implementation stick with the module
global, or should it go back to EG(persistent_list) until we've
created a formal non-resource API (in another RFC)?
I'd like to keep the future of persistent resources out of scope for
the decision on whether to add curl share handle persistence to PHP,
but it is worth deciding what form the persistence takes for now.
Hello again,
It has been two weeks, and unless there are any further comments I
plan on converting the above question into a voteable sub-question and
opening the RFC for votes tomorrow, Thursday October 24th.
Thanks!
Hello again,
It has been two weeks, and unless there are any further comments I
plan on converting the above question into a voteable sub-question and
opening the RFC for votes tomorrow, Thursday October 24th.Thanks!
Hello,
It might be a good idea to get a “strong opinion” on the signature of curl_share_init. This RFC may be reviewed years from now and it should describe why and how things should work from a developers point of view. People are likely to vote “no” if it is too exhaustive or not exhaustive enough (from watching RFCs on here for a bit) and shouldn’t (generally) go into implementation details because then changing them will require an RFC.
Also, it might be worth saying the “next minor version of PHP” just in case there isn’t an 8.5 and goes to 9.0. Dunno if people will be that pedantic, but you never know.
— Rob
Apologies Rob, I don't seem to be receiving individual emails from the
mailing list, so I'll have to respond to my own email here. I've
changed my subscription to receive all emails so hopefully I'll be
able to respond directly to future emails.
It might be a good idea to get a “strong opinion” on the signature of curl_share_init.
I think you are right. I've chatted with Rasmus and we both agree that
changing the signature of curl_share_init is preferable to adding a
new function - it's more in line with how we do persistence elsewhere.
I'll update the RFC to remove the signature question.
Were you also suggesting that I shouldn't ask a question about the
implementation regarding using EG(persistent_list) vs. a module
global? I'm torn; I agree that it is an implementation detail but I
was hoping to actually get wider input since it generated discussion
in my pull request. Since no one responded here I felt like it'd be a
lightweight way to get feedback on what people felt, but the lack of
response probably means that people don't feel strongly enough about
it.
That said, after reflecting I think I would agree that it is
unnecessary to add this as a question, and I'll continue with the
implementation as-is in the PR. If people would like to send feedback
on using EG(persistent_list) instead, feel free to do so here or in
the PR.
Also, it might be worth saying the “next minor version of PHP” just in case there isn’t an 8.5 and goes to 9.0.
Makes sense to me, I'll make this change.
Apologies Rob, I don't seem to be receiving individual emails from the
mailing list, so I'll have to respond to my own email here. I've
changed my subscription to receive all emails so hopefully I'll be
able to respond directly to future emails.It might be a good idea to get a “strong opinion” on the signature of curl_share_init.
I think you are right. I've chatted with Rasmus and we both agree that
changing the signature of curl_share_init is preferable to adding a
new function - it's more in line with how we do persistence elsewhere.
I'll update the RFC to remove the signature question.Were you also suggesting that I shouldn't ask a question about the
implementation regarding using EG(persistent_list) vs. a module
global? I'm torn; I agree that it is an implementation detail but I
was hoping to actually get wider input since it generated discussion
in my pull request. Since no one responded here I felt like it'd be a
lightweight way to get feedback on what people felt, but the lack of
response probably means that people don't feel strongly enough about
it.That said, after reflecting I think I would agree that it is
unnecessary to add this as a question, and I'll continue with the
implementation as-is in the PR. If people would like to send feedback
on using EG(persistent_list) instead, feel free to do so here or in
the PR.Also, it might be worth saying the “next minor version of PHP” just in case there isn’t an 8.5 and goes to 9.0.
Makes sense to me, I'll make this change.
For what it is worth, I think you should go for the EG(persistent_list) method. Over on FrankenPHP, we are looking into some improvements to TSRM (so that Go and PHP can share some of the same locks, among other things) and EG uses TSRM while I think a module global would not.
— Rob
Were you also suggesting that I shouldn't ask a question about the
implementation regarding using EG(persistent_list) vs. a module
global? I'm torn; I agree that it is an implementation detail but I
was hoping to actually get wider input since it generated discussion
in my pull request. Since no one responded here I felt like it'd be a
lightweight way to get feedback on what people felt, but the lack of
response probably means that people don't feel strongly enough about
it.That said, after reflecting I think I would agree that it is
unnecessary to add this as a question, and I'll continue with the
implementation as-is in the PR. If people would like to send feedback
on using EG(persistent_list) instead, feel free to do so here or in
the PR.For what it is worth, I think you should go for the EG(persistent_list) method. Over on FrankenPHP, we are looking into some improvements to TSRM (so that Go and PHP can share some of the same locks, among other things) and EG uses TSRM while I think a module global would not.
For practical purposes, EG(persistent_list)
is a module global (can
bikeshed whether the engine is a module). In particular, both use TLS
for ZTS builds, and neither can access entries of other processes and
threads, respectively.
I still think we should not add new stuff to EG(persistent_list)
;
after all, resources are scheduled for removal, and likely
EG(persistent_list)
will finally be removed, too, and then we would
need to convert to a curl module global anyway.
Christoph
Were you also suggesting that I shouldn't ask a question about the
implementation regarding using EG(persistent_list) vs. a module
global? I'm torn; I agree that it is an implementation detail but I
was hoping to actually get wider input since it generated discussion
in my pull request. Since no one responded here I felt like it'd be a
lightweight way to get feedback on what people felt, but the lack of
response probably means that people don't feel strongly enough about
it.That said, after reflecting I think I would agree that it is
unnecessary to add this as a question, and I'll continue with the
implementation as-is in the PR. If people would like to send feedback
on using EG(persistent_list) instead, feel free to do so here or in
the PR.For what it is worth, I think you should go for the EG(persistent_list) method. Over on FrankenPHP, we are looking into some improvements to TSRM (so that Go and PHP can share some of the same locks, among other things) and EG uses TSRM while I think a module global would not.
For practical purposes,
EG(persistent_list)
is a module global (can
bikeshed whether the engine is a module). In particular, both use TLS
for ZTS builds, and neither can access entries of other processes and
threads, respectively.I still think we should not add new stuff to
EG(persistent_list)
;
after all, resources are scheduled for removal, and likely
EG(persistent_list)
will finally be removed, too, and then we would
need to convert to a curl module global anyway.Christoph
Sounds like a perfectly good reason to not use EG(persistent_list) to me.
— Rob
Hello all,
After receiving some feedback about
https://github.com/php/php-src/pull/15603, I'm formally proposing an
RFC to add persistent curl share handles here:
https://wiki.php.net/rfc/curl_share_persistenceThanks to those who have provided feedback so far! Of note: the
implementation introduces a global variable to the curl extension via
ZEND_BEGIN_MODULE_GLOBALS; this appears preferred over
EG(persistent_list). Should the implementation stick with the module
global, or should it go back to EG(persistent_list) until we've
created a formal non-resource API (in another RFC)?I'd like to keep the future of persistent resources out of scope for
the decision on whether to add curl share handle persistence to PHP,
but it is worth deciding what form the persistence takes for now.
I have voted against the RFC as it lacks details and there seems to have been no analysis on what sort of issues this can cause.
This has been pointed out by Tim already, and this should have been caught earlier, but I don't know if it just me, but it seems your email kinda went into some void and got unnoticed.
Persistent handlers don't really fit PHP's model, and are practically untestable from a PHPT test perspective as far as I know (at least I have never seen any proper tests for the existing ones, but I am happy to be corrected).
As such I am very reluctant to adding more of these things to the language, especially when this seems to just have been hacked together quickly, we still have at least 7 months before feature freeze and I would prefer to actually be done properly.
Especially that it is always easier to add new features than to deprecate and remove bad ones (which is kinda backwards but not relevant to this discussion).
Best regards,
Gina P. Banyard
Hello all,
After receiving some feedback about
https://github.com/php/php-src/pull/15603, I'm formally proposing an
RFC to add persistent curl share handles here:
https://wiki.php.net/rfc/curl_share_persistenceThanks to those who have provided feedback so far! Of note: the
implementation introduces a global variable to the curl extension via
ZEND_BEGIN_MODULE_GLOBALS; this appears preferred over
EG(persistent_list). Should the implementation stick with the module
global, or should it go back to EG(persistent_list) until we've
created a formal non-resource API (in another RFC)?I'd like to keep the future of persistent resources out of scope for
the decision on whether to add curl share handle persistence to PHP,
but it is worth deciding what form the persistence takes for now.I have voted against the RFC as it lacks details and there seems to have been no analysis on what sort of issues this can cause.
I appreciate your feedback, but from my perspective there aren't
really any major issues that this can cause. We already support curl
share handles via curl_share_init, and adding persistence doesn't
fundamentally change the issues you may have with reusing a curl
handle. Tim pointed out that libraries may use this functionality -
and I hope they do, as it can improve performance - in a way that is
dangerous for users, but I frankly don't see that as likely. HTTP
request libraries will likely be interested in the connection sharing
aspect of this, not the cookie jar. If they were to enable cookie
sharing, they would need to do that in a way that was safe for users,
much like they would for any other setting in PHP. It's worth noting
that setting CURLOPT_COOKIEJAR
is already possible with PHP today.
This has been pointed out by Tim already, and this should have been caught earlier, but I don't know if it just me, but it seems your email kinda went into some void and got unnoticed.
Thank you for taking the time to respond at all, late or not.
Persistent handlers don't really fit PHP's model, and are practically untestable from a PHPT test perspective as far as I know (at least I have never seen any proper tests for the existing ones, but I am happy to be corrected).
Persistence is important to successfully scaling Etsy's monolithic PHP
application. As I mentioned in a response to Tim, we make use of
persistence for things like memcached connections in order to reduce
end-user latency, and we'd like to take that further by persisting the
connections of downstream curl requests we make.
This is not just an Etsy-specific issue;
https://blog.jetbrains.com/phpstorm/2024/10/php-annotated-october-2024/#%F0%9F%93%8A-rfc-add-persistent-curl-share-handles
notes that other large-scale PHP users are interested in persistent
HTTP(S) connections. I'd implore you to consider that this would have
a real-world impact on companies that make use of ext/curl in PHP,
without needing to use alternative runtimes or external software.
As such I am very reluctant to adding more of these things to the language, especially when this seems to just have been hacked together quickly, we still have at least 7 months before feature freeze and I would prefer to actually be done properly.
I'd like to note that I don't feel like this was "hacked together
quickly". I take pride in writing high-quality code, and I've
attempted to request feedback multiple times in both the PR and in the
internals mailing list. I plan on implementing tests as well as I
possibly can assuming the RFC passes, but I did not want to invest in
doing so unless it did pass. If there are any quality issues you see
in the PR, please feel free to point them out! @kocksismate already
noted a few changes that I planned on making.
Especially that it is always easier to add new features than to deprecate and remove bad ones (which is kinda backwards but not relevant to this discussion).
If we truly feel that persistence is wrong for PHP, we would need to
deprecate and remove things like persistent PDO connections; I would
expect that would be a perfectly fine time to also remove support for
persistent curl share handles, if they existed.
Thanks,
Eric
Hi
Am 2024-11-01 21:07, schrieb Eric Norris:
really any major issues that this can cause. We already support curl
share handles via curl_share_init, and adding persistence doesn't
fundamentally change the issues you may have with reusing a curl
Yes, it does. See the reply that I just sent in the voting thread.
Persistent handlers don't really fit PHP's model, and are practically
untestable from a PHPT test perspective as far as I know (at least I
have never seen any proper tests for the existing ones, but I am happy
to be corrected).Persistence is important to successfully scaling Etsy's monolithic PHP
application. As I mentioned in a response to Tim, we make use of
persistence for things like memcached connections in order to reduce
end-user latency, and we'd like to take that further by persisting the
connections of downstream curl requests we make.
As far as I know the memcache protocol is stateless - except possibly
for authentication. In that case every connection would be equal /
immutable, making it safe to reuse. MySQL - as an example - is
different, because changing any of the connection-related variables,
e.g. with SET sql_mode='foo'
might ruin your day for follow-up
connections.
If we truly feel that persistence is wrong for PHP, we would need to
deprecate and remove things like persistent PDO connections; I would
expect that would be a perfectly fine time to also remove support for
persistent curl share handles, if they existed.
I believe that stateful persistence is wrong and will lead to
hard-to-debug situations sooner or later, yes.
Best regards
Tim Düsterhus
Hi,
Hello all,
After receiving some feedback about
https://github.com/php/php-src/pull/15603, I'm formally proposing an
RFC to add persistent curl share handles here:
https://wiki.php.net/rfc/curl_share_persistence
I think what's really needed for this is to have some infrastructure for
connection pooling. The current persistent connections in streams (which
includes mysqlnd, redis and its other users) are quite primitive and not
that well managed. This might occasionally result in incorrect clean up and
possibly having too many connections open.
What I think we need is a new internal API that would be handled by SAPI.
So basically some generic API that SAPI would hook into and manage the
connections. For extension it could work by registering the pool with some
connect and clean up callbacks and various parameters (e.g. timeout and
maybe something else). Then it would be just getting the connections from
it. It should be quite generic so it can support also curl handles so the
actual connection should be opaque. The point here is that SAPI should
handle the pool and deal with timeouts / clean up. This could work pretty
well for the threaded SAPI's. FPM is a bit limited as it's per worker but
it could potentially set its own limits and could still better handle
timeouts and clean ups.
I think there should be not such a big oppositions to the general concept
because this is already rooted in PHP in the form of persistent resources
and this just tries to implement the same for objects which we might
eventually need for streams if they ever get them converted to objects.
There are likely many application successfully using persistent resources
and it could be quite problematic to drop the whole concept.
Basically what I want to say is that we have got chance to implement it
properly for objects and we should not end up with the same sort of way how
it's done currently for resources. The better way is to have SAPI managed
API IMHO.
Regards
Jakub
Hi,
On Wed, Oct 9, 2024 at 10:18 PM Eric Norris eric.t.norris@gmail.com
wrote:Hello all,
After receiving some feedback about
https://github.com/php/php-src/pull/15603, I'm formally proposing an
RFC to add persistent curl share handles here:
https://wiki.php.net/rfc/curl_share_persistenceI think what's really needed for this is to have some infrastructure for
connection pooling. The current persistent connections in streams (which
includes mysqlnd, redis and its other users) are quite primitive and not
that well managed. This might occasionally result in incorrect clean up and
possibly having too many connections open.What I think we need is a new internal API that would be handled by SAPI.
So basically some generic API that SAPI would hook into and manage the
connections. For extension it could work by registering the pool with some
connect and clean up callbacks and various parameters (e.g. timeout and
maybe something else). Then it would be just getting the connections from
it. It should be quite generic so it can support also curl handles so the
actual connection should be opaque. The point here is that SAPI should
handle the pool and deal with timeouts / clean up. This could work pretty
well for the threaded SAPI's. FPM is a bit limited as it's per worker but
it could potentially set its own limits and could still better handle
timeouts and clean ups.I think there should be not such a big oppositions to the general concept
because this is already rooted in PHP in the form of persistent resources
and this just tries to implement the same for objects which we might
eventually need for streams if they ever get them converted to objects.
There are likely many application successfully using persistent resources
and it could be quite problematic to drop the whole concept.Basically what I want to say is that we have got chance to implement it
properly for objects and we should not end up with the same sort of way how
it's done currently for resources. The better way is to have SAPI managed
API IMHO.
Just for the reference I voted yes on RFC because that's just about the
concept and user API which I think is fine and it's inline with persistence
that we already have. But note that it doesn't mean that we will accept
implementation in the current form.
Regards
Jakub
Hi,
Hello all,
After receiving some feedback about
https://github.com/php/php-src/pull/15603, I'm formally proposing an
RFC to add persistent curl share handles here:
https://wiki.php.net/rfc/curl_share_persistenceI think what's really needed for this is to have some infrastructure for connection pooling. The current persistent connections in streams (which includes mysqlnd, redis and its other users) are quite primitive and not that well managed. This might occasionally result in incorrect clean up and possibly having too many connections open.
What I think we need is a new internal API that would be handled by SAPI. So basically some generic API that SAPI would hook into and manage the connections. For extension it could work by registering the pool with some connect and clean up callbacks and various parameters (e.g. timeout and maybe something else). Then it would be just getting the connections from it. It should be quite generic so it can support also curl handles so the actual connection should be opaque. The point here is that SAPI should handle the pool and deal with timeouts / clean up. This could work pretty well for the threaded SAPI's. FPM is a bit limited as it's per worker but it could potentially set its own limits and could still better handle timeouts and clean ups.
I think there should be not such a big oppositions to the general concept because this is already rooted in PHP in the form of persistent resources and this just tries to implement the same for objects which we might eventually need for streams if they ever get them converted to objects. There are likely many application successfully using persistent resources and it could be quite problematic to drop the whole concept.
Basically what I want to say is that we have got chance to implement it properly for objects and we should not end up with the same sort of way how it's done currently for resources. The better way is to have SAPI managed API IMHO.
Just for the reference I voted yes on RFC because that's just about the concept and user API which I think is fine and it's inline with persistence that we already have. But note that it doesn't mean that we will accept implementation in the current form.
Thank you for your feedback, and for your vote! I'm open to
implementation changes, so feel free to leave comments on the pull
request in the event this passes.
I'm also open to your idea of having a connection pooling mechanism in
the SAPI, and would be interested in helping draft that as an RFC if
that is necessary. I had noted something like this in the 'Future
Scope' section of this RFC, but wanted to aim small since this was my
first RFC.
Eric