Hello Internals,
PHP suffers from several issues related to timeout and signal handling,
especially when built with ZTS enabled.
- The current implementation of timeouts on UNIX builds seems
"fundamentally incompatible with ZTS" (
https://bugs.php.net/bug.php?id=79464#1589205685) and more anecdotally
conflicts with some Go features (
https://github.com/golang/go/issues/56260#issuecomment-1281040802) - "Zend Signals" causes segmentation faults and other problems in
multi-threaded environments (
https://github.com/php/php-src/issues/9649#issuecomment-1264330874,
https://github.com/php/php-src/pull/5591#issuecomment-650064098), and seems
useless anyway since PHP 7.1 (
https://github.com/php/php-src/pull/5591#issuecomment-645428002)
In 2020, Alex Dowad started a major refactoring to improve these parts (
https://github.com/php/php-src/pull/5570,
https://github.com/php/php-src/pull/5591,
https://github.com/php/php-src/pull/5710), but he stopped working on it.
Instead of doing a major one-time refactoring like that, I propose moving
forward little by little to limit the risks and the potential backward
compatibility breaks.
Here is the plan:
- Switch to timer_create() for timeouts, but only on Linux (it's not
supported on other platforms yet), when ZTS is enabled and Zend Signals is
disabled. As the feature is currently entirely broken when ZTS is on, this
can be considered a bug fix and cannot be worse than it currently is.
1bis. Can be done independently of 1., and even in parallel, optional as
long as we keep the --disable-zend-signals flag: Remove Zend Signals
entirely, because even if it can be partially fixed (I proposed a patch
fixing segfaults: https://github.com/php/php-src/pull/9766), it seems now
useless and causes unfixable issues with some signals such asSIGINT
(
https://github.com/php/php-src/issues/9649#issuecomment-1265811930) - Switch to Grand Central Dispatch on macOS and FreeBSD when ZTS is
enabled and Zend Signals is disabled (if not removed at this point), which
provides a feature similar to timer_create() for these platforms. - Probably in a future major version, optional: switch to
timer_create()/GCD even for non-ZTS builds to uniformize and simplify the
code.
What do you think about this plan? Apart from the technical aspects, what's
the best way forward? Submit patches? Propose an RFC? Do both? (pardon my
ignorance of internals processes).
Thank you,
Kévin Dunglas
Here is the plan:
- Switch to timer_create() for timeouts, but only on Linux (it's not
supported on other platforms yet), when ZTS is enabled and Zend Signals is
disabled. As the feature is currently entirely broken when ZTS is on, this
can be considered a bug fix and cannot be worse than it currently is.
1bis. Can be done independently of 1., and even in parallel, optional as
long as we keep the --disable-zend-signals flag: Remove Zend Signals
entirely, because even if it can be partially fixed (I proposed a patch
fixing segfaults: https://github.com/php/php-src/pull/9766), it seems now
useless and causes unfixable issues with some signals such asSIGINT
(
https://github.com/php/php-src/issues/9649#issuecomment-1265811930)- Switch to Grand Central Dispatch on macOS and FreeBSD when ZTS is
enabled and Zend Signals is disabled (if not removed at this point), which
provides a feature similar to timer_create() for these platforms.- Probably in a future major version, optional: switch to
timer_create()/GCD even for non-ZTS builds to uniformize and simplify the
code.
Sounds good to me, for what it's worth.
I spent some time studying PR #5591 since it feels like I should have
some expertise in this area by now. Timeout handling in PHP 5 was a
joke. The list of library functions which can be safely re-entered
from a signal handler is listed on man signal-safety(7). Most notably
it does not include malloc(). You cannot just longjmp() out of a
signal handler and re-enter userspace code, as PHP 5 was doing.
I partly solved this problem just for Wikimedia in 2011 in a
hilariously hacky custom extension (wmerrors). It defensively produced
an error page, guarded by a second timer, then killed the process with
abort(). This helped to reduce the rate of deadlocks and segfaults in
Wikimedia production.
So I was pretty happy when the new system for signal and timeout
handling was introduced in PHP 7.1. I extended that system by writing
the Excimer extension, which provides userspace timer facilities with
timer_create() and vm_interrupt. We're using Excimer in production to
implement timeouts.
My understanding of PR #5591 is that it is removing the remnants of
the old signal handling system. There is a lot of discussion of the
correct implementation of HANDLE_BLOCK_INTERRUPTIONS(), but that macro
was from the outset an inadequate approach to signal safety. Maybe it
helped to reduce the rate of segfaults, but at Wikimedia's scale it
was still a segfault firehose. If your signal handler calls malloc(),
then obviously every malloc() call and every call of a library
function that may call malloc() becomes a critical section.
So in my view, the critical sections should all be removed and all
signal handlers should be implemented safely, by setting a flag that
is checked from a safe context. The same recommendation applies
whether you use timer_create() or setitimer().
In Excimer we are using timer_create() with SIGEV_THREAD. PHP request
threads are registered in a global hashtable with long-lived integer
handles, because there is no way to prevent events from being
delivered after the relevant request has terminated. You can't store a
pointer in sival_ptr if it will be freed at request shutdown.
SIGEV_THREAD is less intrusive than SIGEV_SIGNAL or SIGEV_THREAD_ID
since it doesn't interrupt syscalls.
What do you think about this plan? Apart from the technical aspects, what's
the best way forward? Submit patches? Propose an RFC? Do both? (pardon my
ignorance of internals processes).
Submit PRs I guess. I've only submitted a handful of PRs to PHP, and I
haven't got push access, so I'm not an expert. But I think the usual
strategies apply. Answer the nitpickers, get to know people, nag,
escalate, etc.
(I did have Subversion write access, which I got by taking over
maintainership of a PEAR package, but I was never brave enough to push
things to the PHP core.)
-- Tim Starling
This kind of thing requires careful thought. My first glance through
it looks good and I definitely want PHP to stop abusing the SIGPROF
signal because it prevents usage by actual profilers, which is
annoying. Full disclosure: I write a profiler for my job.
In addition, I think that there should probably be an internals API
for handing out POSIX realtime signal numbers for platforms which
support it. Quoting from the signal man page:
Unlike standard signals, real-time signals have no predefined
meanings: the entire set of real-time signals can be used for
application-defined purposes.
One could interpret this as meaning we should also make it accessible
to userland, but in line with the one step at a time approach, we can
do that later if we decide we need to. But this is something an
extension can't choose to do by itself: it must be done in core.
I'll review the proposal sometime more carefully. Thanks for bring it up.