All,
We were discussing a range of bugs today with the garbage collector. For
example: https://bugs.php.net/bug.php?id=64827
After quite a bit of digging, it appears what's happening is that the
garbage collector is running during the shutdown of PHP. So the destructors
are fired, and the variables are being destroyed when the GC run happens.
This means that the GC, while walking the variable tree runs into a
partially destructed object (where an entry of the hash table has already
been freed). This causes a segfault, and fun ensues.
Under normal conditions (not during shutdown), this does not appear to be
an issue, because the zval is destructed prior to the object destruction.
This means that there should never be a case where the GC hits a partially
freed object during normal execution.
From what I can see, there are two possible fixes. The first would be to
change how object destruction works in the first place, to tie the variable
into the destruction process (basically refactor the object delref API to
also accept the current zval). That way the part of the code that makes the
decision to nuke the object can nuke the zval first (and hence prevent this
condition). However, this is a major API change and would effect a lot of
extensions that are using or tieing into this hook.
The other option would be to simply disable the GC on shutdown. Considering
all of the variables are going to be thrown away anyway, having the GC run
during shutdown seems a bit wasteful anyway. So if we can kill two birds
with one stone, why not?
I've prepared a basic patch:
https://github.com/ircmaxell/php-src/compare/gc_deactivate_on_shutdown
I did confirm with odoucet (one of the original reporters) that this does
clear up his issue: https://gist.github.com/odoucet/5796378 (along with
trying a bunch of other things).
There are a few out standing questions though:
-
Technically, all we need to do is force GC_G(gc_enabled) = 0 in
shutdown. But we could also use zend_alter_ini_entry which has the same
effect. The question comes is there any reason to go through the overhead
of altering the ini entry instead of the global directly? We do access the
global directly in other areas (but it's typically only set via ini)... -
I put it in php_request_shutdown() after deactivate_ticks, but before it
calls shutdown functions. I could see it being moved after the shutdown
function call, but I'm not sure if it's worth it (either way). thoughts? -
Can anyone think of a reason we'd want the GC enabled during the request
shutdown? I can't think of any...
Additionally, considering that this does solve a segfault, is it worth
nominating this for 5.3? Or is it too risky (or something else I'm
missing)...
Thanks,
Anthony
- Can anyone think of a reason we'd want the GC enabled during the request
shutdown? I can't think of any...
I don't see any reason to to GC once we start the request shutdown dance
either and this is a segfault I am seeing a couple of times per day on
extremely busy production servers (5 or 6 out of hundreds of millions of
requests)
-Rasmus
- Technically, all we need to do is force GC_G(gc_enabled) = 0 in
shutdown. But we could also use zend_alter_ini_entry which has the same
effect. The question comes is there any reason to go through the overhead
of altering the ini entry instead of the global directly? We do access the
global directly in other areas (but it's typically only set via ini)...
If possible try to prevent users from shooting their feet by reenabling via ini_set()
or gc_collect_cycles()
or such.
Additionally, considering that this does solve a segfault, is it worth
nominating this for 5.3? Or is it too risky (or something else I'm
missing)...
See my previous mail to the list ...
johannes
Johannes,
On Wed, Jun 19, 2013 at 2:43 PM, Johannes Schlüter
johannes@schlueters.dewrote:
- Technically, all we need to do is force GC_G(gc_enabled) = 0 in
shutdown. But we could also use zend_alter_ini_entry which has the same
effect. The question comes is there any reason to go through the overhead
of altering the ini entry instead of the global directly? We do access
the
global directly in other areas (but it's typically only set via ini)...If possible try to prevent users from shooting their feet by reenabling
viaini_set()
orgc_collect_cycles()
or such.
Yeah, for that very reason I pushed it after the destructor functions. It
could technically go down a step further below the output buffer as well,
but I don't thing that's too necessary...
Additionally, I have switched it to use the ini behavior setting...
Additionally, considering that this does solve a segfault, is it worth
nominating this for 5.3? Or is it too risky (or something else I'm
missing)...See my previous mail to the list ...
That's why I actually called out 5.3 specifically. I'm not sold either way,
but I can see why some would want to get this into the last release... And
I could see why not...
Thanks,
Anthony
All,
We were discussing a range of bugs today with the garbage collector. For
example: https://bugs.php.net/bug.php?id=64827After quite a bit of digging, it appears what's happening is that the
garbage collector is running during the shutdown of PHP. So the destructors
are fired, and the variables are being destroyed when the GC run happens.
This means that the GC, while walking the variable tree runs into a
partially destructed object (where an entry of the hash table has already
been freed). This causes a segfault, and fun ensues.
Hey:
Sorry, but I don't this this explain is right.
if there is more than one refcount to a zval, then it should never be freed
and if a zval is freed, then it must also be removed from the gc roots.
according to your explain, the gc segfault while walking through
a hashtable of a object.
but that doesn't make any sense, since if it segfault in walking,
then it should also segfault when trying to free the hash table later
while dtor the object.
disable GC in shutdown is okey for me. but that is just try to
cover the bug somewhere in the refcount handler.. not the right fix.
thanks
Under normal conditions (not during shutdown), this does not appear to be
an issue, because the zval is destructed prior to the object destruction.
This means that there should never be a case where the GC hits a partially
freed object during normal execution.From what I can see, there are two possible fixes. The first would be to
change how object destruction works in the first place, to tie the variable
into the destruction process (basically refactor the object delref API to
also accept the current zval). That way the part of the code that makes the
decision to nuke the object can nuke the zval first (and hence prevent this
condition). However, this is a major API change and would effect a lot of
extensions that are using or tieing into this hook.The other option would be to simply disable the GC on shutdown. Considering
all of the variables are going to be thrown away anyway, having the GC run
during shutdown seems a bit wasteful anyway. So if we can kill two birds
with one stone, why not?I've prepared a basic patch:
https://github.com/ircmaxell/php-src/compare/gc_deactivate_on_shutdownI did confirm with odoucet (one of the original reporters) that this does
clear up his issue: https://gist.github.com/odoucet/5796378 (along with
trying a bunch of other things).There are a few out standing questions though:
Technically, all we need to do is force GC_G(gc_enabled) = 0 in
shutdown. But we could also use zend_alter_ini_entry which has the same
effect. The question comes is there any reason to go through the overhead
of altering the ini entry instead of the global directly? We do access the
global directly in other areas (but it's typically only set via ini)...I put it in php_request_shutdown() after deactivate_ticks, but before it
calls shutdown functions. I could see it being moved after the shutdown
function call, but I'm not sure if it's worth it (either way). thoughts?Can anyone think of a reason we'd want the GC enabled during the request
shutdown? I can't think of any...Additionally, considering that this does solve a segfault, is it worth
nominating this for 5.3? Or is it too risky (or something else I'm
missing)...Thanks,
Anthony
--
Laruence Xinchen Hui
http://www.laruence.com/
All,
We were discussing a range of bugs today with the garbage collector. For
example: https://bugs.php.net/bug.php?id=64827After quite a bit of digging, it appears what's happening is that the
garbage collector is running during the shutdown of PHP. So the destructors
are fired, and the variables are being destroyed when the GC run happens.
This means that the GC, while walking the variable tree runs into a
partially destructed object (where an entry of the hash table has already
been freed). This causes a segfault, and fun ensues.
Hey:Sorry, but I don't this this explain is right. if there is more than one refcount to a zval, then it should never be freed and if a zval is freed, then it must also be removed from the gc roots. according to your explain, the gc segfault while walking through
a hashtable of a object.
but that doesn't make any sense, since if it segfault in walking,
then it should also segfault when trying to free the hash table later
while dtor the object.disable GC in shutdown is okey for me. but that is just try to
cover the bug somewhere in the refcount handler.. not the right fix.
thanks
And actually, I prefer not doing this(disable gc in shutdown) for now,
because maybe someday we can get a simple reproduce script(as we all
have see, it is very rare).
then the segfault will be fixed, in the right way.
thanks
Under normal conditions (not during shutdown), this does not appear to be
an issue, because the zval is destructed prior to the object destruction.
This means that there should never be a case where the GC hits a partially
freed object during normal execution.From what I can see, there are two possible fixes. The first would be to
change how object destruction works in the first place, to tie the variable
into the destruction process (basically refactor the object delref API to
also accept the current zval). That way the part of the code that makes the
decision to nuke the object can nuke the zval first (and hence prevent this
condition). However, this is a major API change and would effect a lot of
extensions that are using or tieing into this hook.The other option would be to simply disable the GC on shutdown. Considering
all of the variables are going to be thrown away anyway, having the GC run
during shutdown seems a bit wasteful anyway. So if we can kill two birds
with one stone, why not?I've prepared a basic patch:
https://github.com/ircmaxell/php-src/compare/gc_deactivate_on_shutdownI did confirm with odoucet (one of the original reporters) that this does
clear up his issue: https://gist.github.com/odoucet/5796378 (along with
trying a bunch of other things).There are a few out standing questions though:
Technically, all we need to do is force GC_G(gc_enabled) = 0 in
shutdown. But we could also use zend_alter_ini_entry which has the same
effect. The question comes is there any reason to go through the overhead
of altering the ini entry instead of the global directly? We do access the
global directly in other areas (but it's typically only set via ini)...I put it in php_request_shutdown() after deactivate_ticks, but before it
calls shutdown functions. I could see it being moved after the shutdown
function call, but I'm not sure if it's worth it (either way). thoughts?Can anyone think of a reason we'd want the GC enabled during the request
shutdown? I can't think of any...Additionally, considering that this does solve a segfault, is it worth
nominating this for 5.3? Or is it too risky (or something else I'm
missing)...Thanks,
Anthony
--
Laruence Xinchen Hui
http://www.laruence.com/
--
Laruence Xinchen Hui
http://www.laruence.com/
Sorry, but I don't this this explain is right.
>
> if there is more than one refcount to a zval, then it should never be
> freed
>
> and if a zval is freed, then it must also be removed from the gc roots.
>
The point here is that the GC is run *while* the zval is being freed.
Check out the backtrace here: https://bugs.php.net/bug.php?id=64827 ,
specifically zval pointer 0x272afb8
It appears 4 times recursively being passed into
zend_objects_free_object_storage before the GC is triggered and it
segfaults.
> according to your explain, the gc segfault while walking through
> a hashtable of a object.
>
Yes, that is what's happening here. zval_mark_grey() is trying to walk
through the object's hash table, but the first bucket is already freed, so
when it tries to access it bad things happen.
> but that doesn't make any sense, since if it segfault in walking,
> then it should also segfault when trying to free the hash table later
> while dtor the object.
>
That's the point. The dtor is already happening on that object when the GC
tries to run over it again.
> disable GC in shutdown is okey for me. but that is just try to
> cover the bug somewhere in the refcount handler.. not the right fix.
>
Looking back through, we still have the problem where we can't null out the
zval before destructing the object (like we do with arrays) to prevent
this. That's why I suggested one alternative would be to modify
zend_objects_store_del_ref_by_handle_ex to also accept the zval, so it can
be nulled if the object is going to be freed.
However, we could set the object's bucket to invalid in the delref handler
before we call free_storage. And then modify the GC to check for a valid
bucket... I'll try that patch today to see if it solves the original issue
(as it shouldn't result in an API change either)...
Thanks for the thoughts...
Anthony
Hi!
Yes, that is what's happening here. zval_mark_grey() is trying to walk
through the object's hash table, but the first bucket is already freed, so
when it tries to access it bad things happen.
Why is this specific to shutdown? Hashtables are freed all the time,
what specific shutdown is doing different from all others so that this
bug only happens on shutdown?
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Stas,
Why is this specific to shutdown? Hashtables are freed all the time,
what specific shutdown is doing different from all others so that this
bug only happens on shutdown?
Honestly, I am not sure. Every report that I've seen has it happening at
shutdown. Could very well be a coincidence.
As far as what's happening, figure out more.
Basically, zend_deactivate() (which gets fired long after destructors)
tries to destroy the object store:
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_execute_API.c#293
This iterates through the objects and tries to free them.
Well, during this process, any properties which are still alive are
dtor'ed. However, at a certain point, the garbage collector is fired.
That's when things get funky.
With the latest dump, what appears to be happening is that something
between the zend_deactivate call, and the GC being fired, is overwriting
one of the object zval's with some data. Here's a sample dump of the zval:
(gdb) print (zval_gc_info) *pz
$1 = {z = {value = {lval = 31337624, dval = 1.5482843440690148e-316,
str = {val = 0x1de2c98 "0", len = 20823032}, ht = 0x1de2c98, obj = {
handle = 31337624, handlers = 0x13dbbf8}}, refcount__gc =
4294967295, type = 5 '\005', is_ref__gc = 0 '\000'}, u = {
buffered = 0x2, next = 0x2}}
As you can see, nasty. (Here's the full BT:
https://gist.github.com/odoucet/5796378#comment-848723 )
Well, when the GC hits that zval, it tries to access the object handle, and
throws a segfault (as it's WAY beyond the end of allocated memory).
I have a patch which prevents the segfault:
https://github.com/ircmaxell/php-src/compare/invalidate_object_on_dtor
However, that's not really fixing the situation either, as the zval is
still getting nuked (but only partially).
I am still trying to replicate the issue locally, and if I can, then I can
try to setup watches to check for what's overwriting the zval. But for now,
this is the current progress...
And no, I'm withdrawing the original concept of disabling the GC during
shutdown. The current patch I have works, but it's still just a bandaid on
a gunshot wound, and I'm going to try to figure out what's actually
overwriting the zval..
Anthony
Hi!
Honestly, I am not sure. Every report that I've seen has it happening at
shutdown. Could very well be a coincidence.
Well, if we don't know why or if it's shutdown only, disabling on
shutdown wouldn't do much good.
I have a patch which prevents the
segfault: https://github.com/ircmaxell/php-src/compare/invalidate_object_on_dtorHowever, that's not really fixing the situation either, as the zval is
still getting nuked (but only partially).
If there's a memory overwrite or use-after-free is going on, this patch
is not a complete solution - it relies on the fact that "bad" data will
be always out of range of "good" data. I see no way to ensure that - so
if there's an overwrite that writes garbage inside the object there will
be situations where the garbage looks exactly like a valid object ID and
it will still crash, but it would be significantly harder to reproduce.
So I think before patching it we need to get to the root cause and
figure out why it happens and what causes it, instead of partially
fixing the symptoms.
And no, I'm withdrawing the original concept of disabling the GC during
shutdown. The current patch I have works, but it's still just a bandaid
on a gunshot wound, and I'm going to try to figure out what's actually
overwriting the zval..
That'd be great, if you have any script that reproduces - always or at
least frequently - the problem, please post it.
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
-----Original Message-----
From: Stas Malyshev [mailto:smalyshev@sugarcrm.com]
Sent: Thursday, June 20, 2013 8:20 PM
To: Anthony Ferrara
Cc: Laruence; internals@lists.php.net
Subject: Re: [PHP-DEV] Disabling the GC during shutdownHowever, that's not really fixing the situation either, as the zval is
still getting nuked (but only partially).If there's a memory overwrite or use-after-free is going on, this patch
is not a
complete solution - it relies on the fact that "bad" data will be always
out of
range of "good" data. I see no way to ensure that - so if there's an
overwrite
that writes garbage inside the object there will be situations where the
garbage looks exactly like a valid object ID and it will still crash, but
it would
be significantly harder to reproduce.
So I think before patching it we need to get to the root cause and figure
out
why it happens and what causes it, instead of partially fixing the
symptom
I agree with that. I think it'd be a mistake to submit any patch without
us understanding root cause.
We may cover up a bug which will resurface elsewhere...
Hopefully you can find a way to pin it down.
Thanks for putting this much effort into it!
Andi
Hi,
This bug may be related (and has a reproducing script) :
https://bugs.php.net/bug.php?id=63734
-----Original Message-----
From: Stas Malyshev [mailto:smalyshev@sugarcrm.com]
Sent: Thursday, June 20, 2013 8:20 PM
To: Anthony Ferrara
Cc: Laruence; internals@lists.php.net
Subject: Re: [PHP-DEV] Disabling the GC during shutdownHowever, that's not really fixing the situation either, as the zval is
still getting nuked (but only partially).If there's a memory overwrite or use-after-free is going on, this patch
is not a
complete solution - it relies on the fact that "bad" data will be always
out of
range of "good" data. I see no way to ensure that - so if there's an
overwrite
that writes garbage inside the object there will be situations where the
garbage looks exactly like a valid object ID and it will still crash, but
it would
be significantly harder to reproduce.
So I think before patching it we need to get to the root cause and figure
out
why it happens and what causes it, instead of partially fixing the
symptomI agree with that. I think it'd be a mistake to submit any patch without
us understanding root cause.
We may cover up a bug which will resurface elsewhere...
Hopefully you can find a way to pin it down.Thanks for putting this much effort into it!
Andi
On Sat, Jun 22, 2013 at 12:06 PM, Arnaud Le Blanc arnaud.lb@gmail.comwrote:
Hi,
This bug may be related (and has a reproducing script) :
https://bugs.php.net/bug.php?id=63734
... and is private.
Nikita
Hi,
This bug may be related (and has a reproducing script) :
https://bugs.php.net/bug.php?id=63734
Hey:
I probably has fixed this bug(actually, according to the reporter
this bug is due to Opcache):
according to the latest feedback from the reporter:
https://github.com/zendtech/ZendOptimizerPlus/issues/95#issuecomment-19854473
"Incredible, but it works !
Tested ZendOptimizerPlus latest version (ba6c5f3) : segfault
Applied your patch : fully works"
so, I think this bug is probable like the previous one we already fixed:
https://github.com/php/php-src/commit/e88cdaa0
thanks
-----Original Message-----
From: Stas Malyshev [mailto:smalyshev@sugarcrm.com]
Sent: Thursday, June 20, 2013 8:20 PM
To: Anthony Ferrara
Cc: Laruence; internals@lists.php.net
Subject: Re: [PHP-DEV] Disabling the GC during shutdownHowever, that's not really fixing the situation either, as the zval is
still getting nuked (but only partially).If there's a memory overwrite or use-after-free is going on, this patch
is not a
complete solution - it relies on the fact that "bad" data will be always
out of
range of "good" data. I see no way to ensure that - so if there's an
overwrite
that writes garbage inside the object there will be situations where the
garbage looks exactly like a valid object ID and it will still crash, but
it would
be significantly harder to reproduce.
So I think before patching it we need to get to the root cause and figure
out
why it happens and what causes it, instead of partially fixing the
symptomI agree with that. I think it'd be a mistake to submit any patch without
us understanding root cause.
We may cover up a bug which will resurface elsewhere...
Hopefully you can find a way to pin it down.Thanks for putting this much effort into it!
Andi--
--
Laruence Xinchen Hui
http://www.laruence.com/