die('zend_mm_heap corrupted'); // TODO: Better error message

6 years ago by Rowan Collins — view source

unread

Hi all,

We recently had an error in production which was hard to reproduce, hard to
debug, and turned out to be a bug in an extension. One of the oddest
symptoms was that the PHP code seemed to run to completion, but the request
was never logged by Apache.

The only thing logged was the words "zend_mm_heap corrupted", in the Apache
error log. As far as I can see, this ultimately comes from zend_mm_panic
[1], which (on non-Windows non-debug builds) is simply:

fprintf(stderr, "%s\n", message);
exit(1);

Frustratingly, this error:

Doesn't contain a timestamp
Doesn't contain any context of where the error happened (not even a hint
that this happened during shutdown rather than execution)
Doesn't mention PHP, or look like the messages PHP would normally log
Doesn't log to the PHP error log
Doesn't register as an error in Apache itself

The server in question was running PHP as an Apache module in pre-fork
mode; in that configuration, stderr gets slurped into the Apache error log
without annotation, and the exit(1) silently terminates the connection and
logs nothing.

I suspect this is one of those last-ditch errors where there is a risk of
causing further errors if global data is accessed, but is there any way
this message can be improved, at least to format it more like other PHP
errors?

[1]
https://github.com/php/php-src/blob/1154f6f70a3e4ee083f1bc42d83c190ad68f3862/Zend/zend_alloc.c#L348

Regards,

Rowan Collins
[IMSoP]

6 years ago by Dmitry Stogov — view source

unread

Hi Rowan,

This error message is triggered when Zend MM used in improper way, e.g. invalid pointer free, etc.

MM can't provide useful information about the context and reason.

The best way to find the reason of the problem is running php under valgrind.

$ USE_ZEND_ALLOC=0 valgrind php ...

In most cases valgrind will show incorrect MM operations and their history.

Thanks. Dmitry.

6 years ago by Rowan Collins — view source

unread

Hi Rowan,

This error message is triggered when Zend MM used in improper way, e.g.
invalid pointer free, etc.

MM can't provide useful information about the context and reason.

I get that it can't provide the reason, but that doesn't mean there can't
be any context:

Will it always occur during shutdown? If so, the message could include
"during shutdown". If not, could the message indicate whether it is or not?
The ZEND_MM_CHECK macro takes a message, but all 15 uses of it pass the
same message, "zend_mm_heap corrupted"; are they really identical? Could
making each message distinct help improve bug reports?
The printf format in zend_mm_panic could add something other than a
newline, e.g. "PHP Engine Error: %s", to make this line clearer in the logs
to people who have never heard of "Zend MM".
Ideally, it would also include a timestamp; without this, it can be very
hard to line up which requests are failing, or even how often it is
happening.

So, rather than the current message:

zend_mm_heap corrupted

It would be great to see:

[09-Oct-2018 09:33:03 Europe/London] PHP Fatal Error: zend_mm_heap
corrupted (invalid page offset) during process shutdown.

Even better would be for this to drop into the PHP error log, rather than
echoing to stderr, but I realise that might be more complicated during a
"panic" scenario like this.

The best way to find the reason of the problem is running php under
valgrind.

$ USE_ZEND_ALLOC=0 valgrind php ...

In most cases valgrind will show incorrect MM operations and their history.

I'm not proficient enough with C to get far with that, so my main concerns
are 1) ensuring these errors are clearly visible in the logs; and 2)
helping provide useful context in a bug report against core or an extension.

Regards,

Rowan Collins
[IMSoP]

6 years ago by Julien Pauli — view source

unread

On Tue, Oct 9, 2018 at 10:56 AM Rowan Collins rowan.collins@gmail.com
wrote:

Hi Rowan,

This error message is triggered when Zend MM used in improper way, e.g.
invalid pointer free, etc.

MM can't provide useful information about the context and reason.

I get that it can't provide the reason, but that doesn't mean there can't
be any context:

Will it always occur during shutdown? If so, the message could include
"during shutdown". If not, could the message indicate whether it is or not?

The ZEND_MM_CHECK macro takes a message, but all 15 uses of it pass the
same message, "zend_mm_heap corrupted"; are they really identical? Could
making each message distinct help improve bug reports?

The printf format in zend_mm_panic could add something other than a
newline, e.g. "PHP Engine Error: %s", to make this line clearer in the logs
to people who have never heard of "Zend MM".

Ideally, it would also include a timestamp; without this, it can be very
hard to line up which requests are failing, or even how often it is
happening.

So, rather than the current message:

zend_mm_heap corrupted

It would be great to see:

[09-Oct-2018 09:33:03 Europe/London] PHP Fatal Error: zend_mm_heap
corrupted (invalid page offset) during process shutdown.

Even better would be for this to drop into the PHP error log, rather than
echoing to stderr, but I realise that might be more complicated during a
"panic" scenario like this.

The best way to find the reason of the problem is running php under
valgrind.

$ USE_ZEND_ALLOC=0 valgrind php ...

In most cases valgrind will show incorrect MM operations and their
history.

I'm not proficient enough with C to get far with that, so my main concerns
are 1) ensuring these errors are clearly visible in the logs; and 2)
helping provide useful context in a bug report against core or an
extension.

I think we can improve a little bit, but seeing this message , basically
means that all the memory image (heap) of the current PHP process is
corrupted.

That means that we cannot access any memory area safely, so that means : we
can't call for a function in PHP (like the traditionnal log function, that
would lead to a nice formatted error message).
The only thing we can do, is exit the process. Any attempts to continue
running it, will crash it for sure as crucial memory areas have been
corrupted.

We are in the last low level code area before exiting here, when MM is
paniced , its like the Earth is beeing destroyed : you have no other choice
of waiting your own death, or shorten it by killing yourself.
What we could do , is create a little safe memory zone at PHP startup,
wherein we could live when ZendMM is paniced, and we could try to do better
things than just printf() to stderr and exit.

This ZendMM Panic message is the worst situation PHP could live in. You
should immediately turn off external extensions leading to such a message.
By activating them one by one, you'll find the guilty one I'm sure ^^

After, if you can get a reproducer, then GDB will tell you any little piece
of thing you'd like to know about when crashing ;-)

If you dont want Apache to dumbly give up with the current request, you
should separate your processes by running php-fpm.
This way, you'll get an answer from Apache, like a 503 or so , and probably
some logs in php-fpm logs.

Julien.Pauli

6 years ago by bjorn.x.larsson@telia.com — view source

unread

Den 2018-10-11 kl. 23:25, skrev Julien Pauli:

On Tue, Oct 9, 2018 at 10:56 AM Rowan Collins rowan.collins@gmail.com
wrote:

Hi Rowan,

This error message is triggered when Zend MM used in improper way, e.g.
invalid pointer free, etc.

MM can't provide useful information about the context and reason.

I get that it can't provide the reason, but that doesn't mean there can't
be any context:

Will it always occur during shutdown? If so, the message could include
"during shutdown". If not, could the message indicate whether it is or not?

The ZEND_MM_CHECK macro takes a message, but all 15 uses of it pass the
same message, "zend_mm_heap corrupted"; are they really identical? Could
making each message distinct help improve bug reports?

The printf format in zend_mm_panic could add something other than a
newline, e.g. "PHP Engine Error: %s", to make this line clearer in the logs
to people who have never heard of "Zend MM".

Ideally, it would also include a timestamp; without this, it can be very
hard to line up which requests are failing, or even how often it is
happening.

So, rather than the current message:

zend_mm_heap corrupted

It would be great to see:

[09-Oct-2018 09:33:03 Europe/London] PHP Fatal Error: zend_mm_heap
corrupted (invalid page offset) during process shutdown.

Even better would be for this to drop into the PHP error log, rather than
echoing to stderr, but I realise that might be more complicated during a
"panic" scenario like this.

The best way to find the reason of the problem is running php under
valgrind.

$ USE_ZEND_ALLOC=0 valgrind php ...

In most cases valgrind will show incorrect MM operations and their
history.

I'm not proficient enough with C to get far with that, so my main concerns
are 1) ensuring these errors are clearly visible in the logs; and 2)
helping provide useful context in a bug report against core or an
extension.

I think we can improve a little bit, but seeing this message , basically
means that all the memory image (heap) of the current PHP process is
corrupted.

That means that we cannot access any memory area safely, so that means : we
can't call for a function in PHP (like the traditionnal log function, that
would lead to a nice formatted error message).
The only thing we can do, is exit the process. Any attempts to continue
running it, will crash it for sure as crucial memory areas have been
corrupted.

We are in the last low level code area before exiting here, when MM is
paniced , its like the Earth is beeing destroyed : you have no other choice
of waiting your own death, or shorten it by killing yourself.
What we could do , is create a little safe memory zone at PHP startup,
wherein we could live when ZendMM is paniced, and we could try to do better
things than just printf() to stderr and exit.

This ZendMM Panic message is the worst situation PHP could live in. You
should immediately turn off external extensions leading to such a message.
By activating them one by one, you'll find the guilty one I'm sure ^^

After, if you can get a reproducer, then GDB will tell you any little piece
of thing you'd like to know about when crashing ;-)

If you dont want Apache to dumbly give up with the current request, you
should separate your processes by running php-fpm.
This way, you'll get an answer from Apache, like a 503 or so , and probably
some logs in php-fpm logs.

Julien.Pauli

Would it then be possible to improve printf output a little by
including time & date and stating it's a PHP engine error?

r//Björn L