Hi,
I did come up with a problem in my server crashing with SIGBUS.
After long testing/tracing found:
https://bugs.php.net/bug.php?id=52752
Which seems to be totally the same! But for different enviroment.
Unfortunately that bug report seems to have only some workaround to
disable mmap(). I have Centos 6 / x86_64, PHP 5.3.3.
As it's OLD and on "Feedback" state and this is LONG. So i decided to
send this email and also hope that some developer will check that out.
It's not just 32bit, CentoOS5 and old PHP.
test3.php might need couple of runs to get bus error, sometimes
it just runs and most times it crashes and fast. Also the suggested
workaround to disable MMAP seems to work. But what's the performance
loss in that? And disabling it would need recompile and dunno about
getting RedHat to change their PHP to disable MMAP because of this.
Real fix/patch would be nice and really appreciated! :)
Will comment shorter one to that bug, but for the stuff that
interests the developers:
So how's the latest 5.3.19:
Core was generated by `sapi/cli/php test3.php'.
Program terminated with signal 7, Bus error.
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1709
1709 switch (*YYCURSOR++) {
(gdb) list
1704 }
1705
1706
1707 <ST_IN_SCRIPTING>"#"|"//" {
1708 while (YYCURSOR < YYLIMIT) {
1709 switch (*YYCURSOR++) {
1710 case '\r':
1711 if (*YYCURSOR == '\n') {
1712 YYCURSOR++;
1713 }
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1709
#1 0x0000000000636640 in zendlex (zendlval=0x7fff2476cb90)
at /root/php-5.3.19/Zend/zend_compile.c:4975
#2 0x0000000000620e66 in zendparse ()
at /root/php-5.3.19/Zend/zend_language_parser.c:3285
#3 0x000000000062bb52 in compile_file (file_handle=0x7fff2476ce80,
type=<value optimized out>) at Zend/zend_language_scanner.l:364
#4 0x00000000005362d1 in phar_compile_file (file_handle=0x7fff2476ce80,
type=2) at /root/php-5.3.19/ext/phar/phar.c:3394
#5 0x000000000062b3de in compile_filename (type=2, filename=0x185ac58)
at Zend/zend_language_scanner.l:407
#6 0x000000000067c63e in ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (
execute_data=0x7fe9b5916050)
at /root/php-5.3.19/Zend/zend_vm_execute.h:1967
#7 0x0000000000675a30 in execute (op_array=0x184f358)
at /root/php-5.3.19/Zend/zend_vm_execute.h:107
#8 0x000000000064f86f in zend_execute_scripts (type=8, retval=0x0,
file_count=3) at /root/php-5.3.19/Zend/zend.c:1259
#9 0x00000000005fcd67 in php_execute_script (primary_file=0x7fff24770780)
at /root/php-5.3.19/main/main.c:2316
#10 0x00000000006da002 in main (argc=2, argv=0x7fff24770a18)
at /root/php-5.3.19/sapi/cli/php_cli.c:1189
So it's still there and no need to blame my PHP 5.3.3.
So how about PHP 5.4.9?
Core was generated by `sapi/cli/php test3.php'.
Program terminated with signal 7, Bus error.
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1904
1904 switch (*YYCURSOR++) {
(gdb) list
1899 }
1900
1901
1902 <ST_IN_SCRIPTING>"#"|"//" {
1903 while (YYCURSOR < YYLIMIT) {
1904 switch (*YYCURSOR++) {
1905 case '\r':
1906 if (*YYCURSOR == '\n') {
1907 YYCURSOR++;
1908 }
(gdb) bt
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1904
#1 0x000000000063fd90 in zendlex (zendlval=0x7fff4739ebf0)
at /root/php-5.4.9/Zend/zend_compile.c:6707
#2 0x0000000000628ba4 in zendparse ()
at /root/php-5.4.9/Zend/zend_language_parser.c:3430
#3 0x0000000000634d4d in compile_file (file_handle=0x7fff4739ef40,
type=<value optimized out>) at Zend/zend_language_scanner.l:582
#4 0x0000000000539ae1 in phar_compile_file (file_handle=0x7fff4739ef40,
type=2) at /root/php-5.4.9/ext/phar/phar.c:3388
#5 0x00000000006344ae in compile_filename (type=2, filename=0x7f66ed826d20)
at Zend/zend_language_scanner.l:625
#6 0x00000000006acb6b in ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (
execute_data=0x7f66ed7ea060) at
/root/php-5.4.9/Zend/zend_vm_execute.h:2608
#7 0x00000000006c98a0 in execute (op_array=0x7f66ed81f938)
at /root/php-5.4.9/Zend/zend_vm_execute.h:410
#8 0x00000000006608cd in zend_execute_scripts (type=8, retval=0x0,
file_count=3) at /root/php-5.4.9/Zend/zend.c:1309
#9 0x0000000000603e27 in php_execute_script (primary_file=0x7fff473a2680)
at /root/php-5.4.9/main/main.c:2482
#10 0x000000000070aeac in do_cli (argc=2, argv=0x7fff473a2a88)
at /root/php-5.4.9/sapi/cli/php_cli.c:988
#11 0x000000000070b608 in main (argc=2, argv=0x7fff473a2a88)
at /root/php-5.4.9/sapi/cli/php_cli.c:1364
Still there.. so how about trunk?
Core was generated by `sapi/cli/php test3.php'.
Program terminated with signal 7, Bus error.
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1917
1917 switch (*YYCURSOR++) {
(gdb) list
1912 }
1913
1914
1915 <ST_IN_SCRIPTING>"#"|"//" {
1916 while (YYCURSOR < YYLIMIT) {
1917 switch (*YYCURSOR++) {
1918 case '\r':
1919 if (*YYCURSOR == '\n') {
1920 YYCURSOR++;
1921 }
(gdb) bt
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1917
#1 0x0000000000641c30 in zendlex (zendlval=0x7fff34ca46c0)
at /root/php-trunk-201212191230/Zend/zend_compile.c:6881
#2 0x000000000062a713 in zendparse ()
at /root/php-trunk-201212191230/Zend/zend_language_parser.c:3428
#3 0x0000000000636d75 in compile_file (file_handle=0x7fff34ca4a30,
type=<value optimized out>) at Zend/zend_language_scanner.l:585
#4 0x000000000053a921 in phar_compile_file (file_handle=0x7fff34ca4a30,
type=2) at /root/php-trunk-201212191230/ext/phar/phar.c:3388
#5 0x000000000063641e in compile_filename (type=2, filename=0x7f6444584978)
at Zend/zend_language_scanner.l:628
#6 0x00000000006d48eb in ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (
execute_data=0x7f64445481e0)
at /root/php-trunk-201212191230/Zend/zend_vm_execute.h:2695
#7 0x00000000006d4b40 in execute_ex (execute_data=0x7f64445481e0)
at /root/php-trunk-201212191230/Zend/zend_vm_execute.h:356
#8 0x00000000006634d9 in zend_execute_scripts (type=8, retval=0x0,
file_count=3) at /root/php-trunk-201212191230/Zend/zend.c:1309
#9 0x0000000000605ed9 in php_execute_script (primary_file=0x7fff34ca8180)
at /root/php-trunk-201212191230/main/main.c:2468
#10 0x0000000000710d7c in do_cli (argc=2, argv=0x7fff34ca8588)
at /root/php-trunk-201212191230/sapi/cli/php_cli.c:988
#11 0x00000000007114d8 in main (argc=2, argv=0x7fff34ca8588)
at /root/php-trunk-201212191230/sapi/cli/php_cli.c:1364
Hi!
I did come up with a problem in my server crashing with SIGBUS.
After long testing/tracing found:
Just tried to reproduce it on Centos 6.2 install (without APC), works
just fine for me. I suspect it's some APC issue, does it reproduce for
you without APC loaded?
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
I did come up with a problem in my server crashing with SIGBUS.
After long testing/tracing found:Just tried to reproduce it on Centos 6.2 install (without APC), works
just fine for me. I suspect it's some APC issue, does it reproduce for
you without APC loaded?
It doesn't seem APC-related since it is happening on the first pass. It
is unusual for a non-cached script to cause problems in APC, but it is
of course still possible. A test without APC would be helpful to narrow
it down.
-Rasmus
https://bugs.php.net/bug.php?id=52752
Just tried to reproduce it on Centos 6.2 install (without APC), works
just fine for me. I suspect it's some APC issue, does it reproduce for
you without APC loaded?
Yes, as I mentioned in previous message and in the comments of that bug.
I'll hilight the relevant backtrace in this message.
Did you tried to run it couple of times? In my tests in one machine
it seemed sometimes run without any problems but then ctrl-c and running
again got the sigbus.
Copy&Paste from bugs.php.net:
5.3.19:
Core was generated by `sapi/cli/php test3.php'.
Program terminated with signal 7, Bus error.
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1709
1709 switch (*YYCURSOR++) {
(gdb) list
1704 }
1705
1706
1707 <ST_IN_SCRIPTING>"#"|"//" {
1708 while (YYCURSOR < YYLIMIT) {
1709 switch (*YYCURSOR++) {
1710 case '\r':
1711 if (*YYCURSOR == '\n') {
1712 YYCURSOR++;
1713 }
#0 lex_scan (zendlval=<value optimized out>)
at Zend/zend_language_scanner.l:1709
#1 0x0000000000636640 in zendlex (zendlval=0x7fff2476cb90)
at /root/php-5.3.19/Zend/zend_compile.c:4975
#2 0x0000000000620e66 in zendparse ()
at /root/php-5.3.19/Zend/zend_language_parser.c:3285
#3 0x000000000062bb52 in compile_file (file_handle=0x7fff2476ce80,
type=<value optimized out>) at Zend/zend_language_scanner.l:364
#4 0x00000000005362d1 in phar_compile_file (file_handle=0x7fff2476ce80,
type=2) at /root/php-5.3.19/ext/phar/phar.c:3394
#5 0x000000000062b3de in compile_filename (type=2, filename=0x185ac58)
at Zend/zend_language_scanner.l:407
#6 0x000000000067c63e in ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER (
execute_data=0x7fe9b5916050)
at /root/php-5.3.19/Zend/zend_vm_execute.h:1967
#7 0x0000000000675a30 in execute (op_array=0x184f358)
at /root/php-5.3.19/Zend/zend_vm_execute.h:107
#8 0x000000000064f86f in zend_execute_scripts (type=8, retval=0x0,
file_count=3) at /root/php-5.3.19/Zend/zend.c:1259
#9 0x00000000005fcd67 in php_execute_script (primary_file=0x7fff24770780)
at /root/php-5.3.19/main/main.c:2316
#10 0x00000000006da002 in main (argc=2, argv=0x7fff24770a18)
at /root/php-5.3.19/sapi/cli/php_cli.c:1189
Test file was also in that bugs comments, which i modified a little
to get it to use version that i did compile.
cat test3.php
<?php
if ($argv[1] > 0) {
while ($argv[1]--) file_put_contents('test.tpl', "<?php
#".str_repeat('A', mt_rand(4000, 5000))." ?>\n", LOCK_EX);
} else {
$p2 = popen("sapi/cli/php test3.php 100", "r");
while (1) include 'test.tpl';
}
?
Hi,
If someone wants access to machine where you can reproduce this
send me email. It's not just one machine where I can reproduce
this, so can't blame hardware.
Reproduced this there in CentOS 6.3 default php, php-5.3.19, php-5.4.9
and php-trunk-201212200830.
What I noticed that with trunk I got different places in .l for
the crash so it seems to me that something is really wrong with
mmap functionality and it gets somehow broken files?
This is the most often place:
1917 switch (*YYCURSOR++) {
But I've also got these:
2267 switch (yych) {
1087 if (yych != '<') goto yy4;
I see two possibilities here, php uses mmap somehow that there's
a possibility of that happening or there's a data corruption
bug in mmap itself.
Lets tackle the latter and lets get http://ltp.sourceforge.net/:
$ ./mmap-corruption01 -h1 -m1 -s1
mmap-corruption will run for=> 3661, seconds
mmap-corruption PASSED
Well, no corruption according to that test, so it leaves me
the first option.
This is the script "runme.sh" I used to reproduce crashes.
It usually only needs 1-4 iterations to get the crash:
#!/bin/sh
Touch so first run aren't full of include() errors
touch test.tpl
Remove old core
/bin/rm core.* 1>/dev/null 2>/dev/null
Do crash
NOCRASH=1
while [ $NOCRASH -eq 1 ];do
sapi/cli/php -n test3.php &
PHPPID=$!
echo "Running php with pid $PHPPID..."
sleep 5
ps $PHPPID >/dev/null && NOCRASH=1 || NOCRASH=0
if [ $NOCRASH -eq 1 ];then
echo "Killing php...";
kill -9 $PHPPID;
fi;
done
Crash done
ls -la core.*
Debug
gdb sapi/cli/php core.*
Where test3.php is almost the same, but added -n, so no apc or other
extension is to blame:
<?php
if ($argv[1] > 0) {
while ($argv[1]--) file_put_contents('test.tpl', "<?php
#".str_repeat('A', mt_rand(4000, 5000))." ?>\n", LOCK_EX);
} else {
$p2 = popen("sapi/cli/php -n test3.php 100", "r");
while (1) include 'test.tpl';
}
?>
With this patch, which disables mmap I've not been able to reproduce it
after it has been running for 10 minutes:
--- php-trunk-201212200830/main/main.c 2012-12-05 11:59:32.000000000 +0200
+++ php-trunk-patched-201212200830/main/main.c 2012-12-20
12:18:10.491302651 +0200
@@ -1324,7 +1324,7 @@
/* can we mmap immeadiately? */
memset(&handle->handle.stream.mmap, 0,
sizeof(handle->handle.stream.mmap));
len = php_zend_stream_fsizer(stream TSRMLS_CC);
-
if (len != 0
-
#if HAVE_MMAP || defined(PHP_WIN32)if (0 && len != 0
&& ((len - 1) % page_size) <= page_size - ZEND_MMAP_AHEAD
#endif
Without that patch the crash comes in under half minute.
And now I'm out of ideas of what I could do more to help
get this issue fixed with my knowledge and skills.
https://bugs.php.net/bug.php?id=52752
Just tried to reproduce it on Centos 6.2 install (without APC), works
just fine for me. I suspect it's some APC issue, does it reproduce for
you without APC loaded?
Hi!
<?php
if ($argv[1] > 0) {
while ($argv[1]--) file_put_contents('test.tpl', "<?php
#".str_repeat('A', mt_rand(4000, 5000))." ?>\n", LOCK_EX);
} else {
$p2 = popen("sapi/cli/php -n test3.php 100", "r");
while (1) include 'test.tpl';
}
?>
Yes, I can now reproduce this on my machine too. Not sure what I did
wrong last time, but now I get bus error. I suspect there's some race
condition between mmap and rewriting the file that creates the problem.
The error seems to happen at offset exactly 0x1000 from the start of the
map, which leads me to thinking that maybe the problem is that the page
needs to be loaded, but since the file is not there, being overwritten,
it can not be loaded anymore.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
On Thursday 20 December 2012 10:40:32 Stas Malyshev wrote:
Hi!
<?php
if ($argv[1] > 0) {while ($argv[1]--) file_put_contents('test.tpl', "<?php
#".str_repeat('A', mt_rand(4000, 5000))." ?>\n", LOCK_EX);
} else {$p2 = popen("sapi/cli/php -n test3.php 100", "r");
while (1) include 'test.tpl';}
?>Yes, I can now reproduce this on my machine too. Not sure what I did
wrong last time, but now I get bus error. I suspect there's some race
condition between mmap and rewriting the file that creates the problem.
The error seems to happen at offset exactly 0x1000 from the start of the
map, which leads me to thinking that maybe the problem is that the page
needs to be loaded, but since the file is not there, being overwritten,
it can not be loaded anymore.
Is include supposed to take a LOCK_EX
somehow? I can neither see that in php-
src (5.4.9) nor APC-trunk, doing a cursory grepping.
APC takes a LOCK_EX
in exactly one place, apc_bin_dumpfile(), which does not
look to me like it's related to "include". The usage there is fishy anyway,
not exactly sure, but I think the open that happens before taking LOCK_EX,
will have truncated the file already, leading to the same type of problem as
discussed above wrt concurrent readers.
The only chance to get the above piece of code to work reliably in theory,
would be for the "include" to take a LOCK_EX
before looking at anything
related to the file structure, including the stat call that determines how
much to mmap.
The prudent approach, which should avoid the problem altogether and not need
any LOCK_EX, dictates to ALWAYS write a temporary file (new inode), then
rename it when the write completely succeeds. Otherwise any reader, like
"include", runs the chance of seeing a partially written file, and even
without include using mmap internally, syntax errors would happen from time to
time.
So, my conclusion would be that it is the code snippet above, and not any part
of PHP or the kernel, that is at fault.
best regards
Patrick
Hi!
Is include supposed to take a
LOCK_EX
somehow? I can neither see that in php-
src (5.4.9) nor APC-trunk, doing a cursory grepping.
I'm not sure how any lock would help, since locks are optional, meaning
you still can do the same thing without the locks.
The prudent approach, which should avoid the problem altogether and not need
any LOCK_EX, dictates to ALWAYS write a temporary file (new inode), then
rename it when the write completely succeeds. Otherwise any reader, like
"include", runs the chance of seeing a partially written file, and even
without include using mmap internally, syntax errors would happen from time to
time.
This looks like a very good advice, regardless of bus errors.
So, my conclusion would be that it is the code snippet above, and not any part
of PHP or the kernel, that is at fault.
We could probably add an option to skip mmaps, but since as you pointed
out that doesn't really fix the issue completely, better idea indeed is
to fix the code.
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
On Thursday 20 December 2012 23:23:43 Stas Malyshev wrote:
Hi!
Is include supposed to take a
LOCK_EX
somehow? I can neither see that in
php- src (5.4.9) nor APC-trunk, doing a cursory grepping.I'm not sure how any lock would help, since locks are optional, meaning
you still can do the same thing without the locks.
I'm well aware of the advisory nature of file locks. include taking a LOCK_EX
could have helped - when any .tpl file modifying code also does LOCK_EX.
I do NOT want to propose include doing that!
It's just that the original poster or whoever created that piece of code with
LOCK_EX
in the file_put_contents()
, apparently thought it might work out that
way.
tmpfiles and rename are definitely the correct way to handle that.
best regards
Patrick
Hi,
APC takes a
LOCK_EX
in exactly one place, apc_bin_dumpfile(), which does not
look to me like it's related to "include". The usage there is fishy anyway,
not exactly sure, but I think the open that happens before taking LOCK_EX,
will have truncated the file already, leading to the same type of problem as
discussed above wrt concurrent readers.
Usage is fishy, but it's small and can reproduce the error. Don't really
know what code hits that in my production environment or
is it even the same problem but it seems to have similar backtrace.
So, my conclusion would be that it is the code snippet above, and not any part
of PHP or the kernel, that is at fault.
Oh? Did I understand you correctly? If you can code PHP that crashes
PHP, it's that codes fault not PHP's fault? I've always thought PHP
to be high level programming language where PHP handles things for
you and you can't code anything that crashes it like that with
bus error?
I think that it should at least gracefully exit, log error, what caused
what and where. Better option would be that it just works.
And if you don't see anything to be done in PHP, I think then at
least documentation of include/require should have big warning
explaining that PHP doesn't guarantee include/require not to
crash PHP itself.
Hi!
Oh? Did I understand you correctly? If you can code PHP that crashes
PHP, it's that codes fault not PHP's fault? I've always thought PHP
to be high level programming language where PHP handles things for
you and you can't code anything that crashes it like that with
bus error?
There are a number of ways that you could lead to a crash in PHP. Say,
some infinite loops can end up in crashes. Calling some functions with
specific parameters on some systems could end up in crashes. Some
libraries in some versions can lead to crashes. Etc, etc. We live in
imperfect world, and that includes software which necessarily relies on
other software. Making it perfectly 100% crash proof would be impractical.
If you have any proposal on how to solve this particular problem, you
are welcome to propose a patch. Otherwise, much more practical solution
would be to fix that code.
I think that it should at least gracefully exit, log error, what caused
what and where. Better option would be that it just works.
We can't really gracefully exit when OS produces bus error on missing
part of the file, because you changed it non-atomically. The only way to
avoid it would be to not use mmap, which would be a performance hit and
also not very helpful as you'd just get a mangled file instead.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
On Friday 21 December 2012 10:41:59 Jani Ollikainen wrote:
So, my conclusion would be that it is the code snippet above, and not any
part of PHP or the kernel, that is at fault.Oh? Did I understand you correctly? If you can code PHP that crashes
PHP, it's that codes fault not PHP's fault? I've always thought PHP
to be high level programming language where PHP handles things for
you and you can't code anything that crashes it like that with
bus error?
I understand your sentiment.
To follow up the sentiment in code, there would be two options:
-
capture
SIGBUS
(andSIGSEGV
probably, depending on platform), and in the
signal handler, to lots of funny dances to guess from the stack what kind of
error message would be helpful to the end user or end developer. -
remove any attempt to use mmap to speed up reading, generally, or as an ini
option.
I would personally expect mmap-reading not to make a huge difference in
performance anyway, as the short segment double buffering that it avoids, will
probably be swamped by any kind of parsing of the content anyway. But probably
when people went to the lengths of implementing mmap based reading in the PHP
core, they had some good reasons to do so...
I think that it should at least gracefully exit, log error, what caused
what and where. Better option would be that it just works.
It CANNOT just work. It wouldn't work reliably without any use of mmap,
either. Concurrently modifying and reading one and the same file, will always
run into consistency problems, and doing that is CERTAINLY the fault of the
code that does it.
best regards
Patrick
Hi,
I understand your sentiment.
And I try to understand your technical point of view. I've thought
that PHP handles concurrent stuff so that those kind of things
won't happen and the user won't need to have for example mutexes
to do stuff.
But I think I've been wrong and for performance sake there isn't
stuff like that.
I understood you correctly using temp file and then rename should fix
that? Like this?
<?php
if ($argv[1] > 0) {
while ($argv[1]--)
{
file_put_contents('test.tpl.tmp', "<?php #".str_repeat('A',
mt_rand(4000, 5000))." ?>\n", LOCK_EX);
rename('test.tpl.tmp','test.tpl');
}
} else {
$p2 = popen("php test3.php 100", "r");
while (1) include 'test.tpl';
}
?>
Tested that quickly and cannot get it to do bus error.
Or how should you then do that stupid example so that it won't
crash? It might maybe help me, to find something in my production
code to fix and get rid of the problem.
But I think that include/require should have warning box saying
something about including/requiring files that you might overwrite
in other instance as I can't be only one expecting PHP to handle
these kind of situations.
I understood you correctly using temp file and then rename should fix
that? Like this?file_put_contents('test.tpl.tmp', "<?php #".str_repeat('A',
mt_rand(4000, 5000))." ?>\n", LOCK_EX);
rename('test.tpl.tmp','test.tpl');
Exactly!
You could also do it like this:
$tmpname = 'test.tpl.tmp.'.posix_getpid();
file_put_contents($tmpname, '....');
rename($tmpname, 'test.tpl');
That way - adding the process ID to the temporary filename - does not run into
the danger of two processes changing the file at the same time. The LOCK_EX
should handle that case, too, but I usually go for the "appended PID" approach
and don't worry about file locking at all.
best regards
Patrick