We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:
http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.
The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:
http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.
As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.
Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:
PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gz
Each of these is 1000 requests against the top.php and 4top.php scripts.
from bm.tar.gz. If you start at the
The script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}
top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}
EOB;
?>
and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' => array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>
4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.
I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.
If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.
-Rasmus
What are the results you're getting on an empty script? I'm just
curious whether it's execution speed or startup speed where you are
seeing the big hit. There were changes in both which might have
slowed things down. Another reason to be more careful re: bloat :)
Anid
At 08:34 PM 3/12/2006, Rasmus Lerdorf wrote:
We have a bit of a performance disconnect between 4.4 and 5.1
still. I was doing some benchmarking today just as a sanity check
on some APC work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind
and browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php
scripts. from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';
function top_func($arg) { $b = $arg.$arg; echo $b; } class top_class
{ private $prop; function __construct($arg) { $this->prop = $arg; }
function getProp() { return $this->prop; } function setProp($arg) {
$this->prop = strtolower($arg); } } top_func('foo'); $a = new
top_class('bar'); echo $a->getProp(); $a->setProp("AbCdEfG"); echo
$a->getProp(); echo <<
and config.inc is:<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' => array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.-Rasmus
With an empty.php file 0 bytes long I get:
PHP 5.1.3-dev (no opcode cache, variables_order=GP) 1168-1225 req/sec
over 5 runs of 10000 requests each.
PHP 4.4 same config 1897-1951 req/sec
Just to make sure, since in this case an extra header would make a big
difference, the raw headers that came back on both:
HTTP/1.1 200 OK
Date: Mon, 13 Mar 2006 05:31:51 GMT
Server: Apache/1.3.33 (Debian GNU/Linux) PHP/4.4.3-dev
X-Powered-By: PHP/4.4.3-dev
Connection: close
Content-Type: text/html; charset=iso-8859-1
HTTP/1.1 200 OK
Date: Mon, 13 Mar 2006 05:35:49 GMT
Server: Apache/1.3.33 (Debian GNU/Linux) PHP/5.1.3RC2-dev
X-Powered-By: PHP/5.1.3RC2-dev
Connection: close
Content-Type: text/html; charset=iso-8859-1
So yes, there are a few characters more, but I checked 5.1.2 as well
which actually has less, and it is about the same speed.
And here are some pretty pictures from kcachegrind since I realize many
don't have the ability to run it. These show an overview of 1000
requests for a 0 byte php file. I start at the whole request and zoom
in on request_startup and then hash_environment on each:
http://www.php.net/~rasmus/bm/php51_empty.png
http://www.php.net/~rasmus/bm/php51_empty_startup.png
http://www.php.net/~rasmus/bm/php51_empty_hash_env.png
http://www.php.net/~rasmus/bm/php44_empty.png
http://www.php.net/~rasmus/bm/php44_empty_startup.png
http://www.php.net/~rasmus/bm/php44_empty_hash_env.png
You will notice things missing from each. Basically kcachegrind is only
showing things that took a significant portion of the execution time, so
for PHP44 you will see much more of the Apache stuff showing up since
the PHP parts were so fast. Here you can see the crazy qsort called
from ap_add_common_vars() if you look at the right side of the
php44_empty.png picture.
The corresponding callgrind raw files are:
http://www.php.net/~rasmus/bm/callgrind.out.2464 PHP 5.1
http://www.php.net/~rasmus/bm/callgrind.out.2548 PHP 4.4
You can load these up in kcachegrind and zoom in on other stuff.
-Rasmus
Andi Gutmans wrote:
What are the results you're getting on an empty script? I'm just curious
whether it's execution speed or startup speed where you are seeing the
big hit. There were changes in both which might have slowed things down.
Another reason to be more careful re: bloat :)Anid
At 08:34 PM 3/12/2006, Rasmus Lerdorf wrote:
We have a bit of a performance disconnect between 4.4 and 5.1 still.
I was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php
scripts. from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';
function top_func($arg) { $b = $arg.$arg; echo $b; } class top_class {
private $prop; function __construct($arg) { $this->prop = $arg; }
function getProp() { return $this->prop; } function setProp($arg) {
$this->prop = strtolower($arg); } } top_func('foo'); $a = new
top_class('bar'); echo $a->getProp(); $a->setProp("AbCdEfG"); echo
$a->getProp(); echo <<
and config.inc is:<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.-Rasmus
Hey,
Thanks for posting this info. It definitely sounds like we should
concentrate on the 0 length script at this point. I saw Dmitry
already made some good improvements.
It'd be helpful if others also run such an empty benchmark because it
seems like the two trees are on par now and that it depends more on
your hardware architecture than the PHP versions.... But if we can
get some more results that would give us a clue. In any case, we can
try and optimize further especially the startup/shutdown routines...
Andi
At 10:15 PM 3/12/2006, Rasmus Lerdorf wrote:
With an empty.php file 0 bytes long I get:
PHP 5.1.3-dev (no opcode cache, variables_order=GP) 1168-1225
req/sec over 5 runs of 10000 requests each.PHP 4.4 same config 1897-1951 req/sec
Just to make sure, since in this case an extra header would make a
big difference, the raw headers that came back on both:HTTP/1.1 200 OK
Date: Mon, 13 Mar 2006 05:31:51 GMT
Server: Apache/1.3.33 (Debian GNU/Linux) PHP/4.4.3-dev
X-Powered-By: PHP/4.4.3-dev
Connection: close
Content-Type: text/html; charset=iso-8859-1HTTP/1.1 200 OK
Date: Mon, 13 Mar 2006 05:35:49 GMT
Server: Apache/1.3.33 (Debian GNU/Linux) PHP/5.1.3RC2-dev
X-Powered-By: PHP/5.1.3RC2-dev
Connection: close
Content-Type: text/html; charset=iso-8859-1So yes, there are a few characters more, but I checked 5.1.2 as well
which actually has less, and it is about the same speed.And here are some pretty pictures from kcachegrind since I realize
many don't have the ability to run it. These show an overview of
1000 requests for a 0 byte php file. I start at the whole request
and zoom in on request_startup and then hash_environment on each:http://www.php.net/~rasmus/bm/php51_empty.png
http://www.php.net/~rasmus/bm/php51_empty_startup.png
http://www.php.net/~rasmus/bm/php51_empty_hash_env.png
http://www.php.net/~rasmus/bm/php44_empty.png
http://www.php.net/~rasmus/bm/php44_empty_startup.png
http://www.php.net/~rasmus/bm/php44_empty_hash_env.pngYou will notice things missing from each. Basically kcachegrind is
only showing things that took a significant portion of the execution
time, so for PHP44 you will see much more of the Apache stuff
showing up since the PHP parts were so fast. Here you can see the
crazy qsort called from ap_add_common_vars() if you look at the
right side of the php44_empty.png picture.The corresponding callgrind raw files are:
http://www.php.net/~rasmus/bm/callgrind.out.2464 PHP 5.1
http://www.php.net/~rasmus/bm/callgrind.out.2548 PHP 4.4You can load these up in kcachegrind and zoom in on other stuff.
-Rasmus
Andi Gutmans wrote:
What are the results you're getting on an empty script? I'm just
curious whether it's execution speed or startup speed where you are
seeing the big hit. There were changes in both which might have
slowed things down. Another reason to be more careful re: bloat :)
Anid
At 08:34 PM 3/12/2006, Rasmus Lerdorf wrote:We have a bit of a performance disconnect between 4.4 and 5.1 still.
I was doing some benchmarking today just as a sanity check on some
APC work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes
should really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are
cheap there) with current PHP_4_4 and PHP_5_1 checkouts. Was also
testing 5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these
benchmarks. Even so, the request_startup is significantly more
expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind
and browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php
scripts. from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<?php $base_dir = '/var/www/bm/'; include $base_dir .
'config.inc'; function top_func($arg) { $b = $arg.$arg; echo $b; }
class top_class { private $prop; function __construct($arg) {
$this->prop = $arg; } function getProp() { return $this->prop; }
function setProp($arg) { $this->prop = strtolower($arg); } }
top_func('foo'); $a = new top_class('bar'); echo $a->getProp();
$a->setProp("AbCdEfG"); echo $a->getProp(); echo <<
and config.inc is:<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' => array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.-Rasmus
Andi Gutmans wrote:
Thanks for posting this info. It definitely sounds like we should
concentrate on the 0 length script at this point. I saw Dmitry already
made some good improvements.
Yup, that patch helped. And I guess on some architectures 5.1 is faster
now, but there is still a bit of a gap, at least on this Linux box I am
testing on:
http://www.php.net/~rasmus/bm2.png
The 5.1.3m version there is one where I got rid of the
ap_add_common_vars() and ap_add_cgi_vars() calls in
sapi/apache_mod_php5.c just to see how much it would help. On the empty
case which is all about start/shutdown, it had the expected effect, but
as we do more in the request the effect disappears.
-Rasmus
Hello Rasmus,
not a thing for 5.1 or 4.4 but in 5.2 we could change to a case
insensitive comparison function. That would allow us to change nearly
all of strcasecmp to memcmp. And in may cases it means one less
allocation. And it also means a lot of less code. The casinsensitive
comparision is pretty easy because we can use a 256 byte translation
table that can be provided as a static const table. On a X86 systems
we could also provide that in assembler. That would give us the
possibility to use the XLAT instruction that would otherwise not be
used (maybe newer optimizing compiler know about it though).
Maybe it is worse trying to bring it in and checking how much slower
we are getting with it. If the numbers look promising we could go for
the change then.
A word on HEAD/Unicode. If we use a case insensitivity semantics where
only the normal ascii characters are lowercased we can use the same
table based approach.
Anyway from looking at the output the best optimization was brought up by
you already. Using jit for the ap_* stuff.
best regards
marcus
Monday, March 13, 2006, 5:34:07 AM, you wrote:
We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.
The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.
As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.
Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:
PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gz
Each of these is 1000 requests against the top.php and 4top.php scripts.
from bm.tar.gz. If you start at the
The script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';
function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}
top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}
EOB;
</html>
?>>
and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' => array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>>
4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.
I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.
If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.
-Rasmus
Best regards,
Marcus
Marcus Boerger wrote:
Hello Rasmus,
not a thing for 5.1 or 4.4 but in 5.2 we could change to a case
insensitive comparison function. That would allow us to change nearly
all of strcasecmp to memcmp. And in may cases it means one less
allocation. And it also means a lot of less code. The casinsensitive
comparision is pretty easy because we can use a 256 byte translation
table that can be provided as a static const table. On a X86 systems
we could also provide that in assembler. That would give us the
possibility to use the XLAT instruction that would otherwise not be
used (maybe newer optimizing compiler know about it though).Maybe it is worse trying to bring it in and checking how much slower
we are getting with it. If the numbers look promising we could go for
the change then.A word on HEAD/Unicode. If we use a case insensitivity semantics where
only the normal ascii characters are lowercased we can use the same
table based approach.Anyway from looking at the output the best optimization was brought up by
you already. Using jit for the ap_* stuff.
Well, that doesn't come anywhere near the radar here actually. The
300k+ strcasecmp calls are coming from the qsort in Apache, not from
PHP. Even then, 300k strcasecmp calls is minor here. So I doubt you
could even measure the effect of that change.
-Rasmus
Hi Rasmus,
I made two improvements in 5.1 and run the same bechmarks on Intel Pentium M
1.5GHz 2M cache.
top/top5/top10
php-5.1 740 550 430 req/sec
php-4.4 680 440 290 req/sec
May be the problem is AMD chip? :)
Thanks. Dmitry.
-----Original Message-----
From: Rasmus Lerdorf [mailto:rasmus@lerdorf.com]
Sent: Monday, March 13, 2006 7:34 AM
To: internals
Subject: [PHP-DEV] Calling performance geeksWe have a bit of a performance disconnect between 4.4 and 5.1
still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster
4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and
4top.php scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being
PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like,
for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars()
to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is
extremely expensive
since it does a qsort with a comparison function that uses
strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two
callgrind files
and see what stands out for you. As far as I can tell, while
we can do
some tricks to speed up various helper bits, the slowdown is
coming from
the executor trashing its cache lines.-Rasmus
Hello Dmitry,
if you mean the hash stuff you changed then you did quite some mistakes.
Because the normal apply functions don't respect the ZEND_HASH_* consts as
i mailed last week.
marcus
Monday, March 13, 2006, 1:12:01 PM, you wrote:
Hi Rasmus,
I made two improvements in 5.1 and run the same bechmarks on Intel Pentium M
1.5GHz 2M cache.
top/top5/top10
php-5.1 740 550 430 req/sec
php-4.4 680 440 290 req/sec
May be the problem is AMD chip? :)
Thanks. Dmitry.
-----Original Message-----
From: Rasmus Lerdorf [mailto:rasmus@lerdorf.com]
Sent: Monday, March 13, 2006 7:34 AM
To: internals
Subject: [PHP-DEV] Calling performance geeksWe have a bit of a performance disconnect between 4.4 and 5.1
still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster
4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and
4top.php scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being
PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like,
for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars()
to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is
extremely expensive
since it does a qsort with a comparison function that uses
strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two
callgrind files
and see what stands out for you. As far as I can tell, while
we can do
some tricks to speed up various helper bits, the slowdown is
coming from
the executor trashing its cache lines.-Rasmus
--
Best regards,
Marcus
Hi,
If you're even interested in the tinyest of optimizations, you may wanna
read this. I was just going through the php code. I'm not familiar with it
at all, so I don't know which functions are the bottlenecks, so I can't help
in optimizing the big picture. But I had little else to do right now, so I
figured I'd just browse around through the files to see if I could notice
any local speedups. So really, the things I lay out here are probably
futile, but who knows.
I found that for example the function php_stream_memory_seek() in
main/streams/memory.c contains a whole bunch of return statements. I found
that it can be (you can benchmark this) slightly faster to do this:
int func(int p)
{
int result = 0;
switch (p)
{
case 0: result = 1; break;
case 1: result = -4; break;
case 2: result = 15; break;
}
return result;
}
instead of this:
int func(int p)
{
switch (p)
{
case 0: return 1;
case 1: return -4;
case 2: return 15;
}
return 0;
}
This is correct with 'gcc foo.c' as well as with 'gcc -O2 foo.c'. The
difference is slight, and if it's too tiny, just ignore it this message.
Perhaps some functions that php relies on heavily may benefit from this
though (but I wouldn't know which ones those would be).
Also, I noticed that in php_start_ob_buffer() in main/output.c, and probably
in more functions integers are divided by 2 by doing:
result = intvar / 2;
while it is about 20% faster (even with -O2) to do this:
result = intvar >> 1;
A minor thing I noticed (nothing to speed up here though) is an unused
variable 'i' in insertionsort() in main/mergesort.c (weird that this never
showed up as a compiler warning). Or does the defined TSRMLS_CC depend on
the existance of an integer called 'i'? Pretty unlikely to me.
Why is CONTEXT_TYPE_IMAGE_GIF in main/logos.h defined as "Content-Type:
image/gif" with 2 spaces between "Content-Type" and "image/gif"?
In sapi/apache/mod_php5.c in the function php_apache_log_message(),
Why are these 2 calls:
fprintf(stderr, "%s", message);
fprintf(stderr, "\n");
instead of 1 call:
fprintf(stderr, "%s\n", message);
In sapi/apache/mod_php5.c in the function php_apache_flag_handler_ex(),
the original:
if (!strcasecmp(arg2, "On") || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
is over 5 times slower than:
if (((arg2[0] == 'O' || arg2[0] == 'o') && (arg2[1] == 'n' || arg2[1] ==
'N') && (arg2[2] == '\0')) || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
Like I said, these are extremely tiny things, so please ignore it if it's
too futile :) Nonetheless, if this turns out to be appreciated information,
I'll continue the hunt.
Good luck optimizing,
Ron
"Rasmus Lerdorf" rasmus@lerdorf.com wrote in message
news:4414F63F.5030406@lerdorf.com...
We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.-Rasmus
Hello Ron,
that stuff is only used in edgcases however it is more of a fix than an
optimization. Do you have access and want to do the changes yourself?
regards
marcus
Monday, March 13, 2006, 10:08:30 PM, you wrote:
Hi,
If you're even interested in the tinyest of optimizations, you may wanna
read this. I was just going through the php code. I'm not familiar with it
at all, so I don't know which functions are the bottlenecks, so I can't help
in optimizing the big picture. But I had little else to do right now, so I
figured I'd just browse around through the files to see if I could notice
any local speedups. So really, the things I lay out here are probably
futile, but who knows.
I found that for example the function php_stream_memory_seek() in
main/streams/memory.c contains a whole bunch of return statements. I found
that it can be (you can benchmark this) slightly faster to do this:
int func(int p)
{
int result = 0;
switch (p) { case 0: result = 1; break; case 1: result = -4; break; case 2: result = 15; break; } return result;
}
instead of this:
int func(int p)
{
switch (p)
{
case 0: return 1;
case 1: return -4;
case 2: return 15;
}
return 0;
}
This is correct with 'gcc foo.c' as well as with 'gcc -O2 foo.c'. The
difference is slight, and if it's too tiny, just ignore it this message.
Perhaps some functions that php relies on heavily may benefit from this
though (but I wouldn't know which ones those would be).
Also, I noticed that in php_start_ob_buffer() in main/output.c, and probably
in more functions integers are divided by 2 by doing:
result = intvar / 2;
while it is about 20% faster (even with -O2) to do this:
result = intvar >> 1;
A minor thing I noticed (nothing to speed up here though) is an unused
variable 'i' in insertionsort() in main/mergesort.c (weird that this never
showed up as a compiler warning). Or does the defined TSRMLS_CC depend on
the existance of an integer called 'i'? Pretty unlikely to me.
Why is CONTEXT_TYPE_IMAGE_GIF in main/logos.h defined as "Content-Type:
image/gif" with 2 spaces between "Content-Type" and "image/gif"?
In sapi/apache/mod_php5.c in the function php_apache_log_message(),
Why are these 2 calls:
fprintf(stderr, "%s", message);
fprintf(stderr, "\n");
instead of 1 call:
fprintf(stderr, "%s\n", message);
In sapi/apache/mod_php5.c in the function php_apache_flag_handler_ex(),
the original:
if (!strcasecmp(arg2, "On") || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
is over 5 times slower than:
if (((arg2[0] == 'O' || arg2[0] == 'o') && (arg2[1] == 'n' || arg2[1] ==
'N') && (arg2[2] == '\0')) || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
Like I said, these are extremely tiny things, so please ignore it if it's
too futile :) Nonetheless, if this turns out to be appreciated information,
I'll continue the hunt.
Good luck optimizing,
Ron
"Rasmus Lerdorf" rasmus@lerdorf.com wrote in message
news:4414F63F.5030406@lerdorf.com...We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.-Rasmus
Best regards,
Marcus
Hi,
I don't have access and I don't need to do the changes myself, but if you
prefer it, I will (provided I get access of course).
Ron
"Marcus Boerger" helly@php.net wrote in message
news:1914614308.20060313222029@marcus-boerger.de...
Hello Ron,
that stuff is only used in edgcases however it is more of a fix than an
optimization. Do you have access and want to do the changes yourself?regards
marcusMonday, March 13, 2006, 10:08:30 PM, you wrote:
Hi,
If you're even interested in the tinyest of optimizations, you may wanna
read this. I was just going through the php code. I'm not familiar with
it
at all, so I don't know which functions are the bottlenecks, so I can't
help
in optimizing the big picture. But I had little else to do right now, so
I
figured I'd just browse around through the files to see if I could
notice
any local speedups. So really, the things I lay out here are probably
futile, but who knows.
I found that for example the function php_stream_memory_seek() in
main/streams/memory.c contains a whole bunch of return statements. I
found
that it can be (you can benchmark this) slightly faster to do this:int func(int p)
{
int result = 0;switch (p) { case 0: result = 1; break; case 1: result = -4; break; case 2: result = 15; break; } return result;
}
instead of this:
int func(int p)
{
switch (p)
{
case 0: return 1;
case 1: return -4;
case 2: return 15;
}
return 0;
}This is correct with 'gcc foo.c' as well as with 'gcc -O2 foo.c'. The
difference is slight, and if it's too tiny, just ignore it this message.Perhaps some functions that php relies on heavily may benefit from this
though (but I wouldn't know which ones those would be).
Also, I noticed that in php_start_ob_buffer() in main/output.c, and
probably
in more functions integers are divided by 2 by doing:
result = intvar / 2;
while it is about 20% faster (even with -O2) to do this:
result = intvar >> 1;
A minor thing I noticed (nothing to speed up here though) is an unused
variable 'i' in insertionsort() in main/mergesort.c (weird that this
never
showed up as a compiler warning). Or does the defined TSRMLS_CC depend
on
the existance of an integer called 'i'? Pretty unlikely to me.
Why is CONTEXT_TYPE_IMAGE_GIF in main/logos.h defined as "Content-Type:
image/gif" with 2 spaces between "Content-Type" and "image/gif"?
In sapi/apache/mod_php5.c in the function php_apache_log_message(),
Why are these 2 calls:
fprintf(stderr, "%s", message);
fprintf(stderr, "\n");instead of 1 call:
fprintf(stderr, "%s\n", message);
In sapi/apache/mod_php5.c in the function php_apache_flag_handler_ex(),
the original:
if (!strcasecmp(arg2, "On") || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}is over 5 times slower than:
if (((arg2[0] == 'O' || arg2[0] == 'o') && (arg2[1] == 'n' || arg2[1]
==
'N') && (arg2[2] == '\0')) || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
Like I said, these are extremely tiny things, so please ignore it if
it's
too futile :) Nonetheless, if this turns out to be appreciated
information,
I'll continue the hunt.Good luck optimizing,
Ron
"Rasmus Lerdorf" rasmus@lerdorf.com wrote in message
news:4414F63F.5030406@lerdorf.com...We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still
is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php
scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for
example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely
expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind
files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming
from
the executor trashing its cache lines.-Rasmus
Best regards,
Marcus
This one isn't a good idea. I bet it won't affect overal performance
but makes the code much less maintainable.
The others look OK (just took a quick glance)
At 01:20 PM 3/13/2006, Marcus Boerger wrote:
In sapi/apache/mod_php5.c in the function php_apache_flag_handler_ex(),
the original:
if (!strcasecmp(arg2, "On") || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}is over 5 times slower than:
if (((arg2[0] == 'O' || arg2[0] == 'o') && (arg2[1] == 'n' || arg2[1] ==
'N') && (arg2[2] == '\0')) || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
Hello Ron,
just as a clarification, you looked at unchanged 4.4 code that is fixed
since long in 5.1/HEAD. Please always first look into 5.1/HEAD since 4.4
will only get real fixes but no code beautifying. Also we always start to
modify HEAD first and MFH from there. Doing it the otherway round just
costs unneccessary development time.
Monday, March 13, 2006, 10:08:30 PM, you wrote:
Hi,
If you're even interested in the tinyest of optimizations, you may wanna
read this. I was just going through the php code. I'm not familiar with it
at all, so I don't know which functions are the bottlenecks, so I can't help
in optimizing the big picture. But I had little else to do right now, so I
figured I'd just browse around through the files to see if I could notice
any local speedups. So really, the things I lay out here are probably
futile, but who knows.
I found that for example the function php_stream_memory_seek() in
main/streams/memory.c contains a whole bunch of return statements. I found
that it can be (you can benchmark this) slightly faster to do this:
int func(int p)
{
int result = 0;
switch (p) { case 0: result = 1; break; case 1: result = -4; break; case 2: result = 15; break; } return result;
}
instead of this:
int func(int p)
{
switch (p)
{
case 0: return 1;
case 1: return -4;
case 2: return 15;
}
return 0;
}
This is correct with 'gcc foo.c' as well as with 'gcc -O2 foo.c'. The
difference is slight, and if it's too tiny, just ignore it this message.
Perhaps some functions that php relies on heavily may benefit from this
though (but I wouldn't know which ones those would be).
Also, I noticed that in php_start_ob_buffer() in main/output.c, and probably
in more functions integers are divided by 2 by doing:
result = intvar / 2;
while it is about 20% faster (even with -O2) to do this:
result = intvar >> 1;
A minor thing I noticed (nothing to speed up here though) is an unused
variable 'i' in insertionsort() in main/mergesort.c (weird that this never
showed up as a compiler warning). Or does the defined TSRMLS_CC depend on
the existance of an integer called 'i'? Pretty unlikely to me.
Why is CONTEXT_TYPE_IMAGE_GIF in main/logos.h defined as "Content-Type:
image/gif" with 2 spaces between "Content-Type" and "image/gif"?
In sapi/apache/mod_php5.c in the function php_apache_log_message(),
Why are these 2 calls:
fprintf(stderr, "%s", message);
fprintf(stderr, "\n");
instead of 1 call:
fprintf(stderr, "%s\n", message);
In sapi/apache/mod_php5.c in the function php_apache_flag_handler_ex(),
the original:
if (!strcasecmp(arg2, "On") || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
is over 5 times slower than:
if (((arg2[0] == 'O' || arg2[0] == 'o') && (arg2[1] == 'n' || arg2[1] ==
'N') && (arg2[2] == '\0')) || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
Like I said, these are extremely tiny things, so please ignore it if it's
too futile :) Nonetheless, if this turns out to be appreciated information,
I'll continue the hunt.
Good luck optimizing,
Ron
"Rasmus Lerdorf" rasmus@lerdorf.com wrote in message
news:4414F63F.5030406@lerdorf.com...We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming from
the executor trashing its cache lines.-Rasmus
Best regards,
Marcus
Hi Marcus,
Actually, I was looking at the 5.1.2, not 4.4. And 5.1.2 can't differ that
much from 5.1/HEAD, right? I'm not sure if I'm ready for CVS access, since I
don't know enough of the architecture of the system as a whole. I wouldn't
wanna break anything while trying to make things better. If I wanna
collaborate, what's the protocol of applying for a CVS account and is there
info on quality assurance? Are all CVS changes reviewed? I'd like to know
how this works, but don't quite know where to begin. Can you provide me some
basic info to get me informed and possibly get me started?
Thanks,
Ron
"Marcus Boerger" helly@php.net wrote in message
news:1531526751.20060313233621@marcus-boerger.de...
Hello Ron,
just as a clarification, you looked at unchanged 4.4 code that is fixed
since long in 5.1/HEAD. Please always first look into 5.1/HEAD since 4.4
will only get real fixes but no code beautifying. Also we always start to
modify HEAD first and MFH from there. Doing it the otherway round just
costs unneccessary development time.Monday, March 13, 2006, 10:08:30 PM, you wrote:
Hi,
If you're even interested in the tinyest of optimizations, you may wanna
read this. I was just going through the php code. I'm not familiar with
it
at all, so I don't know which functions are the bottlenecks, so I can't
help
in optimizing the big picture. But I had little else to do right now, so
I
figured I'd just browse around through the files to see if I could
notice
any local speedups. So really, the things I lay out here are probably
futile, but who knows.
I found that for example the function php_stream_memory_seek() in
main/streams/memory.c contains a whole bunch of return statements. I
found
that it can be (you can benchmark this) slightly faster to do this:int func(int p)
{
int result = 0;switch (p) { case 0: result = 1; break; case 1: result = -4; break; case 2: result = 15; break; } return result;
}
instead of this:
int func(int p)
{
switch (p)
{
case 0: return 1;
case 1: return -4;
case 2: return 15;
}
return 0;
}This is correct with 'gcc foo.c' as well as with 'gcc -O2 foo.c'. The
difference is slight, and if it's too tiny, just ignore it this message.Perhaps some functions that php relies on heavily may benefit from this
though (but I wouldn't know which ones those would be).
Also, I noticed that in php_start_ob_buffer() in main/output.c, and
probably
in more functions integers are divided by 2 by doing:
result = intvar / 2;
while it is about 20% faster (even with -O2) to do this:
result = intvar >> 1;
A minor thing I noticed (nothing to speed up here though) is an unused
variable 'i' in insertionsort() in main/mergesort.c (weird that this
never
showed up as a compiler warning). Or does the defined TSRMLS_CC depend
on
the existance of an integer called 'i'? Pretty unlikely to me.
Why is CONTEXT_TYPE_IMAGE_GIF in main/logos.h defined as "Content-Type:
image/gif" with 2 spaces between "Content-Type" and "image/gif"?
In sapi/apache/mod_php5.c in the function php_apache_log_message(),
Why are these 2 calls:
fprintf(stderr, "%s", message);
fprintf(stderr, "\n");instead of 1 call:
fprintf(stderr, "%s\n", message);
In sapi/apache/mod_php5.c in the function php_apache_flag_handler_ex(),
the original:
if (!strcasecmp(arg2, "On") || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}is over 5 times slower than:
if (((arg2[0] == 'O' || arg2[0] == 'o') && (arg2[1] == 'n' || arg2[1]
==
'N') && (arg2[2] == '\0')) || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
Like I said, these are extremely tiny things, so please ignore it if
it's
too futile :) Nonetheless, if this turns out to be appreciated
information,
I'll continue the hunt.Good luck optimizing,
Ron
"Rasmus Lerdorf" rasmus@lerdorf.com wrote in message
news:4414F63F.5030406@lerdorf.com...We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still
is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php
scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for
example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely
expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind
files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming
from
the executor trashing its cache lines.-Rasmus
Best regards,
Marcus
Hello Ron,
i was under the impression you were already contributing to one of the
extensions. And well 5.1.2 obviously differs pretty much from 5.1.3 right
now especially in that file. To apply for a cvs account you'd first send
in some patches and mostly you would do that for a certain extension or
area. After a few reviews you'd go to the webpage, fill out a form and
get an approval. That we all did.
best regards
marcus
Monday, March 13, 2006, 11:52:14 PM, you wrote:
Hi Marcus,
Actually, I was looking at the 5.1.2, not 4.4. And 5.1.2 can't differ that
much from 5.1/HEAD, right? I'm not sure if I'm ready for CVS access, since I
don't know enough of the architecture of the system as a whole. I wouldn't
wanna break anything while trying to make things better. If I wanna
collaborate, what's the protocol of applying for a CVS account and is there
info on quality assurance? Are all CVS changes reviewed? I'd like to know
how this works, but don't quite know where to begin. Can you provide me some
basic info to get me informed and possibly get me started?
Thanks,
Ron
"Marcus Boerger" helly@php.net wrote in message
news:1531526751.20060313233621@marcus-boerger.de...Hello Ron,
just as a clarification, you looked at unchanged 4.4 code that is fixed
since long in 5.1/HEAD. Please always first look into 5.1/HEAD since 4.4
will only get real fixes but no code beautifying. Also we always start to
modify HEAD first and MFH from there. Doing it the otherway round just
costs unneccessary development time.Monday, March 13, 2006, 10:08:30 PM, you wrote:
Hi,
If you're even interested in the tinyest of optimizations, you may wanna
read this. I was just going through the php code. I'm not familiar with
it
at all, so I don't know which functions are the bottlenecks, so I can't
help
in optimizing the big picture. But I had little else to do right now, so
I
figured I'd just browse around through the files to see if I could
notice
any local speedups. So really, the things I lay out here are probably
futile, but who knows.
I found that for example the function php_stream_memory_seek() in
main/streams/memory.c contains a whole bunch of return statements. I
found
that it can be (you can benchmark this) slightly faster to do this:int func(int p)
{
int result = 0;switch (p) { case 0: result = 1; break; case 1: result = -4; break; case 2: result = 15; break; } return result;
}
instead of this:
int func(int p)
{
switch (p)
{
case 0: return 1;
case 1: return -4;
case 2: return 15;
}
return 0;
}This is correct with 'gcc foo.c' as well as with 'gcc -O2 foo.c'. The
difference is slight, and if it's too tiny, just ignore it this message.Perhaps some functions that php relies on heavily may benefit from this
though (but I wouldn't know which ones those would be).
Also, I noticed that in php_start_ob_buffer() in main/output.c, and
probably
in more functions integers are divided by 2 by doing:
result = intvar / 2;
while it is about 20% faster (even with -O2) to do this:
result = intvar >> 1;
A minor thing I noticed (nothing to speed up here though) is an unused
variable 'i' in insertionsort() in main/mergesort.c (weird that this
never
showed up as a compiler warning). Or does the defined TSRMLS_CC depend
on
the existance of an integer called 'i'? Pretty unlikely to me.
Why is CONTEXT_TYPE_IMAGE_GIF in main/logos.h defined as "Content-Type:
image/gif" with 2 spaces between "Content-Type" and "image/gif"?
In sapi/apache/mod_php5.c in the function php_apache_log_message(),
Why are these 2 calls:
fprintf(stderr, "%s", message);
fprintf(stderr, "\n");instead of 1 call:
fprintf(stderr, "%s\n", message);
In sapi/apache/mod_php5.c in the function php_apache_flag_handler_ex(),
the original:
if (!strcasecmp(arg2, "On") || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}is over 5 times slower than:
if (((arg2[0] == 'O' || arg2[0] == 'o') && (arg2[1] == 'n' || arg2[1]
==
'N') && (arg2[2] == '\0')) || (arg2[0] == '1' && arg2[1] == '\0')) {
bool_val[0] = '1';
} else {
bool_val[0] = '0';
}
Like I said, these are extremely tiny things, so please ignore it if
it's
too futile :) Nonetheless, if this turns out to be appreciated
information,
I'll continue the hunt.Good luck optimizing,
Ron
"Rasmus Lerdorf" rasmus@lerdorf.com wrote in message
news:4414F63F.5030406@lerdorf.com...We have a bit of a performance disconnect between 4.4 and 5.1 still. I
was doing some benchmarking today just as a sanity check on some APC
work I have been doing lately and came up with this:http://lerdorf.com/php/bm.html
You can ignore the apc/eaccelerator stuff. Those numbers are not
surprising. The surprising number to me is how much faster 4.4 still
is.The graph labels are slightly off. The 0, 5 and 10 includes should
really be 1, 6 and 11. The actual benchmark code is here:http://www.php.net/~rasmus/bm.tar.gz
Tested on a Linux 2.6 Ubuntu box on an AMD chip (syscalls are cheap
there) with current PHP_4_4 and PHP_5_1 checkouts. Was also testing
5.1.2 to see the effect of getting rid of that uncached realpath call.As far as I can tell auto_globals_jit isn't working at all, but I
eliminated that by doing variables_order = GP for these benchmarks.
Even so, the request_startup is significantly more expensive in 5.1.Here are callgrind dumps for each. Load them up with kcachegrind and
browse around:PHP 4.4 http://www.php.net/~rasmus/callgrind.out.1528.gz
PHP 5.1 http://www.php.net/~rasmus/callgrind.out.1488.gzEach of these is 1000 requests against the top.php and 4top.php
scripts.
from bm.tar.gz. If you start at theThe script is trivial and looks like this:
<html> <?php $base_dir = '/var/www/bm/'; include $base_dir . 'config.inc';function top_func($arg) {
$b = $arg.$arg;
echo $b;
}
class top_class {
private $prop;
function __construct($arg) {
$this->prop = $arg;
}
function getProp() {
return $this->prop;
}
function setProp($arg) {
$this->prop = strtolower($arg);
}
}top_func('foo');
$a = new top_class('bar');
echo $a->getProp();
$a->setProp("AbCdEfG");
echo $a->getProp();
echo <<<EOB
The database is {$config['db']}
and the user is {$config['db_user']}EOB;
</html>
?>and config.inc is:
<?php
$config = array(
'db' => 'mysql',
'db_user' => 'www',
'db_pwd' => 'foobar',
'config1' => 123,
'config2' => 456,
'config3' => 789,
'sub1' => array(1,2,3,4,5,6,7,8,9,10),
'sub2' =>
array("abc","def","ghi","jkl","mno","pqr","stu","vwx","yz")
);
?>4top.php is identical except for the class definition being PHP 4-style
instead. As in no private and a PHP 4 constructor. Otherwise it is
identical.I have some ideas for things we can speed up in 5.1. Like, for
example,
we should add the ap_add_common_vars() and ap_add_cgi_vars() to the jit
mechanism. There isn't much point filling these in unless the script
tries to get them. the ap_add_common_vars() call is extremely
expensive
since it does a qsort with a comparison function that uses strcasecmp.
Of course, this same optimization can be done in 4.4.If you know your way around kcachegrind, load up the two callgrind
files
and see what stands out for you. As far as I can tell, while we can do
some tricks to speed up various helper bits, the slowdown is coming
from
the executor trashing its cache lines.
Best regards,
Marcus