I ran the Zend/bench.php script with PHP 5.0 and PHP 5.1 compiled with
GCC 3.3 and GCC 3.4 using different optimizations on my Intel Pentium-M
laptop:
HEAD (PHP 5.1.0-dev)
GCC 3.3.4 (-march=pentium3)
-O0: 48.602
-Os: 29.920
-O1: 31.349
-O2: 29.029
-O3: 29.644
GCC 3.4.2 (-march=pentium-m -mtune=pentium-m)
-O0: 48.966
-Os: 26.286
-O1: 29.253
-O2: 29.100
-O3: 27.767
-fprofile-{generate|use} -Os: 24.216
-fprofile-{generate|use} -O1: 26.575
-fprofile-{generate|use} -O2: 26.339
-fprofile-{generate|use} -O3: 25.537
PHP_5_0 (PHP 5.0.3-dev)
GCC 3.3.4 (-march=pentium3)
-O0: 58.394
-Os: 42.570
-O1: 43.454
-O2: 42.092
-O3: 42.066
GCC 3.4.2 (-march=pentium-m -mtune=pentium-m)
-O0: 59.272
-Os: 35.853
-O1: 38.275
-O2: 37.989
-O3: 41.020
-fprofile-{generate|use} -Os: 33.926
-fprofile-{generate|use} -O1: 36.853
-fprofile-{generate|use} -O2: 35.335
-fprofile-{generate|use} -O3: 38.897
For the -fprofile-{generate|use} builds I used the Zend/bench.php
script, too, to generate the profile information. If we were to add
Makefile target (like GCC itself ("make profiled-bootstrap")) to the
PHP build that makes use of -fprofile-{generate|use} it would make
sense to use "make test" to generate the profiling information.
When I find the time I will repeat the benchmark on my AMD Athlon64 box
because I think the small L1/L2 caches of the laptop CPU affect the
performance of the bigger code generated through optimizations. You can
clearly see that code optimized for size (-Os) is faster than the more
aggressive -O1, -O2, and -O3 optimization levels.
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Hello Sebastian,
could you please repeat the tests for php 4.3 to see the interesting
difference? Also how about providing a spreadsheet for a better overiew?
How did you do the timeings, i assume time command? And what CPU did you
us, especially how much cache arewe talking about?
best regards
marcus
Monday, October 4, 2004, 8:30:47 PM, you wrote:
I ran the Zend/bench.php script with PHP 5.0 and PHP 5.1 compiled with
GCC 3.3 and GCC 3.4 using different optimizations on my Intel Pentium-M
laptop:
HEAD (PHP 5.1.0-dev)
GCC 3.3.4 (-march=pentium3)
-O0: 48.602 -Os: 29.920 -O1: 31.349 -O2: 29.029 -O3: 29.644
GCC 3.4.2 (-march=pentium-m -mtune=pentium-m)
-O0: 48.966 -Os: 26.286 -O1: 29.253 -O2: 29.100 -O3: 27.767
-fprofile-{generate|use} -Os: 24.216 -fprofile-{generate|use} -O1: 26.575 -fprofile-{generate|use} -O2: 26.339 -fprofile-{generate|use} -O3: 25.537
PHP_5_0 (PHP 5.0.3-dev)
GCC 3.3.4 (-march=pentium3)
-O0: 58.394 -Os: 42.570 -O1: 43.454 -O2: 42.092 -O3: 42.066
GCC 3.4.2 (-march=pentium-m -mtune=pentium-m)
-O0: 59.272 -Os: 35.853 -O1: 38.275 -O2: 37.989 -O3: 41.020
-fprofile-{generate|use} -Os: 33.926 -fprofile-{generate|use} -O1: 36.853 -fprofile-{generate|use} -O2: 35.335 -fprofile-{generate|use} -O3: 38.897
For the -fprofile-{generate|use} builds I used the Zend/bench.php
script, too, to generate the profile information. If we were to add
Makefile target (like GCC itself ("make profiled-bootstrap")) to the
PHP build that makes use of -fprofile-{generate|use} it would make
sense to use "make test" to generate the profiling information.
When I find the time I will repeat the benchmark on my AMD Athlon64 box
because I think the small L1/L2 caches of the laptop CPU affect the
performance of the bigger code generated through optimizations. You can
clearly see that code optimized for size (-Os) is faster than the more
aggressive -O1, -O2, and -O3 optimization levels.
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
--
Best regards,
Marcus mailto:helly@php.net
Marcus Boerger wrote:
could you please repeat the tests for php 4.3 to see the interesting
difference?
Yes, I can do that.
How did you do the timeings, i assume time command?
I used the "total" output of Zend/bench.php.
And what CPU did you us, especially how much cache arewe talking about?
Intel Pentium-M 1500 MHz, Frequency Scaling has been disabled for the
benchmark run.
L1 I cache: 32K, L1 D cache: 32K, L2 cache: 1024K
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Marcus Boerger wrote:
Also how about providing a spreadsheet for a better overiew?
http://www.sebastian-bergmann.de/stuff/gcc-cflags.png
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Hello Sebastian,
Monday, October 4, 2004, 8:55:07 PM, you wrote:
Marcus Boerger wrote:
Also how about providing a spreadsheet for a better overiew?
Looks very nice so far but i am eagerly awaiting the 4.3 results now :-)
--
Best regards,
Marcus mailto:helly@php.net
Marcus Boerger wrote:
very nice so far but i am eagerly awaiting the 4.3 results now :-)
You'll have to wait at least until tomorrow.
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Marcus Boerger wrote:
very nice so far but i am eagerly awaiting the 4.3 results now :-)
I have uploaded the results for PHP 4.3.10-dev, PHP 5.0.3-dev, and
PHP 5.1.0-dev to
http://www.sebastian-bergmann.de/stuff/gcc-cflags.pdf
The machine used was
http://www.sebastian-bergmann.de/gallery/wopr
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Very interesting numbers, I'd like to second Marcus' request for a 4.3
benchmark.
I was somewhat surprised by O2 and O1 being slower then Os, while O3 in
some cases may end over optimizing which would it explain it's poor
showing. However, it could be that it makes simple situations slower,
while more complex operations that are generally more CPU intensive will
in fact become faster. If you don't mind, could you please all include
data for "time make test" as it seems to cover a much greater quantity
of code.
Ilia
Very interesting numbers, I'd like to second Marcus' request for a 4.3
benchmark.I was somewhat surprised by O2 and O1 being slower then Os, while O3 in
some cases may end over optimizing which would it explain it's poor
showing. However, it could be that it makes simple situations slower,
while more complex operations that are generally more CPU intensive will
in fact become faster. If you don't mind, could you please all include
data for "time make test" as it seems to cover a much greater quantity
of code.
Thies and I found the same thing when we did our patch. It relates to
the size of the executor loop that is generated, if you have too much
inlining you end up blowing your instruction cache.
-Sterling
Hello Sterling, hello Sebastian,
Monday, October 4, 2004, 9:18:47 PM, you wrote:
Very interesting numbers, I'd like to second Marcus' request for a 4.3
benchmark.I was somewhat surprised by O2 and O1 being slower then Os, while O3 in
some cases may end over optimizing which would it explain it's poor
showing. However, it could be that it makes simple situations slower,
while more complex operations that are generally more CPU intensive will
in fact become faster. If you don't mind, could you please all include
data for "time make test" as it seems to cover a much greater quantity
of code.Thies and I found the same thing when we did our patch. It relates to
the size of the executor loop that is generated, if you have too much
inlining you end up blowing your instruction cache.
That means not using all opcodes could result in faster execution with
the function based executor. Maybe sebastian could try that too?
--
Best regards,
Marcus mailto:helly@php.net
Marcus Boerger wrote:
That means not using all opcodes could result in faster execution with
the function based executor. Maybe sebastian could try that too?
How would I do that?
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Hello Sebastian,
Monday, October 4, 2004, 10:01:36 PM, you wrote:
Marcus Boerger wrote:
That means not using all opcodes could result in faster execution with
the function based executor. Maybe sebastian could try that too?
How would I do that?
3., 4.: switch only
5.0: call only
5.1: ./configure --with-zend-vm=[CALL|SWITCH|GOTO]
regards
marcus
Marcus Boerger wrote:
5.1: ./configure --with-zend-vm=[CALL|SWITCH|GOTO]
I do not seem to have that configure switch (current HEAD):
sb@wopr-mobile php-5.1 % ./configure --help|grep vm
--enable-sysvmsg Enable sysvmsg support
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Ilia Alshanetsky wrote:
I was somewhat surprised by O2 and O1 being slower then Os
This is because the L1/L2 caches are very small. I will re-run the
benchmark tomorrow (if I find the time) on my Athlon64 box.
--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
I've done lots of testing with these kind of compiler flags. It's really a
tough call because you might change 2 lines in the PHP code and then your
results might differ completely because of CPU cache hits/misses. Also very
much depends on the CPU.
But in any case, interesting results.
Andi
At 08:30 PM 10/4/2004 +0200, Sebastian Bergmann wrote:
I ran the Zend/bench.php script with PHP 5.0 and PHP 5.1 compiled with
GCC 3.3 and GCC 3.4 using different optimizations on my Intel Pentium-M
laptop:HEAD (PHP 5.1.0-dev)
GCC 3.3.4 (-march=pentium3) -O0: 48.602 -Os: 29.920 -O1: 31.349 -O2: 29.029 -O3: 29.644 GCC 3.4.2 (-march=pentium-m -mtune=pentium-m) -O0: 48.966 -Os: 26.286 -O1: 29.253 -O2: 29.100 -O3: 27.767 -fprofile-{generate|use} -Os: 24.216 -fprofile-{generate|use} -O1: 26.575 -fprofile-{generate|use} -O2: 26.339 -fprofile-{generate|use} -O3: 25.537
PHP_5_0 (PHP 5.0.3-dev)
GCC 3.3.4 (-march=pentium3) -O0: 58.394 -Os: 42.570 -O1: 43.454 -O2: 42.092 -O3: 42.066 GCC 3.4.2 (-march=pentium-m -mtune=pentium-m) -O0: 59.272 -Os: 35.853 -O1: 38.275 -O2: 37.989 -O3: 41.020 -fprofile-{generate|use} -Os: 33.926 -fprofile-{generate|use} -O1: 36.853 -fprofile-{generate|use} -O2: 35.335 -fprofile-{generate|use} -O3: 38.897
For the -fprofile-{generate|use} builds I used the Zend/bench.php
script, too, to generate the profile information. If we were to add
Makefile target (like GCC itself ("make profiled-bootstrap")) to the
PHP build that makes use of -fprofile-{generate|use} it would make
sense to use "make test" to generate the profiling information.When I find the time I will repeat the benchmark on my AMD Athlon64 box
because I think the small L1/L2 caches of the laptop CPU affect the
performance of the bigger code generated through optimizations. You can
clearly see that code optimized for size (-Os) is faster than the more
aggressive -O1, -O2, and -O3 optimization levels.--
Sebastian Bergmann http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69