Hi,
Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only on
Linux, release, non-ZTS build. It misses support for debug mode and ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.
I would appreciate if you bench it vs vanilla PHP-5.6 on your applications,
review the code from performance and security points of views, and come
with comments, ideas and criticism. (For example: may be someone would
suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
usage on each emalloc() call? How to reduce cost of statistics collection?)
Currently, I'm not sure if 5% speed improvements costs the effort.
The results of my benchmarks follow.
Thanks. Dmitry.
PHP-5.6 32-bit zend_alloc xx_malloc Improvement blog 105.6 109.7
3.88% drupal
1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4 11865.9 -1.51% qdig
464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress 188.9 196.8 4.19% xoops
132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1 Hello 1153.2 1228.4 6.52% ZF2
Test 263.0 275.5 4.75%
PHP-5.6 64-bit zend_alloc xx_malloc Improvement blog 99.0 102.3
3.33% drupal
1531.1 1604.2 4.78% fw 197.6 206.5 4.50% hello 11702.0 11875.2 1.48% qdig
451.1 476.8 5.70% typo3 541.8 568.7 4.96% wordpress 176.5 185.8 5.27% xoops
126.1 134.5 6.66% scrum 169.3 174.9 3.31% ZF1 Hello 1042.9 1119.6 7.35% ZF2
Test 238.4 242.5 1.72%
Hi,
Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only on
Linux, release, non-ZTS build. It misses support for debug mode and ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.I would appreciate if you bench it vs vanilla PHP-5.6 on your applications,
review the code from performance and security points of views, and come
with comments, ideas and criticism. (For example: may be someone would
suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
usage on each emalloc() call? How to reduce cost of statistics collection?)Currently, I'm not sure if 5% speed improvements costs the effort.
The results of my benchmarks follow.
Thanks. Dmitry.
PHP-5.6 32-bit zend_alloc xx_malloc Improvement blog 105.6 109.7
3.88% drupal
1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4 11865.9 -1.51% qdig
464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress 188.9 196.8 4.19% xoops
132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1 Hello 1153.2 1228.4 6.52% ZF2
Test 263.0 275.5 4.75%
Great Dmitry !
We worked on something with Joe few months ago, mainly adding a new
ZendMM handler which binds jemalloc().
Anyway, I'm gonna try your code and run it against several Symfony2
applications.
I come back to you end of week with some results ;-)
Julien
Of course I tried to plug jemalloc and tcmalloc but they make slowdown
instead of speedup, mainly because zend_alloc was especially designed for
PHP and also because they suffer from multi-threading support overhead. On
the other hand profiling PHP with oprofile I saw a lot of cache misses in
zend_alloc.c, especially because of linked list handling. So I tried to
combine the best from all approaches and then spend a couple of week tuning
it.
Thanks. Dmitry.
Hi,
Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only
on
Linux, release, non-ZTS build. It misses support for debug mode and ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.I would appreciate if you bench it vs vanilla PHP-5.6 on your
applications,
review the code from performance and security points of views, and come
with comments, ideas and criticism. (For example: may be someone would
suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
usage on each emalloc() call? How to reduce cost of statistics
collection?)Currently, I'm not sure if 5% speed improvements costs the effort.
The results of my benchmarks follow.
Thanks. Dmitry.
PHP-5.6 32-bit zend_alloc xx_malloc Improvement blog 105.6 109.7
3.88% drupal
1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4 11865.9 -1.51%
qdig
464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress 188.9 196.8 4.19%
xoops
132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1 Hello 1153.2 1228.4
6.52% ZF2
Test 263.0 275.5 4.75%Great Dmitry !
We worked on something with Joe few months ago, mainly adding a new
ZendMM handler which binds jemalloc().Anyway, I'm gonna try your code and run it against several Symfony2
applications.
I come back to you end of week with some results ;-)Julien
Of course I tried to plug jemalloc and tcmalloc but they make slowdown
instead of speedup, mainly because zend_alloc was especially designed for
PHP and also because they suffer from multi-threading support overhead. On
the other hand profiling PHP with oprofile I saw a lot of cache misses in
zend_alloc.c, especially because of linked list handling. So I tried to
combine the best from all approaches and then spend a couple of week tuning
it.Thanks. Dmitry.
Yes, I was reading the great job you've done so far !
Looking forward in testing this myself and why not fix bugs or give
some more ideas :-)
Anyway, the different pool sizes is nice.
We already got an idea like this in ZendMM with the "small free block"
VS "free block" linked lists, but the implementation you've done so
far is pretty nice evolution.
I think we can improve stuff by studying more accurate caches for
frequently used C-objects such as zvals or zend_object's structures.
So many ideas, glad to see you're having fun with them ;-)
Julien.P
Of course I tried to plug jemalloc and tcmalloc but they make slowdown
instead of speedup, mainly because zend_alloc was especially designed for
PHP and also because they suffer from multi-threading support overhead. On
the other hand profiling PHP with oprofile I saw a lot of cache misses in
zend_alloc.c, especially because of linked list handling. So I tried to
combine the best from all approaches and then spend a couple of week tuning
it.Thanks. Dmitry.
Yes, I was reading the great job you've done so far !
Looking forward in testing this myself and why not fix bugs or give
some more ideas :-)Anyway, the different pool sizes is nice.
We already got an idea like this in ZendMM with the "small free block"
VS "free block" linked lists, but the implementation you've done so
far is pretty nice evolution.I think we can improve stuff by studying more accurate caches for
frequently used C-objects such as zvals or zend_object's structures.
One of the most common macro uses for both emalloc and efree primitives
is in the CTOR and DTOR of zval structures. This is HOT code, yet the
code involved is split across three main modules totalling less that 10K
lines. As a comparison zend_vm_execute.h is over 40K lines to allow the
CC optimizer to optimize across the entire source code.
Wouldn't it make a lot more sense to combine zend_variables.c,
zend_hash.c and zend_alloc.c into a single module (do it by #include
directive as zend_execute.c incorporates zend_vm_execute.h) so that the
CC optimizer can properly optimitise these CTORs and DTORs? The DTOR
for a ZVAL which itself a simple ZVAL hierarchy such as a an array
should be executed as a dense code sequence that can comfortably run out
of the L1Instr cache, with minimal cache misses and BLT failure stalls.
Regards Terry
Hi Terry,
May be I misunderstood you.
Macros must be inlined at compile-time anyway.
Inlining of "slow-paths" of zval_copy_ctor/zval_dtor would cause "code
explosion" and increase cache misses.
However on Linux it must possible to put "hot" functions in one code
section and reduce cache misses.
Anyway, it's unrelated to Memory Manager.
Thanks. Dmitry.
On Wed, Jan 15, 2014 at 3:47 AM, Terry Ellison ellison.terry@gmail.comwrote:
Of course I tried to plug jemalloc and tcmalloc but they make slowdown
instead of speedup, mainly because zend_alloc was especially designed for
PHP and also because they suffer from multi-threading support overhead. On
the other hand profiling PHP with oprofile I saw a lot of cache misses in
zend_alloc.c, especially because of linked list handling. So I tried to
combine the best from all approaches and then spend a couple of week tuning
it.Thanks. Dmitry.
Yes, I was reading the great job you've done so far !
Looking forward in testing this myself and why not fix bugs or give
some more ideas :-)Anyway, the different pool sizes is nice.
We already got an idea like this in ZendMM with the "small free block"
VS "free block" linked lists, but the implementation you've done so
far is pretty nice evolution.I think we can improve stuff by studying more accurate caches for
frequently used C-objects such as zvals or zend_object's structures.One of the most common macro uses for both emalloc and efree primitives
is in the CTOR and DTOR of zval structures. This is HOT code, yet the code
involved is split across three main modules totalling less that 10K lines.
As a comparison zend_vm_execute.h is over 40K lines to allow the CC
optimizer to optimize across the entire source code.Wouldn't it make a lot more sense to combine zend_variables.c, zend_hash.c
and zend_alloc.c into a single module (do it by #include directive as
zend_execute.c incorporates zend_vm_execute.h) so that the CC optimizer can
properly optimitise these CTORs and DTORs? The DTOR for a ZVAL which
itself a simple ZVAL hierarchy such as a an array should be executed as a
dense code sequence that can comfortably run out of the L1Instr cache, with
minimal cache misses and BLT failure stalls.Regards Terry
Hi there,
(For example: may be someone would
suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
usage on each emalloc() call?
Rename emalloc() -> real_emalloc(), then:
.h:
extern void (*emalloc)(size_t n);
.c:
void (*emalloc)(size_t n) = real_emalloc;
startup code:
if (USE_ZEND_ALLOC is 0) {
emalloc = malloc;
}
Probably some adjustments needed (esp. for potentially different calling
conventions, so maybe malloc() will need a small wrapper), this is just
from the top of my head.
Note: this breaks ABI compatibility on most archs (because it changes
the symbol type). But could be done independently of anything else.
Note 2: Depending on system architecture, each call to the function may
incur an additional (small) penalty. No idea how this compares to the
penalty of the if() at the start of emalloc().
Regards,
Christian
Hi Christian,
It's a clear solution, but indirect call may cause a lot of branch
miss-predictions, so it needs to be tested if it can improve performance.
binary compatibility is going to be broken anyway, so it's not a problem.
Thanks. Dmitry.
Hi there,
(For example: may be someone would
suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
usage on each emalloc() call?Rename emalloc() -> real_emalloc(), then:
.h:
extern void (*emalloc)(size_t n);.c:
void (*emalloc)(size_t n) = real_emalloc;startup code:
if (USE_ZEND_ALLOC is 0) {
emalloc = malloc;
}Probably some adjustments needed (esp. for potentially different calling
conventions, so maybe malloc() will need a small wrapper), this is just
from the top of my head.Note: this breaks ABI compatibility on most archs (because it changes
the symbol type). But could be done independently of anything else.Note 2: Depending on system architecture, each call to the function may
incur an additional (small) penalty. No idea how this compares to the
penalty of the if() at the start of emalloc().Regards,
Christian
Hi Dmitry,
Hi,
Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only
on Linux, release, non-ZTS build. It misses support for debug mode and ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.
i've just gave it a try on windows, the compilation breaks with this error
zend\xx_malloc.c(41) : fatal error C1083: Cannot open include file:
'sys/mman.h': No such file or directory
Google gave me this link
https://code.google.com/p/mman-win32/source/browse/trunk/ . I can go for a
fix using this lib or maybe look for another equivalent after the
str_size_and_int64 RFC finish.
Regards
Anatol
Hi Anatol,
It's a Prove of Concept implementation.
I publish it to decide if it makes sense to integrate it into PHP and
implement support for missed things or just forget.
As I wrote, at this moment, it supports only Linux, non-zts, release build.
Thanks. Dmitry.
Hi Dmitry,
Hi,
Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only
on Linux, release, non-ZTS build. It misses support for debug mode and
ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.i've just gave it a try on windows, the compilation breaks with this error
zend\xx_malloc.c(41) : fatal error C1083: Cannot open include file:
'sys/mman.h': No such file or directoryGoogle gave me this link
https://code.google.com/p/mman-win32/source/browse/trunk/ . I can go for a
fix using this lib or maybe look for another equivalent after the
str_size_and_int64 RFC finish.Regards
Anatol
Hi Dmitry,
Hi Anatol,
It's a Prove of Concept implementation.
I publish it to decide if it makes sense to integrate it into PHP and
implement support for missed things or just forget. As I wrote, at this
moment, it supports only Linux, non-zts, release build.Thanks. Dmitry.
yep, I was aware of it. Right for this reason I took the 5 minutes risk of
git clone and make :) Now we know at least where it'll need some more
hacking.
Regards
anatol
Hi Dmitry,
Hi Anatol,
It's a Prove of Concept implementation.
I publish it to decide if it makes sense to integrate it into PHP and
implement support for missed things or just forget. As I wrote, at this
moment, it supports only Linux, non-zts, release build.Thanks. Dmitry.
yep, I was aware of it. Right for this reason I took the 5 minutes risk of
git clone and make :) Now we know at least where it'll need some more
hacking.Regards
anatol
--
I could not see improvement on a basic hello world under Symfony2.
I will try more complex apps ;-)
Julien
Hi,
Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only on
Linux, release, non-ZTS build. It misses support for debug mode and ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.I would appreciate if you bench it vs vanilla PHP-5.6 on your
applications, review the code from performance and security points of
views, and come with comments, ideas and criticism. (For example: may be
someone would suggest how to avoid check for USE_ZEND_ALLOC=0 to allow
system malloc() usage on each emalloc() call? How to reduce cost of
statistics collection?)Currently, I'm not sure if 5% speed improvements costs the effort.
The results of my benchmarks follow.
Thanks. Dmitry.
PHP-5.6 32-bit zend_alloc xx_malloc Improvement blog 105.6 109.7
3.88% drupal 1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4
11865.9 -1.51% qdig 464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress
188.9 196.8 4.19% xoops 132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1
Hello 1153.2 1228.4 6.52% ZF2 Test 263.0 275.5 4.75%PHP-5.6 64-bit zend_alloc xx_malloc Improvement blog 99.0 102.3 3.33% drupal
1531.1 1604.2 4.78% fw 197.6 206.5 4.50% hello 11702.0 11875.2 1.48% qdig
451.1 476.8 5.70% typo3 541.8 568.7 4.96% wordpress 176.5 185.8 5.27% xoops
126.1 134.5 6.66% scrum 169.3 174.9 3.31% ZF1 Hello 1042.9 1119.6 7.35% ZF2
Test 238.4 242.5 1.72%
I tested your patch on some parsing code (which is very heavy on object
creation for syntax trees) and saw ~12% performance improvement and ~15%
memory usage improvement there. So looks like the new allocator works
particularly well if a lot of object allocation is involved.
Nikita
Hi Nikita,
12% improvement on real task looks amazing :)
Was it on 32-bit or 64-bit PHP?
Thanks. Dmitry.
Hi,
Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only on
Linux, release, non-ZTS build. It misses support for debug mode and ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.I would appreciate if you bench it vs vanilla PHP-5.6 on your
applications, review the code from performance and security points of
views, and come with comments, ideas and criticism. (For example: may be
someone would suggest how to avoid check for USE_ZEND_ALLOC=0 to allow
system malloc() usage on each emalloc() call? How to reduce cost of
statistics collection?)Currently, I'm not sure if 5% speed improvements costs the effort.
The results of my benchmarks follow.
Thanks. Dmitry.
PHP-5.6 32-bit zend_alloc xx_malloc Improvement blog 105.6 109.7
3.88% drupal 1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4
11865.9 -1.51% qdig 464.4 495.3 6.65% typo3 563.8 584.9 3.74%
wordpress 188.9 196.8 4.19% xoops 132.7 140.0 5.50% scrum 181.6 192.7
6.11% ZF1 Hello 1153.2 1228.4 6.52% ZF2 Test 263.0 275.5 4.75%PHP-5.6 64-bit zend_alloc xx_malloc Improvement blog 99.0 102.3 3.33% drupal
1531.1 1604.2 4.78% fw 197.6 206.5 4.50% hello 11702.0 11875.2 1.48% qdig
451.1 476.8 5.70% typo3 541.8 568.7 4.96% wordpress 176.5 185.8 5.27% xoops
126.1 134.5 6.66% scrum 169.3 174.9 3.31% ZF1 Hello 1042.9 1119.6 7.35% ZF2
Test 238.4 242.5 1.72%I tested your patch on some parsing code (which is very heavy on object
creation for syntax trees) and saw ~12% performance improvement and ~15%
memory usage improvement there. So looks like the new allocator works
particularly well if a lot of object allocation is involved.Nikita