Hello, everyone. My name is David Wang and I am one of the students
participating in Google Summer of Code this year. As you may remember,
my project is to implement a garbage collector for circular references
in PHP. As the midterm for Summer of Code is coming up, my mentor,
Derick Rethans, thought it would be a good idea if I shared my
progress with the community.
As you know, PHP uses reference counting as a simple garbage
collection system. One of the primary weaknesses of reference counting
systems is that objects that refer indirectly or directly to
themselves, i.e. reference cycles, are not collected. The accumulation
of unreferenced memory that is only deallocated at the end of a
request can be prohibitively expensive. The cycle collector I am
implementing directly addresses this problem.
The cycle collector is, of necessity, a type of tracing garbage
collector. However, it uses reference count information to accelerate
the collection process. Essentially, it works by removing the internal
reference counts (that is, references from an object within the cycle
to another object within the cycles )from candidate cycles. In a cycle
that can be collected, after all internal reference counts have been
removed, the reference count of all objects within the cycle should be
- The particular algorithm that I am using is the synchronous cycle
collector described by David Bacon and V. T. Rajan in Concurrent Cycle
Collection in Reference Counted Systems (see the article for further
details). Various optimizations allow the cycle collector to be run
relatively infrequently and execute speedily when it does run.
Note that because PHP is designed to be single-threaded, a synchronous
algorithm is used which does pause program execution for cycle
collection when cycle collection becomes necessary. If a program has
no reference cycles, the cycle collector does not run at all.
The initial phase of implementation is complete, and I am currently
profiling, optimizing and trying the modified version of PHP on
various test programs in an effort to find bugs. Meanwhile, here are
some recent benchmarks.
Some aspects of the eZ Components (http://ez.no/ezcomponents,
available from SVN with the instructions here:
http://ez.no/community/articles/an_introduction_to_ez_components/installation)
testing suite use reference cycles heavily. The Template test uses
circular references the most frequently, the Graph test also uses
circular references, albeit less heavily.
On the Graph test, maximum memory usage with unmodified PHP was 133.9
MB with an execution time of 8 seconds.
On the Graph test, maximum memory usage with gc was 51.6 MB with an
execution time of 9 seconds.
On the Template test, maxmium memory usage with unmodified PHP was 1.5
GB with an execution time of 30 seconds.
On the Template test, maxmium memory usage with gc was 67.3 MB with an
execution time of 1 minute.
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.
These tests were conducted on my dual core AMD X2 4400+ desktop with
./configure --with-gd --with-jpeg-dir --with-zlib. As you can see,
there is the classic time vs. memory trade-off.
My project is currently being hosted on the xdebug CVS. You can get
the latest version with the following commands:
cvs -d :pserver:srmread@cvs.xdebug.org:/repository login
CVS password: srmread
cvs -d :pserver:srmread@cvs.xdebug.org:/repository co circular
Note that I'm implementing my project on top of a CVS version of PHP
5.2 that is a couple of weeks old. However, the cycle collector can be
ported into other versions of PHP fairly easily as it does not affect
the existing Zend engine code too much.
If anyone has any questions, I'll be more than happy to answer them!
Regards,
Yiduo (David) Wang
Hello, everyone. My name is David Wang and I am one of the students
participating in Google Summer of Code this year. As you may remember,
my project is to implement a garbage collector for circular references
in PHP. As the midterm for Summer of Code is coming up, my mentor,
Derick Rethans, thought it would be a good idea if I shared my
progress with the community.As you know, PHP uses reference counting as a simple garbage
collection system. One of the primary weaknesses of reference counting
systems is that objects that refer indirectly or directly to
themselves, i.e. reference cycles, are not collected. The accumulation
of unreferenced memory that is only deallocated at the end of a
request can be prohibitively expensive. The cycle collector I am
implementing directly addresses this problem.The cycle collector is, of necessity, a type of tracing garbage
collector. However, it uses reference count information to accelerate
the collection process. Essentially, it works by removing the internal
reference counts (that is, references from an object within the cycle
to another object within the cycles )from candidate cycles. In a cycle
that can be collected, after all internal reference counts have been
removed, the reference count of all objects within the cycle should be
- The particular algorithm that I am using is the synchronous cycle
collector described by David Bacon and V. T. Rajan in Concurrent Cycle
Collection in Reference Counted Systems (see the article for further
details). Various optimizations allow the cycle collector to be run
relatively infrequently and execute speedily when it does run.Note that because PHP is designed to be single-threaded, a synchronous
algorithm is used which does pause program execution for cycle
collection when cycle collection becomes necessary. If a program has
no reference cycles, the cycle collector does not run at all.The initial phase of implementation is complete, and I am currently
profiling, optimizing and trying the modified version of PHP on
various test programs in an effort to find bugs. Meanwhile, here are
some recent benchmarks.Some aspects of the eZ Components (http://ez.no/ezcomponents,
available from SVN with the instructions here:
http://ez.no/community/articles/an_introduction_to_ez_components/installation)
testing suite use reference cycles heavily. The Template test uses
circular references the most frequently, the Graph test also uses
circular references, albeit less heavily.On the Graph test, maximum memory usage with unmodified PHP was 133.9
MB with an execution time of 8 seconds.
On the Graph test, maximum memory usage with gc was 51.6 MB with an
execution time of 9 seconds.
On the Template test, maxmium memory usage with unmodified PHP was 1.5
GB with an execution time of 30 seconds.
On the Template test, maxmium memory usage with gc was 67.3 MB with an
execution time of 1 minute.
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.
These tests were conducted on my dual core AMD X2 4400+ desktop with
./configure --with-gd --with-jpeg-dir --with-zlib. As you can see,
there is the classic time vs. memory trade-off.My project is currently being hosted on the xdebug CVS. You can get
the latest version with the following commands:cvs -d :pserver:srmread@cvs.xdebug.org:/repository login
CVS password: srmread
cvs -d :pserver:srmread@cvs.xdebug.org:/repository co circular
Note that I'm implementing my project on top of a CVS version of PHP
5.2 that is a couple of weeks old. However, the cycle collector can be
ported into other versions of PHP fairly easily as it does not affect
the existing Zend engine code too much.If anyone has any questions, I'll be more than happy to answer them!
Regards,
Yiduo (David) Wang
KUDOS!--
/////////////////////////////////////////////////////
Service provided by hitOmeter.NET internet messaging!
.
David Wang schrieb:
Hello, everyone. My name is David Wang and I am one of the students
participating in Google Summer of Code this year. As you may remember,
my project is to implement a garbage collector for circular references
in PHP. As the midterm for Summer of Code is coming up, my mentor,
Derick Rethans, thought it would be a good idea if I shared my
progress with the community.
Hello David,
On the Graph test, maximum memory usage with unmodified PHP was 133.9
MB with an execution time of 8 seconds.
On the Graph test, maximum memory usage with gc was 51.6 MB with an
execution time of 9 seconds.
On the Template test, maxmium memory usage with unmodified PHP was 1.5
GB with an execution time of 30 seconds.
On the Template test, maxmium memory usage with gc was 67.3 MB with an
execution time of 1 minute.
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.
these numbers are quite impressive.
My project is currently being hosted on the xdebug CVS. You can get
the latest version with the following commands:cvs -d :pserver:srmread@cvs.xdebug.org:/repository login
CVS password: srmread
cvs -d :pserver:srmread@cvs.xdebug.org:/repository co circular
Note that I'm implementing my project on top of a CVS version of PHP
5.2 that is a couple of weeks old. However, the cycle collector can be
ported into other versions of PHP fairly easily as it does not affect
the existing Zend engine code too much.
As a first step, I think the code changes that convert direct accesses
to the reference counters to the proper macros should be merged. These
changes should be done in any case and it minimizes your patch and makes
it more readable.
--
Sebastian Bergmann http://sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.these numbers are quite impressive.
Indeed, memory usage reduction is very impressive - though I'm somewhat
surprised so much memory is wasted, is this some special very circular
case? However, twofold speed reduction can be bad - but maybe it can be
optimized.
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
Hi all,
-----Original Message-----
From: news [mailto:news@sea.gmane.org] On Behalf Of Sebastian Bergmann
Sent: Thursday, July 12, 2007 2:52 AM
As a first step, I think the code changes that convert direct accesses
to the reference counters to the proper macros should be merged. These
changes should be done in any case and it minimizes your patch and
makes
it more readable.
I'll get on that. Should I create a patch for 5.2 only, or should one for
HEAD be created as well?
-----Original Message-----
From: Stanislav Malyshev [mailto:stas@zend.com]
Sent: Thursday, July 12, 2007 2:28 PM
Indeed, memory usage reduction is very impressive - though I'm somewhat
surprised so much memory is wasted, is this some special very circular
case?
The Template test does indeed involve a lot of cycles, which is why it was
recommended to me as a good test case. However any program that generates a
large amount of objects that are self-referential can do the same thing.
See: http://bugs.php.net/bug.php?id=33595
However, twofold speed reduction can be bad - but maybe it can be
optimized.
Yes, that's very true. Although, anything that involves checking and freeing
over a gigabyte worth of objects is going to take a long time. I'll keep
working on it.
My current priority for optimization is reducing overhead for acyclic
programs. Three bytes are added to the zval struct with the current
implementation and that struct is copied around a lot. Most notably, this
causes an annoying 5% slow down in bench.php. I'm not certain what real
world impact this has, but data with the php.ini option zend.enable_gc set
to 0 would be very helpful.
David
Hi,
David wrote:
Hi all,
-----Original Message-----
From: news [mailto:news@sea.gmane.org] On Behalf Of Sebastian Bergmann
Sent: Thursday, July 12, 2007 2:52 AMAs a first step, I think the code changes that convert direct accesses
to the reference counters to the proper macros should be merged. These
changes should be done in any case and it minimizes your patch and
makes
it more readable.I'll get on that. Should I create a patch for 5.2 only, or should one for
HEAD be created as well?-----Original Message-----
From: Stanislav Malyshev [mailto:stas@zend.com]
Sent: Thursday, July 12, 2007 2:28 PMIndeed, memory usage reduction is very impressive - though I'm somewhat
surprised so much memory is wasted, is this some special very circular
case?The Template test does indeed involve a lot of cycles, which is why it was
recommended to me as a good test case. However any program that generates a
large amount of objects that are self-referential can do the same thing.
See: http://bugs.php.net/bug.php?id=33595However, twofold speed reduction can be bad - but maybe it can be
optimized.Yes, that's very true. Although, anything that involves checking and freeing
over a gigabyte worth of objects is going to take a long time. I'll keep
working on it.My current priority for optimization is reducing overhead for acyclic
programs. Three bytes are added to the zval struct with the current
implementation and that struct is copied around a lot. Most notably, this
causes an annoying 5% slow down in bench.php. I'm not certain what real
world impact this has, but data with the php.ini option zend.enable_gc set
to 0 would be very helpful.
Could the garabage collector be started / stopped during script
executing from a userland function? zend_gc_enable/disable() in addition
to an ini option to enable it at script startup?
It would give developers the opportunity to start the checking mechanism
if they knew they were working with acyclic objects.
David
Scott
Could the garabage collector be started / stopped during script
executing from a userland function? zend_gc_enable/disable() in
addition
to an ini option to enable it at script startup?
Yes, that functionality can be added fairly easily. Some GC code will still
have to be executed (such as the if branches that currently check whether
the GC is enabled or not :-)) but performance may increase slightly (since
there will no longer be a function call for the GC to note down possible
cycles it detects). However, a drastic performance increase will probably
not be seen. It's very unlikely for the main garbage collection cycle to be
run if there are no cycles present. (I can go into some technical details if
you'd like).
David
Hi David,
Isn't finding cycles an NP-complete problem? :) How can you know there
are no cycles without actually looking for them?
It would definitely be nice to be able to turn this feature on/off at
runtime. The majority of pages in PHP don't require such a feature but
it's a good thing for us to have (I've looked into this quite a bit in
the past).
As for the additional if() statements you can try and use branch
prediction hinting to the compiler in order to keep the non-gc version
fast.
Btw, one of the challenges I found in the past is knowing which hash
tables to actually traverse because some variables/hash tables in PHP
are saved outside the symbol tables. Are you really successful in
traversing all of them? Are you saving their addresses during
allocation? If so, how do you know they are intact when you traverse? Do
you start traversal only when we're supposedly in a stable state?
(re-entrancy problems?)
Also the additional size of zval can be quite an issue. In fact, with
the whole 16bit->32bit transition we made for refcount we reduced
performance. Maybe we can figure out a creative way to keep this down?
Do you mind posting a patch against your baseline to make it easier for
us to look at your work?
This can end up being a good thing if we manage to make it work well.
The problem with this stuff though is that the last 5-10% is 95% of the
work :)
Thanks for the effort!
Andi
-----Original Message-----
From: David [mailto:planetbeing@gmail.com]
Sent: Thursday, July 12, 2007 6:35 PM
To: 'Scott MacVicar'
Cc: Stas Malyshev; 'Sebastian Bergmann'; internals@lists.php.net
Subject: RE: [PHP-DEV] Mid-term Update for Cycle Collector
(Google Summer of Code Project)Could the garabage collector be started / stopped during script
executing from a userland function? zend_gc_enable/disable() in
addition to an ini option to enable it at script startup?Yes, that functionality can be added fairly easily. Some GC
code will still have to be executed (such as the if branches
that currently check whether the GC is enabled or not :-))
but performance may increase slightly (since there will no
longer be a function call for the GC to note down possible
cycles it detects). However, a drastic performance increase
will probably not be seen. It's very unlikely for the main
garbage collection cycle to be run if there are no cycles
present. (I can go into some technical details if you'd like).David
--
To
unsubscribe, visit: http://www.php.net/unsub.php
Hi,
-----Original Message-----
From: Andi Gutmans [mailto:andi@zend.com]
Sent: Thursday, July 12, 2007 9:06 PM
To: David; Scott MacVicar
Cc: Stas Malyshev; Sebastian Bergmann; internals@lists.php.net
Subject: RE: [PHP-DEV] Mid-term Update for Cycle Collector (Google
Summer of Code Project)Hi David,
Isn't finding cycles an NP-complete problem? :) How can you know there
are no cycles without actually looking for them?
That's right. =P I do traverse all objects for which it is possible to have
cycles. The time-saving observation is that a cycle that has no external
references to it can only be formed when the reference count for one of the
objects in the cycle is decremented to a non-zero value. I take those
objects and buffer them to look at later (the algorithm used to then mark
and sweep these cycles is more efficient when operated on a large buffer).
Is that what you mean?
As for the additional if() statements you can try and use branch
prediction hinting to the compiler in order to keep the non-gc version
fast.
I haven't heard of that before... Is it some sort of compiler optimization
setting? Could you recommend to me some reference that would describe how
this can be done? It might help a lot.
Btw, one of the challenges I found in the past is knowing which hash
tables to actually traverse because some variables/hash tables in PHP
are saved outside the symbol tables. Are you really successful in
traversing all of them? Are you saving their addresses during
allocation? If so, how do you know they are intact when you traverse?
Those are some tough questions! Here's what I decided to do in my
implementation: I decided to assume the cycles can only be formed with the
following types of nodes: zvals that are arrays, zvals that point to objects
that use the standard handler table, and objects in the objects store that
use the standard handler table. Of course, it's conceivable that some custom
object will form a reference cycle, but for custom objects, there's really
no good way for me to figure out what zvals they reference. I used to try to
use the get_properties handler, but for example, the SimpleXML extension
actually dynamically generates zvals on each invocation of get_properties.
On each collection cycle, I can traverse a node up to three times, and that
obviously would be pretty disgusting. I decided to just stick with standard
Zend objects.
For array nodes, I just traverse its hash table. For zvals that point to
objects, I just skip to the object it points to. For objects, I traverse its
hash table of properties.
How do I know these nodes are intact? I really don't, but I'm assuming if
they're still referenced, they are intact. Unless they are freed with a
reference count above zero (which does not appear to happen), that
assumption is true. I DO know that my starting points (the buffered nodes)
are intact because I remove them from my buffer when they are destroyed.
(Assumption here is that zval_dtor will be called on any zval before its
memory is freed).
Do
you start traversal only when we're supposedly in a stable state?
(re-entrancy problems?)
A bad state to start traversal in would be one in which traversal will make
me wander into addresses that are bad. I only start collection when the user
explicitly invokes it, or when I detect a possible root. I only check for a
possible root when a reference to an object in object store is removed, or a
couple places in zend_execute, when a zval is assigned to (and overwritten)
and zval_ptr_dtor. This should catch all cycles and these places are
controlled places inside the Zend engine which are stable.
Also the additional size of zval can be quite an issue. In fact, with
the whole 16bit->32bit transition we made for refcount we reduced
performance. Maybe we can figure out a creative way to keep this down?
I've been wracking my brain for a solution to this problem. There are two
primary possibilities I can think of right now. One is to have a new struct
called zval_gc. Every allocation (and zval on the stack and in temporary
storage) will actually allocate the zval_gc which will be exactly the same
as zval, only with the two things I need tacked on the end. Then Zend will
only copy the zval part around and leave the gc part alone (that part
shouldn't be copied anyway) and speed will theoretically be the same.
However, this solution has plenty of its own complications I'm trying to
work out.
The second possibility is to stash the gc variables in the things that the
zvals point to. For arrays, it'd be the hash table. For objects, well,
that's the problem. Multiple zvals can point to the same object. Still
thinking about this one.
Ruled out possibility: Storing the gc information in the unused bits of
refcount and/or is_ref. This one SOUNDS great but for some reason, bitwise
operations are REALLY, REALLY SLOW. This still eludes me. For some reason,
(zval->is_ref > 0x80) is much faster than (zval_isref & 0x80). zval->is_ref
= 0 is also much, much faster than zval->is_ref &= 0x7f. I know the second
one requires both a memory read and write, but should it be so slow as to
more than wipe out any difference eliminating three bytes from the zval
struct made? Derick suggested this one and I still don't understand why it
didn't work. It would've been perfect. If I sound a bit frustrated here, I
am. =P
Do you mind posting a patch against your baseline to make it easier for
us to look at your work?
Give me a day or so... I want to work out a patch that macroizes refcount
manipulation for PHP 5.2 in CVS, then I'll post a patch that patches against
THAT.
This can end up being a good thing if we manage to make it work well.
The problem with this stuff though is that the last 5-10% is 95% of the
work :)
Don't I know it!
David
I haven't heard of that before... Is it some sort of compiler optimization
setting? Could you recommend to me some reference that would describe how
this can be done? It might help a lot.
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
look for __builtin_expect.
See also EXPECTED and UNEXPECTED in zend_alloc.c
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
On the Graph test, maximum memory usage with unmodified PHP was 133.9
MB with an execution time of 8 seconds.
On the Graph test, maximum memory usage with gc was 51.6 MB with an
execution time of 9 seconds.
On the Template test, maxmium memory usage with unmodified PHP was 1.5
GB with an execution time of 30 seconds.
On the Template test, maxmium memory usage with gc was 67.3 MB with an
execution time of 1 minute.
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.
These tests were conducted on my dual core AMD X2 4400+ desktop with
./configure --with-gd --with-jpeg-dir --with-zlib. As you can see,
there is the classic time vs. memory trade-off.
My numbers are even slightly better (64bit Intel Core2 Duo @ 1.83Ghz).
The 1000 template tests without GC: 1m35s with 1540Mb
with GC: 1m57s with 73Mb
Derick
On the Template test, maxmium memory usage with unmodified PHP was 1.5
GB with an execution time of 30 seconds.
On the Template test, maxmium memory usage with gc was 67.3 MB with an
execution time of 1 minute.
...
As you can see,
there is the classic time vs. memory trade-off.
Hi David,
Very interesting stuff.
Is it possible that total server throughput with gc could actually be
better than unmodified php by preventing swapping at some higher
level of concurrent requests?
Best Regards,
Jeff
I have a patch adding the GC that's taken against the patch I
submitted earlier (converting accesses to refcount and is_ref to
macros).
Patch: http://web.pdx.edu/~way/frommacros.diff.txt
The two new files, zend_gc.c and zend_gc.h go in the Zend folder:
http://web.pdx.edu/~way/zend_gc.c
http://web.pdx.edu/~way/zend_gc.h
Is it possible that total server throughput with gc could actually be
better than unmodified php by preventing swapping at some higher
level of concurrent requests?
Yes, that's certainly true if it becomes the case that concurrent
requests take up so much memory that they induce swapping! If that
ever happens, the system will basically come to a complete standstill.
No matter what the GC does, it still will be effectively infinitely
faster than saving and retrieving information from the hard drive.
Disclaimer: I don't administrate web servers, but it seems to me that
it's unlikely that any server administrators would allow their
machines to swap on web requests, at least on dedicated machines
serving a lot of people. Performance, I imagine, would simply be
abysmal if the server had to swap on even a fraction of the requests
it gets! If that happens, a server administrator would just get a new
machine, or upgrade the memory on the existing one, or restrict the
maximum number of concurrent requests. Therefore, I imagine gc would
just delay the date of that memory upgrade, or allow more concurrent
clients to be served on one machine.
David
David Wang wrote:
Is it possible that total server throughput with gc could actually be
better than unmodified php by preventing swapping at some higher
level of concurrent requests?Yes, that's certainly true if it becomes the case that concurrent
requests take up so much memory that they induce swapping! If that
ever happens, the system will basically come to a complete standstill.
No matter what the GC does, it still will be effectively infinitely
faster than saving and retrieving information from the hard drive.
It is still extremely rare for code to have cyclic references. So while
GC could prevent swapping in the case of a malicious user, or in the
case of a coding mistake, I don't think the general case of typical code
running under normal circumstances would consume less memory with GC
enabled.
-Rasmus
David,
It is still extremely rare for code to have cyclic references. So
while
GC could prevent swapping in the case of a malicious user, or in the
case of a coding mistake, I don't think the general case of typical
code
running under normal circumstances would consume less memory with GC
enabled.
This is a good point.
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.
Since test suites repeatedly build up and tear down data structures,
it seems to me that these would be great stress tests for a garbage
collector, but not very representative of the patterns of an average
php application. Do you have any performance/memory usage numbers
for something more typical like a WordPress index page request or
some such thing?
Best Regards,
Jeff
David Wang wrote:
Is it possible that total server throughput with gc could actually be
better than unmodified php by preventing swapping at some higher
level of concurrent requests?Yes, that's certainly true if it becomes the case that concurrent
requests take up so much memory that they induce swapping! If that
ever happens, the system will basically come to a complete standstill.
No matter what the GC does, it still will be effectively infinitely
faster than saving and retrieving information from the hard drive.It is still extremely rare for code to have cyclic references. So while
GC could prevent swapping in the case of a malicious user, or in the
case of a coding mistake, I don't think the general case of typical code
running under normal circumstances would consume less memory with GC
enabled.
It all depends on the code of course - things that do parsing usually
tend to have cyclic structures (to keep a link to the "parent").
Derick
--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org