Hi,
I wrote some daemon scripts for a web crawler in PHP. Now I spent one
day to work around the growing memory consumtion of these scripts.
Since I use a MySQL connection, Syslog and many classes, I wanted to let
the script run a while before restarting. So the scripts had lines like:
$i = 1000;
while( --$i )
{
while( run() ); // run returns false if there are no pending jobs
gc_collect_cycles()
;
// echo get_memory_usage();
sleep( 20 );
}
I thought, that most memory should be cleaned up after each termination
of the run() function, but the reported memory usage grew rapidly.
-
Was it a mistake to use PHP for such scripts? What language should I've been
choosing instead? -
--enable-debug did report some small leaks, but much less then the
consumption grow. -
My suspicion is, that either pdo_mysql or dom are not freeing their
used memory during a request. Is that possible? -
It would help me a lot, if I could easily get an overview, what is
consuming my memory. Which Zvals are known? Which extension / line did
allocate how much memory? -
I'm using http://libslack.org/daemon now to control the script
execution. This gave me the idea for a special kind of PHP binary
"php-daemon":- php-daemon is an executable that restarts a given php script in a
loop - php-daemon can be combined with apc/xcache to store the bytecode
Unfortunately I'm still to much a newbie to write this myself.
- php-daemon is an executable that restarts a given php script in a
Thanks for your time reading this,
Thomas Koch, http://www.koch.ro
YMC AG, http://www.ymc.ch
First, this is not an internals question. But, I would be not many
people on the general list could help you.
I wrote some daemon scripts for a web crawler in PHP. Now I spent one
day to work around the growing memory consumtion of these scripts.
I have PHP daemons running for weeks that don't use up a lot of memory.
You have to be careful what you do in a PHP script. I would guess you
are running Linux. Assuming you are, the linux memory system never
frees allocated memory back to the OS. Not until the process dies at
least. So, if any part of your code ever needs a lot of memory, that
will be allocated to the process until it ends.
- Was it a mistake to use PHP for such scripts? What language should I've been
choosing instead?
Any language has its caveats. And you can waste (leak is a strong word)
memory with any language.
- My suspicion is, that either pdo_mysql or dom are not freeing their
used memory during a request. Is that possible?
Are you telling pdo to free its results? If not, that is bad
programming. You have to unset vars and free db result sets yourself to
ensure they are not building up. PHP uses a lazy garbage collector that
is optimized for short lived web scripts. You have to overcome that
when working with PHP in a non-web environment.
- I'm using http://libslack.org/daemon now to control the script
execution. This gave me the idea for a special kind of PHP binary
"php-daemon":
- php-daemon is an executable that restarts a given php script in a
loop- php-daemon can be combined with apc/xcache to store the bytecode
Unfortunately I'm still to much a newbie to write this myself.
My standard way to handle scripts that need a lot of memory is to use
the pcntl functions to fork children that run and do the work. They can
end after a certain time or memory usage. Again, you have to make sure
the main, parent script is well written and does not waste file
descriptors, connections, etc.
Hmm, maybe I will write a blog post about this.
--
Brian Moon
Senior Web Engineer
When you care enough to spend the very least.
http://dealnews.com/
Since I use a MySQL connection, Syslog and many classes, I wanted to let
the script run a while before restarting. So the scripts had lines like:$i = 1000;
while( --$i )
{
while( run() ); // run returns false if there are no pending jobs
gc_collect_cycles()
;
// echo get_memory_usage();
sleep( 20 );
}
- Was it a mistake to use PHP for such scripts? What language should I've
been
choosing instead?
Almost.
Try to compile your very-own-version of php, using configure argument
--disable-zend-memory-manager
also, disable all php extensions that your scripts do not depend on.
- --enable-debug did report some small leaks, but much less then the
consumption grow.
zend memory manager won't release memory to the system. Memory held by ZMM
quickly becomes fragmented into smaller chunks than needed, so it takes from
the system
more and more "fresh" memory. Definitely this approach is not for your
situation (php daemons).
-jv
Almost.
Try to compile your very-own-version of php, using configure argument
--disable-zend-memory-manager
There is no such configure option since ages.
Zend MM can be disabled by setting USE_ZEND_ALLOC env var to 0, but only when
--enable-debug is used (since nobody is supposed to do that except for debugging
purposes).
also, disable all php extensions that your scripts do not depend on.
That won't affect memory usage very much.
- --enable-debug did report some small leaks, but much less then the
consumption grow.
zend memory manager won't release memory to the system.
That's plain wrong.
Of course it does free() memory whenever it thinks the memory should be free()-ed.
Though that doesn't guarantee that the OS is be able to reuse this memory.
--
Wbr,
Antony Dovgal
Almost.
Try to compile your very-own-version of php, using configure argument
--disable-zend-memory-managerThere is no such configure option since ages.
seems you're flying too high.
RHEL5 still ships php5.1.6 and it has this option and it undefines
ZEND_USE_ZEND_ALLOC.
For the newer versions (5.2.x) it can be set at the environment, which is
not that good btw.
That won't affect memory usage very much.
I gives appox 2MB per php-cli instance. Sometimes it's good enough, if for
example you need to run 200 instances.
- --enable-debug did report some small leaks, but much less then the
consumption grow.
zend memory manager won't release memory to the system.
That's plain wrong.
Of course it does free() memory whenever it thinks the memory should be
free()-ed.
Though that doesn't guarantee that the OS is be able to reuse this memory.
Antony, I think you're wrong with this. ZMM does never release memory to the
system. Instead it puts free chunks into its own free memory chain.
See how _zend_mm_free_int() works in 5.2.5 and it's the only function that
is called from for example _efree, at least when ZMM is enabled.
-regards,
jv.
ZMM does never release memory to the system.
my appologies, "never" is a too strong word for that. I'd use "almost
never".
Indeed ZMM does release segment to the OS if all the memory blocks in it are
freed.
I'd only add that with an average script, it won't happen.
jv
Almost.
Try to compile your very-own-version of php, using configure argument
--disable-zend-memory-managerThere is no such configure option since ages.
seems you're flying too high.
RHEL5 still ships php5.1.6 and it has this option and it undefines
ZEND_USE_ZEND_ALLOC.
PHP 5.2.0 has been released more than 2 (two) years ago, which should be
enough for a distro to catch up, don't you think so?
For the newer versions (5.2.x) it can be set at the environment, which is
not that good btw.
Please elaborate.
That won't affect memory usage very much.
I gives appox 2MB per php-cli instance. Sometimes it's good enough, if for
example you need to run 200 instances.
Shared libs are, well, shared among the processes, so I don't think it's "per instance".
--
Wbr,
Antony Dovgal
seems you're flying too high.
RHEL5 still ships php5.1.6 and it has this option and it undefines
ZEND_USE_ZEND_ALLOC.PHP 5.2.0 has been released more than 2 (two) years ago, which should be
enough for a distro to catch up, don't you think so?
Perhaps. But what they ship with RHEL5 is up to Redhat.
For the newer versions (5.2.x) it can be set at the environment, which is
not that good btw.Please elaborate.
Ok. With default memory allocation strategy I can run relatively short
scripts very efficiently and that's why I'd not change it for the web site.
On the other hand, for long-running scripts that I run using php-cli, I'd
always prefer resource saving priorities.
In particular, I'd prefer ZMM not to "cache" memory blocks.
With only one system-wide setting, I have no choise. Either all php will use
ZMM or neither.
That won't affect memory usage very much.
I gives appox 2MB per php-cli instance. Sometimes it's good enough, if
for
example you need to run 200 instances.Shared libs are, well, shared among the processes, so I don't think it's
"per instance".
That's truth. Code segments will be shared. But even with 2MB of code, CPU
uses 512 entries in TLB for each process and it's clear worthless stuff.
In case of hundred instances running simultaneously, it may save some time.
Also, please add time for the initialization of the modules that won't be
used.
Anyway, if php is to be recompiled from the sources in order to build
long-term running daemons, the configure options should be carefully tuned.
-jv
Hi,
albeit not as a daemon, we've successfully developed a Crawler in PHP
within our company. It can run for hours without a leak, if I remember
correctly it's peak memory consumption is below 64MB. However we're
crawling only a small amount of URLs, just around 10.000 .
As Brian mentioned: free your database resources, unset unused
variables. We've had one major rewrite which, besides re-architecturing
the whole thing for plugin/modularity, involved auditing every step to
make sure resources are properly freed. Usually a PHP developer doesn't
have to pay much attention to it because of the wide-used process-fork
model (but I guess I don't need to tell you that :).
But you'll get often beaten by PHP itself:
it has quite some leaks and finding/tracking them done costs time,
sometimes requires skill at the C level of PHP to properly
understand/diagnose things and if you were (unfortunately) successful in
identifying a PHP problem you've report a bug, preferable attach provide
a patch/workaround.
For example, we've had to fight http://bugs.php.net/bug.php?id=43450 .
Tracking this PHP problem was quite time consuming, involving multiple
developers, etc. Luckily we could work around this, but it was pretty
annoying.
We actually planned to release this as open source, donate it to Zend,
whatever. Legally it's done within the company, just no one had the time
for the publishing process, going over things, etc. :/
As a sidenote: We've hit the current limit of our Crawler implementation
in PHP itself: we can't to parallel fetching/processing of URLs in a
efficient manner. You can get things quick running in PHP, but doing
things with style and a serious architecture hits its limits. We've gone
to Java for such cases, made sense for us anyway as we had to move away
from Zend_Search_Lucene as it had performance problems with our index
where as Lucene/Solr was still mostly bored. Will be interesting to see
if http://code.google.com/p/marjory/ can handle this. Ops, off-topic.
HTH,
- Markus