Hi guys,
I'm looking for the maintainer of the preg_match
function in PHP. There appears to be a nasty leak in
its most basic functionality, and being a very
fundamental function of PHP (not to mention that my
long running scripts use it a lot ;) I thought I'd try
and go the direct route. Sorry, I don't know C or I'd
try to fix it myself! Considering its nature I
thought best to at least make a post to the list
rather than it get lost in the bugs pile (the original
(much less refined) report from someone else had been
sitting a long time).
<?
while (1) {
$body = "any string";
$rand = "any different
strings".mt_rand(0,mt_getrandmax());
$pattern = "/$rand/";
preg_match($pattern, $body, $match);
}
?>
http://bugs.php.net/bug.php?id=28513
This leaks 50MB per second on my PHP5.0.0,5.0.1. It
is probably the similar bug reported in PHP4. If you
have any suggestions, please let me know!
Sincere regards,
Jason.
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush
This is not a bug, but rather expected behavior. PCRE extension caches
compiled regular expressions so that subsequent runs of the same regex do not
need to perform the compilation step. In your example you are generating new
regex in an unterminated loop, so it's no surprise that PHP eventually
exhausts the available memory and terminates.
Ilia
Hi guys,
I'm looking for the maintainer of the preg_match
function in PHP. There appears to be a nasty leak in
its most basic functionality, and being a very
fundamental function of PHP (not to mention that my
long running scripts use it a lot ;) I thought I'd try
and go the direct route. Sorry, I don't know C or I'd
try to fix it myself! Considering its nature I
thought best to at least make a post to the list
rather than it get lost in the bugs pile (the original
(much less refined) report from someone else had been
sitting a long time).<?
while (1) {
$body = "any string";
$rand = "any different
strings".mt_rand(0,mt_getrandmax());
$pattern = "/$rand/";
preg_match($pattern, $body, $match);
}
?>http://bugs.php.net/bug.php?id=28513
This leaks 50MB per second on my PHP5.0.0,5.0.1. It
is probably the similar bug reported in PHP4. If you
have any suggestions, please let me know!Sincere regards,
Jason.
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush
Shouldn't you be able to disable that cache though?
John
This is not a bug, but rather expected behavior. PCRE extension caches
compiled regular expressions so that subsequent runs of the same regex do not
need to perform the compilation step. In your example you are generating new
regex in an unterminated loop, so it's no surprise that PHP eventually
exhausts the available memory and terminates.Ilia
Hi guys,
I'm looking for the maintainer of the preg_match
function in PHP. There appears to be a nasty leak in
its most basic functionality, and being a very
fundamental function of PHP (not to mention that my
long running scripts use it a lot ;) I thought I'd try
and go the direct route. Sorry, I don't know C or I'd
try to fix it myself! Considering its nature I
thought best to at least make a post to the list
rather than it get lost in the bugs pile (the original
(much less refined) report from someone else had been
sitting a long time).<?
while (1) {
$body = "any string";
$rand = "any different
strings".mt_rand(0,mt_getrandmax());
$pattern = "/$rand/";
preg_match($pattern, $body, $match);
}
?>http://bugs.php.net/bug.php?id=28513
This leaks 50MB per second on my PHP5.0.0,5.0.1. It
is probably the similar bug reported in PHP4. If you
have any suggestions, please let me know!Sincere regards,
Jason.
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush
Ilia Alshanetsky ilia@prohost.org writes:
This is not a bug, but rather expected behavior. PCRE extension caches
compiled regular expressions so that subsequent runs of the same regex do
not need to perform the compilation step. In your example you are generating
new regex in an unterminated loop, so it's no surprise that PHP eventually
exhausts the available memory and terminates.
Since PHP never knows what the user might do or how long the application might
run for, perhaps the cache, a useful feature in this case, should have a
maximum cache size. If the maximum cache size is exceeded, the oldest
(ideally) cached compiled regexp would be deleted from the cache.
It's probably reasonable to keep only a very small number of compiled regular
expressions in cache. Intuition, at least, tells me that if a regular
expression isn't reused "soon" the compile time is likely not a big deal.
I'm guessing that the regular expressions are maintained in such an order that
the requested one can be found quickly (via a hash? binary search?). Given
my earlier assumption that only a small number really need be cached, they
could instead be kept in FIFO order, and a simple linear search of the (small)
list done to see if the requested regexp is cached. When it's not found, the
one at tail of the queue (assuming the queue is full) would be deleted to make
room for a new one which would be pushed onto the head of the queue.
Since my assumption is based purely on intuition, is there any indication from
"real life" that in fact, keeping many regexps in the cache is truly
beneficial?
Cheers,
Derrell
FIFO, with the addition that one that's re-used, will be moved to the
beginning of the list, would (I think) greatly benefit the cache hit-rate.
Just my $0.02
Ron
"Derrell Lipman" Derrell.Lipman@UnwiredUniverse.com wrote in message
news:8yc4a2u3.fsf@random.internal...
Ilia Alshanetsky ilia@prohost.org writes:
This is not a bug, but rather expected behavior. PCRE extension caches
compiled regular expressions so that subsequent runs of the same regex
do
not need to perform the compilation step. In your example you are
generating
new regex in an unterminated loop, so it's no surprise that PHP
eventually
exhausts the available memory and terminates.Since PHP never knows what the user might do or how long the application
might
run for, perhaps the cache, a useful feature in this case, should have a
maximum cache size. If the maximum cache size is exceeded, the oldest
(ideally) cached compiled regexp would be deleted from the cache.It's probably reasonable to keep only a very small number of compiled
regular
expressions in cache. Intuition, at least, tells me that if a regular
expression isn't reused "soon" the compile time is likely not a big deal.I'm guessing that the regular expressions are maintained in such an order
that
the requested one can be found quickly (via a hash? binary search?).
Given
my earlier assumption that only a small number really need be cached, they
could instead be kept in FIFO order, and a simple linear search of the
(small)
list done to see if the requested regexp is cached. When it's not found,
the
one at tail of the queue (assuming the queue is full) would be deleted to
make
room for a new one which would be pushed onto the head of the queue.Since my assumption is based purely on intuition, is there any indication
from
"real life" that in fact, keeping many regexps in the cache is truly
beneficial?Cheers,
Derrell