Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:12336 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 81940 invoked by uid 1010); 24 Aug 2004 21:01:41 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 81263 invoked by uid 1007); 24 Aug 2004 21:01:37 -0000 Message-ID: <20040824210137.81249.qmail@pb1.pair.com> To: internals@lists.php.net References: <20040824122859.1826.qmail@web11002.mail.yahoo.com><200408241153.56659.ilia@prohost.org> <8yc4a2u3.fsf@random.internal> Date: Tue, 24 Aug 2004 23:01:34 +0200 Lines: 54 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1437 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 X-Posted-By: 212.238.144.71 Subject: Re: [PHP-DEV] preg leak From: r.korving@xit.nl ("Ron Korving") FIFO, with the addition that one that's re-used, will be moved to the beginning of the list, would (I think) greatly benefit the cache hit-rate. Just my $0.02 Ron "Derrell Lipman" wrote in message news:8yc4a2u3.fsf@random.internal... > Ilia Alshanetsky writes: > > > This is not a bug, but rather expected behavior. PCRE extension caches > > compiled regular expressions so that subsequent runs of the same regex do > > not need to perform the compilation step. In your example you are generating > > new regex in an unterminated loop, so it's no surprise that PHP eventually > > exhausts the available memory and terminates. > > Since PHP never knows what the user might do or how long the application might > run for, perhaps the cache, a useful feature in this case, should have a > maximum cache size. If the maximum cache size is exceeded, the oldest > (ideally) cached compiled regexp would be deleted from the cache. > > It's probably reasonable to keep only a very small number of compiled regular > expressions in cache. Intuition, at least, tells me that if a regular > expression isn't reused "soon" the compile time is likely not a big deal. > > I'm guessing that the regular expressions are maintained in such an order that > the requested one can be found quickly (via a hash? binary search?). Given > my earlier assumption that only a small number really need be cached, they > could instead be kept in FIFO order, and a simple linear search of the (small) > list done to see if the requested regexp is cached. When it's not found, the > one at tail of the queue (assuming the queue is full) would be deleted to make > room for a new one which would be pushed onto the head of the queue. > > Since my assumption is based purely on intuition, is there any indication from > "real life" that in fact, keeping many regexps in the cache is truly > beneficial? > > Cheers, > > Derrell