Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:12329 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 47063 invoked by uid 1010); 24 Aug 2004 17:00:21 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 46965 invoked from network); 24 Aug 2004 17:00:20 -0000 Received: from unknown (HELO amber.vis-av.com) (66.92.75.243) by pb1.pair.com with SMTP; 24 Aug 2004 17:00:20 -0000 Received: (qmail 13890 invoked from network); 24 Aug 2004 17:00:20 -0000 Received: from unknown (HELO random.?none?) (192.168.1.9) by amber.internal with SMTP; 24 Aug 2004 17:00:20 -0000 Received: (nullmailer pid 5742 invoked by uid 0); Tue, 24 Aug 2004 17:00:20 -0000 To: ilia@prohost.org Cc: internals@lists.php.net, Jason References: <20040824122859.1826.qmail@web11002.mail.yahoo.com> <200408241153.56659.ilia@prohost.org> Reply-To: Derrell.Lipman@UnwiredUniverse.com Date: Tue, 24 Aug 2004 13:00:20 -0400 In-Reply-To: <200408241153.56659.ilia@prohost.org> (Ilia Alshanetsky's message of "Tue, 24 Aug 2004 11:53:56 -0400") Message-ID: <8yc4a2u3.fsf@random.internal> Lines: 32 User-Agent: Gnus/5.090006 (Oort Gnus v0.06) XEmacs/21.4 (Common Lisp, i386-debian-linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: [PHP-DEV] preg leak From: Derrell.Lipman@UnwiredUniverse.com Ilia Alshanetsky writes: > This is not a bug, but rather expected behavior. PCRE extension caches > compiled regular expressions so that subsequent runs of the same regex do > not need to perform the compilation step. In your example you are generating > new regex in an unterminated loop, so it's no surprise that PHP eventually > exhausts the available memory and terminates. Since PHP never knows what the user might do or how long the application might run for, perhaps the cache, a useful feature in this case, should have a maximum cache size. If the maximum cache size is exceeded, the oldest (ideally) cached compiled regexp would be deleted from the cache. It's probably reasonable to keep only a very small number of compiled regular expressions in cache. Intuition, at least, tells me that if a regular expression isn't reused "soon" the compile time is likely not a big deal. I'm guessing that the regular expressions are maintained in such an order that the requested one can be found quickly (via a hash? binary search?). Given my earlier assumption that only a small number really need be cached, they could instead be kept in FIFO order, and a simple linear search of the (small) list done to see if the requested regexp is cached. When it's not found, the one at tail of the queue (assuming the queue is full) would be deleted to make room for a new one which would be pushed onto the head of the queue. Since my assumption is based purely on intuition, is there any indication from "real life" that in fact, keeping many regexps in the cache is truly beneficial? Cheers, Derrell