Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:98405 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 66699 invoked from network); 5 Mar 2017 13:32:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 Mar 2017 13:32:06 -0000 Authentication-Results: pb1.pair.com header.from=rasmus@mindplay.dk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=rasmus@mindplay.dk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain mindplay.dk from 209.85.220.180 cause and error) X-PHP-List-Original-Sender: rasmus@mindplay.dk X-Host-Fingerprint: 209.85.220.180 mail-qk0-f180.google.com Received: from [209.85.220.180] ([209.85.220.180:36549] helo=mail-qk0-f180.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B9/15-10274-3531CB85 for ; Sun, 05 Mar 2017 08:32:05 -0500 Received: by mail-qk0-f180.google.com with SMTP id 1so118531654qkl.3 for ; Sun, 05 Mar 2017 05:32:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mindplay-dk.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=RRg5G18P1GHhIam4IQq4YhjBF+n2cz0d9KVjRgZslk8=; b=1WFw6yXaeu17R0l0MxSIy7QWnbiGZSiL38OHo7h22koGGQptg74AqjW1ummWhmLFfr 7t5tL8+h3aKzvhvm8ND9nG33/t1yHe1B98bdsqXeSkfVt3xMDz8IfKlysOjAMeaj0Tde H5Cv8LDDxRzm6vroXNVGcU/99Ch9WNX8kJHelL1PD35hwuBNIyLCpqaZKc741JYPu8fL GylEgPJ1mJyDbb7mXfcNCMl5DJFSGsbD44ihuxFeS7h5LNg/fiWbG9LPC1SaTdek1L+i /Mck6ywEeI5z5FHdKp8akmS9n8yh6N58FiXU0F6Cm3hE3beCKpnlII9oFqJwB9pJeRpB PXeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=RRg5G18P1GHhIam4IQq4YhjBF+n2cz0d9KVjRgZslk8=; b=AvLVopA6i9jFDusgOb2K2epoWzgEBJ7i9xiIgy1kFFQ5uTB4nT2/6tGpNOIr0/wZol 1oeWNeuJMOSwy8tKlfFVMdEX5qrEb4OstRye+3R3QL6Oknj/m9acutsQd/P1/6CiNKEB PMYGeTA92v3nLdaLi4ggwg7YXkAo7PrIlGNk86brq2G6pOVZ+nFsrIkz52GCs9KMmhYU uOYhxXvj3TuI8xfNRvlwySqvAaZTVYEAaE2RUNAGBvd8C8OfXjG1IQvwT+hEfPc00lqD MC1ezo0YORHLPJcG+EVhey267fFNIPyKeUleWKfv1z7I7YMtjtoCMHUjJ1bLYumuJVlT eSZg== X-Gm-Message-State: AMke39nD2k3Ge/LYBf5JHUcihYIwppIevDPMXJ7vD+nCJLTN0XErswaP0Dm5BclN3f+YioU/2GAyH/2ZDGHVYA== X-Received: by 10.55.204.11 with SMTP id r11mr11796229qki.169.1488720720055; Sun, 05 Mar 2017 05:32:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.237.37.60 with HTTP; Sun, 5 Mar 2017 05:31:59 -0800 (PST) In-Reply-To: References: Date: Sun, 5 Mar 2017 14:31:59 +0100 Message-ID: To: Nikita Popov Cc: PHP internals Content-Type: multipart/alternative; boundary=001a1149a43eb087b70549fbcd29 Subject: Re: [PHP-DEV] PCRE caching From: rasmus@mindplay.dk (Rasmus Schultz) --001a1149a43eb087b70549fbcd29 Content-Type: text/plain; charset=UTF-8 Thanks for clearing this up, Nikita :-) > Compiled expressions are cached between requests. However, they are not shared between processes That sounds good - it would likely very difficult (if not impossible) and likely would make only a marginal performance difference. All good, I think. > The cache invalidation strategy is FIFO Okay, so, do cache entries move "to the front of the line" when there's a cache hit? I'm thinking, otherwise, a large number of dynamic expressions (which don't, or only minimally, benefit from caching) might actually push static expressions (which do benefit) out of the cache. Just wondering. If invalidation of static expressions does happen in that case, it likely doesn't impact most apps - I don't think dynamic expressions are very common in most apps, though I have used it occasionally for things like dictionary search... On Fri, Mar 3, 2017 at 10:28 PM, Nikita Popov wrote: > On Wed, Mar 1, 2017 at 4:35 PM, Rasmus Schultz wrote: > >> Hey internals, >> >> I was wondering whether or how PCRE regular expression get parsed and >> cached, and I found this answer on Stack Overflow: >> >> http://stackoverflow.com/questions/209906/compile-regex-in-php >> >> Do I understand this correctly, that: >> >> 1. All regular expressions are hashed and the compiled expression is >> cached >> internally between calls. >> > > Correct. > > 2. The /S modifier applies more optimizations during compile, but caching >> works the same way. >> > > Yes. Additionally, if PCRE JIT is enabled (which it usually is on PHP 7) > we always study, independently of whether /S was specified. > > >> 3. Compiled expressions are not cached between requests. >> > > Compiled expressions are cached between requests. However, they are not > shared between processes (I'm not even sure if that's possible.) > > The cache invalidation strategy is FIFO. More specifically, whenever the > cache fills up, we discard the first 1/8 cached regular expressions. > > Nikita > > >> If so, this seems far from optimal. >> >> Every unique regular expression needs to be compiled during every request, >> right? >> >> So with FPM, or with long-running apps, we're missing an opportunity to >> optimize by caching between requests. >> >> And with long-running apps, we're caching every dynamic regular >> expression, >> which could harm (memory overhead) more than help. >> >> Ideally, shouldn't we have (like some engines/languages) a switch to >> enable >> caching? >> >> The run-time can't know if a given regular expression is dynamic or >> static, >> can it? It's just a string either way - so without a switch, you're either >> committing compiled dynamic expressions to the cache unnecessarily, and/or >> missing an opportunity to cache between requests in long-running apps or >> under FPM. >> >> I think most apps use quite a lot of regular expression for validation >> etc. >> so maybe there's a missed optimization opportunity here? >> >> Cheers, >> Rasmus Schultz >> > > --001a1149a43eb087b70549fbcd29--