Hey internals,
I was wondering whether or how PCRE regular expression get parsed and
cached, and I found this answer on Stack Overflow:
http://stackoverflow.com/questions/209906/compile-regex-in-php
Do I understand this correctly, that:
-
All regular expressions are hashed and the compiled expression is cached
internally between calls. -
The /S modifier applies more optimizations during compile, but caching
works the same way. -
Compiled expressions are not cached between requests.
If so, this seems far from optimal.
Every unique regular expression needs to be compiled during every request,
right?
So with FPM, or with long-running apps, we're missing an opportunity to
optimize by caching between requests.
And with long-running apps, we're caching every dynamic regular expression,
which could harm (memory overhead) more than help.
Ideally, shouldn't we have (like some engines/languages) a switch to enable
caching?
The run-time can't know if a given regular expression is dynamic or static,
can it? It's just a string either way - so without a switch, you're either
committing compiled dynamic expressions to the cache unnecessarily, and/or
missing an opportunity to cache between requests in long-running apps or
under FPM.
I think most apps use quite a lot of regular expression for validation etc.
so maybe there's a missed optimization opportunity here?
Cheers,
Rasmus Schultz
Hello,
We can also use a LRU caching strategy with a pre-defined (or
user-defined) number of expressions to keep in the cache. This would
also be a good idea to track number of times a regular expression is
used. If this number reaches a certain threshold, then we could
automatically re-compile it with the S
modifier.
I agree there is a missed optimisation here. I share your feeling.
Regards.
Hey internals,
I was wondering whether or how PCRE regular expression get parsed and
cached, and I found this answer on Stack Overflow:http://stackoverflow.com/questions/209906/compile-regex-in-php
Do I understand this correctly, that:
All regular expressions are hashed and the compiled expression is cached
internally between calls.The /S modifier applies more optimizations during compile, but caching
works the same way.Compiled expressions are not cached between requests.
If so, this seems far from optimal.
Every unique regular expression needs to be compiled during every request,
right?So with FPM, or with long-running apps, we're missing an opportunity to
optimize by caching between requests.And with long-running apps, we're caching every dynamic regular expression,
which could harm (memory overhead) more than help.Ideally, shouldn't we have (like some engines/languages) a switch to enable
caching?The run-time can't know if a given regular expression is dynamic or static,
can it? It's just a string either way - so without a switch, you're either
committing compiled dynamic expressions to the cache unnecessarily, and/or
missing an opportunity to cache between requests in long-running apps or
under FPM.I think most apps use quite a lot of regular expression for validation etc.
so maybe there's a missed optimization opportunity here?Cheers,
Rasmus Schultz
Hey internals,
I was wondering whether or how PCRE regular expression get parsed and
cached, and I found this answer on Stack Overflow:http://stackoverflow.com/questions/209906/compile-regex-in-php
Do I understand this correctly, that:
- All regular expressions are hashed and the compiled expression is cached
internally between calls.
Correct.
- The /S modifier applies more optimizations during compile, but caching
works the same way.
Yes. Additionally, if PCRE JIT is enabled (which it usually is on PHP 7) we
always study, independently of whether /S was specified.
- Compiled expressions are not cached between requests.
Compiled expressions are cached between requests. However, they are not
shared between processes (I'm not even sure if that's possible.)
The cache invalidation strategy is FIFO. More specifically, whenever the
cache fills up, we discard the first 1/8 cached regular expressions.
Nikita
If so, this seems far from optimal.
Every unique regular expression needs to be compiled during every request,
right?So with FPM, or with long-running apps, we're missing an opportunity to
optimize by caching between requests.And with long-running apps, we're caching every dynamic regular expression,
which could harm (memory overhead) more than help.Ideally, shouldn't we have (like some engines/languages) a switch to enable
caching?The run-time can't know if a given regular expression is dynamic or static,
can it? It's just a string either way - so without a switch, you're either
committing compiled dynamic expressions to the cache unnecessarily, and/or
missing an opportunity to cache between requests in long-running apps or
under FPM.I think most apps use quite a lot of regular expression for validation etc.
so maybe there's a missed optimization opportunity here?Cheers,
Rasmus Schultz
Thanks for clearing this up, Nikita :-)
Compiled expressions are cached between requests. However, they are not
shared between processes
That sounds good - it would likely very difficult (if not impossible) and
likely would make only a marginal performance difference. All good, I think.
The cache invalidation strategy is FIFO
Okay, so, do cache entries move "to the front of the line" when there's a
cache hit?
I'm thinking, otherwise, a large number of dynamic expressions (which
don't, or only minimally, benefit from caching) might actually push static
expressions (which do benefit) out of the cache.
Just wondering. If invalidation of static expressions does happen in that
case, it likely doesn't impact most apps - I don't think dynamic
expressions are very common in most apps, though I have used it
occasionally for things like dictionary search...
Hey internals,
I was wondering whether or how PCRE regular expression get parsed and
cached, and I found this answer on Stack Overflow:http://stackoverflow.com/questions/209906/compile-regex-in-php
Do I understand this correctly, that:
- All regular expressions are hashed and the compiled expression is
cached
internally between calls.Correct.
- The /S modifier applies more optimizations during compile, but caching
works the same way.
Yes. Additionally, if PCRE JIT is enabled (which it usually is on PHP 7)
we always study, independently of whether /S was specified.
- Compiled expressions are not cached between requests.
Compiled expressions are cached between requests. However, they are not
shared between processes (I'm not even sure if that's possible.)The cache invalidation strategy is FIFO. More specifically, whenever the
cache fills up, we discard the first 1/8 cached regular expressions.Nikita
If so, this seems far from optimal.
Every unique regular expression needs to be compiled during every request,
right?So with FPM, or with long-running apps, we're missing an opportunity to
optimize by caching between requests.And with long-running apps, we're caching every dynamic regular
expression,
which could harm (memory overhead) more than help.Ideally, shouldn't we have (like some engines/languages) a switch to
enable
caching?The run-time can't know if a given regular expression is dynamic or
static,
can it? It's just a string either way - so without a switch, you're either
committing compiled dynamic expressions to the cache unnecessarily, and/or
missing an opportunity to cache between requests in long-running apps or
under FPM.I think most apps use quite a lot of regular expression for validation
etc.
so maybe there's a missed optimization opportunity here?Cheers,
Rasmus Schultz