Hi!
PHP7 supports PCRE's JIT compilation of patterns by default, which
mostly works fine. However, there are issues when the matching exceeds
the JIT stack limit, see bug #70110[1].
I'm not sure how to solve this best. Basically, I see two possible
solutions: either we fall back to non JIT matching, if pcre_exec() fails
with PCRE_ERROR_JIT_STACKLIMIT, or we use a custom JIT stack and make
its size a configurable ini setting (similar to pcre.backtrack_limit),
and raise E_WARNING
if the matching fails due to limited stack size.
Thoughts?
[1] https://bugs.php.net/bug.php?id=70110
--
Christoph M. Becker
Hi Christoph,
There are ways to dymacally increase the stack. Apache's modphp can use the
apache config. Fpm, fcgi or cli can change it on windows (afair it is in
the doc). It is possible too on Linux with ulimit.
An alternative (not a big fan) would be to use setrlimit with an ini
setting.
Cheers,
Pierre
Hi!
PHP7 supports PCRE's JIT compilation of patterns by default, which
mostly works fine. However, there are issues when the matching exceeds
the JIT stack limit, see bug #70110[1].I'm not sure how to solve this best. Basically, I see two possible
solutions: either we fall back to non JIT matching, if pcre_exec() fails
with PCRE_ERROR_JIT_STACKLIMIT, or we use a custom JIT stack and make
its size a configurable ini setting (similar to pcre.backtrack_limit),
and raiseE_WARNING
if the matching fails due to limited stack size.Thoughts?
[1] https://bugs.php.net/bug.php?id=70110
--
Christoph M. Becker
Hi Pierre,
Pierre Joye wrote:
Hi Christoph,
There are ways to dymacally increase the stack. Apache's modphp can use the
apache config. Fpm, fcgi or cli can change it on windows (afair it is in
the doc). It is possible too on Linux with ulimit.An alternative (not a big fan) would be to use setrlimit with an ini
setting.
Ah, I should have explained better that libpcre's JIT support is
implemented as own virtual machine, and libpcre has functions to use a
custom stack which is allocated on the heap. If these functions are not
used (as it's now in ext/pcre) a fixed 32K on the machine stack are used.[1]
Anyhow, it is not possible to change the stack (size) during a call to
pcre_exec() (unless, maybe, libpcre would be modified), and it is not
possible (to my knowledge) to calculate the required stack size in
advance. The required stack size depends on pattern and subject.
So basically, we have to call pcre_exec(), and if it fails due to
limited stack size, we either fail as well, or we try again. In the
latter case we could either use a bigger stack or do without JIT. In
the former case we should at least give users a setting to choose the
desired stack size, which is quite comparable to pcre.backtrack_limit
and pcre.recursion_limit.
Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.
[1] http://www.pcre.org/original/doc/html/pcrejit.html#SEC8
--
Christoph M. Becker
Hi Pierre,
Pierre Joye wrote:
Hi Christoph,
There are ways to dymacally increase the stack. Apache's modphp can use
the
apache config. Fpm, fcgi or cli can change it on windows (afair it is in
the doc). It is possible too on Linux with ulimit.An alternative (not a big fan) would be to use setrlimit with an ini
setting.Ah, I should have explained better that libpcre's JIT support is
implemented as own virtual machine, and libpcre has functions to use a
custom stack which is allocated on the heap. If these functions are not
used (as it's now in ext/pcre) a fixed 32K on the machine stack are
used.[1]Anyhow, it is not possible to change the stack (size) during a call to
pcre_exec() (unless, maybe, libpcre would be modified), and it is not
possible (to my knowledge) to calculate the required stack size in
advance. The required stack size depends on pattern and subject.So basically, we have to call pcre_exec(), and if it fails due to
limited stack size, we either fail as well, or we try again. In the
latter case we could either use a bigger stack or do without JIT. In
the former case we should at least give users a setting to choose the
desired stack size, which is quite comparable to pcre.backtrack_limit
and pcre.recursion_limit.Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.
Only to be sure, ou mean it is a different issue than the common one (dozen
of bugs for pcre about it)? Solved by increasing the stack size of the
process or binary?
Also an ini setting may not be portable. That's something to double check :)
Cheers,
Pierre
--
Christoph M. Becker
Pierre Joye wrote:
Ah, I should have explained better that libpcre's JIT support is
implemented as own virtual machine, and libpcre has functions to use a
custom stack which is allocated on the heap. If these functions are not
used (as it's now in ext/pcre) a fixed 32K on the machine stack are
used.[1]Anyhow, it is not possible to change the stack (size) during a call to
pcre_exec() (unless, maybe, libpcre would be modified), and it is not
possible (to my knowledge) to calculate the required stack size in
advance. The required stack size depends on pattern and subject.So basically, we have to call pcre_exec(), and if it fails due to
limited stack size, we either fail as well, or we try again. In the
latter case we could either use a bigger stack or do without JIT. In
the former case we should at least give users a setting to choose the
desired stack size, which is quite comparable to pcre.backtrack_limit
and pcre.recursion_limit.Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.Only to be sure, ou mean it is a different issue than the common one (dozen
of bugs for pcre about it)? Solved by increasing the stack size of the
process or binary?
I'm pretty sure that is another issue. The JIT functionality of libpcre
is available only for PHP 7, and the issue reported in #70110 only
happens when pcre.jit is enabled. And to my knowledge #70110 is the
first reported bug that's caused by the JIT.
Also an ini setting may not be portable. That's something to double check :)
ACK. However, AFAICT pcre_jit_stack_alloc() and pcre_assign_jit_stack()
are supposed to be portable for systems where JIT support is available.
--
Christoph M. Becker
<snip great explanation, thanks>
Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.
I'm not a big fan of that, although it's at least in the spirit of
what configuration settings are meant to be used for.
What if we added the PCRE_ERROR_JIT_STACKLIMIT error constant to those
exposed to userland so that it's more easily noticed via
preg_last_error()
, and adding a modifier that can be used to disable
the JIT on a per-pattern basis (by setting PCRE_NO_START_OPTIMIZE,
which admittedly disables other stuff too, but at least the regex will
run)? At least then users could check the error when the regex fails
and re-run the regex without the JIT if they chose to.
How likely is the average user to hit this, do you think?
Adam
<snip great explanation, thanks>
Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.I'm not a big fan of that, although it's at least in the spirit of
what configuration settings are meant to be used for.What if we added the PCRE_ERROR_JIT_STACKLIMIT error constant to those
exposed to userland so that it's more easily noticed via
preg_last_error()
, and adding a modifier that can be used to disable
the JIT on a per-pattern basis (by setting PCRE_NO_START_OPTIMIZE,
which admittedly disables other stuff too, but at least the regex will
run)? At least then users could check the error when the regex fails
and re-run the regex without the JIT if they chose to.How likely is the average user to hit this, do you think?
I am not sure we have accurate numbers or can have. It really depends on
the expressions. A simple one can exhaust the stack as much as a apparently
more complex expressions.
I have to say that the default stack we use now on windows has reduced the
amount of bug reports. I do not think we can eliminate it and increase it
too much may bring some bad side effects as well.
Adam Harvey wrote:
<snip great explanation, thanks>
Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.I'm not a big fan of that, although it's at least in the spirit of
what configuration settings are meant to be used for.What if we added the PCRE_ERROR_JIT_STACKLIMIT error constant to those
exposed to userland so that it's more easily noticed via
preg_last_error()
, and adding a modifier that can be used to disable
the JIT on a per-pattern basis (by setting PCRE_NO_START_OPTIMIZE,
which admittedly disables other stuff too, but at least the regex will
run)? At least then users could check the error when the regex fails
and re-run the regex without the JIT if they chose to.
Yes, that would be an option. I'm not sure if we need to touch
PCRE_NO_START_OPTIMIZE (or if it would even help); simply not calling
pcre_study() respectively not passing the studied extra data (second
argument) to pcre_exec() should be sufficient.
How likely is the average user to hit this, do you think?
I don't know. The libpcre man pages say:
| JIT uses far less memory for recursion than the interpretive code,
| and a maximum stack size of 512K to 1M should be more than enough for
| any pattern.
Of course, a hard-coded 512K or even more would be rather much. The
default stack size (32K) for the regex in the test script of #70110
would be sufficient, if the subject is repeated only about 2000 times,
but that doesn't say much (the preg_match()
is rather contrived).
Anyway, that shows that it's not only the pattern that's relevant for
the needed JIT stack space, but also the subject string.
--
Christoph M. Becker
Christoph Becker wrote:
Adam Harvey wrote:
<snip great explanation, thanks>
Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.I'm not a big fan of that, although it's at least in the spirit of
what configuration settings are meant to be used for.What if we added the PCRE_ERROR_JIT_STACKLIMIT error constant to those
exposed to userland so that it's more easily noticed via
preg_last_error()
, and adding a modifier that can be used to disable
the JIT on a per-pattern basis (by setting PCRE_NO_START_OPTIMIZE,
which admittedly disables other stuff too, but at least the regex will
run)? At least then users could check the error when the regex fails
and re-run the regex without the JIT if they chose to.Yes, that would be an option. I'm not sure if we need to touch
PCRE_NO_START_OPTIMIZE (or if it would even help); simply not calling
pcre_study() respectively not passing the studied extra data (second
argument) to pcre_exec() should be sufficient.
Correction: we simply should not pass PCRE_STUDY_JIT_COMPILE to
pcre_study() if we want to disable JIT support for individual regexps.[1]
How likely is the average user to hit this, do you think?
I don't know. The libpcre man pages say:
| JIT uses far less memory for recursion than the interpretive code,
| and a maximum stack size of 512K to 1M should be more than enough for
| any pattern.Of course, a hard-coded 512K or even more would be rather much. The
default stack size (32K) for the regex in the test script of #70110
would be sufficient, if the subject is repeated only about 2000 times,
but that doesn't say much (thepreg_match()
is rather contrived).Anyway, that shows that it's not only the pattern that's relevant for
the needed JIT stack space, but also the subject string.
[1]
https://github.com/php/php-src/blob/php-7.0.0beta2/ext/pcre/php_pcre.c#L436-L440
--
Christoph M. Becker
<snip great explanation, thanks>
Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.I'm not a big fan of that, although it's at least in the spirit of
what configuration settings are meant to be used for.What if we added the PCRE_ERROR_JIT_STACKLIMIT error constant to those
exposed to userland so that it's more easily noticed via
preg_last_error()
, and adding a modifier that can be used to disable
the JIT on a per-pattern basis (by setting PCRE_NO_START_OPTIMIZE,
which admittedly disables other stuff too, but at least the regex will
run)? At least then users could check the error when the regex fails
and re-run the regex without the JIT if they chose to.
But this might mean that patterns which previously worked, because no JIT was used, suddenly fail in existing code with a new error constant. Which I guess is a BC break.
<snip great explanation, thanks>
Therefore I tend to prefer a new ini setting (say,
pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.I'm not a big fan of that, although it's at least in the spirit of
what configuration settings are meant to be used for.What if we added the PCRE_ERROR_JIT_STACKLIMIT error constant to those
exposed to userland so that it's more easily noticed via
preg_last_error()
, and adding a modifier that can be used to disable
the JIT on a per-pattern basis (by setting PCRE_NO_START_OPTIMIZE,
which admittedly disables other stuff too, but at least the regex will
run)? At least then users could check the error when the regex fails
and re-run the regex without the JIT if they chose to.But this might mean that patterns which previously worked, because no JIT
was used, suddenly fail in existing code with a new error constant. Which I
guess is a BC break.
Such things happened in the past with normal updates, both ways. Simple
code changes, fixes etc may have led to changes in the stack usage. I can
remember issues with some specific updates.
<snip great explanation, thanks>
Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which
there are already so many.I'm not a big fan of that, although it's at least in the spirit of
what configuration settings are meant to be used for.What if we added the PCRE_ERROR_JIT_STACKLIMIT error constant to those
exposed to userland so that it's more easily noticed via
preg_last_error()
, and adding a modifier that can be used to disable
the JIT on a per-pattern basis (by setting PCRE_NO_START_OPTIMIZE,
which admittedly disables other stuff too, but at least the regex will
run)? At least then users could check the error when the regex fails
and re-run the regex without the JIT if they chose to.But this might mean that patterns which previously worked, because no JIT was used, suddenly fail in existing code with a new error constant. Which I guess is a BC break.
Maybe it's best to change the default of pcre.jit to "0"?
--
Christoph M. Becker
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Thursday, July 23, 2015 8:47 PM
To: Pierre Joye pierre.php@gmail.com
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitHi Pierre,
Pierre Joye wrote:
Hi Christoph,
There are ways to dymacally increase the stack. Apache's modphp can
use the apache config. Fpm, fcgi or cli can change it on windows
(afair it is in the doc). It is possible too on Linux with ulimit.An alternative (not a big fan) would be to use setrlimit with an ini
setting.Ah, I should have explained better that libpcre's JIT support is implemented as
own virtual machine, and libpcre has functions to use a custom stack which is
allocated on the heap. If these functions are not used (as it's now in ext/pcre) a
fixed 32K on the machine stack are used.[1]Anyhow, it is not possible to change the stack (size) during a call to
pcre_exec() (unless, maybe, libpcre would be modified), and it is not possible (to
my knowledge) to calculate the required stack size in advance. The required
stack size depends on pattern and subject.So basically, we have to call pcre_exec(), and if it fails due to limited stack size,
we either fail as well, or we try again. In the latter case we could either use a
bigger stack or do without JIT. In the former case we should at least give users a
setting to choose the desired stack size, which is quite comparable to
pcre.backtrack_limit and pcre.recursion_limit.Therefore I tend to prefer a new ini setting (say, pcre.jitstack_limit).
That would mean, however, to add yet another ini setting, of which there are
already so many.
This looks like an extremely fragile topic because it depends on how much stack is available to an executable. A custom JIT stack can behave more stable but cannot be resized. And the main issue is that the JIT stack size, machine stack size and ext/pcre cache size are completely unrelated terms. For example, a binary can have not enough stack, but the custom JIT stack using mmap/VirtualAlloc could even succeed, but then pcre_exec will be executed and overflow the machine stack. We can never know which one is exhausted first - the one for the JIT compilation or the other one for the execution, or vice versa.
Generally, moving the JIT compilation away from the machine stack and increasing the PCRE cache size should be more stable against this . However it's an edge case. IMHO we should not do it just to fix some crazy usage. Users who need it might just turn off JIT. Normal usage seems not to be affected, say loading some sane functional script, which FE is done by any benchmark with WP, Symfony, etc. But moving JIT compilation away from the machine stack wil lpossibly affect it.
Regards
Anatol
Hi Anatol,
Anatol Belski wrote:
This looks like an extremely fragile topic because it depends on how
much stack is available to an executable. A custom JIT stack can
behave more stable but cannot be resized. And the main issue is that
the JIT stack size, machine stack size and ext/pcre cache size are
completely unrelated terms. For example, a binary can have not enough
stack, but the custom JIT stack using mmap/VirtualAlloc could even
succeed, but then pcre_exec will be executed and overflow the machine
stack. We can never know which one is exhausted first - the one for
the JIT compilation or the other one for the execution, or vice
versa.Generally, moving the JIT compilation away from the machine stack and
increasing the PCRE cache size should be more stable against this .
However it's an edge case. IMHO we should not do it just to fix some
crazy usage. Users who need it might just turn off JIT. Normal usage
seems not to be affected, say loading some sane functional script,
which FE is done by any benchmark with WP, Symfony, etc. But moving
JIT compilation away from the machine stack wil lpossibly affect it.
Beforehand, I'm not suggesting to change anything regarding our PCRE
cache (PCRE_G(pcre_cache)); this seems to be fine as it is, and is
indeed not related to this topic.
Now please consider the following simple expression:
preg_match('/^(foo)+$/', str_repeat('foo', $n))
This will fail (i.e. yield FALSE) independently of pcre.jit for large
enough $n. However, a user can change pcre.recursion_limit what will
affect the $n limit (the expression will fail for smaller or larger $n),
if pcre.jit=0. If pcre.jit=1 the user can't influence this boundary in
any way, currently.
And maybe even worse, with pcre.jit=0 the boundary is 50,000, but with
pcre.jit=1 it is only 1,366. Of course, one can argue that this is a
contrived example, and that such usage is crazy, but why do we have a
default pcre.recursion_limit of 100,000 then? A recursion_limit of
2,734 would be sufficient to have a boundary of $n == 1,366.
All in all, as this example already suggests, classic execution of
matching is done by recursive calls (using normal stack frames), while
JIT execution of matching is iterative, using a special JIT stack.[1] I
don't think it is justified to give users a setting to adjust for the
former, but not for the latter (except to disable JIT, albeit JIT might
bring quite some boost especially for such cases).
As we're pretty late in the game for PHP 7.0, it might be best to
postpone a new ini setting or other changes to PHP 7.1, but at the very
least I would introduce a new error constant, say
PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a more meaningful result
when calling preg_last_error()
than PHP_PCRE_INTERNAL_ERROR. And it
seems to be appropriate to add a note to UPGRADING that pcre.jit=1 may
cause some preg_*() to fail which would work with pcre.jit=0.
[1] http://www.pcre.org/original/doc/html/pcrestack.html
[2]
https://github.com/cmb69/php-src/commit/1546e3025403bb7ebed233c71fbc299fd584c9e3
--
Christoph M. Becker
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 12:09 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitHi Anatol,
Anatol Belski wrote:
This looks like an extremely fragile topic because it depends on how
much stack is available to an executable. A custom JIT stack can
behave more stable but cannot be resized. And the main issue is that
the JIT stack size, machine stack size and ext/pcre cache size are
completely unrelated terms. For example, a binary can have not enough
stack, but the custom JIT stack using mmap/VirtualAlloc could even
succeed, but then pcre_exec will be executed and overflow the machine
stack. We can never know which one is exhausted first - the one for
the JIT compilation or the other one for the execution, or vice versa.Generally, moving the JIT compilation away from the machine stack and
increasing the PCRE cache size should be more stable against this .
However it's an edge case. IMHO we should not do it just to fix some
crazy usage. Users who need it might just turn off JIT. Normal usage
seems not to be affected, say loading some sane functional script,
which FE is done by any benchmark with WP, Symfony, etc. But moving
JIT compilation away from the machine stack wil lpossibly affect it.Beforehand, I'm not suggesting to change anything regarding our PCRE cache
(PCRE_G(pcre_cache)); this seems to be fine as it is, and is indeed not related to
this topic.Now please consider the following simple expression:
preg_match('/^(foo)+$/', str_repeat('foo', $n))
This will fail (i.e. yield FALSE) independently of pcre.jit for large enough $n.
However, a user can change pcre.recursion_limit what will affect the $n limit
(the expression will fail for smaller or larger $n), if pcre.jit=0. If pcre.jit=1 the
user can't influence this boundary in any way, currently.And maybe even worse, with pcre.jit=0 the boundary is 50,000, but with
pcre.jit=1 it is only 1,366. Of course, one can argue that this is a contrived
example, and that such usage is crazy, but why do we have a default
pcre.recursion_limit of 100,000 then? A recursion_limit of
2,734 would be sufficient to have a boundary of $n == 1,366.
The 100000 is an empirical value by my guess. It regards to the default stack sizes on different platforms and to an average pattern. There is no prediction that PCRE will not exhaust the stack available to the binary.
All in all, as this example already suggests, classic execution of matching is done
by recursive calls (using normal stack frames), while JIT execution of matching is
iterative, using a special JIT stack.[1] I don't think it is justified to give users a
setting to adjust for the former, but not for the latter (except to disable JIT,
albeit JIT might bring quite some boost especially for such cases).
JIT without custom stack will use "32k of machine stack", by the doc. We currently don't really give users a choice to choose between iterative and recursive PCRE execution, it's a compile time decision. And this is again because an iterative execution will be safer, but will affect an average case with unnecessary thrift. In this JIT case, one could give this choice, you're right, but we should evaluate it carefully. Fe what happens to the PHP pattern cache if JIT stack is exhausted?
What I was more precisely talking about is like
If (false === preg_match(",evil pattern,", ...)) {
Ini_set("pcre.jit", 0);
// retry
}
So received error - no JIT. And no additional logic/overhead in ext/pcre.
Maybe custom JIT were eligible in this case, but according to the PCRE doc it can easily bring issues as the global JIT memory can't be resized/migrated just by one's finger click. Say one would be forced to either use the custom JIT stack or default JIT stack from the start on. This can end up with the over complication using a custom JIT stack for a particular pattern.
As we're pretty late in the game for PHP 7.0, it might be best to postpone a new
ini setting or other changes to PHP 7.1, but at the very least I would introduce a
new error constant, say PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a
more meaningful result when callingpreg_last_error()
than
PHP_PCRE_INTERNAL_ERROR. And it seems to be appropriate to add a note to
UPGRADING that pcre.jit=1 may cause some preg_*() to fail which would work
with pcre.jit=0.
Yes, instead of returning false one could return an explicit error, or indicate in any other ways. Thanks for bringing it up and maybe we'll have more info after your further investigation. But documentation is appropriate in any cases.
Regards
Anatol
Hi Anatol,
Anatol Belski wrote:
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 12:09 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitNow please consider the following simple expression:
preg_match('/^(foo)+$/', str_repeat('foo', $n))
This will fail (i.e. yield FALSE) independently of pcre.jit for large enough $n.
However, a user can change pcre.recursion_limit what will affect the $n limit
(the expression will fail for smaller or larger $n), if pcre.jit=0. If pcre.jit=1 the
user can't influence this boundary in any way, currently.And maybe even worse, with pcre.jit=0 the boundary is 50,000, but with
pcre.jit=1 it is only 1,366. Of course, one can argue that this is a contrived
example, and that such usage is crazy, but why do we have a default
pcre.recursion_limit of 100,000 then? A recursion_limit of
2,734 would be sufficient to have a boundary of $n == 1,366.The 100000 is an empirical value by my guess. It regards to the default stack sizes on different platforms and to an average pattern. There is no prediction that PCRE will not exhaust the stack available to the binary.
ACK. However, the user is able to tune this value according to the
environment. And yes, I'm aware that pcre.recursion_limit may be
necessary to prevent a stack overflow, what can't happen with JIT
compilation (if the JIT stack is exhausted, pcre_exec() returns
gracefully), but nonetheless giving users some control over the size of
the JIT stack seems to be appropriate (at least in the long run, aka.
PHP 7.1).
All in all, as this example already suggests, classic execution of matching is done
by recursive calls (using normal stack frames), while JIT execution of matching is
iterative, using a special JIT stack.[1] I don't think it is justified to give users a
setting to adjust for the former, but not for the latter (except to disable JIT,
albeit JIT might bring quite some boost especially for such cases).JIT without custom stack will use "32k of machine stack", by the doc. We currently don't really give users a choice to choose between iterative and recursive PCRE execution, it's a compile time decision.
Well, one can enforce non JIT execution by setting pcre.jit=0.
pcre.jit=1 indeed may not enforce JIT execution, depending on the
platform[1] and the libpcre version.
And this is again because an iterative execution will be safer, but will affect an average case with unnecessary thrift. In this JIT case, one could give this choice, you're right, but we should evaluate it carefully. Fe what happens to the PHP pattern cache if JIT stack is exhausted?
That is unrelated. It is possible to use a single custom PCRE JIT stack
for all patterns (actually, pcre_exec() calls). Only if multi-threading
comes into play, each thread should have its own stack (that's not
absolutely necessary, but it simplifies things a lot).[2] Therefore
having a single PCRE_G(jit_stack) would be sufficient. I've committed a
naive draft to my php-src fork[3].
What I was more precisely talking about is like
If (false === preg_match(",evil pattern,", ...)) {
Ini_set("pcre.jit", 0);
// retry
}So received error - no JIT. And no additional logic/overhead in ext/pcre.
Yes, that is a viable way for userland (even if I would suggest to test
against pcre_last_error(); preg_match()
may return FALSE
for several
error conditions, such as PHP_PCRE_BAD_UTF8_ERROR, and then retrying
with pcre.jit=0 wouldn't help). We could as well put the same logic
into ext/pcre (but I do not suggest to do so, even though I mentioned
this as option in the OP).
Maybe custom JIT were eligible in this case, but according to the PCRE doc it can easily bring issues as the global JIT memory can't be resized/migrated just by one's finger click. Say one would be forced to either use the custom JIT stack or default JIT stack from the start on. This can end up with the over complication using a custom JIT stack for a particular pattern.
I would use the same JIT stack for all patterns (whether that is the
default JIT stack, or a custom one). The libpcre manual states[2]:
| You may safely use the same JIT stack for more than one pattern
| (either by assigning directly or by callback), as long as the
| patterns are all matched sequentially in the same thread.
To my knowledge in ext/pcre all patterns are matched sequentially, but
I'll double-check, and I'll inquire on the libpcre mailing list.
As we're pretty late in the game for PHP 7.0, it might be best to postpone a new
ini setting or other changes to PHP 7.1, but at the very least I would introduce a
new error constant, say PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a
more meaningful result when callingpreg_last_error()
than
PHP_PCRE_INTERNAL_ERROR. And it seems to be appropriate to add a note to
UPGRADING that pcre.jit=1 may cause some preg_*() to fail which would work
with pcre.jit=0.Yes, instead of returning false one could return an explicit error, or indicate in any other ways.
I would not suggest to change the return value of preg_match()
and
friends. FALSE
seems to be appropriate here. Instead I suggest to add
a new error constant which would be returned from preg_last_error()
.
Thanks for bringing it up and maybe we'll have more info after your further investigation. But documentation is appropriate in any cases.
I'll make a respective PR after having checked back with the libpcre devs.
[1] http://www.pcre.org/original/doc/html/pcrejit.html#SEC3
[2] http://www.pcre.org/original/doc/html/pcrejit.html#SEC8
[3]
https://github.com/cmb69/php-src/commit/78ad15d589c25b5d10aa68f5b419333f4040c16a
--
Christoph M. Becker
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 2:25 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitHi Anatol,
Anatol Belski wrote:
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 12:09 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitNow please consider the following simple expression:
preg_match('/^(foo)+$/', str_repeat('foo', $n))
This will fail (i.e. yield FALSE) independently of pcre.jit for large enough $n.
However, a user can change pcre.recursion_limit what will affect the
$n limit (the expression will fail for smaller or larger $n), if
pcre.jit=0. If pcre.jit=1 the user can't influence this boundary in any way,
currently.And maybe even worse, with pcre.jit=0 the boundary is 50,000, but
with
pcre.jit=1 it is only 1,366. Of course, one can argue that this is a
contrived example, and that such usage is crazy, but why do we have a
default pcre.recursion_limit of 100,000 then? A recursion_limit of
2,734 would be sufficient to have a boundary of $n == 1,366.The 100000 is an empirical value by my guess. It regards to the default stack
sizes on different platforms and to an average pattern. There is no prediction
that PCRE will not exhaust the stack available to the binary.ACK. However, the user is able to tune this value according to the environment.
And yes, I'm aware that pcre.recursion_limit may be necessary to prevent a
stack overflow, what can't happen with JIT compilation (if the JIT stack is
exhausted, pcre_exec() returns gracefully), but nonetheless giving users some
control over the size of the JIT stack seems to be appropriate (at least in the
long run, aka.
PHP 7.1).
Yep, I see your point about the config option. That would only work at MINIT stage. Though we also don't give many other options to the user land, like the pattern cache size. Which is IMHO fine because users would normally not need such nternal detail.
All in all, as this example already suggests, classic execution of
matching is done by recursive calls (using normal stack frames),
while JIT execution of matching is iterative, using a special JIT
stack.[1] I don't think it is justified to give users a setting to
adjust for the former, but not for the latter (except to disable JIT, albeit JIT
might bring quite some boost especially for such cases).JIT without custom stack will use "32k of machine stack", by the doc. We
currently don't really give users a choice to choose between iterative and
recursive PCRE execution, it's a compile time decision.Well, one can enforce non JIT execution by setting pcre.jit=0.
pcre.jit=1 indeed may not enforce JIT execution, depending on the platform[1]
and the libpcre version.
But we're discussing only the cases where JIT is available, that's what it is about. No need to care about it otherwise.
And this is again because an iterative execution will be safer, but will affect an
average case with unnecessary thrift. In this JIT case, one could give this choice,
you're right, but we should evaluate it carefully. Fe what happens to the PHP
pattern cache if JIT stack is exhausted?That is unrelated. It is possible to use a single custom PCRE JIT stack for all
patterns (actually, pcre_exec() calls). Only if multi-threading comes into play,
each thread should have its own stack (that's not absolutely necessary, but it
simplifies things a lot).[2] Therefore having a single PCRE_G(jit_stack) would be
sufficient. I've committed a naive draft to my php-src fork[3].
If it's going to happen, of course it should be a single JIT stack for all patterns per thread. PCRE_G is required, otherwise some locking has to be implemented manually. But what I merely wanted to illustrate is exactly what you say - there are unrelated things which can go wrong but possibly affect/cause each other.
I would use the same JIT stack for all patterns (whether that is the default JIT
stack, or a custom one). The libpcre manual states[2]:| You may safely use the same JIT stack for more than one pattern
| (either by assigning directly or by callback), as long as the patterns
| are all matched sequentially in the same thread.To my knowledge in ext/pcre all patterns are matched sequentially, but I'll
double-check, and I'll inquire on the libpcre mailing list.
Sure, just after MINIT it cannot be reassigned/resized/changed. Thus, given something like pcre.jit_stack_size INI were implemented, if it's requested under 32kb - nothing should be done, machine stack should be used. If it were over it - then custom JIT stack based on mmap/VirtualAlloc. But frankly I don't see an essential difference to the handling in the user space. Yes, it might be an advantage to still use JIT, but it moves away from the machine stack.
And IMHO as it's an edge case, fixing it were fine but should not be done on the cost of much over complication and affecting normal usage. From what is to see right now - sane usage is not affected while improved by JIT. If you're in the mood to add a note to upgrading, it'll be probably just file for now - disabling JIT retains the full PHP5 compatible behavior.
As we're pretty late in the game for PHP 7.0, it might be best to
postpone a new ini setting or other changes to PHP 7.1, but at the
very least I would introduce a new error constant, say
PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a more meaningful
result when callingpreg_last_error()
than PHP_PCRE_INTERNAL_ERROR.
And it seems to be appropriate to add a note to UPGRADING that
pcre.jit=1 may cause some preg_*() to fail which would work with pcre.jit=0.Yes, instead of returning false one could return an explicit error, or indicate in
any other ways.I would not suggest to change the return value of
preg_match()
and friends.
FALSE
seems to be appropriate here. Instead I suggest to add a new error
constant which would be returned frompreg_last_error()
.
Yep, sounds feasible. At least users can be informed about this kind of error, seems a constant is feasible for 7.0.
Btw I've just noticed that the snippet I've posted earlier won't currently work.
If (false === preg_match(",evil pattern,", ...)) {
If (OUT_OF_JIT_STACK == pcre_last_error()) {
ini_set("pcre.jit", 0);
// retry
}
}
The retry would fail even when JIT is disabled on demand. This is actually rather unexpected, maybe something in PCRE JIT state cannot be recovered once JIT compilation has failed. But actually, even without running into debugging, it is also a bad sign telling that the behavior can get complicated (
Running for the INI option and some custom JIT stack functionality needs more brainstorming and test. Please ping when you have an initial implementation, I'd be glad to help testing, etc. anyway.
Regards
Anatol
Hi Anatol,
Anatol Belski wrote:
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 2:25 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI would not suggest to change the return value of
preg_match()
and friends.
FALSE
seems to be appropriate here. Instead I suggest to add a new error
constant which would be returned frompreg_last_error()
.Yep, sounds feasible. At least users can be informed about this kind of error, seems a constant is feasible for 7.0.
That would be nice. I'll make a respective PR.
Btw I've just noticed that the snippet I've posted earlier won't currently work.
If (false === preg_match(",evil pattern,", ...)) {
If (OUT_OF_JIT_STACK == pcre_last_error()) {
ini_set("pcre.jit", 0);
// retry
}
}The retry would fail even when JIT is disabled on demand. This is actually rather unexpected, maybe something in PCRE JIT state cannot be recovered once JIT compilation has failed. But actually, even without running into debugging, it is also a bad sign telling that the behavior can get complicated (
Indeed! That's caused by our PCRE cache. The cached regex is alread
studied with JIT info, as all that happens in
pcre_get_compiled-regex_cache[1]. So switching pcre.jit at runtime may
not have the desired effect. I'm not sure if that has to be regarded as
bug, and whether we should do something about it. One solution might be
to clear the regex cache whenever the ini setting changes. I'll have a
closer look.
Running for the INI option and some custom JIT stack functionality needs more brainstorming and test. Please ping when you have an initial implementation, I'd be glad to help testing, etc. anyway.
All in all it seems that we probably should have an RFC for that. I'll
consider to write one, but there's no need to hurry – too late for 7.0
anyway. :)
[1]
https://github.com/php/php-src/blob/php-7.0.0beta2/ext/pcre/php_pcre.c#L257
--
Christoph M. Becker
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 6:02 PM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitHi Anatol,
Anatol Belski wrote:
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 2:25 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI would not suggest to change the return value of
preg_match()
and friends.
FALSE
seems to be appropriate here. Instead I suggest to add a new
error constant which would be returned frompreg_last_error()
.Yep, sounds feasible. At least users can be informed about this kind of error,
seems a constant is feasible for 7.0.That would be nice. I'll make a respective PR.
Thanks.
Btw I've just noticed that the snippet I've posted earlier won't currently work.
If (false === preg_match(",evil pattern,", ...)) {
If (OUT_OF_JIT_STACK == pcre_last_error()) {
ini_set("pcre.jit", 0);
// retry
}
}The retry would fail even when JIT is disabled on demand. This is
actually rather unexpected, maybe something in PCRE JIT state cannot
be recovered once JIT compilation has failed. But actually, even
without running into debugging, it is also a bad sign telling that the
behavior can get complicated (Indeed! That's caused by our PCRE cache. The cached regex is alread studied
with JIT info, as all that happens in pcre_get_compiled-regex_cache[1]. So
switching pcre.jit at runtime may not have the desired effect. I'm not sure if that
has to be regarded as bug, and whether we should do something about it. One
solution might be to clear the regex cache whenever the ini setting changes. I'll
have a closer look.
Good catch. Maybe another option could be implementing the PHP pattern above
In C. Say
- check what pcre_exec delivered
- if it’s the JIT stack error - look for the pattern in the cache
- if it's found - remove it from the cache
- recompile and cache
- exec a non JIT pattern version
However this would silently ignore JIT, not transparent for user. Maybe another
option could be introducing a user space function to clear a particular pattern from
the cache. When expunging the full cache, it'll for sure have a negative performance
effect when the cache isn't empty. But it's actually logic as one can base it on the idea -
either JIT is enabled for everything, or it is not (for everything).
Running for the INI option and some custom JIT stack functionality needs
more brainstorming and test. Please ping when you have an initial
implementation, I'd be glad to help testing, etc. anyway.All in all it seems that we probably should have an RFC for that. I'll consider to
write one, but there's no need to hurry – too late for 7.0 anyway. :)
Yep, the whole needs a good consideration about how to do it best without a big
impact on the user side. Also probably it could be worth it to check how PCRE2
behaves in this cases, maybe it were worthwhile to upgrade to PCRE2 for 7.1.
Regards
Anatol
Hi Anatol,
Anatol Belski wrote:
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 6:02 PM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitHi Anatol,
Anatol Belski wrote:
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Saturday, July 25, 2015 2:25 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI would not suggest to change the return value of
preg_match()
and friends.
FALSE
seems to be appropriate here. Instead I suggest to add a new
error constant which would be returned frompreg_last_error()
.Yep, sounds feasible. At least users can be informed about this kind of error,
seems a constant is feasible for 7.0.That would be nice. I'll make a respective PR.
Thanks.Btw I've just noticed that the snippet I've posted earlier won't currently work.
If (false === preg_match(",evil pattern,", ...)) {
If (OUT_OF_JIT_STACK == pcre_last_error()) {
ini_set("pcre.jit", 0);
// retry
}
}The retry would fail even when JIT is disabled on demand. This is
actually rather unexpected, maybe something in PCRE JIT state cannot
be recovered once JIT compilation has failed. But actually, even
without running into debugging, it is also a bad sign telling that the
behavior can get complicated (Indeed! That's caused by our PCRE cache. The cached regex is alread studied
with JIT info, as all that happens in pcre_get_compiled-regex_cache[1]. So
switching pcre.jit at runtime may not have the desired effect. I'm not sure if that
has to be regarded as bug, and whether we should do something about it. One
solution might be to clear the regex cache whenever the ini setting changes. I'll
have a closer look.Good catch. Maybe another option could be implementing the PHP pattern above
In C. Say
- check what pcre_exec delivered
- if it’s the JIT stack error - look for the pattern in the cache
- if it's found - remove it from the cache
- recompile and cache
- exec a non JIT pattern version
However this would silently ignore JIT, not transparent for user. Maybe another
option could be introducing a user space function to clear a particular pattern from
the cache. When expunging the full cache, it'll for sure have a negative performance
effect when the cache isn't empty. But it's actually logic as one can base it on the idea -
either JIT is enabled for everything, or it is not (for everything).
Definitely interesting ideas. However, currently I'm leaning towards a
full-fledged solution in PHP 7.1 (whatever that may be), and to only add
the error constant for PHP 7.0. Users can set pcre.jit=0 in php.ini and
not change the setting during runtime, if necessary.
FWIW, some hours ago I've implemented the clean whole cache solution:
https://github.com/cmb69/php-src/commit/c41d78046da48db2fa8f5bd82fd6ac58099288f8.
Running for the INI option and some custom JIT stack functionality needs
more brainstorming and test. Please ping when you have an initial
implementation, I'd be glad to help testing, etc. anyway.All in all it seems that we probably should have an RFC for that. I'll consider to
write one, but there's no need to hurry – too late for 7.0 anyway. :)Yep, the whole needs a good consideration about how to do it best without a big
impact on the user side. Also probably it could be worth it to check how PCRE2
behaves in this cases, maybe it were worthwhile to upgrade to PCRE2 for 7.1.
Indeed, offering support for PCRE2 should be considered for 7.1. And
also support of the JIT fast path API:
http://www.pcre.org/original/doc/html/pcrejit.html#SEC11.
Regards
--
Christoph M. Becker
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Monday, July 27, 2015 12:06 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitDefinitely interesting ideas. However, currently I'm leaning towards a full-
fledged solution in PHP 7.1 (whatever that may be), and to only add the error
constant for PHP 7.0. Users can set pcre.jit=0 in php.ini and not change the
setting during runtime, if necessary.FWIW, some hours ago I've implemented the clean whole cache solution:
<https://github.com/cmb69/php-
src/commit/c41d78046da48db2fa8f5bd82fd6ac58099288f8>.
I wrote this script to test it https://gist.github.com/weltling/cd7f8127378989cca87c , but as expected - cache reset make a significant impact when the cache is full. Though, maybe as it's an exceptional situation, it were the simplest solution. Because it does not affect the normal usage, maybe it's ok for 7.0. Maybe we should give this flexibility, still in doubt. But the constant IMHO should be done anyway.
Running for the INI option and some custom JIT stack functionality
needs
more brainstorming and test. Please ping when you have an initial
implementation, I'd be glad to help testing, etc. anyway.All in all it seems that we probably should have an RFC for that.
I'll consider to write one, but there's no need to hurry – too late
for 7.0 anyway. :)Yep, the whole needs a good consideration about how to do it best
without a big impact on the user side. Also probably it could be worth
it to check how PCRE2 behaves in this cases, maybe it were worthwhile to
upgrade to PCRE2 for 7.1.Indeed, offering support for PCRE2 should be considered for 7.1. And also
support of the JIT fast path API:
http://www.pcre.org/original/doc/html/pcrejit.html#SEC11.
Yes, maybe we'll have to support both for a while because PCRE2 is quite new so it'll need quite some time distros catch up.
Thanks
Anatol
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Monday, July 27, 2015 12:06 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitDefinitely interesting ideas. However, currently I'm leaning towards a full-
fledged solution in PHP 7.1 (whatever that may be), and to only add the error
constant for PHP 7.0. Users can set pcre.jit=0 in php.ini and not change the
setting during runtime, if necessary.FWIW, some hours ago I've implemented the clean whole cache solution:
<https://github.com/cmb69/php-
src/commit/c41d78046da48db2fa8f5bd82fd6ac58099288f8>.I wrote this script to test it https://gist.github.com/weltling/cd7f8127378989cca87c , but as expected - cache reset make a significant impact when the cache is full. Though, maybe as it's an exceptional situation, it were the simplest solution. Because it does not affect the normal usage, maybe it's ok for 7.0. Maybe we should give this flexibility, still in doubt. But the constant IMHO should be done anyway.
IMHO we shouldn't introduce the cache clearing. As your test script
shows, the performance might suffer considerably when switching pcre.jit
in a running script. Furthermore cached regexps currently in use (e.g.
via preg_replace_callback) can't be cleared, what might lead to
confusion (in very rare cases, though).
As it is now users can set pcre.jit=0 and so will retain full
compatibility with PHP 5. If they're running with pcre.jit=1, they're
able to get a meaningful error (PREG_JIT_STACKLIMIT_ERROR), if the
default stack is not large enough (what indeed might rarely be the case).
If we leave it as this, I suggest to make pcre.jit PHP_INI_PERDIR
instead of PHP_INI_ALL for PHP 7.0. That still gives users the option
to disable it (ideally even on shared hosting), but avoids issues when
dynamically switching the setting with regard to our regexp caching. If
we find a better solution for PHP 7.1, we still can extend it to
PHP_INI_ALL without a BC break.
For PHP 7.1 it might be worthwhile to consider to store jitted resp.
non-jitted regexps in the cache, and to fetch what fits to the current
setting of pcre.jit (and otherwise to compile again, adding the new
entry to the cache, or maybe replacing it).
--
Christoph M. Becker
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 2:35 AM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limit-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Monday, July 27, 2015 12:06 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitDefinitely interesting ideas. However, currently I'm leaning towards
a full- fledged solution in PHP 7.1 (whatever that may be), and to
only add the error constant for PHP 7.0. Users can set pcre.jit=0 in
php.ini and not change the setting during runtime, if necessary.FWIW, some hours ago I've implemented the clean whole cache solution:
<https://github.com/cmb69/php-
src/commit/c41d78046da48db2fa8f5bd82fd6ac58099288f8>.I wrote this script to test it
https://gist.github.com/weltling/cd7f8127378989cca87c , but as expected -
cache reset make a significant impact when the cache is full. Though, maybe as
it's an exceptional situation, it were the simplest solution. Because it does not
affect the normal usage, maybe it's ok for 7.0. Maybe we should give this
flexibility, still in doubt. But the constant IMHO should be done anyway.IMHO we shouldn't introduce the cache clearing. As your test script shows, the
performance might suffer considerably when switching pcre.jit in a running
script. Furthermore cached regexps currently in use (e.g.
via preg_replace_callback) can't be cleared, what might lead to confusion (in
very rare cases, though).As it is now users can set pcre.jit=0 and so will retain full compatibility with PHP
- If they're running with pcre.jit=1, they're able to get a meaningful error
(PREG_JIT_STACKLIMIT_ERROR), if the default stack is not large enough (what
indeed might rarely be the case).
Same opinion here.
If we leave it as this, I suggest to make pcre.jit PHP_INI_PERDIR instead of
PHP_INI_ALL for PHP 7.0. That still gives users the option to disable it (ideally
even on shared hosting), but avoids issues when dynamically switching the
setting with regard to our regexp caching. If we find a better solution for PHP
7.1, we still can extend it to PHP_INI_ALL without a BC break.
That would disable possibility to change it in the registry which is PHP_INI_USER. That's something about shared hosting, too.
For PHP 7.1 it might be worthwhile to consider to store jitted resp.
non-jitted regexps in the cache, and to fetch what fits to the current setting of
pcre.jit (and otherwise to compile again, adding the new entry to the cache, or
maybe replacing it).
Not sure to understand what you mean. Storing every pattern in both variants studied with and without JIT?
Regards
Anatol
Hi Anatol,
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 2:35 AM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limit-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Monday, July 27, 2015 12:06 AM
To: Anatol Belski anatol.php@belski.net; 'Pierre Joye'
pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitIf we leave it as this, I suggest to make pcre.jit PHP_INI_PERDIR instead of
PHP_INI_ALL for PHP 7.0. That still gives users the option to disable it (ideally
even on shared hosting), but avoids issues when dynamically switching the
setting with regard to our regexp caching. If we find a better solution for PHP
7.1, we still can extend it to PHP_INI_ALL without a BC break.That would disable possibility to change it in the registry which is PHP_INI_USER. That's something about shared hosting, too.
Ah, I see. Then we ought to prominently document that chaning the
setting during script execution might not have the desired effect due to
internal caching.
For PHP 7.1 it might be worthwhile to consider to store jitted resp.
non-jitted regexps in the cache, and to fetch what fits to the current setting of
pcre.jit (and otherwise to compile again, adding the new entry to the cache, or
maybe replacing it).Not sure to understand what you mean. Storing every pattern in both variants studied with and without JIT?
I didn't mean to store all patterns in both variants, but rather as
requested. Say, the PHP script starts with pcre.jit=0, so all patterns
will be stored studied without JIT. Then the script changes to
pcre.jit=1. Following cache lookups check whether an already cached
pattern was studied with JIT. If not, a new cache entry is made with
the pattern studied with JIT (that new entry may replace the old one or
not; not sure what's best). Changing back to pcre.jit=0 will again keep
the old patterns, but ensures that only those which have been studied
without JIT are retrieved from the cache.
--
Christoph M. Becker
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 1:16 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI didn't mean to store all patterns in both variants, but rather as requested. Say,
the PHP script starts with pcre.jit=0, so all patterns will be stored studied without
JIT. Then the script changes to pcre.jit=1. Following cache lookups check
whether an already cached pattern was studied with JIT. If not, a new cache
entry is made with the pattern studied with JIT (that new entry may replace the
old one or not; not sure what's best). Changing back to pcre.jit=0 will again
keep the old patterns, but ensures that only those which have been studied
without JIT are retrieved from the cache.
It needs to replace the old entry, the regex string is used as key. At least per current mechanism. When user changes the JIT config, I'd see it like as an obvious hint. Like pcre.jit=0 - no more JIT study from now on. Probably it would be needed to save the study flags with the cache entry, looks like the pcre_cache_entry struct should have a gap for that. But actually sounds like an approach.
Regards
Anatol
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 1:16 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI didn't mean to store all patterns in both variants, but rather as requested. Say,
the PHP script starts with pcre.jit=0, so all patterns will be stored studied without
JIT. Then the script changes to pcre.jit=1. Following cache lookups check
whether an already cached pattern was studied with JIT. If not, a new cache
entry is made with the pattern studied with JIT (that new entry may replace the
old one or not; not sure what's best). Changing back to pcre.jit=0 will again
keep the old patterns, but ensures that only those which have been studied
without JIT are retrieved from the cache.It needs to replace the old entry, the regex string is used as key. At least per current mechanism. When user changes the JIT config, I'd see it like as an obvious hint. Like pcre.jit=0 - no more JIT study from now on.
Probably you're right that replacement is best. However, the key is the
full regex string (including modifiers), AFAIK, and Adam already came up
with the idea that choosing between JIT and non-JIT might be done via a
new modifier[1]. Something to consider, if it turns out that there is
some real optimization potential for userland to choose between both
variants on a case-by-case basis.
Probably it would be needed to save the study flags with the cache entry, looks like the pcre_cache_entry struct should have a gap for that.
If I'm not mistaken, an additional flag wouldn't be necessary, because
the pcre_extra struct has the executable_jit field, which could be queried.
[1] http://news.php.net/php.internals/87261
--
Christoph M. Becker
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 7:40 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limit-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 1:16 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI didn't mean to store all patterns in both variants, but rather as
requested. Say, the PHP script starts with pcre.jit=0, so all
patterns will be stored studied without JIT. Then the script changes
to pcre.jit=1. Following cache lookups check whether an already
cached pattern was studied with JIT. If not, a new cache entry is
made with the pattern studied with JIT (that new entry may replace
the old one or not; not sure what's best). Changing back to
pcre.jit=0 will again keep the old patterns, but ensures that only those which
have been studied without JIT are retrieved from the cache.It needs to replace the old entry, the regex string is used as key. At least per
current mechanism. When user changes the JIT config, I'd see it like as an
obvious hint. Like pcre.jit=0 - no more JIT study from now on.Probably you're right that replacement is best. However, the key is the full regex
string (including modifiers), AFAIK, and Adam already came up with the idea that
choosing between JIT and non-JIT might be done via a new modifier[1].
Something to consider, if it turns out that there is some real optimization
potential for userland to choose between both variants on a case-by-case basis.
Yeah, that were a nice approach. Should be checked probably, whether there's a corresponding perl pattern modifier (probably there is none, but shouldn't be that bad). And also to evaluate what behavior were produced when say jit is disabled but a modifier is passed.
Probably it would be needed to save the study flags with the cache entry, looks
like the pcre_cache_entry struct should have a gap for that.If I'm not mistaken, an additional flag wouldn't be necessary, because the
pcre_extra struct has the executable_jit field, which could be queried.
Ok, that would be even better, given it's reliable.
Regards
Anatol
Just wanted to resurrect this thread... I just got a JIT stack size limit error (which Composer reports as "unknown error" because it has no logic for the new constant) while running "composer require somepackage" with PHP 7 and JIT enabled.
IMO, a pattern modifier would be good so that applications can disable JIT on demand.
I'd also like to re-start a discussion on changing the default for pcre.jit to 0.
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 7:40 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limit-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 1:16 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI didn't mean to store all patterns in both variants, but rather as
requested. Say, the PHP script starts with pcre.jit=0, so all
patterns will be stored studied without JIT. Then the script changes
to pcre.jit=1. Following cache lookups check whether an already
cached pattern was studied with JIT. If not, a new cache entry is
made with the pattern studied with JIT (that new entry may replace
the old one or not; not sure what's best). Changing back to
pcre.jit=0 will again keep the old patterns, but ensures that only those which
have been studied without JIT are retrieved from the cache.It needs to replace the old entry, the regex string is used as key. At least per
current mechanism. When user changes the JIT config, I'd see it like as an
obvious hint. Like pcre.jit=0 - no more JIT study from now on.Probably you're right that replacement is best. However, the key is the full regex
string (including modifiers), AFAIK, and Adam already came up with the idea that
choosing between JIT and non-JIT might be done via a new modifier[1].
Something to consider, if it turns out that there is some real optimization
potential for userland to choose between both variants on a case-by-case basis.Yeah, that were a nice approach. Should be checked probably, whether there's a corresponding perl pattern modifier (probably there is none, but shouldn't be that bad). And also to evaluate what behavior were produced when say jit is disabled but a modifier is passed.
Probably it would be needed to save the study flags with the cache entry, looks
like the pcre_cache_entry struct should have a gap for that.If I'm not mistaken, an additional flag wouldn't be necessary, because the
pcre_extra struct has the executable_jit field, which could be queried.Ok, that would be even better, given it's reliable.
Regards
Anatol
Hi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Sunday, March 13, 2016 6:09 AM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitJust wanted to resurrect this thread... I just got a JIT stack size limit
error (which
Composer reports as "unknown error" because it has no logic for the new
constant) while running "composer require somepackage" with PHP 7 and JIT
enabled.
Could you please give a worky reproduce scenario? Itself the description
sounds like that some part in composer is not yet PHP7 ported or it's a
special case causing JIT impacts, but it were good to have a reproducer to
evaluate and make a conclusion. Till today, there was no bug reports about
PCRE JIT issues in common PHP frameworks. Except this special case
https://bugs.exim.org/show_bug.cgi?id=1189 which is fixed in PCRE 8.39 that
should be bundled as soon as possible (an eye is being kept on that).
IMO, a pattern modifier would be good so that applications can disable JIT
on
demand.I'd also like to re-start a discussion on changing the default for
pcre.jit to 0.
A pattern modifier could be thinkable and would be doable (sure for master,
or even for later 7.0 versions), but would cause some increased complexity
in logic for pattern cache. From the above - let's have a reproducer and
evaluate the status. Then we'll have a point for further consideration.
Thanks
Anatol
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 7:40 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limit-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 1:16 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI didn't mean to store all patterns in both variants, but rather as
requested. Say, the PHP script starts with pcre.jit=0, so all
patterns will be stored studied without JIT. Then the script
changes to pcre.jit=1. Following cache lookups check whether an
already cached pattern was studied with JIT. If not, a new cache
entry is made with the pattern studied with JIT (that new entry may
replace the old one or not; not sure what's best). Changing back
to
pcre.jit=0 will again keep the old patterns, but ensures that only
those which
have been studied without JIT are retrieved from the cache.It needs to replace the old entry, the regex string is used as key.
At least per
current mechanism. When user changes the JIT config, I'd see it like
as an obvious hint. Like pcre.jit=0 - no more JIT study from now on.Probably you're right that replacement is best. However, the key is
the full regex string (including modifiers), AFAIK, and Adam already
came up with the idea that choosing between JIT and non-JIT might be
done
via a new modifier[1].
Something to consider, if it turns out that there is some real
optimization potential for userland to choose between both variants on
a
case-by-case basis.Yeah, that were a nice approach. Should be checked probably, whether
there's
a corresponding perl pattern modifier (probably there is none, but
shouldn't be
that bad). And also to evaluate what behavior were produced when say jit
is
disabled but a modifier is passed.Probably it would be needed to save the study flags with the cache
entry, looks
like the pcre_cache_entry struct should have a gap for that.If I'm not mistaken, an additional flag wouldn't be necessary,
because the pcre_extra struct has the executable_jit field, which could
be
queried.Ok, that would be even better, given it's reliable.
Regards
Anatol
--
To unsubscribe,
visit: http://www.php.net/unsub.php
Hi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Sunday, March 13, 2016 6:09 AM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitJust wanted to resurrect this thread... I just got a JIT stack size limit
error (which
Composer reports as "unknown error" because it has no logic for the new
constant) while running "composer require somepackage" with PHP 7 and JIT
enabled.
Could you please give a worky reproduce scenario? Itself the description
sounds like that some part in composer is not yet PHP7 ported or it's a
special case causing JIT impacts, but it were good to have a reproducer to
evaluate and make a conclusion. Till today, there was no bug reports about
PCRE JIT issues in common PHP frameworks. Except this special case
https://bugs.exim.org/show_bug.cgi?id=1189 which is fixed in PCRE 8.39 that
should be bundled as soon as possible (an eye is being kept on that).
Sure. So composer creates patterns to parse a file dynamically; I have not figured out why it only happens for some composer.json files (or maybe it happens for all of them and I just have not noticed), but this example here I extracted from a reproducibly failing "composer require" run; it runs into the JIT stack size limit on PHP 7 while it works fine on 5.x:
https://gist.github.com/dzuelke/cc64a630c14416eda3e9
IMO, a pattern modifier would be good so that applications can disable JIT
on
demand.I'd also like to re-start a discussion on changing the default for
pcre.jit to 0.
A pattern modifier could be thinkable and would be doable (sure for master,
or even for later 7.0 versions), but would cause some increased complexity
in logic for pattern cache. From the above - let's have a reproducer and
evaluate the status. Then we'll have a point for further consideration.Thanks
Anatol
Hi Christoph,
-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 7:40 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limit-----Original Message-----
From: Christoph Becker [mailto:cmbecker69@gmx.de]
Sent: Tuesday, August 4, 2015 1:16 PM
To: Anatol Belski anatol.php@belski.net; 'Christoph Becker'
cmbecker69@gmx.de; 'Pierre Joye' pierre.php@gmail.com
Cc: 'PHP internals' internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitI didn't mean to store all patterns in both variants, but rather as
requested. Say, the PHP script starts with pcre.jit=0, so all
patterns will be stored studied without JIT. Then the script
changes to pcre.jit=1. Following cache lookups check whether an
already cached pattern was studied with JIT. If not, a new cache
entry is made with the pattern studied with JIT (that new entry may
replace the old one or not; not sure what's best). Changing back
to
pcre.jit=0 will again keep the old patterns, but ensures that only
those which
have been studied without JIT are retrieved from the cache.
It needs to replace the old entry, the regex string is used as key.
At least per
current mechanism. When user changes the JIT config, I'd see it like
as an obvious hint. Like pcre.jit=0 - no more JIT study from now on.Probably you're right that replacement is best. However, the key is
the full regex string (including modifiers), AFAIK, and Adam already
came up with the idea that choosing between JIT and non-JIT might be
done
via a new modifier[1].
Something to consider, if it turns out that there is some real
optimization potential for userland to choose between both variants on
a
case-by-case basis.
Yeah, that were a nice approach. Should be checked probably, whether
there's
a corresponding perl pattern modifier (probably there is none, but
shouldn't be
that bad). And also to evaluate what behavior were produced when say jit
is
disabled but a modifier is passed.Probably it would be needed to save the study flags with the cache
entry, looks
like the pcre_cache_entry struct should have a gap for that.If I'm not mistaken, an additional flag wouldn't be necessary,
because the pcre_extra struct has the executable_jit field, which could
be
queried.
Ok, that would be even better, given it's reliable.Regards
Anatol
--
To unsubscribe,
visit: http://www.php.net/unsub.php
Hi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Tuesday, March 15, 2016 11:58 PM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitSure. So composer creates patterns to parse a file dynamically; I have not
figured out why it only happens for some composer.json files (or maybe it
happens for all of them and I just have not noticed), but this example here I
extracted from a reproducibly failing "composer require" run; it runs into the JIT
stack size limit on PHP 7 while it works fine on 5.x:
I've just tried this on Debian Jessie and on Windows with the latest 7.0.x builds, in both cases the error is PREG_BACKTRACK_LIMIT_ERROR. I'd be next asking you to check the configuration. Runtime like system or ini, or build config can affect this. Anyway, I don't see an issue with PCRE JIT at the moment.
Thanks.
anatol
Hi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Tuesday, March 15, 2016 11:58 PM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitSure. So composer creates patterns to parse a file dynamically; I have not
figured out why it only happens for some composer.json files (or maybe it
happens for all of them and I just have not noticed), but this example here I
extracted from a reproducibly failing "composer require" run; it runs into the JIT
stack size limit on PHP 7 while it works fine on 5.x:I've just tried this on Debian Jessie and on Windows with the latest 7.0.x builds, in both cases the error is PREG_BACKTRACK_LIMIT_ERROR. I'd be next asking you to check the configuration. Runtime like system or ini, or build config can affect this. Anyway, I don't see an issue with PCRE JIT at the moment.
Sorry, I simplified the example too far. Updated it again to throw a JIT stack limit error again (but works fine with "php -dpcre.jit=0"):
https://gist.github.com/dzuelke/cc64a630c14416eda3e9/de7b7293798ac57b4f75d822dc22a625be0fe9df
David
Hi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Tuesday, March 15, 2016 11:58 PM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitSure. So composer creates patterns to parse a file dynamically; I have not
figured out why it only happens for some composer.json files (or maybe it
happens for all of them and I just have not noticed), but this example here I
extracted from a reproducibly failing "composer require" run; it runs into the JIT
stack size limit on PHP 7 while it works fine on 5.x:I've just tried this on Debian Jessie and on Windows with the latest 7.0.x builds, in both cases the error is PREG_BACKTRACK_LIMIT_ERROR. I'd be next asking you to check the configuration. Runtime like system or ini, or build config can affect this. Anyway, I don't see an issue with PCRE JIT at the moment.
Sorry, I simplified the example too far. Updated it again to throw a JIT stack limit error again (but works fine with "php -dpcre.jit=0"):
https://gist.github.com/dzuelke/cc64a630c14416eda3e9/de7b7293798ac57b4f75d822dc22a625be0fe9df
Were you able to reproduce this? :)
David
Hi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Sunday, March 20, 2016 10:10 PM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitHi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Tuesday, March 15, 2016 11:58 PM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitSure. So composer creates patterns to parse a file dynamically; I
have not figured out why it only happens for some composer.json
files (or maybe it happens for all of them and I just have not
noticed), but this example here I extracted from a reproducibly
failing "composer require" run; it runs into the JIT stack size limit
on PHP 7
while it works fine on 5.x:I've just tried this on Debian Jessie and on Windows with the latest
7.0.x
builds, in both cases the error is PREG_BACKTRACK_LIMIT_ERROR. I'd be next
asking you to check the configuration. Runtime like system or ini, or
build config
can affect this. Anyway, I don't see an issue with PCRE JIT at the moment.Sorry, I simplified the example too far. Updated it again to throw a JIT
stack
limit error again (but works fine with "php -dpcre.jit=0"):https://gist.github.com/dzuelke/cc64a630c14416eda3e9/de7b7293798ac57b4
f75d822dc22a625be0fe9dfWere you able to reproduce this? :)
Yes, it is reproducible. I was just investigating on another issue
https://bugs.exim.org/show_bug.cgi?id=1803 which now seems unrelated. With
the JIT stack - we can try to use a custom stack to increase its size by
allocating a custom jit stack of an increased sizeas a workaround. If that
works, it should be fine. I'll be working on a patch next week, then we can
see this option makes sense. More on the topic is readable here
http://www.pcre.org/original/doc/html/pcrejit.html#stackcontrol .
Regards
Anatol
Hi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Sunday, March 20, 2016 10:10 PM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitHi,
-----Original Message-----
From: David Zuelke [mailto:dz@heroku.com]
Sent: Tuesday, March 15, 2016 11:58 PM
To: Anatol Belski anatol.php@belski.net
Cc: Christoph Becker cmbecker69@gmx.de; Pierre Joye
pierre.php@gmail.com; PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE JIT stack size limitSure. So composer creates patterns to parse a file dynamically; I
have not figured out why it only happens for some composer.json
files (or maybe it happens for all of them and I just have not
noticed), but this example here I extracted from a reproducibly
failing "composer require" run; it runs into the JIT stack size limit
on PHP 7
while it works fine on 5.x:I've just tried this on Debian Jessie and on Windows with the latest
7.0.x
builds, in both cases the error is PREG_BACKTRACK_LIMIT_ERROR. I'd be next
asking you to check the configuration. Runtime like system or ini, or
build config
can affect this. Anyway, I don't see an issue with PCRE JIT at the moment.Sorry, I simplified the example too far. Updated it again to throw a JIT
stack
limit error again (but works fine with "php -dpcre.jit=0"):https://gist.github.com/dzuelke/cc64a630c14416eda3e9/de7b7293798ac57b4
f75d822dc22a625be0fe9dfWere you able to reproduce this? :)
Yes, it is reproducible. I was just investigating on another issue
https://bugs.exim.org/show_bug.cgi?id=1803 which now seems unrelated. With
the JIT stack - we can try to use a custom stack to increase its size by
allocating a custom jit stack of an increased sizeas a workaround. If that
works, it should be fine. I'll be working on a patch next week, then we can
see this option makes sense. More on the topic is readable here
http://www.pcre.org/original/doc/html/pcrejit.html#stackcontrol .
That sounds good. I suspect a significantly larger stack size than the PCRE default of 32k will do the job for most cases. Arguably, what Composer uses there is not a trivial pattern with all the nesting going on, but I was unable to simplify it when I took a few minutes the other day, and while the composer.json where the bug occurs is generated by code, I don't think it qualifies as an extreme case.
I guess though that some day, someone with a more complex pattern and/or subject string will come with the same problem and say "this fails for me, the 256k stack limit you set in PHP is not enough", so maybe an INI option should exist? Although that's not terribly useful if one program needs it for one pattern. Flags don't work well for this (unless it's e.g. a numeric one indicating multiples of 128k or something, but that's also terrible).
Or change the pcre_match (et al) implementations to retry without JIT if the error occurs?
David
PHP7 supports PCRE's JIT compilation of patterns by default, which
mostly works fine. However, there are issues when the matching exceeds
the JIT stack limit, see bug #70110[1].I'm not sure how to solve this best. Basically, I see two possible
solutions: either we fall back to non JIT matching, if pcre_exec() fails
with PCRE_ERROR_JIT_STACKLIMIT, or we use a custom JIT stack and make
its size a configurable ini setting (similar to pcre.backtrack_limit),
and raiseE_WARNING
if the matching fails due to limited stack size.Thoughts?
Zoltán Herczeg just told me that the size of the JIT stack doesn't have
to be fixed, but rather allows to dynamically grow up to a maximum given
size (preallocated by mmap/VirtualAlloc). I'll do some testing and
report back later.
--
Christoph M. Becker