Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:87282 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 37392 invoked from network); 25 Jul 2015 00:25:25 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Jul 2015 00:25:25 -0000 Authentication-Results: pb1.pair.com header.from=cmbecker69@gmx.de; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=cmbecker69@gmx.de; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmx.de designates 212.227.17.22 as permitted sender) X-PHP-List-Original-Sender: cmbecker69@gmx.de X-Host-Fingerprint: 212.227.17.22 mout.gmx.net Received: from [212.227.17.22] ([212.227.17.22:59467] helo=mout.gmx.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BF/B3-10459-277D2B55 for ; Fri, 24 Jul 2015 20:25:23 -0400 Received: from [192.168.0.100] ([95.89.139.132]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0M6B6s-1YyEDt1qXK-00y6ZE; Sat, 25 Jul 2015 02:25:19 +0200 Message-ID: <55B2D779.9090409@gmx.de> Date: Sat, 25 Jul 2015 02:25:29 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Anatol Belski , 'Pierre Joye' CC: 'PHP internals' References: <55B0CADD.8000807@gmx.de> <55B136BE.8010006@gmx.de> <046401d0c644$9cf76880$d6e63980$@belski.net> <55B2B78A.4080202@gmx.de> <048001d0c665$ed19ec40$c74dc4c0$@belski.net> In-Reply-To: <048001d0c665$ed19ec40$c74dc4c0$@belski.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:9nIgmXnqBS0D2Jjxu3Kf34spgOJinCT9kaF9bNTsqZn6GSZLzS9 h1gn+dRCI6fbnSNZvMF6OAqKzoL9fZtUJMRXTsu4k7L5coCGnS26cca6OwyJ0a4M0F4bAfb aqe6dJRiiAfaYyNBxD8W3y/D8aZcwpLRFplL3Bo0r8J/Lf0CpZC63mMi4I0/WK1rC/AUj5I hshHwct930ho2aGSjeqNg== X-UI-Out-Filterresults: notjunk:1;V01:K0:qm9UHpfJxMk=:A54dM2Nbd48OVqGxQwo3Lj 5aeLhd4vcY8nph9u095SixrxsTE0BdG9CKaebjyWkO0BjqskeZfeiomp8xUkE5E9YxS+Oixj8 ntviPbw2PqCMMwiqeLzzDKh+BFvMODKbrd1d2l1R7Qd1IBL+3qe4u7fxjzvmwyws6GTvc2LGC XrD2BRlWptR/FgTUuLTQfakCWh+zRljurnMtDYoBDM7La/icDBizbQagx6D9MBBH6oLwGoek+ ExxUSVChqqnfiUVd+gJW64EUtMjIHRUqiHxqpb909HId0R6jjc7RvLXiqdEZWNBv7zQJnavm+ 1GBCqCm50BOqqqup0ZaFzD5LSrZrFgwNgwSZNfq4pXrfNkX4F6nVi4BrrbqeTHnFA+dYw4/hx gZ3UDEXVqNLaZBhs20y8hwaHhwp2pHfQDw2rCSOTrwU7lQEZ1U2ryJMLV3sqPSsZDeBW2ZSkK lMXmDd7eJ/AjA0ehTTLT8e9a8FUKR5U4PHZFBWRAbMaJJ49a7rF8sInRMAiAwUbekJyE8NXty /wMYEMeaMZFrS3CCbittXTZeAKDTShaUq79TaL/vLvPym35Opv85JJs3MAxI44oKzTxaR1HEY MYZkgirFRSNE/VI0GRj/LTWNYYOT6a9zz6NgQV0dG1EnR5ot533S/90QC+gDmKQPCwB1AU7os CuVG0CmzyXMSfPBsYnPXBT5JbjiZogoRiB9y0+ILs+Y7jsWvVMpK7NsDyc+X7qnwAkfA= Subject: Re: [PHP-DEV] PCRE JIT stack size limit From: cmbecker69@gmx.de (Christoph Becker) Hi Anatol, Anatol Belski wrote: > Hi Christoph, > >> -----Original Message----- >> From: Christoph Becker [mailto:cmbecker69@gmx.de] >> Sent: Saturday, July 25, 2015 12:09 AM >> To: Anatol Belski ; 'Pierre Joye' >> >> Cc: 'PHP internals' >> Subject: Re: [PHP-DEV] PCRE JIT stack size limit >> >> Now please consider the following simple expression: >> >> preg_match('/^(foo)+$/', str_repeat('foo', $n)) >> >> This will fail (i.e. yield FALSE) independently of pcre.jit for large enough $n. >> However, a user can change pcre.recursion_limit what will affect the $n limit >> (the expression will fail for smaller or larger $n), if pcre.jit=0. If pcre.jit=1 the >> user can't influence this boundary in any way, currently. >> >> And maybe even worse, with pcre.jit=0 the boundary is 50,000, but with >> pcre.jit=1 it is only 1,366. Of course, one can argue that this is a contrived >> example, and that such usage is crazy, but why do we have a default >> pcre.recursion_limit of 100,000 then? A recursion_limit of >> 2,734 would be sufficient to have a boundary of $n == 1,366. > > The 100000 is an empirical value by my guess. It regards to the default stack sizes on different platforms and to an average pattern. There is no prediction that PCRE will not exhaust the stack available to the binary. ACK. However, the user is able to tune this value according to the environment. And yes, I'm aware that pcre.recursion_limit may be necessary to prevent a stack overflow, what can't happen with JIT compilation (if the JIT stack is exhausted, pcre_exec() returns gracefully), but nonetheless giving users some control over the size of the JIT stack seems to be appropriate (at least in the long run, aka. PHP 7.1). >> All in all, as this example already suggests, classic execution of matching is done >> by recursive calls (using normal stack frames), while JIT execution of matching is >> iterative, using a special JIT stack.[1] I don't think it is justified to give users a >> setting to adjust for the former, but not for the latter (except to disable JIT, >> albeit JIT might bring quite some boost especially for such cases). > > JIT without custom stack will use "32k of machine stack", by the doc. We currently don't really give users a choice to choose between iterative and recursive PCRE execution, it's a compile time decision. Well, one can enforce non JIT execution by setting pcre.jit=0. pcre.jit=1 indeed may not enforce JIT execution, depending on the platform[1] and the libpcre version. > And this is again because an iterative execution will be safer, but will affect an average case with unnecessary thrift. In this JIT case, one could give this choice, you're right, but we should evaluate it carefully. Fe what happens to the PHP pattern cache if JIT stack is exhausted? That is unrelated. It is possible to use a single custom PCRE JIT stack for all patterns (actually, pcre_exec() calls). Only if multi-threading comes into play, each thread should have its own stack (that's not absolutely necessary, but it simplifies things a lot).[2] Therefore having a single PCRE_G(jit_stack) would be sufficient. I've committed a naive draft to my php-src fork[3]. > What I was more precisely talking about is like > > If (false === preg_match(",evil pattern,", ...)) { > Ini_set("pcre.jit", 0); > // retry > } > > So received error - no JIT. And no additional logic/overhead in ext/pcre. Yes, that is a viable way for userland (even if I would suggest to test against pcre_last_error(); preg_match() may return FALSE for several error conditions, such as PHP_PCRE_BAD_UTF8_ERROR, and then retrying with pcre.jit=0 wouldn't help). We could as well put the same logic into ext/pcre (but I do not suggest to do so, even though I mentioned this as option in the OP). > Maybe custom JIT were eligible in this case, but according to the PCRE doc it can easily bring issues as the global JIT memory can't be resized/migrated just by one's finger click. Say one would be forced to either use the custom JIT stack or default JIT stack from the start on. This can end up with the over complication using a custom JIT stack for a particular pattern. I would use the same JIT stack for all patterns (whether that is the default JIT stack, or a custom one). The libpcre manual states[2]: | You may safely use the same JIT stack for more than one pattern | (either by assigning directly or by callback), as long as the | patterns are all matched sequentially in the same thread. To my knowledge in ext/pcre all patterns are matched sequentially, but I'll double-check, and I'll inquire on the libpcre mailing list. >> As we're pretty late in the game for PHP 7.0, it might be best to postpone a new >> ini setting or other changes to PHP 7.1, but at the very least I would introduce a >> new error constant, say PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a >> more meaningful result when calling preg_last_error() than >> PHP_PCRE_INTERNAL_ERROR. And it seems to be appropriate to add a note to >> UPGRADING that pcre.jit=1 may cause some preg_*() to fail which would work >> with pcre.jit=0. > > Yes, instead of returning false one could return an explicit error, or indicate in any other ways. I would not suggest to change the return value of preg_match() and friends. FALSE seems to be appropriate here. Instead I suggest to add a new error constant which would be returned from preg_last_error(). > Thanks for bringing it up and maybe we'll have more info after your further investigation. But documentation is appropriate in any cases. I'll make a respective PR after having checked back with the libpcre devs. [1] [2] [3] -- Christoph M. Becker