Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:87278 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28490 invoked from network); 24 Jul 2015 23:10:33 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Jul 2015 23:10:33 -0000 Authentication-Results: pb1.pair.com header.from=anatol.php@belski.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=anatol.php@belski.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain belski.net from 85.214.73.107 cause and error) X-PHP-List-Original-Sender: anatol.php@belski.net X-Host-Fingerprint: 85.214.73.107 klapt.com Received: from [85.214.73.107] ([85.214.73.107:60945] helo=h1123647.serverkompetenz.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 84/22-10459-8E5C2B55 for ; Fri, 24 Jul 2015 19:10:33 -0400 Received: by h1123647.serverkompetenz.net (Postfix, from userid 1006) id 5170E23D615C; Sat, 25 Jul 2015 01:10:29 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on h1123647.serverkompetenz.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.5 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=unavailable version=3.3.2 Received: from w530phpdev (pD9FE8363.dip0.t-ipconnect.de [217.254.131.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by h1123647.serverkompetenz.net (Postfix) with ESMTPSA id 996EE23D6003; Sat, 25 Jul 2015 01:10:25 +0200 (CEST) To: "'Christoph Becker'" , "'Pierre Joye'" Cc: "'PHP internals'" References: <55B0CADD.8000807@gmx.de> <55B136BE.8010006@gmx.de> <046401d0c644$9cf76880$d6e63980$@belski.net> <55B2B78A.4080202@gmx.de> In-Reply-To: <55B2B78A.4080202@gmx.de> Date: Sat, 25 Jul 2015 01:10:22 +0200 Message-ID: <048001d0c665$ed19ec40$c74dc4c0$@belski.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQHD9RJvenw/Ss5kSkuHSfZFOrs2uQBWRtPIAqC4oYQC1yvg/gL/4swBnb5l1dA= Content-Language: en-us Subject: RE: [PHP-DEV] PCRE JIT stack size limit From: anatol.php@belski.net ("Anatol Belski") Hi Christoph, > -----Original Message----- > From: Christoph Becker [mailto:cmbecker69@gmx.de] > Sent: Saturday, July 25, 2015 12:09 AM > To: Anatol Belski ; 'Pierre Joye' > > Cc: 'PHP internals' > Subject: Re: [PHP-DEV] PCRE JIT stack size limit >=20 > Hi Anatol, >=20 > Anatol Belski wrote: >=20 > > This looks like an extremely fragile topic because it depends on how > > much stack is available to an executable. A custom JIT stack can > > behave more stable but cannot be resized. And the main issue is that > > the JIT stack size, machine stack size and ext/pcre cache size are > > completely unrelated terms. For example, a binary can have not = enough > > stack, but the custom JIT stack using mmap/VirtualAlloc could even > > succeed, but then pcre_exec will be executed and overflow the = machine > > stack. We can never know which one is exhausted first - the one for > > the JIT compilation or the other one for the execution, or vice = versa. > > > > Generally, moving the JIT compilation away from the machine stack = and > > increasing the PCRE cache size should be more stable against this . > > However it's an edge case. IMHO we should not do it just to fix some > > crazy usage. Users who need it might just turn off JIT. Normal usage > > seems not to be affected, say loading some sane functional script, > > which FE is done by any benchmark with WP, Symfony, etc. But moving > > JIT compilation away from the machine stack wil lpossibly affect it. >=20 > Beforehand, I'm not suggesting to change anything regarding our PCRE = cache > (PCRE_G(pcre_cache)); this seems to be fine as it is, and is indeed = not related to > this topic. >=20 > Now please consider the following simple expression: >=20 > preg_match('/^(foo)+$/', str_repeat('foo', $n)) >=20 > This will fail (i.e. yield FALSE) independently of pcre.jit for large = enough $n. > However, a user can change pcre.recursion_limit what will affect the = $n limit > (the expression will fail for smaller or larger $n), if pcre.jit=3D0. = If pcre.jit=3D1 the > user can't influence this boundary in any way, currently. >=20 > And maybe even worse, with pcre.jit=3D0 the boundary is 50,000, but = with > pcre.jit=3D1 it is only 1,366. Of course, one can argue that this is = a contrived > example, and that such usage is crazy, but why do we have a default > pcre.recursion_limit of 100,000 then? A recursion_limit of > 2,734 would be sufficient to have a boundary of $n =3D=3D 1,366. >=20 The 100000 is an empirical value by my guess. It regards to the default = stack sizes on different platforms and to an average pattern. There is = no prediction that PCRE will not exhaust the stack available to the = binary. > All in all, as this example already suggests, classic execution of = matching is done > by recursive calls (using normal stack frames), while JIT execution of = matching is > iterative, using a special JIT stack.[1] I don't think it is = justified to give users a > setting to adjust for the former, but not for the latter (except to = disable JIT, > albeit JIT might bring quite some boost especially for such cases). >=20 JIT without custom stack will use "32k of machine stack", by the doc. We = currently don't really give users a choice to choose between iterative = and recursive PCRE execution, it's a compile time decision. And this is = again because an iterative execution will be safer, but will affect an = average case with unnecessary thrift. In this JIT case, one could give = this choice, you're right, but we should evaluate it carefully. Fe what = happens to the PHP pattern cache if JIT stack is exhausted? What I was more precisely talking about is like If (false =3D=3D=3D preg_match(",evil pattern,", ...)) { Ini_set("pcre.jit", 0); // retry } So received error - no JIT. And no additional logic/overhead in = ext/pcre. Maybe custom JIT were eligible in this case, but according to the PCRE = doc it can easily bring issues as the global JIT memory can't be = resized/migrated just by one's finger click. Say one would be forced to = either use the custom JIT stack or default JIT stack from the start on. = This can end up with the over complication using a custom JIT stack for = a particular pattern. > As we're pretty late in the game for PHP 7.0, it might be best to = postpone a new > ini setting or other changes to PHP 7.1, but at the very least I would = introduce a > new error constant, say PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get = a > more meaningful result when calling preg_last_error() than > PHP_PCRE_INTERNAL_ERROR. And it seems to be appropriate to add a note = to > UPGRADING that pcre.jit=3D1 may cause some preg_*() to fail which = would work > with pcre.jit=3D0. >=20 Yes, instead of returning false one could return an explicit error, or = indicate in any other ways. Thanks for bringing it up and maybe we'll = have more info after your further investigation. But documentation is = appropriate in any cases. Regards Anatol