Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:87290 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 17165 invoked from network); 25 Jul 2015 14:23:13 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Jul 2015 14:23:13 -0000 Authentication-Results: pb1.pair.com header.from=anatol.php@belski.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=anatol.php@belski.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain belski.net from 85.214.73.107 cause and error) X-PHP-List-Original-Sender: anatol.php@belski.net X-Host-Fingerprint: 85.214.73.107 klapt.com Received: from [85.214.73.107] ([85.214.73.107:39886] helo=h1123647.serverkompetenz.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id FE/B0-12869-ECB93B55 for ; Sat, 25 Jul 2015 10:23:10 -0400 Received: by h1123647.serverkompetenz.net (Postfix, from userid 1006) id 38BBB23D615C; Sat, 25 Jul 2015 16:23:07 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on h1123647.serverkompetenz.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.5 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=unavailable version=3.3.2 Received: from w530phpdev (pD9FE860F.dip0.t-ipconnect.de [217.254.134.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by h1123647.serverkompetenz.net (Postfix) with ESMTPSA id 69CE723D6003; Sat, 25 Jul 2015 16:23:02 +0200 (CEST) To: "'Christoph Becker'" , "'Pierre Joye'" Cc: "'PHP internals'" References: <55B0CADD.8000807@gmx.de> <55B136BE.8010006@gmx.de> <046401d0c644$9cf76880$d6e63980$@belski.net> <55B2B78A.4080202@gmx.de> <048001d0c665$ed19ec40$c74dc4c0$@belski.net> <55B2D779.9090409@gmx.de> In-Reply-To: <55B2D779.9090409@gmx.de> Date: Sat, 25 Jul 2015 16:23:00 +0200 Message-ID: <04d301d0c6e5$6b28f0c0$417ad240$@belski.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQHD9RJvenw/Ss5kSkuHSfZFOrs2uQBWRtPIAqC4oYQC1yvg/gL/4swBAjr9PxcCDsvXWJ2dB1tg Content-Language: en-us Subject: RE: [PHP-DEV] PCRE JIT stack size limit From: anatol.php@belski.net ("Anatol Belski") Hi Christoph, > -----Original Message----- > From: Christoph Becker [mailto:cmbecker69@gmx.de] > Sent: Saturday, July 25, 2015 2:25 AM > To: Anatol Belski ; 'Pierre Joye' > > Cc: 'PHP internals' > Subject: Re: [PHP-DEV] PCRE JIT stack size limit >=20 > Hi Anatol, >=20 > Anatol Belski wrote: >=20 > > Hi Christoph, > > > >> -----Original Message----- > >> From: Christoph Becker [mailto:cmbecker69@gmx.de] > >> Sent: Saturday, July 25, 2015 12:09 AM > >> To: Anatol Belski ; 'Pierre Joye' > >> > >> Cc: 'PHP internals' > >> Subject: Re: [PHP-DEV] PCRE JIT stack size limit > >> > >> Now please consider the following simple expression: > >> > >> preg_match('/^(foo)+$/', str_repeat('foo', $n)) > >> > >> This will fail (i.e. yield FALSE) independently of pcre.jit for = large enough $n. > >> However, a user can change pcre.recursion_limit what will affect = the > >> $n limit (the expression will fail for smaller or larger $n), if > >> pcre.jit=3D0. If pcre.jit=3D1 the user can't influence this = boundary in any way, > currently. > >> > >> And maybe even worse, with pcre.jit=3D0 the boundary is 50,000, but > >> with > >> pcre.jit=3D1 it is only 1,366. Of course, one can argue that this = is a > >> contrived example, and that such usage is crazy, but why do we have = a > >> default pcre.recursion_limit of 100,000 then? A recursion_limit of > >> 2,734 would be sufficient to have a boundary of $n =3D=3D 1,366. > > > > The 100000 is an empirical value by my guess. It regards to the = default stack > sizes on different platforms and to an average pattern. There is no = prediction > that PCRE will not exhaust the stack available to the binary. >=20 > ACK. However, the user is able to tune this value according to the = environment. > And yes, I'm aware that pcre.recursion_limit may be necessary to = prevent a > stack overflow, what can't happen with JIT compilation (if the JIT = stack is > exhausted, pcre_exec() returns gracefully), but nonetheless giving = users some > control over the size of the JIT stack seems to be appropriate (at = least in the > long run, aka. > PHP 7.1). Yep, I see your point about the config option. That would only work at = MINIT stage. Though we also don't give many other options to the user = land, like the pattern cache size. Which is IMHO fine because users = would normally not need such nternal detail. >=20 > >> All in all, as this example already suggests, classic execution of > >> matching is done by recursive calls (using normal stack frames), > >> while JIT execution of matching is iterative, using a special JIT > >> stack.[1] I don't think it is justified to give users a setting to > >> adjust for the former, but not for the latter (except to disable = JIT, albeit JIT > might bring quite some boost especially for such cases). > > > > JIT without custom stack will use "32k of machine stack", by the = doc. We > currently don't really give users a choice to choose between iterative = and > recursive PCRE execution, it's a compile time decision. >=20 > Well, one can enforce non JIT execution by setting pcre.jit=3D0. > pcre.jit=3D1 indeed may not enforce JIT execution, depending on the = platform[1] > and the libpcre version. But we're discussing only the cases where JIT is available, that's what = it is about. No need to care about it otherwise. >=20 > > And this is again because an iterative execution will be safer, but = will affect an > average case with unnecessary thrift. In this JIT case, one could give = this choice, > you're right, but we should evaluate it carefully. Fe what happens to = the PHP > pattern cache if JIT stack is exhausted? >=20 > That is unrelated. It is possible to use a single custom PCRE JIT = stack for all > patterns (actually, pcre_exec() calls). Only if multi-threading comes = into play, > each thread should have its own stack (that's not absolutely = necessary, but it > simplifies things a lot).[2] Therefore having a single = PCRE_G(jit_stack) would be > sufficient. I've committed a naive draft to my php-src fork[3]. >=20 If it's going to happen, of course it should be a single JIT stack for = all patterns per thread. PCRE_G is required, otherwise some locking has = to be implemented manually. But what I merely wanted to illustrate is = exactly what you say - there are unrelated things which can go wrong but = possibly affect/cause each other.=20 =20 > I would use the same JIT stack for all patterns (whether that is the = default JIT > stack, or a custom one). The libpcre manual states[2]: >=20 > | You may safely use the same JIT stack for more than one pattern > | (either by assigning directly or by callback), as long as the = patterns > | are all matched sequentially in the same thread. >=20 > To my knowledge in ext/pcre all patterns are matched sequentially, but = I'll > double-check, and I'll inquire on the libpcre mailing list. >=20 Sure, just after MINIT it cannot be reassigned/resized/changed. Thus, = given something like pcre.jit_stack_size INI were implemented, if it's = requested under 32kb - nothing should be done, machine stack should be = used. If it were over it - then custom JIT stack based on = mmap/VirtualAlloc. But frankly I don't see an essential difference to = the handling in the user space. Yes, it might be an advantage to still = use JIT, but it moves away from the machine stack. And IMHO as it's an edge case, fixing it were fine but should not be = done on the cost of much over complication and affecting normal usage. = From what is to see right now - sane usage is not affected while = improved by JIT. If you're in the mood to add a note to upgrading, it'll = be probably just file for now - disabling JIT retains the full PHP5 = compatible behavior.=20 > >> As we're pretty late in the game for PHP 7.0, it might be best to > >> postpone a new ini setting or other changes to PHP 7.1, but at the > >> very least I would introduce a new error constant, say > >> PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a more meaningful > >> result when calling preg_last_error() than PHP_PCRE_INTERNAL_ERROR. > >> And it seems to be appropriate to add a note to UPGRADING that > >> pcre.jit=3D1 may cause some preg_*() to fail which would work with = pcre.jit=3D0. > > > > Yes, instead of returning false one could return an explicit error, = or indicate in > any other ways. >=20 > I would not suggest to change the return value of preg_match() and = friends. > FALSE seems to be appropriate here. Instead I suggest to add a new = error > constant which would be returned from preg_last_error(). >=20 Yep, sounds feasible. At least users can be informed about this kind of = error, seems a constant is feasible for 7.0. Btw I've just noticed that the snippet I've posted earlier won't = currently work.=20 If (false =3D=3D=3D preg_match(",evil pattern,", ...)) { If (OUT_OF_JIT_STACK =3D=3D pcre_last_error()) { ini_set("pcre.jit", 0); // retry } } The retry would fail even when JIT is disabled on demand. This is = actually rather unexpected, maybe something in PCRE JIT state cannot be = recovered once JIT compilation has failed. But actually, even without = running into debugging, it is also a bad sign telling that the behavior = can get complicated ( Running for the INI option and some custom JIT stack functionality needs = more brainstorming and test. Please ping when you have an initial = implementation, I'd be glad to help testing, etc. anyway. Regards Anatol