Hi there,
we just ran into a version of the bug "JIT bug with lookbehind assertion":
https://bugs.exim.org/show_bug.cgi?id=1189
To reproduce it you can use
php -n -r 'ini_set("pcre.jit", 0); echo preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'
vs.
php -n -r 'ini_set("pcre.jit", 1); echo preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'
Since the PCRE bug report dates from 2011-12-27 and is still marked NEW I wonder if it would be safer for PHP to disable pcre.jit in the recommended php.ini configuration files.
Also: Does anyone know who might be able/willing to look at the upstream bug?
Regards,
- Chris
Hi there,
we just ran into a version of the bug "JIT bug with lookbehind assertion":
https://bugs.exim.org/show_bug.cgi?id=1189To reproduce it you can use
php -n -r 'ini_set("pcre.jit", 0); echo preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'
vs.
php -n -r 'ini_set("pcre.jit", 1); echo preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'Since the PCRE bug report dates from 2011-12-27 and is still marked NEW I wonder if it would be safer for PHP to disable pcre.jit in the recommended php.ini configuration files.
Also: Does anyone know who might be able/willing to look at the upstream bug?
Regards,
- Chris
Just replying here since many people probably missed this since gmail
classified it as spam.
-Rasmus
Hi Christian,
-----Original Message-----
From: Christian Schneider [mailto:cschneid@cschneid.com]
Sent: Wednesday, February 17, 2016 2:07 PM
To: PHP internals internals@lists.php.net
Subject: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionHi there,
we just ran into a version of the bug "JIT bug with lookbehind assertion":
https://bugs.exim.org/show_bug.cgi?id=1189To reproduce it you can use
php -n -r 'ini_set("pcre.jit", 0); echo preg_replace("/\b(11|21|41)\b/u",
"z", "x°11\n");'
vs.
php -n -r 'ini_set("pcre.jit", 1); echo preg_replace("/\b(11|21|41)\b/u",
"z", "x°11\n");'
Seems valgrind doesn't detect any issues with this code. Using latest 7.0.4-dev code with PCRE 8.38, valgrind 3.10.0 on Jessie. Is there something else I could miss?
Thanks
Anatol
Hi Christian,
-----Original Message-----
From: Anatol Belski [mailto:anatol.php@belski.net]
Sent: Thursday, February 18, 2016 9:16 AM
To: 'Christian Schneider' cschneid@cschneid.com; 'PHP internals'
internals@lists.php.net
Subject: RE: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionHi Christian,
-----Original Message-----
From: Christian Schneider [mailto:cschneid@cschneid.com]
Sent: Wednesday, February 17, 2016 2:07 PM
To: PHP internals internals@lists.php.net
Subject: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionHi there,
we just ran into a version of the bug "JIT bug with lookbehind assertion":
https://bugs.exim.org/show_bug.cgi?id=1189To reproduce it you can use
php -n -r 'ini_set("pcre.jit", 0); echo
preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'
vs.
php -n -r 'ini_set("pcre.jit", 1); echo
preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'Seems valgrind doesn't detect any issues with this code. Using latest 7.0.4-dev
code with PCRE 8.38, valgrind 3.10.0 on Jessie. Is there something else I could
miss?
Could you please write back, what is the out difference between those two commands?
Thanks.
Anatol
Christian,
-----Original Message-----
From: Anatol Belski [mailto:anatol.php@belski.net]
Sent: Friday, February 19, 2016 9:20 AM
To: 'Christian Schneider' cschneid@cschneid.com; 'PHP internals'
internals@lists.php.net
Subject: RE: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionHi Christian,
-----Original Message-----
From: Anatol Belski [mailto:anatol.php@belski.net]
Sent: Thursday, February 18, 2016 9:16 AM
To: 'Christian Schneider' cschneid@cschneid.com; 'PHP internals'
internals@lists.php.net
Subject: RE: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind
assertionHi Christian,
-----Original Message-----
From: Christian Schneider [mailto:cschneid@cschneid.com]
Sent: Wednesday, February 17, 2016 2:07 PM
To: PHP internals internals@lists.php.net
Subject: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionHi there,
we just ran into a version of the bug "JIT bug with lookbehind assertion":
https://bugs.exim.org/show_bug.cgi?id=1189To reproduce it you can use
php -n -r 'ini_set("pcre.jit", 0); echo
preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'
vs.
php -n -r 'ini_set("pcre.jit", 1); echo
preg_replace("/\b(11|21|41)\b/u", "z", "x°11\n");'Seems valgrind doesn't detect any issues with this code. Using latest
7.0.4-dev code with PCRE 8.38, valgrind 3.10.0 on Jessie. Is there
something else I could miss?Could you please write back, what is the out difference between those two
commands?
One more question - are you using (the bundled or external) PCRE 8.38?
Thanks
Anatol
Could you please write back, what is the out difference between those
two commands? Thanks. Anatol
In the first case, it correctly outputs «x°11» (78 c2 b0 7a). With jit
enabled it produces «x�z» (78 c2 7a). That is, it is only outputting the
lower byte of the utf-8 encoding of the U+00B0 character
Tested on PHP 7.0.3 using the system libpcre 8.38
-----Original Message-----
From: Ángel González [mailto:keisial@gmail.com]
Sent: Sunday, February 21, 2016 1:27 AM
To: Anatol Belski anatol.php@belski.net
Cc: 'Christian Schneider' cschneid@cschneid.com; 'PHP internals'
internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionCould you please write back, what is the out difference between those
two commands? Thanks. Anatol
In the first case, it correctly outputs «x°11» (78 c2 b0 7a). With jit enabled it
produces «x z» (78 c2 7a). That is, it is only outputting the lower byte of the utf-8
encoding of the U+00B0 character Tested on PHP 7.0.3 using the system libpcre
8.38
Were you putting the snippets into a file or testing on the console? I had an issue while testing this on the console, that some chars was partially swallowed by terminal (which was a utf-8 terminal). When putting into a file, the output is same for both - "x°z". Please see also the continued discussion in the original ticket https://bugs.exim.org/show_bug.cgi?id=1189 . The offsets delivered by PCRE also seem to be correct, and valgrind doesn't find anything. It were great if you could confirm these insights.
Thanks
Anatol
Am 21.02.2016 um 11:42 schrieb Anatol Belski anatol.php@belski.net:
-----Original Message-----
From: Ángel González [mailto:keisial@gmail.com]
Sent: Sunday, February 21, 2016 1:27 AM
To: Anatol Belski anatol.php@belski.net
Cc: 'Christian Schneider' cschneid@cschneid.com; 'PHP internals'
internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionCould you please write back, what is the out difference between those
two commands? Thanks. Anatol
In the first case, it correctly outputs «x°11» (78 c2 b0 7a). With jit enabled it
produces «x z» (78 c2 7a). That is, it is only outputting the lower byte of the utf-8
encoding of the U+00B0 character Tested on PHP 7.0.3 using the system libpcre
8.38Were you putting the snippets into a file or testing on the console? I had an issue while testing this on the console, that some chars was partially swallowed by terminal (which was a utf-8 terminal). When putting into a file, the output is same for both - "x°z". Please see also the continued discussion in the original ticket https://bugs.exim.org/show_bug.cgi?id=1189 . The offsets delivered by PCRE also seem to be correct, and valgrind doesn't find anything. It were great if you could confirm these insights.
I can reproduce it in a console and in a file.
PCRE Library Version => 8.38 2015-11-23
I also reproduced it with a C program directly using the system PCRE library, no PHP involved.
I attached the C source to https://bugs.exim.org/show_bug.cgi?id=1189
Regards,
- Chris
Were you putting the snippets into a file or testing on the console? I
had an issue while testing this on the console, that some chars was
partially swallowed by terminal (which was a utf-8 terminal). When
putting into a file, the output is same for both - "x°z". Please see
also the continued discussion in the original ticket
https://bugs.exim.org/show_bug.cgi?id=1189 . The offsets delivered by
PCRE also seem to be correct, and valgrind doesn't find anything. It
were great if you could confirm these insights. Thanks Anatol
I was testing on a console and piping into hexdump. Using files showed
-as expected- the same failure.
The pcre-8.39 trunk (r1635), once you enable all the required options,
does fix both the C test case and the php one (loading the new dynamic
library instead of the system one).
So yes, seems fixed in libpcre :)
Hi Ángel,
-----Original Message-----
From: Ángel González [mailto:keisial@gmail.com]
Sent: Thursday, February 25, 2016 9:12 PM
To: Anatol Belski anatol.php@belski.net
Cc: 'Christian Schneider' cschneid@cschneid.com; 'PHP internals'
internals@lists.php.net
Subject: Re: [PHP-DEV] PCRE jit bug with UTF-8 and lookbehind assertionWere you putting the snippets into a file or testing on the console? I
had an issue while testing this on the console, that some chars was
partially swallowed by terminal (which was a utf-8 terminal). When
putting into a file, the output is same for both - "x°z". Please see
also the continued discussion in the original ticket
https://bugs.exim.org/show_bug.cgi?id=1189 . The offsets delivered by
PCRE also seem to be correct, and valgrind doesn't find anything. It
were great if you could confirm these insights. Thanks Anatol
I was testing on a console and piping into hexdump. Using files showed -as
expected- the same failure.
The pcre-8.39 trunk (r1635), once you enable all the required options, does fix
both the C test case and the php one (loading the new dynamic library instead of
the system one).So yes, seems fixed in libpcre :)
Thanks for the further check. Yeah, 8.39 looks to be fixing the issue, just waiting for it to be released to upgrade in 7.0.
Regards
Anatol