I chose the simplest example to show the preg_replace behavior, there are
better (and safer) ways to scape slash characters.
Anyways, is this the expected preg_replace behavior?
Martin
<?php
function test($str) {
static $re = '/(^|[^\\])'/';
static $change = '$1\'';
echo $str, PHP_EOL,
preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
}
test("str '' str"); // bug?
test("str \'\' str"); // ok
test("'str'"); // ok
test("'str'"); // ok
Expected:
str '' str
str '' str
str '' str
str '' str
'str'
'str'
'str'
'str'
Result:
str '' str
str '' str
str '' str
str '' str
'str'
'str'
'str'
'str'
Martin Scotta
<?php
function test($str) {
static $re = '/(^|[^\\])'/';
static $change = '$1\'';echo $str, PHP_EOL,
preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
}test("str '' str"); // bug?
test("str \'\' str"); // ok
test("'str'"); // ok
test("'str'"); // ok
Your regex is ...
(^|[^\\])'
Options: case insensitive; ^ and $ match at line breaks
Match the regular expression below and capture its match into
backreference number 1 «(^|[^\\])»
Match either the regular expression below (attempting the next
alternative only if this one fails) «^»
Assert position at the beginning of a line (at beginning of the
string or after a line break character) «^»
Or match regular expression number 2 below (the entire group fails
if this one fails to match) «[^\\]»
Match a single character NOT present in the list below «[^\\]»
A \ character «\»
A \ character «\»
Match the character “'” literally «'»
I think [^\\] is wrong and you want it to be ...
(?!\\)
or
(?!\{2})
With that, the output is ...
str '' str
str '' str
str '' str
str \'\' str
'str'
'str'
'str'
\'str\'
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
I chose the simplest example to show the preg_replace behavior,
You've GOT to be kidding. The SIMPLEST?!
How about an example that doesn't require escaping ALL the interesting
characters involved?
Here's a modified version that I think it quite a bit simpler:
<?php
function test($str) {
static $re = '/(^|[^a])b/';
static $change = '$1ab';
echo $str, PHP_EOL; // input
echo preg_replace($re, $change, $str), PHP_EOL, PHP_EOL; // output
}
test("str bb str"); // bug?
test("str abab str"); // ok
test("b str b"); // ok
test("ab str ab"); // ok
?>
The way I interpret it, it should put an 'a' before every 'b' that is
not already preceded by an 'a'.
But the buggy case gives 'str abb str' rather than the expected
'str abab str'.
It does look like a bug to me.
Ben.
there are
better (and safer) ways to scape slash characters.
Anyways, is this the expected preg_replace behavior?Martin
<?php
function test($str) {
static $re = '/(^|[^\\])'/';
static $change = '$1\'';echo $str, PHP_EOL, preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
}
test("str '' str"); // bug?
test("str \'\' str"); // ok
test("'str'"); // ok
test("'str'"); // ok
Expected:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'
Result:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'Martin Scotta
I chose the simplest example to show the preg_replace behavior,
You've GOT to be kidding. The SIMPLEST?!
How about an example that doesn't require escaping ALL the interesting
characters involved?Here's a modified version that I think it quite a bit simpler:
<?php
function test($str) {
static $re = '/(^|[^a])b/';
static $change = '$1ab';echo $str, PHP_EOL; // input
echo preg_replace($re, $change, $str), PHP_EOL, PHP_EOL; // output
}test("str bb str"); // bug?
test("str abab str"); // ok
test("b str b"); // ok
test("ab str ab"); // ok
?>The way I interpret it, it should put an 'a' before every 'b' that is
not already preceded by an 'a'.But the buggy case gives 'str abb str' rather than the expected
'str abab str'.It does look like a bug to me.
Actually, no it doesn't.
The behaviour is correct.
Matches cannot overlap. Since the character preceding 'b' is part of the
match, there is only one match in the string 'str bb str'. The match is
' b'. After that match, the
You actually want an assertion. I think this:
static $re = '/(^|(?<!a))b/';
Ben.
What is more likely to be wrong? Your understanding of a specific
regex pattern (which happens to be full of escapes making it
incredibly hard to read) or the implementation of preg_replace?
~Hannes
I chose the simplest example to show the preg_replace behavior, there are
better (and safer) ways to scape slash characters.
Anyways, is this the expected preg_replace behavior?Martin
<?php
function test($str) {
static $re = '/(^|[^\\])'/';
static $change = '$1\'';echo $str, PHP_EOL,
preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
}test("str '' str"); // bug?
test("str \'\' str"); // ok
test("'str'"); // ok
test("'str'"); // ok
Expected:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'
Result:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'Martin Scotta
What is more likely to be wrong? Your understanding of a specific
regex pattern (which happens to be full of escapes making it
incredibly hard to read) or the implementation of preg_replace?~Hannes
I chose the simplest example to show the preg_replace behavior, there are
better (and safer) ways to scape slash characters.
Anyways, is this the expected preg_replace behavior?Martin
<?php
function test($str) {
static $re = '/(^|[^\\])'/';
static $change = '$1\'';echo $str, PHP_EOL,
preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
}test("str '' str"); // bug?
test("str \'\' str"); // ok
test("'str'"); // ok
test("'str'"); // ok
Expected:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'
Result:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'Martin Scotta
--
Did no one see why the regex was wrong?
RegexBuddy (a windows app) explains regexes VERY VERY well.
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
What is more likely to be wrong? Your understanding of a specific
regex pattern (which happens to be full of escapes making it
incredibly hard to read) or the implementation of preg_replace?~Hannes
I chose the simplest example to show the preg_replace behavior, there are
better (and safer) ways to scape slash characters.
Anyways, is this the expected preg_replace behavior?Martin
<?php
function test($str) {
static $re = '/(^|[^\\])'/';
static $change = '$1\'';echo $str, PHP_EOL,
preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
}test("str '' str"); // bug?
test("str \'\' str"); // ok
test("'str'"); // ok
test("'str'"); // ok
Expected:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'
Result:
str '' str
str '' strstr '' str
str '' str'str'
'str''str'
'str'Martin Scotta
--
Did no one see why the regex was wrong?
RegexBuddy (a windows app) explains regexes VERY VERY well.
The important bit (where the problem lies with regard to the regex) is ...
Match a single character NOT present in the list below «[^\\]»
A \ character «\»
A \ character «\»
The issue is the word single.
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY
static $re = '/(^|[^\\\\])\'/';
Did no one see why the regex was wrong?
I saw what the regex was. I didn't think like you that it was 'wrong'.
Once you unescape the characters in the PHP single-quoted string above
(where two backslashes count as one, and backslash-quote counts as a
quote), the actual pattern that reaches the preg_replace function is:
/(^|[^\\])'/
RegexBuddy (a windows app) explains regexes VERY VERY well.
What kind of patterns? Does it support PCRE ones?
The important bit (where the problem lies with regard to the regex) is
...Match a single character NOT present in the list below «[^\\]»
A \ character «\»
A \ character «\»
This is not the case.
-
As above, the pattern reaching preg_replace is /(^|[^\])'/
-
PCRE, unlike many other regular expression implementations, allows
backslash-escaping inside character classes (square brackets). So the
doubled backslash only actually counts as a single backslash character
to be excluded from the set of characters the atom will match.
There is no error here. (And even if there were two backslashes being
excluded, of course, it wouldn't hurt anything or change the meaning of
the pattern.)
The issue is the word single.
I don't think anybody thought otherwise.
The problem was that, to a casual observer, the pattern seems to mean "a
quote which doesn't already have a backslash before it". I believe this
was its intent. (And the replacement added the 'missing' backslash.)
But the pattern doesn't mean that. It actually means "a character which
isn't a backslash, followed by a quote". This is subtly different.
And it's most noticeable when two quotes follow each other in the
subject string. In
str''str
first the pattern matches "r'" (non-backslash followed by quote), and
then it keeps searching from that point, i.e. it searches "'str". Since
this isn't the beginning of the string, and there is no quote following
a non-backslash character, there are no further matches.
Now, here is a pattern which actually means "a quote which doesn't
already have a backslash before it" which is achieved by means of a
lookbehind assertion, which, even when searching the string after the
first match, "'str", still 'looks back' on the earlier part of the
string to recognise the second quote is not preceded by a backslash and
match a second time:
/(^|(?<!\\))'/
As a PHP single-quoted string this is:
'/(^|(?<!\\\\))\'/'
Hope this helps,
Ben.
Now, here is a pattern which actually means "a quote which doesn't
already have a backslash before it" which is achieved by means of a
lookbehind assertion, which, even when searching the string after the
first match, "'str", still 'looks back' on the earlier part of the
string to recognise the second quote is not preceded by a backslash and
match a second time:/(^|(?<!\))'/
As a PHP single-quoted string this is:
'/(^|(?<!\\))'/'
And I should mention, as Martin did, that this actually isn't a good
idea. There are better/safer ways to escape quotes. In particular,
consider how this subject string
str\\'; delete from users;
will not have the quote escaped, because it is preceded by two
backslashes. To match more carefully, you have to be careful to 'eat
backslashes in pairs'. Someone gave a pattern that attempted to do
something like that in an earlier post, too.
Ben.
[snip]
Hope this helps,
Ben.
As an outsider in this discussion, I'd just like to applaud you for one
of the best, in-depth, most patient and most thorough explanations I
have ever seen on a mailing list.
Dave
static $re = '/(^|[^\\])'/';
Did no one see why the regex was wrong?
I saw what the regex was. I didn't think like you that it was 'wrong'.
Once you unescape the characters in the PHP single-quoted string above
(where two backslashes count as one, and backslash-quote counts as a
quote), the actual pattern that reaches the preg_replace function is:/(^|[^\])'/
RegexBuddy (a windows app) explains regexes VERY VERY well.
What kind of patterns? Does it support PCRE ones?
Yep and MANY other flavours (C#, C++, Dephi, Groovy, Java,
Javascript, MySQL, ...)
The important bit (where the problem lies with regard to the regex) is
...Match a single character NOT present in the list below «[^\\]»
A \ character «\»
A \ character «\»This is not the case.
As above, the pattern reaching preg_replace is /(^|[^\])'/
PCRE, unlike many other regular expression implementations, allows
backslash-escaping inside character classes (square brackets). So the
doubled backslash only actually counts as a single backslash character
to be excluded from the set of characters the atom will match.There is no error here. (And even if there were two backslashes being
excluded, of course, it wouldn't hurt anything or change the meaning of
the pattern.)The issue is the word single.
I don't think anybody thought otherwise.
The problem was that, to a casual observer, the pattern seems to mean "a
quote which doesn't already have a backslash before it". I believe this
was its intent. (And the replacement added the 'missing' backslash.)But the pattern doesn't mean that. It actually means "a character which
isn't a backslash, followed by a quote". This is subtly different.And it's most noticeable when two quotes follow each other in the
subject string. Instr''str
first the pattern matches "r'" (non-backslash followed by quote), and
then it keeps searching from that point, i.e. it searches "'str". Since
this isn't the beginning of the string, and there is no quote following
a non-backslash character, there are no further matches.Now, here is a pattern which actually means "a quote which doesn't
already have a backslash before it" which is achieved by means of a
lookbehind assertion, which, even when searching the string after the
first match, "'str", still 'looks back' on the earlier part of the
string to recognise the second quote is not preceded by a backslash and
match a second time:/(^|(?<!\))'/
As a PHP single-quoted string this is:
'/(^|(?<!\\))'/'
Hope this helps,
Ben.
If I say ...
<?php
echo '/(^|[^\\])'/';
?>
I get ...
/(^|[^\])'/
which is explained as ...
(^|[^\])'
Options: case insensitive; ^ and $ match at line breaks
Match the regular expression below and capture its match into
backreference number 1 «(^|[^\])»
Match either the regular expression below (attempting the next
alternative only if this one fails) «^»
Assert position at the beginning of a line (at beginning of the
string or after a line break character) «^»
Or match regular expression number 2 below (the entire group fails
if this one fails to match) «[^\]»
Match any character that is NOT a \ character «[^\]»
Match the character “'” literally «'»
And that certainly makes a LOT more sense.
Decoding regexes and handling the escaping needed for the language is
a real headache sometimes.
Just imagine creating regex code for use by client side Javascript using PHP.
8 \ in a row for a single \ wouldn't be impossible.
Sorry for the confusion.
--
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY