Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:51697 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 17778 invoked from network); 15 Mar 2011 12:42:05 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Mar 2011 12:42:05 -0000 Authentication-Results: pb1.pair.com smtp.mail=mail_ben_schmidt@yahoo.com.au; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=mail_ben_schmidt@yahoo.com.au; sender-id=unknown; domainkeys=good Received-SPF: error (pb1.pair.com: domain yahoo.com.au from 98.139.212.185 cause and error) DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: mail_ben_schmidt@yahoo.com.au X-Host-Fingerprint: 98.139.212.185 nm26.bullet.mail.bf1.yahoo.com Received: from [98.139.212.185] ([98.139.212.185:36404] helo=nm26.bullet.mail.bf1.yahoo.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 5E/62-03183-C9E5F7D4 for ; Tue, 15 Mar 2011 07:42:05 -0500 Received: from [98.139.212.149] by nm26.bullet.mail.bf1.yahoo.com with NNFMP; 15 Mar 2011 12:42:02 -0000 Received: from [98.139.212.249] by tm6.bullet.mail.bf1.yahoo.com with NNFMP; 15 Mar 2011 12:42:02 -0000 Received: from [127.0.0.1] by omp1058.mail.bf1.yahoo.com with NNFMP; 15 Mar 2011 12:42:02 -0000 X-Yahoo-Newman-Id: 134798.63829.bm@omp1058.mail.bf1.yahoo.com Received: (qmail 11401 invoked from network); 15 Mar 2011 12:42:01 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=DKIM-Signature:Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=QhnF0LQkV3Z2sGf8rwYkGpo8O9lsDbBXZeEvOFar3e5lKOUUvaQxhm57HSr8KAqoMPq4LSTIPfpl/qjWkaS/1TQM2NUiY81pXTAovu6jPo5/B2eLb1/xiv8y/reYmQVs6Tm1uYT6H5wbpQc2Z3nLzk0rdLBo4ymSi+bnXHY5c8Q= ; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com.au; s=s1024; t=1300192921; bh=mNnV8LWmowYizAa91Cw299aIowuchWiiHEUTNCSBE/4=; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=EASq+x4saG5LYwP3w274rJ36WCuFW1oPKfnG0l1nNJL+fz8jLgd4eEJSTXmyQmdOdoBqA7OzNA9Q3FSrKADeq7uyYWQT0hy4yJ6jiAhw96Y+3J7yxAGiTtfjWFrBx34Q/10u/WXR6Ralf57oufZ6wxQuuTLSmLYYh28cyqRVFts= Received: from thought.local (mail_ben_schmidt@203.217.72.107 with plain) by smtp133.mail.mud.yahoo.com with SMTP; 15 Mar 2011 05:42:00 -0700 PDT X-Yahoo-SMTP: enFMnPSswBAexaHyzgobwuUTrYOhZdJ0KRA2SjA- X-YMail-OSG: jBJjXZcVM1mBPBp5YyQvA7wHXJxAQ2TpacK22NnWEzYIlFu c0AjBwZZf4AoNv4xvc_BFzoNhBuSEa6Dd5LpEGnGauMJeL8AiFSDKZRDv0hz rkzgEr6jw1ehDRYxtv5g..X65LRbyW_l2WGeTaHXB4gNV.AqrcKBvMCBGk70 VqxWydGPmJDzVJ0iJmjiHguRPLnE4rv02ts6Q.5z2zmSzG6EpBVePagCbIgR R.JjLV4y7TMvs27p4CBhIE.TaWx6icxUq5bFZAYu4H3xHFjGir_F1l2DCJ3R FLbCmV67jbrHKz9c- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4D7F5E96.8040507@yahoo.com.au> Date: Tue, 15 Mar 2011 23:41:58 +1100 User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-GB; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 ThunderBrowse/3.3.5 MIME-Version: 1.0 To: RQuadling@googlemail.com CC: internals@lists.php.net References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] preg_replace does not replace all occurrences From: mail_ben_schmidt@yahoo.com.au (Ben Schmidt) >>>> static $re = '/(^|[^\\\\])\'/'; >> >> Did no one see why the regex was wrong? I saw what the regex was. I didn't think like you that it was 'wrong'. Once you unescape the characters in the PHP single-quoted string above (where two backslashes count as one, and backslash-quote counts as a quote), the actual pattern that reaches the preg_replace function is: /(^|[^\\])'/ >> RegexBuddy (a windows app) explains regexes VERY VERY well. What kind of patterns? Does it support PCRE ones? > The important bit (where the problem lies with regard to the regex) is > ... > > Match a single character NOT present in the list below «[^\\\\]» > A \ character «\\» > A \ character «\\» This is not the case. 1. As above, the pattern reaching preg_replace is /(^|[^\\])'/ 2. PCRE, unlike many other regular expression implementations, allows backslash-escaping inside character classes (square brackets). So the doubled backslash only actually counts as a single backslash character to be excluded from the set of characters the atom will match. There is no error here. (And even if there were two backslashes being excluded, of course, it wouldn't hurt anything or change the meaning of the pattern.) > The issue is the word _single_. I don't think anybody thought otherwise. The problem was that, to a casual observer, the pattern seems to mean "a quote which doesn't already have a backslash before it". I believe this was its intent. (And the replacement added the 'missing' backslash.) But the pattern doesn't mean that. It actually means "a character which isn't a backslash, followed by a quote". This is subtly different. And it's most noticeable when two quotes follow each other in the subject string. In str''str first the pattern matches "r'" (non-backslash followed by quote), and then it keeps searching from that point, i.e. it searches "'str". Since this isn't the beginning of the string, and there is no quote following a non-backslash character, there are no further matches. Now, here is a pattern which actually means "a quote which doesn't already have a backslash before it" which is achieved by means of a lookbehind assertion, which, even when searching the string after the first match, "'str", still 'looks back' on the earlier part of the string to recognise the second quote is not preceded by a backslash and match a second time: /(^|(?