Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:88649 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 74525 invoked from network); 2 Oct 2015 21:03:09 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 2 Oct 2015 21:03:09 -0000 Authentication-Results: pb1.pair.com smtp.mail=fsb@thefsb.org; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=fsb@thefsb.org; sender-id=pass Received-SPF: pass (pb1.pair.com: domain thefsb.org designates 173.203.187.67 as permitted sender) X-PHP-List-Original-Sender: fsb@thefsb.org X-Host-Fingerprint: 173.203.187.67 smtp67.iad3a.emailsrvr.com Linux 2.6 Received: from [173.203.187.67] ([173.203.187.67:37162] helo=smtp67.iad3a.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F7/8D-23989-901FE065 for ; Fri, 02 Oct 2015 17:03:08 -0400 Received: from smtp1.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp1.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 979CB1802DD; Fri, 2 Oct 2015 17:03:02 -0400 (EDT) Received: by smtp1.relay.iad3a.emailsrvr.com (Authenticated sender: fsb-AT-thefsb.org) with ESMTPSA id 480741802E4; Fri, 2 Oct 2015 17:03:02 -0400 (EDT) X-Sender-Id: fsb@thefsb.org Received: from yossy.local (c-73-4-147-142.hsd1.ma.comcast.net [73.4.147.142]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:587 (trex/5.4.2); Fri, 02 Oct 2015 21:03:02 GMT To: Sara Golemon , PHP Internals , Peter Cowburn References: Cc: PHP internals Message-ID: <560EF0FA.9050300@thefsb.org> Date: Fri, 2 Oct 2015 17:02:50 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Strings, invalid escape sequences and parse errors From: fsb@thefsb.org (Tom Worster) On 10/2/15 1:04 PM, Sara Golemon wrote: >> On Fri, Oct 2, 2015 at 4:18 AM, Peter Cowburn >> wrote: >> >>> a) change all other "invalid" escape sequences to be a parse error [that >>> would mean "\m" would raise a parse error!] >>> >>> b) change \u{} to behave like any other escape sequence, by not raising a >>> parse error and instead keeping the literal characters >>> >>> or c) tell me to keep quiet and accept the oddball behaviour, having quirks >>> is The PHP Way after all. >>> >> >> Well, I think option (a) would break parsed strings containing regex: >> > Oh holy hell. I was about to point towards A because I agree with > Andrea that our invalid escape handling makes no sense, then you throw > this wrench in the gears. > > While I still think that ignoring invalid sequences is bad and a > recipe for disaster (for example, in a given regex string, you have > some "escapes" passed to the engine as-is, while others like > \t\v\f\r\n do get interpolated, which is so inconsistent and entirely > php it's practically its own meme), I have to be practical about the > fact that there is a TON of existing regex out there (and no small > amount of "\u1234" sequences in JSON blobs). A ton of that existing > regex is also needlessly using double-quotes strings where > single-quotes would have worked, meaning we can't just bifurcate on > that (even though allowing invalid sequences through on single-quotes > makes some sense). > > Ugh... No, that's too big of a change to existing scripts. Can't do > option A, much as I'd like. > >> Option (b) sounds reasonable, but there's probably A Solid Reason it was >> implemented that way >> > AIUI, the "solid reason" was because it's dangerous to fail silently > where you have high confidence that something is wrong. Again, I > believe in it, but the arguments against option A illustrate why it > might not be practical. I hate to say this, but in the interest of > consistency (were 7.0 not in its final stage) I'd vote for B. > >> which if so leaves (c.ii): accepting the odd-ball behavior.... >> > Given that 7.0 is in its final stage, and changing this behaviour is > probably a non-starter at this point. C seems the most sane^W > pragmatic. It's not the first inconsistency PHP's picked up, it won't > be the last. I agree with Sara all the way except the opinion that it's too late to fix this bug with option B, which I think is the right one. I simply don't know if it is too late or not so I suggest Peter enter a bug report and see what happens. If it's too late for 7.0.0 do it in .0.1, which is ok because people will expect instability with 7.0.0. \u{394}semver > 1 is sufficient warning, I think. Tom