Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:88657 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 62137 invoked from network); 4 Oct 2015 10:12:03 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 Oct 2015 10:12:03 -0000 X-Host-Fingerprint: 109.111.195.162 unknown Received: from [109.111.195.162] ([109.111.195.162:10730] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D5/C4-31315-37BF0165 for ; Sun, 04 Oct 2015 06:12:03 -0400 Message-ID: To: internals@lists.php.net References: Date: Sun, 4 Oct 2015 11:12:00 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Firefox/38.0 SeaMonkey/2.35 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Posted-By: 109.111.195.162 Subject: Re: [PHP-DEV] Strings, invalid escape sequences and parse errors From: ajf@ajf.me (Andrea Faulds) Hey Sara, Sara Golemon wrote: > On Fri, Oct 2, 2015 at 6:53 AM, Bishop Bettini wrote: >> Option (b) sounds reasonable, but there's probably A Solid Reason it was >> implemented that way >> > AIUI, the "solid reason" was because it's dangerous to fail silently > where you have high confidence that something is wrong. Again, I > believe in it, but the arguments against option A illustrate why it > might not be practical. I hate to say this, but in the interest of > consistency (were 7.0 not in its final stage) I'd vote for B. I made \u work the way it does because I don't like repeating the mistakes of the past. It deliberately uses {} rather than a variable-length sequence of hex digits. Why? Because this way it's always completely clear where the boundary is between it and any following characters, unlike the mess which is octal and hex escapes. I also made it an error because, well, why shouldn't it be? People don't like it when something fails and you don't tell them. We don't even produce an E_NOTICE or E_STRICT or something for invalid escapes, we just pretend it's not an escape sequence. That's awful: it means code made for a PHP version which supports these escapes will silently break on other PHP versions. In \u's case, it would mean that if the Unicode standard extended or shrunk the range (U+0000–U+10FFFF) in future, we wouldn't be able to change \u's limits to match, because it would silently break PHP code. I'd love it if we could also make a slash followed by an unrecognised character always be an error, but alas, sounds like we can't do that. So for now, we can only make new escape sequences behave sensibly. \u is deliberately inconsistent. You can go replace it with \uXXXX if you really want to, but PHP's users will not thank you. Thanks. -- Andrea Faulds http://ajf.me/