Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79491 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 60396 invoked from network); 9 Dec 2014 08:15:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Dec 2014 08:15:38 -0000 Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.214 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.214 mail4-2.serversure.net Linux 2.6 Received: from [217.147.176.214] ([217.147.176.214:33332] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 1F/32-39368-6AFA6845 for ; Tue, 09 Dec 2014 03:15:36 -0500 Received: (qmail 24821 invoked by uid 89); 9 Dec 2014 08:15:32 -0000 Received: by simscan 1.3.1 ppid: 24815, pid: 24818, t: 0.0728s scanners: attach: 1.3.1 clamav: 0.96/m:52/d:10677 Received: from unknown (HELO ?10.0.0.8?) (lester@rainbowdigitalmedia.org.uk@86.178.188.220) by mail4.serversure.net with ESMTPA; 9 Dec 2014 08:15:32 -0000 Message-ID: <5486AFA3.3000402@lsces.co.uk> Date: Tue, 09 Dec 2014 08:15:31 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: internals@lists.php.net References: <10EE9A5B-1711-455A-AB6A-6E7EA858D081@ajf.me> In-Reply-To: <10EE9A5B-1711-455A-AB6A-6E7EA858D081@ajf.me> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] [VOTE][RFC] Unicode Codepoint Escape Syntax From: lester@lsces.co.uk (Lester Caine) On 09/12/14 02:44, Andrea Faulds wrote: >> Maybe there should be more elaboration on why PHP itself should go with >> > the \u{xxxx} ECMAScript representaton, thus introducing a syntax disparity >> > with our most major string handling extension. > Well, PCRE does what it does probably because of its name: *Perl-Compatible* Regular Expressions. Perl has the \x syntax. But PCRE’s syntax comes from what suits Perl, not PHP, so I don’t see why we should necessarily match its behaviour. If we add \x{xxxxx} syntax to PHP’s string literals, then we’ll break existing code which uses double quoted strings for regular expressions. > > I think \x{xxxx} is misleading anyway - \xXX is always single-byte/character, yet Unicode code points can’t be represented in PHP strings as single bytes when encoded in UTF-8 (unless they’re below U+0100, of course). If I saw "\x{abcd}” I'd expect it to be the same as "\xab\xbc”. Plus, while Perl has \x{xxxx} syntax, Ruby and ECMAScript 6 have the \u{xxxx} syntax, so \u{xxxx} is already more popular. The ‘u’ in \u{xxxx} also makes it more obviously “Unicode”. If ICU is to be adopted as the base for unicode support, then surely everything else should follow those rules? \uhhhh and \Uhhhhhhhh are defined along with \x{hhhhhh} so does it make sense to add something which is not part of ICU? -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk