Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79488 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 43086 invoked from network); 9 Dec 2014 02:44:39 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Dec 2014 02:44:39 -0000 Authentication-Results: pb1.pair.com smtp.mail=ajf@ajf.me; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ajf@ajf.me; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ajf.me designates 192.64.116.200 as permitted sender) X-PHP-List-Original-Sender: ajf@ajf.me X-Host-Fingerprint: 192.64.116.200 imap1-2.ox.privateemail.com Received: from [192.64.116.200] ([192.64.116.200:50855] helo=imap1-2.ox.privateemail.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2D/70-39368-61266845 for ; Mon, 08 Dec 2014 21:44:39 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.privateemail.com (Postfix) with ESMTP id 74C5FB0008B; Mon, 8 Dec 2014 21:44:36 -0500 (EST) X-Virus-Scanned: Debian amavisd-new at imap1.ox.privateemail.com Received: from mail.privateemail.com ([127.0.0.1]) by localhost (imap1.ox.privateemail.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id JoZmRwc2iSEY; Mon, 8 Dec 2014 21:44:36 -0500 (EST) Received: from oa-res-26-240.wireless.abdn.ac.uk (oa-res-26-240.wireless.abdn.ac.uk [137.50.26.240]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.privateemail.com (Postfix) with ESMTPSA id BE2BBB0008A; Mon, 8 Dec 2014 21:44:35 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) In-Reply-To: Date: Tue, 9 Dec 2014 02:44:33 +0000 Cc: internals Content-Transfer-Encoding: quoted-printable Message-ID: <10EE9A5B-1711-455A-AB6A-6E7EA858D081@ajf.me> References: To: mario@include-once.org X-Mailer: Apple Mail (2.1993) Subject: Re: [PHP-DEV] [VOTE][RFC] Unicode Codepoint Escape Syntax From: ajf@ajf.me (Andrea Faulds) Hi! > On 9 Dec 2014, at 02:14, mario@include-once.org wrote: >=20 > 2014-12-09 0:51 GMT+01:00 Andrea Faulds : >>=20 >> https://wiki.php.net/rfc/unicode_escape >=20 >=20 > Still leaves unmentioned that there was already an established Unicode > escape syntax. PCRE provides \x{1F520} for codepoints in conjunction = to > plain \xFF for byte escapes. Interesting, I was unaware of that until now, thanks for pointing this = out. > Maybe there should be more elaboration on why PHP itself should go = with > the \u{xxxx} ECMAScript representaton, thus introducing a syntax = disparity > with our most major string handling extension. Well, PCRE does what it does probably because of its name: = *Perl-Compatible* Regular Expressions. Perl has the \x syntax. But = PCRE=E2=80=99s syntax comes from what suits Perl, not PHP, so I don=E2=80=99= t see why we should necessarily match its behaviour. If we add \x{xxxxx} = syntax to PHP=E2=80=99s string literals, then we=E2=80=99ll break = existing code which uses double quoted strings for regular expressions. I think \x{xxxx} is misleading anyway - \xXX is always = single-byte/character, yet Unicode code points can=E2=80=99t be = represented in PHP strings as single bytes when encoded in UTF-8 (unless = they=E2=80=99re below U+0100, of course). If I saw "\x{abcd}=E2=80=9D = I'd expect it to be the same as "\xab\xbc=E2=80=9D. Plus, while Perl has = \x{xxxx} syntax, Ruby and ECMAScript 6 have the \u{xxxx} syntax, so = \u{xxxx} is already more popular. The =E2=80=98u=E2=80=99 in \u{xxxx} = also makes it more obviously =E2=80=9CUnicode=E2=80=9D. Thanks! -- Andrea Faulds http://ajf.me/