Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79136 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 2843 invoked from network); 24 Nov 2014 22:28:36 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Nov 2014 22:28:36 -0000 Authentication-Results: pb1.pair.com smtp.mail=ajf@ajf.me; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ajf@ajf.me; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ajf.me designates 198.187.29.245 as permitted sender) X-PHP-List-Original-Sender: ajf@ajf.me X-Host-Fingerprint: 198.187.29.245 imap11-3.ox.privateemail.com Received: from [198.187.29.245] ([198.187.29.245:57101] helo=imap11-3.ox.privateemail.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 72/82-21335-311B3745 for ; Mon, 24 Nov 2014 17:28:36 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.privateemail.com (Postfix) with ESMTP id D7D0F8800DA; Mon, 24 Nov 2014 17:28:32 -0500 (EST) X-Virus-Scanned: Debian amavisd-new at imap11.ox.privateemail.com Received: from mail.privateemail.com ([127.0.0.1]) by localhost (imap11.ox.privateemail.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Kpzkh9QpHVYL; Mon, 24 Nov 2014 17:28:32 -0500 (EST) Received: from oa-res-27-210.wireless.abdn.ac.uk (oa-res-27-210.wireless.abdn.ac.uk [137.50.27.210]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.privateemail.com (Postfix) with ESMTPSA id 60A7D8800E2; Mon, 24 Nov 2014 17:28:32 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) In-Reply-To: Date: Mon, 24 Nov 2014 22:28:30 +0000 Cc: PHP Internals Content-Transfer-Encoding: quoted-printable Message-ID: References: To: Sara Golemon X-Mailer: Apple Mail (2.1993) Subject: Re: [PHP-DEV] [RFC] Unicode Escape Syntax From: ajf@ajf.me (Andrea Faulds) > On 24 Nov 2014, at 22:21, Sara Golemon wrote: >=20 > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: >> Here=E2=80=99s a new RFC: https://wiki.php.net/rfc/unicode_escape >>=20 > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-facto encoding, and recognizing > this is pretty reasonable. On that note, it strikes me now that we assume an encoding anyway for = all escape sequences. If I=E2=80=99m using EBCDIC or UTF-16, =E2=80=9C\n=E2= =80=9D isn=E2=80=99t going to help me much! > You may want to make it a requirement that strings containing \u > escapes are denoted as: u"blah blah" We set aside this format > back in the PHP6 days (note that b"blah" is equivalent to "blah" for > binary strings). I=E2=80=99d rather keep u"blah blah=E2=80=9D for if/when we add actual = Unicode strings.=20 > On the BMP versus SMP issue of \uXXXX styles, we addressed this in > PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six > hexit codepoints. e.g. "\u1234" =3D=3D=3D "\U001234" I'd rather > follow this style than making \u special and different from hex and > octal notations by using braces. That is something I=E2=80=99d thought about. \U takes 8 hex digits in = every other language which has it, though. I suppose we could do this, it resolves the BMP issue, certainly. Still, = I think the brace syntax has its advantages because it=E2=80=99s = completely unambiguous and it means we only have one syntax for this, = not two different ones (less mental overhead). Plus, it=E2=80=99s worth = noting that \u would still be different from \ooo and \xXX anyway, as = it=E2=80=99d be fixed-length while octal and hex aren=E2=80=99t. -- Andrea Faulds http://ajf.me/