Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79154 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 59249 invoked from network); 25 Nov 2014 10:32:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Nov 2014 10:32:43 -0000 Authentication-Results: pb1.pair.com smtp.mail=derick@php.net; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=derick@php.net; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 82.113.146.227 as permitted sender) X-PHP-List-Original-Sender: derick@php.net X-Host-Fingerprint: 82.113.146.227 xdebug.org Linux 2.6 Received: from [82.113.146.227] ([82.113.146.227:34556] helo=xdebug.org) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 19/F2-40624-BCA54745 for ; Tue, 25 Nov 2014 05:32:43 -0500 Received: from localhost (localhost [IPv6:::1]) by xdebug.org (Postfix) with ESMTPS id 175DB10DDF2; Tue, 25 Nov 2014 10:32:40 +0000 (GMT) Date: Tue, 25 Nov 2014 10:32:39 +0000 (GMT) X-X-Sender: derick@whisky.home.derickrethans.nl To: Sara Golemon cc: Andrea Faulds , PHP Internals In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-219015596-1416911560=:13538" Subject: Re: [PHP-DEV] [RFC] Unicode Escape Syntax From: derick@php.net (Derick Rethans) --8323329-219015596-1416911560=:13538 Content-Type: TEXT/PLAIN; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 24 Nov 2014, Sara Golemon wrote: > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: > > Here=E2=80=99s a new RFC: https://wiki.php.net/rfc/unicode_escape > > > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-facto encoding, and recognizing > this is pretty reasonable. >=20 > You may want to make it a requirement that strings containing \u > escapes are denoted as: u"blah blah" We set aside this format > back in the PHP6 days (note that b"blah" is equivalent to "blah" for > binary strings). >=20 > On the BMP versus SMP issue of \uXXXX styles, we addressed this in > PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six > hexit codepoints. e.g. "\u1234" =3D=3D=3D "\U001234" I'd rather > follow this style than making \u special and different from hex and > octal notations by using braces. I agree with this fully. No need to reinvent a wheel (that we left=20 behind on the road)... cheers, Derick --8323329-219015596-1416911560=:13538--