Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79138 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 5831 invoked from network); 24 Nov 2014 22:35:30 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Nov 2014 22:35:30 -0000 Authentication-Results: pb1.pair.com smtp.mail=ajf@ajf.me; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ajf@ajf.me; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ajf.me designates 192.64.116.207 as permitted sender) X-PHP-List-Original-Sender: ajf@ajf.me X-Host-Fingerprint: 192.64.116.207 imap2-2.ox.privateemail.com Received: from [192.64.116.207] ([192.64.116.207:51965] helo=imap2-2.ox.privateemail.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BC/23-21335-1B2B3745 for ; Mon, 24 Nov 2014 17:35:30 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.privateemail.com (Postfix) with ESMTP id 59E8D8C0082; Mon, 24 Nov 2014 17:35:26 -0500 (EST) X-Virus-Scanned: Debian amavisd-new at imap2.ox.privateemail.com Received: from mail.privateemail.com ([127.0.0.1]) by localhost (imap2.ox.privateemail.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id slpq8Lgn48En; Mon, 24 Nov 2014 17:35:26 -0500 (EST) Received: from oa-res-27-210.wireless.abdn.ac.uk (oa-res-27-210.wireless.abdn.ac.uk [137.50.27.210]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.privateemail.com (Postfix) with ESMTPSA id 853678C007D; Mon, 24 Nov 2014 17:35:25 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) In-Reply-To: Date: Mon, 24 Nov 2014 22:35:23 +0000 Cc: Sara Golemon , PHP Internals Content-Transfer-Encoding: quoted-printable Message-ID: <13B08117-4BE5-4E0D-A3FF-B6A4D1F9584C@ajf.me> References: To: Adam Harvey X-Mailer: Apple Mail (2.1993) Subject: Re: [PHP-DEV] [RFC] Unicode Escape Syntax From: ajf@ajf.me (Andrea Faulds) > On 24 Nov 2014, at 22:30, Adam Harvey wrote: >=20 > On 24 November 2014 at 14:21, Sara Golemon wrote: >> On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: >>> Here=E2=80=99s a new RFC: https://wiki.php.net/rfc/unicode_escape >>>=20 >> I'm okay with producing UTF-8 even though our strings are technically >> binary. As you state, UTF-8 is the de-facto encoding, and = recognizing >> this is pretty reasonable. >=20 > I'm also OK with this, although I do wonder if we should be respecting > the user's default_charset setting instead. (Since default_charset > defaults to "UTF-8", in practice this isn't a significant difference > for the average user.) Ooh, that would be a possibility. That or using whatever encoding the = source file is specified to be with declare(), so it matches the = encoding of other characters in the string. This=E2=80=99d add significant complexity to it, though (would we have = to require ICU or something? D:), plus the vast majority of Unicode = characters will only be supported by Unicode encodings=E2=80=A6 and of = those, only UTF-8 is really in much use here anyway. >> You may want to make it a requirement that strings containing \u >> escapes are denoted as: u"blah blah" We set aside this format >> back in the PHP6 days (note that b"blah" is equivalent to "blah" for >> binary strings). >=20 > It seems to me that the point of \u and \U escapes is to embed Unicode > in potentially non-Unicode strings, so using u"" doesn't feel right. I don=E2=80=99t really see where you=E2=80=99re coming from, it also = makes just as much sense within Unicode strings. There are plenty of = cases (like the U+202E or ma=C3=B1ana examples in the RFC) where you=E2=80= =99d want a Unicode escape in a Unicode string. -- Andrea Faulds http://ajf.me/