Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79137 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 4309 invoked from network); 24 Nov 2014 22:30:39 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Nov 2014 22:30:39 -0000 Authentication-Results: pb1.pair.com smtp.mail=adam@adamharvey.name; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=adam@adamharvey.name; sender-id=pass Received-SPF: pass (pb1.pair.com: domain adamharvey.name designates 209.85.213.169 as permitted sender) X-PHP-List-Original-Sender: adam@adamharvey.name X-Host-Fingerprint: 209.85.213.169 mail-ig0-f169.google.com Received: from [209.85.213.169] ([209.85.213.169:57574] helo=mail-ig0-f169.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 67/D2-21335-E81B3745 for ; Mon, 24 Nov 2014 17:30:39 -0500 Received: by mail-ig0-f169.google.com with SMTP id hl2so5833411igb.2 for ; Mon, 24 Nov 2014 14:30:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adamharvey.name; s=google; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=q+vrako7QOOKxK+YqyoyR7FOzoGzbs60MnsZItomTTg=; b=MUBeG4asNhxRuO2dz8ra/uGhQOyQLY72XhFXt7sQvlKuare3wsVTKbF0hzsbiHIXYQ 05eWrtEWRHESh1MviEC52N07FEAZDs6seQlhy0v9jDtI18brF0FnSjFootFov8z394JS fbVUa080Ljc44TQOOTeigDIIPXMxoWOcnFtcw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=q+vrako7QOOKxK+YqyoyR7FOzoGzbs60MnsZItomTTg=; b=CML0D+vGPiX3zH7dBCXVyePDqROTDlk6Pn6HEqK5wRqSlB+YjMEy78Mo2Z3dy+L9GX PB/K/Vys/sAAUW8Oj68MqpZATo4zF172s+7hl1Hk1btqBPLATwTMC65xm7hTbbNVkLc9 EW2iF6VXYVQiq4fgIceifS3/CNpMHqIMvBqvh820US0Dmu0WK8/YFmnzP6wvVq4Z8XeP kwCqEQkmhoOEOX39Y2Xe4j3R+6xYjyyowizJ8DA2urKJFgW0/S7QsF6+kZOsJWxoQhQ5 9AfguCpopq8sm2Jxt52bmQs7gvUFIRNfUxbBpHopJXgY0w44fluKs/8qaKvlivgVL6uE 8Etg== X-Gm-Message-State: ALoCoQlRXd7nGYYM3Q5eDKz4sSrw3SQ5OH4i5RIGHOODQsBXwx1IMEuE5Y/jxllhrxsl5ym9r8go X-Received: by 10.50.3.67 with SMTP id a3mr14231212iga.42.1416868236181; Mon, 24 Nov 2014 14:30:36 -0800 (PST) MIME-Version: 1.0 Sender: adam@adamharvey.name Received: by 10.42.86.129 with HTTP; Mon, 24 Nov 2014 14:30:15 -0800 (PST) In-Reply-To: References: Date: Mon, 24 Nov 2014 14:30:15 -0800 X-Google-Sender-Auth: FaNbfFAAZMfpunYSprcnqLWLZKY Message-ID: To: Sara Golemon Cc: Andrea Faulds , PHP Internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] [RFC] Unicode Escape Syntax From: aharvey@php.net (Adam Harvey) On 24 November 2014 at 14:21, Sara Golemon wrote: > On Mon, Nov 24, 2014 at 2:09 PM, Andrea Faulds wrote: >> Here=E2=80=99s a new RFC: https://wiki.php.net/rfc/unicode_escape >> > I'm okay with producing UTF-8 even though our strings are technically > binary. As you state, UTF-8 is the de-facto encoding, and recognizing > this is pretty reasonable. I'm also OK with this, although I do wonder if we should be respecting the user's default_charset setting instead. (Since default_charset defaults to "UTF-8", in practice this isn't a significant difference for the average user.) > You may want to make it a requirement that strings containing \u > escapes are denoted as: u"blah blah" We set aside this format > back in the PHP6 days (note that b"blah" is equivalent to "blah" for > binary strings). It seems to me that the point of \u and \U escapes is to embed Unicode in potentially non-Unicode strings, so using u"" doesn't feel right. > On the BMP versus SMP issue of \uXXXX styles, we addressed this in > PHP6 by making \u denote 4 hexit BMP codepoints, while \U denoted six > hexit codepoints. e.g. "\u1234" =3D=3D=3D "\U001234" I'd rather > follow this style than making \u special and different from hex and > octal notations by using braces. I think I prefer the brace style, personally. Non-BMP codepoints have become more important since PHP 6 (thanks, emoji), and having \u and \U be case sensitive when \x isn't seems confusing. Adam