Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79139 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 7392 invoked from network); 24 Nov 2014 22:41:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Nov 2014 22:41:43 -0000 Authentication-Results: pb1.pair.com header.from=adam@adamharvey.name; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=adam@adamharvey.name; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain adamharvey.name designates 209.85.223.179 as permitted sender) X-PHP-List-Original-Sender: adam@adamharvey.name X-Host-Fingerprint: 209.85.223.179 mail-ie0-f179.google.com Received: from [209.85.223.179] ([209.85.223.179:34435] helo=mail-ie0-f179.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 32/83-21335-624B3745 for ; Mon, 24 Nov 2014 17:41:43 -0500 Received: by mail-ie0-f179.google.com with SMTP id rp18so9920861iec.10 for ; Mon, 24 Nov 2014 14:41:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adamharvey.name; s=google; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=/+8w3afuI/e7IYUxVUFJcg0hSyEGsIVARM43ePoKIJI=; b=fqVT7KzZIZ9C2orY+W1+Zwxs2ISrKCaa46vmri2R+DyxEg/QtZO4UA3bhuCL0felVq /sZBYOkg6KPzK/KXx15RJrEaCN6e/UbOoVrxQhSFWCL+iRpaWTnElkdWWkwceqEuoqFW rkmn7Y6GZFYL8O7IUdIaO8V+DCwg3fyhZNqQM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=/+8w3afuI/e7IYUxVUFJcg0hSyEGsIVARM43ePoKIJI=; b=I642BBuNyR8swmUIMhZopl4AJASC1rvPb9BR9HfCAwKuuEOhi9l5kIWzsrrpUL6KFG +0UZnjSy0bRnZsoechsmPuRg1XlXbzBENA3rHA+IPOxyUlysOkmTCAdE/qWzd3GeRzqN Fjava/q5u+vxdhWtNEKgXjiWK/5LVjdpOixnxZKTAZZ829uNf7EfgODcC9JKaegeQe9H b20sGzGMcoGNaERFsQTEI1PzpB01DU6T/FicFvr1gn19qHRgaKYsHzyOeQOCOS/P9mxs xKgEQh/eboT/9xDrDZZ/Zm5ZB2krATZylEnYD2zERRA3Vzdg9fPJFXTidIfoV5ugRuK/ 8PMA== X-Gm-Message-State: ALoCoQnXIUduSnLfzS198uvHZHPjTfZXKd1+Qpg+7U7mONAfaayhIjrtonfd9Y4RUGa/69tVCm7q X-Received: by 10.50.3.67 with SMTP id a3mr14284417iga.42.1416868900266; Mon, 24 Nov 2014 14:41:40 -0800 (PST) MIME-Version: 1.0 Sender: adam@adamharvey.name Received: by 10.42.86.129 with HTTP; Mon, 24 Nov 2014 14:41:19 -0800 (PST) In-Reply-To: <13B08117-4BE5-4E0D-A3FF-B6A4D1F9584C@ajf.me> References: <13B08117-4BE5-4E0D-A3FF-B6A4D1F9584C@ajf.me> Date: Mon, 24 Nov 2014 14:41:19 -0800 X-Google-Sender-Auth: C9Y4jBphSl_6pXsD8lP9gyJbq_s Message-ID: To: Andrea Faulds Cc: Sara Golemon , PHP Internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] [RFC] Unicode Escape Syntax From: aharvey@php.net (Adam Harvey) On 24 November 2014 at 14:35, Andrea Faulds wrote: > >> On 24 Nov 2014, at 22:30, Adam Harvey wrote: >> I'm also OK with this, although I do wonder if we should be respecting >> the user's default_charset setting instead. (Since default_charset >> defaults to "UTF-8", in practice this isn't a significant difference >> for the average user.) > > Ooh, that would be a possibility. That or using whatever encoding the sou= rce file is specified to be with declare(), so it matches the encoding of o= ther characters in the string. > > This=E2=80=99d add significant complexity to it, though (would we have to= require ICU or something? D:), plus the vast majority of Unicode character= s will only be supported by Unicode encodings=E2=80=A6 and of those, only U= TF-8 is really in much use here anyway. We would have to require ICU, but that might be worthwhile for PHP 7 anyway. Having at least one i18n API that's guaranteed to be available would be nice. >>> You may want to make it a requirement that strings containing \u >>> escapes are denoted as: u"blah blah" We set aside this format >>> back in the PHP6 days (note that b"blah" is equivalent to "blah" for >>> binary strings). >> >> It seems to me that the point of \u and \U escapes is to embed Unicode >> in potentially non-Unicode strings, so using u"" doesn't feel right. > > I don=E2=80=99t really see where you=E2=80=99re coming from, it also make= s just as much sense within Unicode strings. There are plenty of cases (lik= e the U+202E or ma=C3=B1ana examples in the RFC) where you=E2=80=99d want a= Unicode escape in a Unicode string. I probably worded that badly =E2=80=94 I just mean that I don't think \u an= d \U should be limited to only u"" strings, but should work in normal strings as well. (In other words, I'm agreeing with what's in your RFC, not with Sara.) Adam