Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:79163
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 82.113.146.227 as permitted sender)
Date: Tue, 25 Nov 2014 11:48:34 +0000 (GMT)
To: Dmitry Stogov <dmitry@zend.com>
cc: Andrea Faulds <ajf@ajf.me>, PHP Internals <internals@lists.php.net>
In-Reply-To: <CA+9eiLvxt=H8w_VTgL-0sTS19DJzdouX4peBzQXFuWmNefwH+w@mail.gmail.com>
Message-ID: <alpine.DEB.2.11.1411251144110.13538@whisky.home.derickrethans.nl>
References: <C2A085AA-3E3A-405F-954B-4C1F68A46012@ajf.me> <CA+9eiLsNHZMTgV0EZFXxYmAx+sE-qbgPrTYE0rH=bFLpSq2rdg@mail.gmail.com> <24EE758F-BF8F-4AE9-B793-20739CD9875D@ajf.me> <CA+9eiLvxt=H8w_VTgL-0sTS19DJzdouX4peBzQXFuWmNefwH+w@mail.gmail.com>
User-Agent: Alpine 2.11 (DEB 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Subject: Re: [PHP-DEV] [RFC] Unicode Escape Syntax
From: derick@php.net (Derick Rethans)

On Tue, 25 Nov 2014, Dmitry Stogov wrote:

> On Tue, Nov 25, 2014 at 1:00 PM, Andrea Faulds <ajf@ajf.me> wrote:
> 
> >
> > > On 25 Nov 2014, at 08:33, Dmitry Stogov <dmitry@zend.com> wrote:
> > >
> > > May be I misunderstood something, but why to introduce unicode escapes
> > if PHP engine doesn't support Unicode.
> >
> > We don't have Unicode strings which are made of codepoints rather than
> > bytes, sure. But we do usually treat these strings as UTF-8. The idea of
> > doing this in a language without Unicode strings isn't new, C/C++ have the
> > u8"" syntax for making UTF-8 strings.
> >
> 
> u8"string" tells that the whole string is UTF-8 encoded.
> Your escape Unicode proposal  assumes just UTF-8 codepoint, but the whole
> string encoding is still undefined.
> 
> 
> >
> > > Always converting such escapes into UTF-8 encoding, doesn't make any
> > sense for people who use other encodings for output, databases, etc.
> >
> > If you're using other encodings, why do you want to use a Unicode
> > codepoints? Most Unicode codepoints will not supported by another character
> > set.
> >
> 
> Agree, this Unicode escapes are not going to be used for anything 
> except UTF-8 encoded strings. I'm not completely against it. It's just 
> an incomplete solution.

I think "incomplete" nails it on the head. Without "proper" Unicode 
support in the parser, compiler and string function semantics, having 
these escape codes doesn't really do a lot for us. I now think it would 
fit better just as part of the "UString" idea - although I still need to 
write my reservations and recommendations down about that too.

cheers,
Derick