Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79163 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75818 invoked from network); 25 Nov 2014 11:48:40 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Nov 2014 11:48:40 -0000 Authentication-Results: pb1.pair.com header.from=derick@php.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=derick@php.net; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 82.113.146.227 as permitted sender) X-PHP-List-Original-Sender: derick@php.net X-Host-Fingerprint: 82.113.146.227 xdebug.org Linux 2.6 Received: from [82.113.146.227] ([82.113.146.227:50032] helo=xdebug.org) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A6/56-40624-69C64745 for ; Tue, 25 Nov 2014 06:48:39 -0500 Received: from localhost (localhost [IPv6:::1]) by xdebug.org (Postfix) with ESMTPS id 2E0C3E202D; Tue, 25 Nov 2014 11:48:35 +0000 (GMT) Date: Tue, 25 Nov 2014 11:48:34 +0000 (GMT) X-X-Sender: derick@whisky.home.derickrethans.nl To: Dmitry Stogov cc: Andrea Faulds , PHP Internals In-Reply-To: Message-ID: References: <24EE758F-BF8F-4AE9-B793-20739CD9875D@ajf.me> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Re: [PHP-DEV] [RFC] Unicode Escape Syntax From: derick@php.net (Derick Rethans) On Tue, 25 Nov 2014, Dmitry Stogov wrote: > On Tue, Nov 25, 2014 at 1:00 PM, Andrea Faulds wrote: > > > > > > On 25 Nov 2014, at 08:33, Dmitry Stogov wrote: > > > > > > May be I misunderstood something, but why to introduce unicode escapes > > if PHP engine doesn't support Unicode. > > > > We don't have Unicode strings which are made of codepoints rather than > > bytes, sure. But we do usually treat these strings as UTF-8. The idea of > > doing this in a language without Unicode strings isn't new, C/C++ have the > > u8"" syntax for making UTF-8 strings. > > > > u8"string" tells that the whole string is UTF-8 encoded. > Your escape Unicode proposal assumes just UTF-8 codepoint, but the whole > string encoding is still undefined. > > > > > > > Always converting such escapes into UTF-8 encoding, doesn't make any > > sense for people who use other encodings for output, databases, etc. > > > > If you're using other encodings, why do you want to use a Unicode > > codepoints? Most Unicode codepoints will not supported by another character > > set. > > > > Agree, this Unicode escapes are not going to be used for anything > except UTF-8 encoded strings. I'm not completely against it. It's just > an incomplete solution. I think "incomplete" nails it on the head. Without "proper" Unicode support in the parser, compiler and string function semantics, having these escape codes doesn't really do a lot for us. I now think it would fit better just as part of the "UString" idea - although I still need to write my reservations and recommendations down about that too. cheers, Derick