Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:79509 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 7739 invoked from network); 9 Dec 2014 16:54:34 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Dec 2014 16:54:34 -0000 Authentication-Results: pb1.pair.com header.from=derick@php.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=derick@php.net; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 82.113.146.227 as permitted sender) X-PHP-List-Original-Sender: derick@php.net X-Host-Fingerprint: 82.113.146.227 xdebug.org Linux 2.6 Received: from [82.113.146.227] ([82.113.146.227:56632] helo=xdebug.org) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B1/83-23416-94927845 for ; Tue, 09 Dec 2014 11:54:34 -0500 Received: from localhost (localhost [IPv6:::1]) by xdebug.org (Postfix) with ESMTPS id 2430FE202E; Tue, 9 Dec 2014 16:54:29 +0000 (GMT) Date: Tue, 9 Dec 2014 16:54:29 +0000 (GMT) X-X-Sender: derick@whisky.home.derickrethans.nl To: Andrea Faulds cc: mario@include-once.org, internals In-Reply-To: <10EE9A5B-1711-455A-AB6A-6E7EA858D081@ajf.me> Message-ID: References: <10EE9A5B-1711-455A-AB6A-6E7EA858D081@ajf.me> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-1974992134-1418144070=:4080" Subject: Re: [PHP-DEV] [VOTE][RFC] Unicode Codepoint Escape Syntax From: derick@php.net (Derick Rethans) --8323329-1974992134-1418144070=:4080 Content-Type: TEXT/PLAIN; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 9 Dec 2014, Andrea Faulds wrote: > I think \x{xxxx} is misleading anyway - \xXX is always=20 > single-byte/character, yet Unicode code points can=E2=80=99t be represent= ed in=20 > PHP strings as single bytes when encoded in UTF-8 (unless they=E2=80=99re= =20 > below U+0100, of course). You mean below U+0080 surely? Only the "first 7 bits" can be represented=20 as a single byte with UTF-8. U+0080 is for example 0xC2 0x80 in UTF-8. cheers, Derick --8323329-1974992134-1418144070=:4080--