Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37447 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 55691 invoked from network); 5 May 2008 08:02:44 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 May 2008 08:02:44 -0000 Authentication-Results: pb1.pair.com header.from=stefan.walk@gmail.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=stefan.walk@gmail.com; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain gmail.com does not designate 62.75.137.136 as permitted sender) X-PHP-List-Original-Sender: stefan.walk@gmail.com X-Host-Fingerprint: 62.75.137.136 fuer-et.de Linux 2.5 (sometimes 2.4) (4) Received: from [62.75.137.136] ([62.75.137.136:49461] helo=eve.fuer-et.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 56/23-40102-42FBE184 for ; Mon, 05 May 2008 04:02:44 -0400 Received: from [192.168.178.34] (drms-590c9f4b.pool.einsundeins.de [89.12.159.75]) by eve.fuer-et.de (Postfix) with ESMTP id 594011C59A32; Mon, 5 May 2008 08:02:41 +0000 (UTC) Message-ID: <481EBF1A.6040406@gmail.com> Date: Mon, 05 May 2008 10:02:34 +0200 User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: Lester Caine CC: internals@lists.php.net References: <4BD5A050-02F2-46BD-B867-FA8CA12FF1BD@macvicar.net> <48988.78.61.224.253.1209918881.nsm@avilys.eik.lt> <60526.78.61.224.253.1209928511.nsm@avilys.eik.lt> <481EB410.1090804@lsces.co.uk> In-Reply-To: <481EB410.1090804@lsces.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Removal of unicode_semantics From: stefan.walk@gmail.com (Stefan Walk) Lester Caine schrieb: > That sounds like just the sort of edge case that Derick is suggesting > needs logging for fixing up. unicode_semantics=on is just another bodge > to to make it happen rather than a solution. I think I understand your > description, and to my eyes it looks like a unicode bug that needs > addressing? No, it's a misunderstanding of how things work that has been explained to Tomas countless times. A unicode string consists of codepoints, not of bytes. Having \xXX and \XXX insert bytes instead of codepoints does not make sense, because a) That would require a defined unicode encoding to be used, and even if that is the case b) would allow you to insert broken data into the unicode string, so it's not a unicode string anymore, which is a no-no. If you want to do that sort of fiddling with binary details, use binary strings, not unicode strings. Regards, Stefan