Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37519 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 91893 invoked from network); 7 May 2008 16:53:32 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 May 2008 16:53:32 -0000 Authentication-Results: pb1.pair.com header.from=andrei@gravitonic.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=andrei@gravitonic.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain gravitonic.com from 64.233.170.188 cause and error) X-PHP-List-Original-Sender: andrei@gravitonic.com X-Host-Fingerprint: 64.233.170.188 rn-out-0910.google.com Received: from [64.233.170.188] ([64.233.170.188:25067] helo=rn-out-0910.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 89/B2-20715-C8ED1284 for ; Wed, 07 May 2008 12:53:32 -0400 Received: by rn-out-0910.google.com with SMTP id i24so128512rng.0 for ; Wed, 07 May 2008 09:53:29 -0700 (PDT) Received: by 10.114.158.1 with SMTP id g1mr2116961wae.203.1210179209095; Wed, 07 May 2008 09:53:29 -0700 (PDT) Received: from Macintosh-5.local ( [12.51.40.234]) by mx.google.com with ESMTPS id q20sm4528679pog.7.2008.05.07.09.53.25 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 07 May 2008 09:53:27 -0700 (PDT) Message-ID: <4821DE82.2030903@gravitonic.com> Date: Wed, 07 May 2008 09:53:22 -0700 User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070807) MIME-Version: 1.0 To: Stefan Walk CC: Lester Caine , internals@lists.php.net References: <4BD5A050-02F2-46BD-B867-FA8CA12FF1BD@macvicar.net> <48988.78.61.224.253.1209918881.nsm@avilys.eik.lt> <60526.78.61.224.253.1209928511.nsm@avilys.eik.lt> <481EB410.1090804@lsces.co.uk> <481EBF1A.6040406@gmail.com> In-Reply-To: <481EBF1A.6040406@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Removal of unicode_semantics From: andrei@gravitonic.com (Andrei Zmievski) Precisely. Stefan Walk wrote: > Lester Caine schrieb: >> That sounds like just the sort of edge case that Derick is suggesting >> needs logging for fixing up. unicode_semantics=on is just another >> bodge to to make it happen rather than a solution. I think I >> understand your description, and to my eyes it looks like a unicode >> bug that needs addressing? > > No, it's a misunderstanding of how things work that has been explained > to Tomas countless times. A unicode string consists of codepoints, not > of bytes. Having \xXX and \XXX insert bytes instead of codepoints does > not make sense, because a) That would require a defined unicode > encoding to be used, and even if that is the case b) would allow you to > insert broken data into the unicode string, so it's not a unicode string > anymore, which is a no-no. If you want to do that sort of fiddling with > binary details, use binary strings, not unicode strings. > > Regards, > Stefan >