Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37448 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61224 invoked from network); 5 May 2008 08:31:55 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 May 2008 08:31:55 -0000 Authentication-Results: pb1.pair.com smtp.mail=tokul@users.sourceforge.net; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=tokul@users.sourceforge.net; sender-id=unknown Received-SPF: error (pb1.pair.com: domain users.sourceforge.net from 213.197.162.99 cause and error) X-PHP-List-Original-Sender: tokul@users.sourceforge.net X-Host-Fingerprint: 213.197.162.99 avilys.eik.lt Linux 2.6 Received: from [213.197.162.99] ([213.197.162.99:40412] helo=avilys.eik.lt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8F/14-40102-AF5CE184 for ; Mon, 05 May 2008 04:31:54 -0400 Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 59EB91F5193 for ; Mon, 5 May 2008 11:30:51 +0300 (EEST) Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 3FCF11F5192 for ; Mon, 5 May 2008 11:30:51 +0300 (EEST) Received: from 78.61.224.253 (NaSMail authenticated user tomas@topolis.lt) by avilys.eik.lt with HTTP; Mon, 5 May 2008 11:30:51 +0300 (EEST) Message-ID: <38743.78.61.224.253.1209976251.nsm@avilys.eik.lt> In-Reply-To: <481EBF1A.6040406@gmail.com> References: <4BD5A050-02F2-46BD-B867-FA8CA12FF1BD@macvicar.net> <48988.78.61.224.253.1209918881.nsm@avilys.eik.lt> <60526.78.61.224.253.1209928511.nsm@avilys.eik.lt> <481EB410.1090804@lsces.co.uk> <481EBF1A.6040406@gmail.com> Date: Mon, 5 May 2008 11:30:51 +0300 (EEST) To: internals@lists.php.net User-Agent: NaSMail/1.4 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: [PHP-DEV] Removal of unicode_semantics From: tokul@users.sourceforge.net ("Tomas Kuliavas") > Lester Caine schrieb: >> That sounds like just the sort of edge case that Derick is suggesting >> needs logging for fixing up. unicode_semantics=on is just another bodge >> to to make it happen rather than a solution. I think I understand your >> description, and to my eyes it looks like a unicode bug that needs >> addressing? > > No, it's a misunderstanding of how things work that has been explained > to Tomas countless times. A unicode string consists of codepoints, not > of bytes. Having \xXX and \XXX insert bytes instead of codepoints does > not make sense, because a) That would require a defined unicode > encoding to be used, and even if that is the case b) would allow you to > insert broken data into the unicode string, so it's not a unicode string > anymore, which is a no-no. If you want to do that sort of fiddling with > binary details, use binary strings, not unicode strings. I agree that it is not a bug, because I declare invalid encoding in scripts in order to make sure that binary and unicode bytes are equal. You haven't explained me how things work. All your explanations ask me to use code compatible only with PHP 5.2.1+, drop code that worked fine in older PHP versions and take away control of charset conversions. I want backwards compatibility with PHP 5.2.0 and PHP4. I want to be able to control charset conversions. Where are warranties that charset conversions will work better in PHP6? In current setups it is safer to do charset conversions internally instead of relying on PHP to do things. And I can't drop that code entirely because Unicode implementation in PHP 5.2.1 is dummy. It is there only to avoid E_PARSE errors in PHP6 compatible code. -- Tomas