Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:29572 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 5614 invoked by uid 1010); 20 May 2007 06:15:18 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 5599 invoked from network); 20 May 2007 06:15:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 May 2007 06:15:18 -0000 Authentication-Results: pb1.pair.com header.from=tokul@users.sourceforge.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=tokul@users.sourceforge.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain users.sourceforge.net from 213.197.162.99 cause and error) X-PHP-List-Original-Sender: tokul@users.sourceforge.net X-Host-Fingerprint: 213.197.162.99 avilys.eik.lt Linux 2.6 Received: from [213.197.162.99] ([213.197.162.99:51999] helo=avilys.eik.lt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DE/50-32102-377EF464 for ; Sun, 20 May 2007 02:15:16 -0400 Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 9BB1E2488E1; Sun, 20 May 2007 09:13:55 +0300 (EEST) Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 7E5D71F5147; Sun, 20 May 2007 09:13:55 +0300 (EEST) Received: from 88.118.163.159 (NaSMail authenticated user tomas@topolis.lt) by avilys.eik.lt with HTTP; Sun, 20 May 2007 09:13:55 +0300 (EEST) Message-ID: <59165.88.118.163.159.1179641635.squirrel@avilys.eik.lt> In-Reply-To: <464F650B.6090802@zend.com> References: <51491.88.118.163.159.1179577357.squirrel@avilys.eik.lt> <464EEF4B.1030002@zend.com> <40865.88.118.163.159.1179583186.squirrel@avilys.eik.lt> <464F090A.9090200@zend.com> <35054.88.118.163.159.1179589687.squirrel@avilys.eik.lt> <464F650B.6090802@zend.com> Date: Sun, 20 May 2007 09:13:55 +0300 (EEST) To: "Antony Dovgal" Cc: internals@lists.php.net User-Agent: NaSMail/1.0 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: [PHP-DEV] PHP Unicode extension in PHP6 From: tokul@users.sourceforge.net ("Tomas Kuliavas") >> strlen("\xC4\x85") = 2. strlen((binary)"\xC4\x85") = 4. Not good. It is >> one character in utf-8. > > I'm afraid I don't understand you again.. 0xC4 and 0x85 are hex codes for latin small letter a with ogonek in utf-8. ą If script is written in utf-8, I expect bool(true) on var_dump() line. It is bool(false), when unicode.semantics are turned on. Internal SquirrelMail character set decoding functions write mapping tables in hexadecimals or octals. In some cases they evaluate only byte value and not whole symbol. Multibyte character set decoding can use recode, iconv and mbstring, but most of single byte decoding is written in plain string functions and stores hex to html mapping tables in associative arrays. Expected result: ą Got: ą test setup (php6.0-200705190630) uses trimmed php.ini with only unicode.semantics=on setting unicode.fallback_encoding - no value unicode.filesystem_encoding - no value unicode.http_input_encoding - no value unicode.output_encoding - no value unicode.runtime_encoding - no value unicode.script_encoding - no value unicode.semantics - On unicode.stream_encoding - UTF-8