Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:45199 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 93333 invoked from network); 3 Aug 2009 17:47:48 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Aug 2009 17:47:48 -0000 Authentication-Results: pb1.pair.com header.from=stas@zend.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=stas@zend.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 63.205.162.117 as permitted sender) X-PHP-List-Original-Sender: stas@zend.com X-Host-Fingerprint: 63.205.162.117 us-mr1.zend.com Linux 2.4/2.6 Received: from [63.205.162.117] ([63.205.162.117:42620] helo=us-mr1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 91/D7-58952-1C2277A4 for ; Mon, 03 Aug 2009 13:47:48 -0400 Received: from us-gw1.zend.com (us-ex1.zend.net [192.168.16.5]) by us-mr1.zend.com (Postfix) with ESMTP id 22571E1246 for ; Mon, 3 Aug 2009 10:35:36 -0700 (PDT) Received: from [192.168.16.83] ([192.168.16.83]) by us-gw1.zend.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 3 Aug 2009 10:48:39 -0700 Message-ID: <4A7722BC.1000301@zend.com> Date: Mon, 03 Aug 2009 10:47:40 -0700 Organization: Zend Technologies User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: Moriyoshi Koizumi CC: php-dev References: <4A6C6496.7060603@mozo.jp> <4A71DA47.8080809@zend.com> <4A731DE2.2060206@zend.com> <4A738624.1@zend.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 03 Aug 2009 17:48:39.0092 (UTC) FILETIME=[A202BF40:01CA1462] Subject: Re: [PHP-DEV] Re: Alternative mbstring implementation using ICU From: stas@zend.com (Stanislav Malyshev) Hi! > It can be done through conversion error handlers. You can append an > encoded form of a codepoint for such unassigned characters to the > buffer within the handler. OK, if so we may want to add implementation of this behavior to our ICU support. > And yes, it's worth providing separate conversion system. You might > not be aware of it, but there are several sets of different character > sets, each of which is often represented with a specific encoding > scheme. Shift_JIS is one of those. I'm not sure I understand. There are tons of character sets, etc. but as I understand ICU conversion routines handle them, including Shift_JIS - isn't it true? > What I am mainly interested in is 5.4, or something that will come > before 6. BTW, it would be much better if there had been a sort of > coordination between the developers of mbstring and intl extension. I'm not sure what will happen about 5.4 etc. but sure I'd be glad to help as much as I could with anything regarding intl extension. DO you have some specific things that need to be done? -- Stanislav Malyshev, Zend Software Architect stas@zend.com http://www.zend.com/ (408)253-8829 MSN: stas@zend.com