Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:45200 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 3067 invoked from network); 3 Aug 2009 18:08:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Aug 2009 18:08:38 -0000 Authentication-Results: pb1.pair.com header.from=mozo@mozo.jp; sender-id=permerror Authentication-Results: pb1.pair.com smtp.mail=mozo@mozo.jp; spf=permerror; sender-id=permerror Received-SPF: error (pb1.pair.com: domain mozo.jp from 209.85.132.246 cause and error) X-PHP-List-Original-Sender: mozo@mozo.jp X-Host-Fingerprint: 209.85.132.246 an-out-0708.google.com Received: from [209.85.132.246] ([209.85.132.246:15569] helo=an-out-0708.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9E/F9-58952-4A7277A4 for ; Mon, 03 Aug 2009 14:08:37 -0400 Received: by an-out-0708.google.com with SMTP id d14so1748878and.38 for ; Mon, 03 Aug 2009 11:08:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.208.9 with SMTP id f9mr8515448ang.55.1249322914346; Mon, 03 Aug 2009 11:08:34 -0700 (PDT) In-Reply-To: <4A7722BC.1000301@zend.com> References: <4A6C6496.7060603@mozo.jp> <4A71DA47.8080809@zend.com> <4A731DE2.2060206@zend.com> <4A738624.1@zend.com> <4A7722BC.1000301@zend.com> Date: Tue, 4 Aug 2009 03:08:13 +0900 Message-ID: To: Stanislav Malyshev Cc: php-dev Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Re: Alternative mbstring implementation using ICU From: mozo@mozo.jp (Moriyoshi Koizumi) On Tue, Aug 4, 2009 at 2:47 AM, Stanislav Malyshev wrote: >> And yes, it's worth providing separate conversion system. =A0You might >> not be aware of it, but there are several sets of different character >> sets, each of which is often represented with a specific encoding >> scheme. =A0Shift_JIS is one of those. > > I'm not sure I understand. There are tons of character sets, etc. but as = I > understand ICU conversion routines handle them, including Shift_JIS - isn= 't > it true? Coded character sets and character encoding schemes are different concepts. As for the specific case I mentioned, there are a number of variants of the character set that is commonly represented as Shift_JIS, and ICU doesn't support all of those. > >> What I am mainly interested in is 5.4, or something that will come >> before 6. =A0BTW, it would be much better if there had been a sort of >> coordination between the developers of mbstring and intl extension. > > I'm not sure what will happen about 5.4 etc. but sure I'd be glad to help= as > much as I could with anything regarding intl extension. DO you have some > specific things that need to be done? This is just one of my ideas, but If intl extension eventually obtains enough functionality that allows one to write emulated mbstring functions in userland, then it would sound very attractive to me. Moriyoshi