Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:45169 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 62440 invoked from network); 31 Jul 2009 23:04:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 Jul 2009 23:04:01 -0000 Authentication-Results: pb1.pair.com header.from=mozo@mozo.jp; sender-id=permerror Authentication-Results: pb1.pair.com smtp.mail=mozo@mozo.jp; spf=permerror; sender-id=permerror Received-SPF: error (pb1.pair.com: domain mozo.jp from 209.85.217.228 cause and error) X-PHP-List-Original-Sender: mozo@mozo.jp X-Host-Fingerprint: 209.85.217.228 mail-gx0-f228.google.com Received: from [209.85.217.228] ([209.85.217.228:51608] helo=mail-gx0-f228.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 48/00-61941-F58737A4 for ; Fri, 31 Jul 2009 19:04:01 -0400 Received: by gxk28 with SMTP id 28so3146721gxk.23 for ; Fri, 31 Jul 2009 16:03:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.90.70.6 with SMTP id s6mr2336729aga.73.1249081063218; Fri, 31 Jul 2009 15:57:43 -0700 (PDT) In-Reply-To: <4A731DE2.2060206@zend.com> References: <4A6C6496.7060603@mozo.jp> <4A71DA47.8080809@zend.com> <4A731DE2.2060206@zend.com> Date: Sat, 1 Aug 2009 07:57:23 +0900 Message-ID: To: Stanislav Malyshev Cc: php-dev Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Re: Alternative mbstring implementation using ICU From: mozo@mozo.jp (Moriyoshi Koizumi) Hi, On Sat, Aug 1, 2009 at 1:37 AM, Stanislav Malyshev wrote: > Hi! > >>> mb_str* - shouldn't you in 6 just convert them to unicode and do all >>> string >>> operations with Unicode strings? Also, in 5 isn't there some intersection >>> with grapheme_* functions? >> >> mb_strwidth() and mb_strimwidth() are not covered. > > True. I wonder what this function is useful for? They calculate the total width of a string based on "east asian width" property, which is still valid to give a rough measurement of the rendered string. > >>> mb_output_handler - shouldn't setting the proper encoding in 6 do the >>> same job? >>> mb_convert_encoding - don't we already have a number of functions that do >>> encoding conversions? >> >> I don't think It can gracefully handle characters that have no >> corresponding entries in the target character set. I'm even thinking > > That's a common problem, IIRC PHP 6 converters have configurable error modes > for that. Don't unicode_set_error_handler() and unicode_set_error_mode() do > what you want? I guess it isn't what I want. If my understanding is correct, a handler set by unicode_set_error_handler() merely deals with the aftermath and cannot interact with the converter. There are good reasons to support user-supplied mappings of characters in PUA to one of legacy encodings such as Shift_JIS, not just replacing such characters by placeholders. In addition to these, shouldn't there be any case where one have to manipulate Unicode strings on per-coded-character-basis rather than per-grapheme-basis just like substr() in PHP6? Regards, Moriyoshi