Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:45171 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 70396 invoked from network); 1 Aug 2009 00:11:56 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Aug 2009 00:11:56 -0000 Authentication-Results: pb1.pair.com smtp.mail=stas@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=stas@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 63.205.162.117 as permitted sender) X-PHP-List-Original-Sender: stas@zend.com X-Host-Fingerprint: 63.205.162.117 us-mr1.zend.com Linux 2.4/2.6 Received: from [63.205.162.117] ([63.205.162.117:42060] helo=us-mr1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 7D/80-01384-A48837A4 for ; Fri, 31 Jul 2009 20:11:55 -0400 Received: from us-gw1.zend.com (us-ex1.zend.net [192.168.16.5]) by us-mr1.zend.com (Postfix) with ESMTP id B8407E1240 for ; Fri, 31 Jul 2009 16:59:49 -0700 (PDT) Received: from [192.168.16.202] ([192.168.16.202]) by us-gw1.zend.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 31 Jul 2009 17:03:41 -0700 Message-ID: <4A738624.1@zend.com> Date: Fri, 31 Jul 2009 17:02:44 -0700 Organization: Zend Technologies User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: Moriyoshi Koizumi CC: php-dev References: <4A6C6496.7060603@mozo.jp> <4A71DA47.8080809@zend.com> <4A731DE2.2060206@zend.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 01 Aug 2009 00:03:41.0266 (UTC) FILETIME=[871CC320:01CA123B] Subject: Re: [PHP-DEV] Re: Alternative mbstring implementation using ICU From: stas@zend.com (Stanislav Malyshev) Hi! > They calculate the total width of a string based on "east asian width" > property, which is still valid to give a rough measurement of the > rendered string. OK, I guess if it's some kind of special calculation that doesn't follow from others it should be preserved, there are tons of such special functions in PHP. >> That's a common problem, IIRC PHP 6 converters have configurable error modes >> for that. Don't unicode_set_error_handler() and unicode_set_error_mode() do >> what you want? > > I guess it isn't what I want. If my understanding is correct, a > handler set by unicode_set_error_handler() merely deals with the > aftermath and cannot interact with the converter. There are good That depends. For some error modes, it says to converter to replace invalid chars with some other char or skip it. You can't however now specify custom mappings (I'm not sure ICU allows that, but maybe it can be simulated...). Here the question is - is it really worth to keep whole separate conversion system for just this, or can it be done with standard conversion, possibly somewhat tweaked? > In addition to these, shouldn't there be any case where one have to > manipulate Unicode strings on per-coded-character-basis rather than > per-grapheme-basis just like substr() in PHP6? In PHP 6 right now it's actually the only case, grapheme functions not even ported to PHP 6 yet (I know, not good) - but that's what regular str* functions should be doing, right? -- Stanislav Malyshev, Zend Software Architect stas@zend.com http://www.zend.com/ (408)253-8829 MSN: stas@zend.com