Newsgroups: php.i18n,php.internals Path: news.php.net Xref: news.php.net php.i18n:914 php.internals:21773 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 2460 invoked by uid 1010); 3 Feb 2006 00:54:55 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 2429 invoked from network); 3 Feb 2006 00:54:55 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Feb 2006 00:54:55 -0000 X-Host-Fingerprint: 216.145.54.171 mrout1.yahoo.com FreeBSD 4.7-5.2 (or MacOS X 10.2-10.3) (2) Received: from ([216.145.54.171:32237] helo=mrout1.yahoo.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id 1E/A6-41770-ED9A2E34 for ; Thu, 02 Feb 2006 19:54:55 -0500 Received: from [66.228.175.145] (borndress-lm.corp.yahoo.com [66.228.175.145]) by mrout1.yahoo.com (8.13.4/8.13.4/y.out) with ESMTP id k130iJKH094228; Thu, 2 Feb 2006 16:44:19 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v623) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message-ID: Content-Transfer-Encoding: quoted-printable Cc: php-i18n@lists.php.net Date: Thu, 2 Feb 2006 16:45:39 -0800 To: PHP Developers Mailing List X-Mailer: Apple Mail (2.623) Subject: Unicode string iterator performance From: andrei@gravitonic.com (Andrei Zmievski) You probably saw that I have committed initial implementation of=20 TextIterator. The impetus for this is that direct indexing of Unicode=20 strings via [] operator is slow, very slow, at least currently. The=20 reason is that [] cannot simply perform random-offset indexing into=20 UCHar* strings. It needs to start from the beginning of the string and=20= iterate forward until it reaches the desired offset, because our=20 default unit is a codepoint, which can take up 1 or 2 UChar's. So here are some (rough) numbers on the relative performance of=20 TextIterator vs. []. The script I used was a simple one (attached after=20= the signature). Each test was 10000 runs over 500-character string. [] operator: 27.16373 s TextIterator: 1.89697 s (!) For comparison, running the same [] operator test on a 500-character=20 binary (old-style) string gives me 9.11334 s. Quite interesting, I'd=20 say. I am not sure how we can optimize [] to be faster than the iterator=20 approach. Food for thought? - Andrei