Newsgroups: php.i18n,php.internals Path: news.php.net Xref: news.php.net php.i18n:915 php.internals:21774 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 8166 invoked by uid 1010); 3 Feb 2006 01:02:46 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 8144 invoked from network); 3 Feb 2006 01:02:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Feb 2006 01:02:46 -0000 X-Host-Fingerprint: 216.145.54.171 mrout1.yahoo.com FreeBSD 4.7-5.2 (or MacOS X 10.2-10.3) (2) Received: from ([216.145.54.171:35171] helo=mrout1.yahoo.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id 77/87-41770-5BBA2E34 for ; Thu, 02 Feb 2006 20:02:46 -0500 Received: from [66.228.175.145] (borndress-lm.corp.yahoo.com [66.228.175.145]) by mrout1.yahoo.com (8.13.4/8.13.4/y.out) with ESMTP id k1310QJt098839; Thu, 2 Feb 2006 17:00:26 -0800 (PST) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v623) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message-ID: <9521aa9f38e046899b58a67b231aa04c@gravitonic.com> Content-Transfer-Encoding: quoted-printable Cc: php-i18n@lists.php.net Date: Thu, 2 Feb 2006 17:01:46 -0800 To: PHP Developers Mailing List X-Mailer: Apple Mail (2.623) Subject: Re: [PHP-DEV] Unicode string iterator performance From: andrei@gravitonic.com (Andrei Zmievski) For yet another comparison, the [] operator test under PHP 4 gives=20 7.24410 s. - Andrei On Feb 2, 2006, at 4:45 PM, Andrei Zmievski wrote: > You probably saw that I have committed initial implementation of=20 > TextIterator. The impetus for this is that direct indexing of Unicode=20= > strings via [] operator is slow, very slow, at least currently. The=20 > reason is that [] cannot simply perform random-offset indexing into=20 > UCHar* strings. It needs to start from the beginning of the string and=20= > iterate forward until it reaches the desired offset, because our=20 > default unit is a codepoint, which can take up 1 or 2 UChar's. > > So here are some (rough) numbers on the relative performance of=20 > TextIterator vs. []. The script I used was a simple one (attached=20 > after the signature). Each test was 10000 runs over 500-character=20 > string. > > [] operator: 27.16373 s > TextIterator: 1.89697 s (!) > > For comparison, running the same [] operator test on a 500-character=20= > binary (old-style) string gives me 9.11334 s. Quite interesting, I'd=20= > say. > > I am not sure how we can optimize [] to be faster than the iterator=20 > approach. Food for thought? > > - Andrei > > $a =3D str_repeat('a\U010201bc=DF', 100); > var_dump($a); > > /* warm up the engine */ > for ($x =3D 0; $x < 100; $x++) { > foreach (new TextIterator($a) as $c) { > } > } > > /* measure [] */ > $start =3D microtime(true); > for ($x =3D 0; $x < 10000; $x++) { > $len =3D strlen($a); > for ($i =3D 0; $i < $len; $i++) { > $c =3D $a[$i]; > } > } > $end =3D microtime(true); > > printf("[] run time: %.5f\n", $end - $start); > > /* measure iterator */ > $start =3D microtime(true); > for ($x =3D 0; $x < 10000; $x++) { > foreach (new TextIterator($a) as $c) { > } > } > $end =3D microtime(true); > > printf("iterator run time: %.5f\n", $end - $start); > ?> > > --=20 > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php >