Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:64789 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19548 invoked from network); 9 Jan 2013 22:45:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Jan 2013 22:45:15 -0000 Authentication-Results: pb1.pair.com header.from=glopes@nebm.ist.utl.pt; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=glopes@nebm.ist.utl.pt; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain nebm.ist.utl.pt from 193.136.128.21 cause and error) X-PHP-List-Original-Sender: glopes@nebm.ist.utl.pt X-Host-Fingerprint: 193.136.128.21 smtp1.ist.utl.pt Linux 2.6 Received: from [193.136.128.21] ([193.136.128.21:35635] helo=smtp1.ist.utl.pt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 95/AB-02684-9F2FDE05 for ; Wed, 09 Jan 2013 17:45:15 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp1.ist.utl.pt (Postfix) with ESMTP id B0D617000452 for ; Wed, 9 Jan 2013 22:45:09 +0000 (WET) X-Virus-Scanned: by amavisd-new-2.6.4 (20090625) (Debian) at ist.utl.pt Received: from smtp1.ist.utl.pt ([127.0.0.1]) by localhost (smtp1.ist.utl.pt [127.0.0.1]) (amavisd-new, port 10025) with LMTP id KZ1nkT-IkeWU for ; Wed, 9 Jan 2013 22:45:09 +0000 (WET) Received: from mail2.ist.utl.pt (mail.ist.utl.pt [IPv6:2001:690:2100:1::8]) by smtp1.ist.utl.pt (Postfix) with ESMTP id 727677000423 for ; Wed, 9 Jan 2013 22:45:09 +0000 (WET) Received: from damnation.nl.lo.geleia.net (unknown [IPv6:2001:470:94a2:4:21d:baff:feee:cc0b]) (Authenticated sender: ist155741) by mail2.ist.utl.pt (Postfix) with ESMTPSA id 9E58F200528E for ; Wed, 9 Jan 2013 22:45:08 +0000 (WET) Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: internals@lists.php.net References: Date: Wed, 09 Jan 2013 23:45:03 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Organization: =?utf-8?Q?N=C3=BAcleo_de_Eng=2E_Biom=C3=A9di?= =?utf-8?Q?ca_do_I=2ES=2ET=2E?= Message-ID: In-Reply-To: User-Agent: Opera Mail/12.11 (Linux) Subject: Re: [PHP-DEV] strtr vs. str_replace runtime From: glopes@nebm.ist.utl.pt ("Gustavo Lopes") On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes wrote: > The algorithm behaves very poorly in this case because at each position > of the text, all the substrings starting there and with size between m > and n (where m is the size of the smallest pattern and n is the largest) > are checked, even if there are only two patterns with size m and n. We > could fix this easily by building a set of the pattern sizes found and > try only with those. The hashing of the substrings could also be > improved; we don't have to recalculate everything when we advance in the > text. > Both optimizations (the hash rolling and limiting the substrings hashed on each iteration) worked quite well. But I got much better results with another algorithm [1], so I'm going to merge the branch with it [2] instead. I get these results with a 1.7 MB string and 13 replacement strings, the smallest with 6 characters and 30 iterations (x86-64, gcc -O3): strtr: 0.1387 str_replace: 0.4471 The algorithm doesn't perform as well when the replacement strings are small. Adding a replacement for the pattern '_' (1 character) yields: strtr: 0.6157 str_replace: 0.6230 But even in this case, it works better than my optimized version of the current algorithm. I plan on merging to 5.4 and 5.5; you may want to review it as introducing completely new code carries some risk. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.2927 [2] https://github.com/cataphract/php-src/compare/strtr_wu94 -- Gustavo Lopes