Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:64822 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 23112 invoked from network); 10 Jan 2013 19:47:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Jan 2013 19:47:49 -0000 Authentication-Results: pb1.pair.com smtp.mail=nicolai.scheer@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nicolai.scheer@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.46 as permitted sender) X-PHP-List-Original-Sender: nicolai.scheer@gmail.com X-Host-Fingerprint: 74.125.82.46 mail-wg0-f46.google.com Received: from [74.125.82.46] ([74.125.82.46:40636] helo=mail-wg0-f46.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 49/EC-02684-3EA1FE05 for ; Thu, 10 Jan 2013 14:47:48 -0500 Received: by mail-wg0-f46.google.com with SMTP id dr13so444832wgb.25 for ; Thu, 10 Jan 2013 11:47:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=JdFrR2Uky+gERvqtZLWevwM4FWKJUXeLW9bp7i7ft2g=; b=qGgqU08QRMlJzGeu+vRfTjqi7uLMkoG6DJfJjwFl4ZluzNv0zC0prIozr4LizxUZRm Xfn04eptbH5mWZMXJVIhn2rdnnqcYvAXTtlnXmY6Z8fMEW7MQyJN4+vHSkYNWjanj1+5 tqGfPKLh4nWSHLF07GkjeJtg5NOFpyN4LBcJnoHG28RHgv0oHkxkymYd7d2yEfr8FBBQ wRW3JHed+lL2ekyw1mR+YgbFwH+dV1CtN2U+3xUbIjeNArBTtSLMSBvKTc+nBBOsJwh1 dcC3NtfCTzBxGeqxQeRf7stKVLI750SwnQUnW8Jafi38aIfsPH+oUHUasatYrBqb++gY XD7Q== MIME-Version: 1.0 Received: by 10.194.87.200 with SMTP id ba8mr116715076wjb.22.1357847264925; Thu, 10 Jan 2013 11:47:44 -0800 (PST) Received: by 10.227.170.206 with HTTP; Thu, 10 Jan 2013 11:47:44 -0800 (PST) In-Reply-To: References: Date: Thu, 10 Jan 2013 20:47:44 +0100 Message-ID: To: Gustavo Lopes Cc: PHP internals Content-Type: multipart/alternative; boundary=089e010d8afee2986504d2f47309 Subject: Re: [PHP-DEV] strtr vs. str_replace runtime From: nicolai.scheer@gmail.com (Nicolai Scheer) --089e010d8afee2986504d2f47309 Content-Type: text/plain; charset=UTF-8 Hi! On 9 January 2013 23:45, Gustavo Lopes wrote: > On Thu, 03 Jan 2013 11:40:31 +0100, Gustavo Lopes > wrote: > > The algorithm behaves very poorly in this case because at each position >> of the text, all the substrings starting there and with size between m and >> n (where m is the size of the smallest pattern and n is the largest) are >> checked, even if there are only two patterns with size m and n. We could >> fix this easily by building a set of the pattern sizes found and try only >> with those. The hashing of the substrings could also be improved; we don't >> have to recalculate everything when we advance in the text. >> >> > Both optimizations (the hash rolling and limiting the substrings hashed on > each iteration) worked quite well. > > But I got much better results with another algorithm [1], so I'm going to > merge the branch with it [2] instead. I get these results with a 1.7 MB > string and 13 replacement strings, the smallest with 6 characters and 30 > iterations (x86-64, gcc -O3): > > strtr: 0.1387 > str_replace: 0.4471 > Nice :) > > The algorithm doesn't perform as well when the replacement strings are > small. Adding a replacement for the pattern '_' (1 character) yields: > > strtr: 0.6157 > str_replace: 0.6230 > Even that is way better than before :) > > But even in this case, it works better than my optimized version of the > current algorithm. > > I plan on merging to 5.4 and 5.5; you may want to review it as introducing > completely new code carries some risk. > > [1] http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.**2927 > [2] https://github.com/cataphract/php-src/compare/strtr_wu94 Does "merging to 5.4" mean I can grab the head of the 5.4 branch afterwards and try it myself? Thanks! Greetings Nico --089e010d8afee2986504d2f47309--