Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:38205 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 94635 invoked from network); 13 Jun 2008 16:59:54 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Jun 2008 16:59:54 -0000 Authentication-Results: pb1.pair.com smtp.mail=michal.dziemianko@gmail.com; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=michal.dziemianko@gmail.com; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain gmail.com does not designate 82.132.130.152 as permitted sender) X-PHP-List-Original-Sender: michal.dziemianko@gmail.com X-Host-Fingerprint: 82.132.130.152 sidious.london.02.net Linux 2.6 Received: from [82.132.130.152] ([82.132.130.152:53804] helo=mail.o2.co.uk) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id CB/4F-15165-987A2584 for ; Fri, 13 Jun 2008 12:59:53 -0400 Received: from [192.168.1.64] (78.86.103.249) by mail.o2.co.uk (8.0.013.3) (authenticated as michal_dziemianko@o2.co.uk) id 4851DD95001F443E; Fri, 13 Jun 2008 17:59:50 +0100 Mime-Version: 1.0 (Apple Message framework v753.1) In-Reply-To: <484F910A.3080208@zend.com> References: <7E62CA6E-83F4-4F9C-86FB-75EBE7D489C9@gmail.com> <484D36EB.9080202@macvicar.net> <31fe29920806110114i263a47b4sd66d4849ceb22980@mail.gmail.com> <484F910A.3080208@zend.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <33FE7D54-B113-4C5F-9315-659F8550397E@gmail.com> Content-Transfer-Encoding: 7bit Date: Fri, 13 Jun 2008 17:53:02 +0100 To: internals@lists.php.net, Scott MacVicar X-Mailer: Apple Mail (2.753.1) Subject: Re: [PHP-DEV] Algorithm Optimizations - string search From: michal.dziemianko@gmail.com (Michal Dziemianko) Hello again I have setup small page where I am publishing all the profiling and evaluation results. It is here: http://83.168.205.202/~michal/ standard15/ So far I have put there function usage profile, zend_memnstr analysis, stripos and strrpos analysis including some charts etc. CVS diffs where applicable are also posted along with comparison of original and patched code. Michal On 2008-06-11, at 09:47, Stanislav Malyshev wrote: > Hi! > >> Here are some statistics: >> - average haystack length: 624.2 >> - average needle length: 1.9 ! -> 63% of needles of length 1 >> - avg length of haystacks shorter than avg: 41.0 -> 85% of all >> haystacks >> - avg length of haystacks longer than avg: 5685.11 > > I think it would be interesting to see same excluding 1-char > needles since in this case it should do one-char lookup (btw, if we > don't do it on C level, it might be a good idea to). > > >> Although strpos implements fix for that, some other functions >> don't. My idea >> is than to implement ZEND_MEMNSTR once again in shape: >> if (needle_len = 1) >> here just linear sweep >> else if haystack_len < 5000 (5000 is arbitrary - maybe some more >> tests >> needed to choose good value) >> original implementation (as it is the best one in this case) >> else >> BM/KMP (i think BM will be better in this case, as some people >> suggested) > > I'm not sure very big haystacks really worth the trouble - how many > of them are used? It may be interesting to see medians instead of > averages for that. But len=1 I think worth having special case. > -- > Stanislav Malyshev, Zend Software Architect > stas@zend.com http://www.zend.com/ > (408)253-8829 MSN: stas@zend.com