Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:38130 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 26872 invoked from network); 10 Jun 2008 08:53:29 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Jun 2008 08:53:29 -0000 Authentication-Results: pb1.pair.com smtp.mail=michal.dziemianko@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=michal.dziemianko@gmail.com; sender-id=pass; domainkeys=bad Received-SPF: pass (pb1.pair.com: domain gmail.com designates 72.14.220.157 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: michal.dziemianko@gmail.com X-Host-Fingerprint: 72.14.220.157 fg-out-1718.google.com Received: from [72.14.220.157] ([72.14.220.157:50595] helo=fg-out-1718.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 6A/D1-15621-7014E484 for ; Tue, 10 Jun 2008 04:53:28 -0400 Received: by fg-out-1718.google.com with SMTP id 16so1911696fgg.23 for ; Tue, 10 Jun 2008 01:53:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:mime-version:in-reply-to :references:content-type:message-id:content-transfer-encoding:from :subject:date:to:x-mailer; bh=sbK46Y4xUxIWZbfB9tXmQQEZPOQH1HiyFkokTwo0Zsk=; b=F2pUNLlIz9MvNsLyedhE93kai3rBO8PL6lNKoFAHOWX8b0pzIBmYS36xyXTQsDgqcg 7kL2E5h6eodoFVa3Y/MLjX2UmYT0x9GqUEAC8DlPjKOfVd0joshSmssW45SR8QOfKBgJ WiBlZO4XmxYTKyK83JzfEN4O2mHz6fyxOc1OE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:content-type:message-id :content-transfer-encoding:from:subject:date:to:x-mailer; b=O+R0bAxhJp0HVepz9wobIfzSjuw0O1rIHj05doReJX+X1LghdCGfNPFOnF2gM6/r0j MIsLWqMj90dMtGNCAdaTqT/nDlV8jJVWeogR1qLOLX//+K+dQIc8xjl6SSfKG60o412F sg0Ng4U8fA9LZbjGtZq+aZbulmMZHG0LWb5Wo= Received: by 10.103.218.19 with SMTP id v19mr3069370muq.110.1213088004112; Tue, 10 Jun 2008 01:53:24 -0700 (PDT) Received: from ?129.215.48.228? ( [129.215.48.228]) by mx.google.com with ESMTPS id j9sm4864630mue.5.2008.06.10.01.53.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 10 Jun 2008 01:53:22 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v753.1) In-Reply-To: <484D36EB.9080202@macvicar.net> References: <7E62CA6E-83F4-4F9C-86FB-75EBE7D489C9@gmail.com> <484D36EB.9080202@macvicar.net> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: Content-Transfer-Encoding: 7bit Date: Tue, 10 Jun 2008 09:46:42 +0100 To: internals@lists.php.net X-Mailer: Apple Mail (2.753.1) Subject: Re: [PHP-DEV] Algorithm Optimizations - string search From: michal.dziemianko@gmail.com (Michal Dziemianko) Hello On 2008-06-09, at 14:58, Scott MacVicar wrote: > There is rabin-karp too but its worse case is O(nm) so that might > not be ideal, perhaps we should try to compare all of them. > OK. I think RK will not be much better than original piece of code which runs in time o(n + m) i most cases (but has worst time o(nm)) > Scott > > Nuno Lopes wrote: >> Hi, >> So some comments: >> - you have some problems with the indentation. We only use tabs, >> so please stick to that. Also, there are some lines that are not >> indented correctly >> - Have you considered the Boyer-Moore algorithm? I think it's a >> little faster than KMP (take a look at e.g. http:// >> www.cs.utexas.edu/users/moore/best-ideas/string-searching/) Yes, but it requires additional space (size of alphabet) and average running time is 3n = o(n), while KMP runs in o(n+m) that makes them actually the same on average (BM is faster in some cases of course). As Scott asked if I can make it running on UNICODE it seems to be quite important (the additional space I mean). >> - please remove the //TUTAJ SKONCZYLEM comment of course >> - revert this change (as well as a few other that are similar): >> - for (r_end = r + Z_STRLEN_P(return_value) - 1; r < >> r_end; ) { >> + for ( r_end = r + Z_STRLEN_P( return_value ) - 1; r < >> r_end; ){ >> (we like small diffs, not long diffs with changes that also break >> our coding standards. e.g. we don't use space after the '(' char. >> Philip wrote a nice article about diffs at http://wiki.php.net/doc/ >> articles/whitespace) that was not intentional i think >> - in strrpos_reverse_kmp() I think you allocate 4 bytes less that >> you want actually the reverse - in zend_memnstr I allocate 4 bytes too much;) >> - I think you've too many comments.. We don't need 1 comment per >> line :) >> After fixing all these points and after running the test suite >> (with valgrind) and make sure there are no regressions, I think >> it's safe for you to commit. Still, I would like to see some >> performance figures comparing the KMP and the Boyer-Moore (or >> point me some papers about the subject). Will do >> Thanks for your work and good luck for the rest of the SoC project :) Thanks, Michal >> Nuno >> ----- Original Message ----- From: "Michal Dziemianko" >> >> To: >> Sent: Monday, June 09, 2008 12:39 PM >> Subject: [PHP-DEV] Algorithm Optimizations - string search >>> Hello, >>> Here: http://212.85.117.53/DIFF.txt is small patch that will speed >>> up following functions: >>> strpos, >>> stripos, >>> strrpos >>> strripos, >>> and probably some others (all that use zend_memnstr/php_memnstr >>> function) >>> >>> The speedup of zend_memnstr is about 8% on average (about 30% in >>> case >>> of artificial strings). >>> Functions strrpos and strripos are about 25% faster on average. >>> >>> The only drawback - it needs some additional space (size of the >>> needle). >>> >>> All functions pass all the tests. >>> >>> If it looks fine than I will apply for cvs account. >>> >>> Cheers, >>> Michal >>>