Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:38161 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 21021 invoked from network); 11 Jun 2008 08:27:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Jun 2008 08:27:06 -0000 Authentication-Results: pb1.pair.com header.from=michal.dziemianko@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=michal.dziemianko@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.198.239 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: michal.dziemianko@gmail.com X-Host-Fingerprint: 209.85.198.239 rv-out-0506.google.com Received: from [209.85.198.239] ([209.85.198.239:63820] helo=rv-out-0506.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 1D/BD-26183-85C8F484 for ; Wed, 11 Jun 2008 04:27:05 -0400 Received: by rv-out-0506.google.com with SMTP id g37so3112812rvb.23 for ; Wed, 11 Jun 2008 01:27:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=MFck8YkBvdZ1eIdvcmLK/T3Q5mhmydIGVcD87sQAq0c=; b=t6XZmgZYwlO1kqrRFcu+EUdJkq4VMvlcJTRzWxthTL0uKz+dswg6wEWHAT5KnTWkSm RdTO2FEWeUZAc4VLoZPKCILF+vcGfGTMYT1yOqxqfr4Z1EBBEZCmNDRkzGKP9n4vm9Nr BxvlOIq38TGuStarBbbFpFJotA5hjoGnpO71o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=fm2BSQKXqvjbFoXtZHKcrWgp+W5TAUkZDunamt8m9tSVqGFwqDmCnc8MLUVMNtV7Jo qP4KWgx98dvbjWlmx9SFUzKc0YZyppuOdFuof4MU+w4BCz0+mQSEqpeAKy2nBYAGyE4t 3vwdMZhGa2tetNai2k77+yIIoYR62+2qnvncc= Received: by 10.114.155.1 with SMTP id c1mr6137357wae.24.1213172822337; Wed, 11 Jun 2008 01:27:02 -0700 (PDT) Received: by 10.114.88.9 with HTTP; Wed, 11 Jun 2008 01:27:01 -0700 (PDT) Message-ID: <31fe29920806110127g1fdac03fmffbbf8ac763eaec4@mail.gmail.com> Date: Wed, 11 Jun 2008 09:27:01 +0100 To: internals@lists.php.net In-Reply-To: <819912BDAE6BCB4097883B226DA473B10B0AC8B4@SACEXMV02.hq.netapp.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_26371_12500183.1213172822336" References: <7E62CA6E-83F4-4F9C-86FB-75EBE7D489C9@gmail.com> <484D36EB.9080202@macvicar.net> <819912BDAE6BCB4097883B226DA473B10B0AC8B4@SACEXMV02.hq.netapp.com> Subject: Re: [PHP-DEV] Algorithm Optimizations - string search From: michal.dziemianko@gmail.com ("Michal Dziemianko") ------=_Part_26371_12500183.1213172822336 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline Indeed it is not meant to work with UNICODE, but as an optimization for PHP5. I tried a lot of examples in ISO-8859-2, and ISO-8859-10 as these are the most interesting for me - and it worked fine. If it really causes problems while original implementation works than details are appreciated. Michal On Wed, Jun 11, 2008 at 9:01 AM, Texin, Tex wrote: > When I looked at the code, I assumed that it wasn't intended for > international use > I'll have to go back and look to give you details, but it doesn't work for > international use or unicode. > It would be ok for 8859-1. > > > -----Original Message----- > > From: Scott MacVicar [mailto:scott@macvicar.net] > > Sent: Monday, June 09, 2008 6:58 AM > > To: Nuno Lopes > > Cc: internals@lists.php.net; Michal Dziemianko > > Subject: Re: [PHP-DEV] Algorithm Optimizations - string search > > > > There is rabin-karp too but its worse case is O(nm) so that > > might not be ideal, perhaps we should try to compare all of them. > > > > Scott > > > > Nuno Lopes wrote: > > > Hi, > > > > > > So some comments: > > > - you have some problems with the indentation. We only use tabs, so > > > please stick to that. Also, there are some lines that are > > not indented > > > correctly > > > - Have you considered the Boyer-Moore algorithm? I think > > it's a little > > > faster than KMP (take a look at e.g. > > > http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/) > > > - please remove the //TUTAJ SKONCZYLEM comment > > > - revert this change (as well as a few other that are similar): > > > - for (r_end = r + Z_STRLEN_P(return_value) - 1; r < > > r_end; ) { > > > + for ( r_end = r + Z_STRLEN_P( return_value ) - 1; r > > < r_end; > > > ){ (we like small diffs, not long diffs with changes that > > also break > > > our coding standards. e.g. we don't use space after the '(' char. > > > Philip wrote a nice article about diffs at > > > http://wiki.php.net/doc/articles/whitespace) > > > - in strrpos_reverse_kmp() I think you allocate 4 bytes > > less that you > > > want > > > - I think you've too many comments.. We don't need 1 > > comment per line > > > :) > > > > > > After fixing all these points and after running the test suite (with > > > valgrind) and make sure there are no regressions, I think it's safe > > > for you to commit. Still, I would like to see some > > performance figures > > > comparing the KMP and the Boyer-Moore (or point me some > > papers about > > > the subject). > > > > > > Thanks for your work and good luck for the rest of the SoC > > project :) > > > > > > Nuno > > > > > > > > > ----- Original Message ----- From: "Michal Dziemianko" > > > > > > To: > > > Sent: Monday, June 09, 2008 12:39 PM > > > Subject: [PHP-DEV] Algorithm Optimizations - string search > > > > > > > > >> Hello, > > >> Here: http://212.85.117.53/DIFF.txt is small patch that > > will speed > > >> up following functions: > > >> strpos, > > >> stripos, > > >> strrpos > > >> strripos, > > >> and probably some others (all that use zend_memnstr/php_memnstr > > >> function) > > >> > > >> The speedup of zend_memnstr is about 8% on average (about > > 30% in case > > >> of artificial strings). > > >> Functions strrpos and strripos are about 25% faster on average. > > >> > > >> The only drawback - it needs some additional space (size > > of the needle). > > >> > > >> All functions pass all the tests. > > >> > > >> If it looks fine than I will apply for cvs account. > > >> > > >> Cheers, > > >> Michal > > > > > > > > > > -- > > PHP Internals - PHP Runtime Development Mailing List To > > unsubscribe, visit: http://www.php.net/unsub.php > > > > > -- -------------------------------------------------------------- I love MacOS X.................................................... -------------------------------------------------------------- ------=_Part_26371_12500183.1213172822336--