Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:38451 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 54270 invoked from network); 20 Jun 2008 09:17:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Jun 2008 09:17:37 -0000 Authentication-Results: pb1.pair.com header.from=nlopess@php.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=nlopess@php.net; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 212.55.154.24 as permitted sender) X-PHP-List-Original-Sender: nlopess@php.net X-Host-Fingerprint: 212.55.154.24 relay4.ptmail.sapo.pt Linux 2.4/2.6 Received: from [212.55.154.24] ([212.55.154.24:60918] helo=sapo.pt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E3/96-16112-FA57B584 for ; Fri, 20 Jun 2008 05:17:37 -0400 Received: (qmail 6718 invoked from network); 20 Jun 2008 09:17:30 -0000 Received: from unknown (HELO sapo.pt) (10.134.37.163) by relay4 with SMTP; 20 Jun 2008 09:17:30 -0000 Received: (qmail 30922 invoked from network); 20 Jun 2008 09:17:30 -0000 X-AntiVirus: PTMail-AV 0.3-0.92.0 X-Virus-Status: Clean (0.01381 seconds) Received: from unknown (HELO pc07654) (nunoplopes@sapo.pt@[85.240.55.19]) (envelope-sender ) by mta13 (qmail-ldap-1.03) with SMTP for ; 20 Jun 2008 09:17:30 -0000 Message-ID: To: "Michal Dziemianko" , References: <7E62CA6E-83F4-4F9C-86FB-75EBE7D489C9@gmail.com> <484D36EB.9080202@macvicar.net> <31fe29920806110114i263a47b4sd66d4849ceb22980@mail.gmail.com> <484F910A.3080208@zend.com> <33FE7D54-B113-4C5F-9315-659F8550397E@gmail.com> <11BACFBA-C2C7-4405-9092-679606B41DB1@gmail.com> In-Reply-To: <11BACFBA-C2C7-4405-9092-679606B41DB1@gmail.com> Date: Fri, 20 Jun 2008 10:17:23 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6001.18000 X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6001.18000 Subject: Re: [PHP-DEV] Algorithm Optimizations - string search From: nlopess@php.net ("Nuno Lopes") Hi, Both patches seem ok to me. Please ask for a CVS account and prepare yourself to commit (when your mentor says so) :) In the second patch maybe you could use zend_memrchr() as well, but you should get the same results. Also I think that bundling the backwards KMP would be a nice addition. People use PHP for other things than regular web scripting.. Nuno ----- Original Message ----- From: "Michal Dziemianko" To: Sent: Friday, June 20, 2008 9:36 AM Subject: Re: [PHP-DEV] Algorithm Optimizations - string search > Hello, > Here goes first diff - to keep it simple and avoid confusion I will > post them one-by-one after the previous is accepted/rejected. > Optimization of STRRPOS/STRRIPOS for PHP_5_3. > > 2 things are changed: > * implementation of search loop from theta(NM) to o(N), omega(NM) - > it is now the same as original zend_memnstr - using zend_memrchr > instead of memchr function. Here is comparison of performance: http:// > 212.85.117.53/gsoc/index.php? > option=com_content&view=article&id=48:strrpos-patch- > proposal&catid=36:patches&Itemid=56 > This implementation is faster than original for any reasonable input > (a little slower for "aaab" in "aaaaaaaab", but that is really bad > case), and I think in this case we should not bother with KMP/BM as I > haven't found any call to this functions with haystack longer than > 100 (NOTE: on MY server, over ~1h - this can be different for > wikipedia guys;))). In case somebody needs KMP - I have it > implemented to work "backwards" thus suitable for strrpos. > > * second thing is pre-lowering the haystack in strripos in case of > needle len=1 - why is explained here: http://212.85.117.53/gsoc/ > index.php?option=com_content&view=article&id=49:strripos-possible- > patches&catid=36:patches&Itemid=56 > > Here is the diff: > -------------------------------------------------- > diff -u -d -r1.445.2.14.2.69.2.22 string.c > --- ext/standard/string.c 27 May 2008 10:29:33 -0000 > 1.445.2.14.2.69.2.22 > +++ ext/standard/string.c 20 Jun 2008 08:08:34 -0000 > @@ -1876,18 +1876,19 @@ > > if (needle_len == 1) { > /* Single character search can shortcut memcmps */ > - while (e >= p) { > - if (*e == *needle) { > - RETURN_LONG(e - p + (offset > 0 ? > offset : 0)); > - } > - e--; > - } > + if ((e = (char *)zend_memrchr(p, *needle, e - p + > 1)) != NULL) { > + RETURN_LONG(e - p + (offset > 0 ? offset : 0)); > + } > RETURN_FALSE; > } > > while (e >= p) { > - if (memcmp(e, needle, needle_len) == 0) { > - RETURN_LONG(e - p + (offset > 0 ? offset : 0)); > + if ((e = (char *)zend_memrchr(p, *needle, e - p + > 1)) != NULL) { > + if (!memcmp(needle, e, needle_len)) { > + RETURN_LONG(e - p + (offset > 0 ? > offset : 0)); > + } > + } else { > + RETURN_FALSE; > } > e--; > } > @@ -1925,6 +1926,11 @@ > if ((haystack_len == 0) || (needle_len == 0)) { > RETURN_FALSE; > } > + > + haystack_dup = estrndup(haystack, haystack_len); > + php_strtolower(haystack_dup, haystack_len); > + needle_dup = estrndup(needle, needle_len); > + php_strtolower(needle_dup, needle_len); > > if (needle_len == 1) { > /* Single character search can shortcut memcmps > @@ -1932,34 +1938,36 @@ > if (offset >= 0) { > if (offset > haystack_len) { > php_error_docref(NULL TSRMLS_CC, > E_NOTICE, "Offset is greater than the length of haystack string"); > + efree(needle_dup); > + efree(haystack_dup); > RETURN_FALSE; > } > - p = haystack + offset; > - e = haystack + haystack_len - 1; > + p = haystack_dup + offset; > + e = haystack_dup + haystack_len - 1; > } else { > - p = haystack; > + p = haystack_dup; > if (-offset > haystack_len || offset < - > INT_MAX) { > php_error_docref(NULL TSRMLS_CC, > E_NOTICE, "Offset is greater than the length of haystack string"); > + efree(needle_dup); > + efree(haystack_dup); > RETURN_FALSE; > } > - e = haystack + haystack_len + offset; > + e = haystack_dup + haystack_len + offset; > } > - /* Borrow that ord_needle buffer to avoid repeatedly > tolower()ing needle */ > - *ord_needle = tolower(*needle); > + > while (e >= p) { > - if (tolower(*e) == *ord_needle) { > + if (*e == *needle_dup) { > + efree(needle_dup); > + efree(haystack_dup); > RETURN_LONG(e - p + (offset > 0 ? > offset : 0)); > } > e--; > } > + efree(needle_dup); > + efree(haystack_dup); > RETURN_FALSE; > } > > - needle_dup = estrndup(needle, needle_len); > - php_strtolower(needle_dup, needle_len); > - haystack_dup = estrndup(haystack, haystack_len); > - php_strtolower(haystack_dup, haystack_len); > - > if (offset >= 0) { > if (offset > haystack_len) { > efree(needle_dup); > @@ -1985,11 +1993,13 @@ > } > > while (e >= p) { > - if (memcmp(e, needle_dup, needle_len) == 0) { > - efree(haystack_dup); > - efree(needle_dup); > - RETURN_LONG(e - p + (offset > 0 ? offset : 0)); > - } > + if ((e = (char *)zend_memrchr(p, *needle, e - p + > 1)) != NULL) { > + if (!memcmp(needle, e, needle_len)) { > + efree(haystack_dup); > + efree(needle_dup); > + RETURN_LONG(e - p + (offset > 0 ? > offset : 0)); > + } > + } > e--; > } > ---------------------------------------------- > > Tested and PASS. Valgrind does not show any memory leaks, but if > somebody more competent can confirm it, than I will really appreciate > it. > Cheers, > Michal > > PS If somebody can take a look at that: http://212.85.117.53/gsoc/ > index.php?option=com_content&view=article&id=52:small- > buginconsistence&catid=38:other&Itemid=58 it would be also great. If > it should be the same in both functions than I have patch and test > cases for it prepared.