Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:38449 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 45388 invoked from network); 20 Jun 2008 08:36:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Jun 2008 08:36:38 -0000 Authentication-Results: pb1.pair.com header.from=michal.dziemianko@gmail.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=michal.dziemianko@gmail.com; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain gmail.com does not designate 82.132.130.151 as permitted sender) X-PHP-List-Original-Sender: michal.dziemianko@gmail.com X-Host-Fingerprint: 82.132.130.151 yoda.london.02.net Linux 2.6 Received: from [82.132.130.151] ([82.132.130.151:56524] helo=mail.o2.co.uk) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D4/E4-16112-31C6B584 for ; Fri, 20 Jun 2008 04:36:37 -0400 Received: from [192.168.1.64] (78.86.103.249) by mail.o2.co.uk (8.0.013.3) (authenticated as michal_dziemianko@o2.co.uk) id 4851DD9501559DAF for internals@lists.php.net; Fri, 20 Jun 2008 09:36:32 +0100 In-Reply-To: References: <7E62CA6E-83F4-4F9C-86FB-75EBE7D489C9@gmail.com> <484D36EB.9080202@macvicar.net> <31fe29920806110114i263a47b4sd66d4849ceb22980@mail.gmail.com> <484F910A.3080208@zend.com> <33FE7D54-B113-4C5F-9315-659F8550397E@gmail.com> Mime-Version: 1.0 (Apple Message framework v753.1) X-Priority: 3 Content-Type: multipart/alternative; boundary=Apple-Mail-7--579421230 Message-ID: <11BACFBA-C2C7-4405-9092-679606B41DB1@gmail.com> Date: Fri, 20 Jun 2008 09:36:19 +0100 To: internals@lists.php.net X-Mailer: Apple Mail (2.753.1) Subject: Re: [PHP-DEV] Algorithm Optimizations - string search From: michal.dziemianko@gmail.com (Michal Dziemianko) --Apple-Mail-7--579421230 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Hello, Here goes first diff - to keep it simple and avoid confusion I will post them one-by-one after the previous is accepted/rejected. Optimization of STRRPOS/STRRIPOS for PHP_5_3. 2 things are changed: * implementation of search loop from theta(NM) to o(N), omega(NM) - it is now the same as original zend_memnstr - using zend_memrchr instead of memchr function. Here is comparison of performance: http:// 212.85.117.53/gsoc/index.php? option=com_content&view=article&id=48:strrpos-patch- proposal&catid=36:patches&Itemid=56 This implementation is faster than original for any reasonable input (a little slower for "aaab" in "aaaaaaaab", but that is really bad case), and I think in this case we should not bother with KMP/BM as I haven't found any call to this functions with haystack longer than 100 (NOTE: on MY server, over ~1h - this can be different for wikipedia guys;))). In case somebody needs KMP - I have it implemented to work "backwards" thus suitable for strrpos. * second thing is pre-lowering the haystack in strripos in case of needle len=1 - why is explained here: http://212.85.117.53/gsoc/ index.php?option=com_content&view=article&id=49:strripos-possible- patches&catid=36:patches&Itemid=56 Here is the diff: -------------------------------------------------- diff -u -d -r1.445.2.14.2.69.2.22 string.c --- ext/standard/string.c 27 May 2008 10:29:33 -0000 1.445.2.14.2.69.2.22 +++ ext/standard/string.c 20 Jun 2008 08:08:34 -0000 @@ -1876,18 +1876,19 @@ if (needle_len == 1) { /* Single character search can shortcut memcmps */ - while (e >= p) { - if (*e == *needle) { - RETURN_LONG(e - p + (offset > 0 ? offset : 0)); - } - e--; - } + if ((e = (char *)zend_memrchr(p, *needle, e - p + 1)) != NULL) { + RETURN_LONG(e - p + (offset > 0 ? offset : 0)); + } RETURN_FALSE; } while (e >= p) { - if (memcmp(e, needle, needle_len) == 0) { - RETURN_LONG(e - p + (offset > 0 ? offset : 0)); + if ((e = (char *)zend_memrchr(p, *needle, e - p + 1)) != NULL) { + if (!memcmp(needle, e, needle_len)) { + RETURN_LONG(e - p + (offset > 0 ? offset : 0)); + } + } else { + RETURN_FALSE; } e--; } @@ -1925,6 +1926,11 @@ if ((haystack_len == 0) || (needle_len == 0)) { RETURN_FALSE; } + + haystack_dup = estrndup(haystack, haystack_len); + php_strtolower(haystack_dup, haystack_len); + needle_dup = estrndup(needle, needle_len); + php_strtolower(needle_dup, needle_len); if (needle_len == 1) { /* Single character search can shortcut memcmps @@ -1932,34 +1938,36 @@ if (offset >= 0) { if (offset > haystack_len) { php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Offset is greater than the length of haystack string"); + efree(needle_dup); + efree(haystack_dup); RETURN_FALSE; } - p = haystack + offset; - e = haystack + haystack_len - 1; + p = haystack_dup + offset; + e = haystack_dup + haystack_len - 1; } else { - p = haystack; + p = haystack_dup; if (-offset > haystack_len || offset < - INT_MAX) { php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Offset is greater than the length of haystack string"); + efree(needle_dup); + efree(haystack_dup); RETURN_FALSE; } - e = haystack + haystack_len + offset; + e = haystack_dup + haystack_len + offset; } - /* Borrow that ord_needle buffer to avoid repeatedly tolower()ing needle */ - *ord_needle = tolower(*needle); + while (e >= p) { - if (tolower(*e) == *ord_needle) { + if (*e == *needle_dup) { + efree(needle_dup); + efree(haystack_dup); RETURN_LONG(e - p + (offset > 0 ? offset : 0)); } e--; } + efree(needle_dup); + efree(haystack_dup); RETURN_FALSE; } - needle_dup = estrndup(needle, needle_len); - php_strtolower(needle_dup, needle_len); - haystack_dup = estrndup(haystack, haystack_len); - php_strtolower(haystack_dup, haystack_len); - if (offset >= 0) { if (offset > haystack_len) { efree(needle_dup); @@ -1985,11 +1993,13 @@ } while (e >= p) { - if (memcmp(e, needle_dup, needle_len) == 0) { - efree(haystack_dup); - efree(needle_dup); - RETURN_LONG(e - p + (offset > 0 ? offset : 0)); - } + if ((e = (char *)zend_memrchr(p, *needle, e - p + 1)) != NULL) { + if (!memcmp(needle, e, needle_len)) { + efree(haystack_dup); + efree(needle_dup); + RETURN_LONG(e - p + (offset > 0 ? offset : 0)); + } + } e--; } ---------------------------------------------- Tested and PASS. Valgrind does not show any memory leaks, but if somebody more competent can confirm it, than I will really appreciate it. Cheers, Michal PS If somebody can take a look at that: http://212.85.117.53/gsoc/ index.php?option=com_content&view=article&id=52:small- buginconsistence&catid=38:other&Itemid=58 it would be also great. If it should be the same in both functions than I have patch and test cases for it prepared. --Apple-Mail-7--579421230--