Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96717 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 48248 invoked from network); 3 Nov 2016 13:33:08 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Nov 2016 13:33:08 -0000 Authentication-Results: pb1.pair.com header.from=ben.coutu@zeyos.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=ben.coutu@zeyos.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zeyos.com designates 89.163.237.165 as permitted sender) X-PHP-List-Original-Sender: ben.coutu@zeyos.com X-Host-Fingerprint: 89.163.237.165 mx.zeyos.com Received: from [89.163.237.165] ([89.163.237.165:59030] helo=mx.zeyos.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 59/1B-34238-F8C3B185 for ; Thu, 03 Nov 2016 08:33:04 -0500 Received: from mx.zeyos.com (localhost [127.0.0.1]) by mx.zeyos.com (Postfix) with ESMTP id 25B225FAAA for ; Thu, 3 Nov 2016 14:33:01 +0100 (CET) Authentication-Results: mx.zeyos.com (amavisd-new); dkim=pass reason="pass (just generated, assumed good)" header.d=zeyos.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=zeyos.com; h= content-transfer-encoding:content-type:content-type:mime-version :to:subject:subject:from:from:date:date; s=dkim; t=1478179980; x=1479043981; bh=2xPX8OTqm3ZCaPDsK1uFMOJZEKyauE/X92jCKX9caYE=; b= o4DBUYmXbmOyNBO4H8Ih9MetL1sJOCzHKNLD/taXVsF99ZqXT7Nw2IcA73FEIiry vS34gbewE9zN4S/gEAVmBk64z5iD0YER0LZKAPsn1vE/NlUlbn4OMsgStw9IlhCu TPQtcG+4G/Z7ooa6JegPOp9q25X9Vh7xB4E5pgAyOtQ= X-Virus-Scanned: Debian amavisd-new at mx.zeyos.com Received: from mx.zeyos.com ([127.0.0.1]) by mx.zeyos.com (mx.zeyos.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id uQkF5GZRUjzG for ; Thu, 3 Nov 2016 14:33:00 +0100 (CET) Received: from 127.0.0.1 (srv32.dedicated.server-hosting.expert [89.163.135.32]) by mx.zeyos.com (Postfix) with ESMTPSA id D4BB95FA7E; Thu, 3 Nov 2016 14:33:00 +0100 (CET) Date: Thu, 03 Nov 2016 14:33:00 +0100 To: Levi Morrison Cc: PHP Internals , Dmitry Stogov MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID: <20161103133301.25B225FAAA@mx.zeyos.com> Subject: Re: [PHP-DEV] Algorithmic efficiency of zend_memrchr From: ben.coutu@zeyos.com (Benjamin Coutu) The type qualifier is just copy paste relic. We'd of course have to remove = const form the signature (and the function body) for this to work. Quickly = looking at zend_memrchr's call sites I don't see that it would be a huge is= sue to modify that or am I missing something? =0A=0A=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D Original =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0AFrom: Levi Morrison =0ATo: Benjamin Coutu =0ADate: Thu, 03 Nov= 2016 13:23:23 +0100=0ASubject: Re: [PHP-DEV] Algorithmic efficiency of zen= d_memrchr=0A=0A>=20 >=20 > On Thu, Nov 3, 2016 at 5:21 AM, Benjamin Coutu wrot= e: > > Hello everyone, > > > > I think there are a few improvements that we could make to some fundame= ntal algorithms. > > I'd like to point out one particular for now, to see if there is any in= terest in these kind of low level code optimizations from the community. > > > > Please consider the following code snippet extracted from > > > > https://github.com/php/php-src/blob/49412756df244d94a217853395d15e96cb6= 0e18f/Zend/zend_operators.h#L193 > > > > static zend_always_inline const void *zend_memrchr(const void *s, int c= , size_t n) > > { > > const unsigned char *e; > > if (n <=3D 0) { > > return NULL; > > } > > > > for (e =3D (const unsigned char *)s + n - 1; e >=3D (const unsigned c= har *)s; e--) { > > if (*e =3D=3D (const unsigned char)c) { > > return (const void *)e; > > } > > } > > return NULL; > > } > > > > The issue with this is that on each loop we have to do two comparison t= ests, one against the length boundary and one to check for the actual occur= rence of the search character. > > > > We can avoid checking the length boundary if we use a sentinel by setti= ng the last possible element to the character we are searching for, whereby= we guarantee that we will find it and no longer need the test inside the l= oop. After the search we reset that character and then recheck the last ite= ration step to see if we have aborted the loop because of reaching the end = (false positive) or because of an actual positive. > > > > A patch could look like this: > > > > static zend_always_inline const void *zend_memrchr(const void *s, int c= , size_t n) > > { > > const unsigned char *b; > > const unsigned char *e; > > const unsigned char t; > > > > if (n > 0) { > > /* Initialize boundary pointers */ > > b =3D (const unsigned char *)s; > > e =3D b + n - 1; > > > > /* Set sentinel */ > > t =3D *b; > > *b =3D (const unsigned char)c; > > > > /* Search */ > > while (*e !=3D (const unsigned char)c) { > > --e; > > } > > > > /* Restore sentinel */ > > *b =3D t; > > > > /* Recheck */ > > if (e !=3D b || t =3D=3D (const unsigned char)c) { > > return (const void *)e; > > } > > } > > > > return NULL; > > } > > > > For the above example we could go one step further and test an entire l= ongword at a time with alignment on longword boundaries, just like some mod= ern C library implementations do. Which begs the question: Is there a reaso= n we are not using standard memrchr here? > > > > This kind of approach might be applicable to other string handling func= tions as well. Should I look into this further? > > > > Cheers, > > > > Benjamin Coutu > > > > -- > > > > Bejamin Coutu > > ben.coutu@zeyos.com > > > > ZeyOS, Inc. > > http://www.zeyos.com > > > > > > -- > > PHP Internals - PHP Runtime Development Mailing List > > To unsubscribe, visit: http://www.php.net/unsub.php >=20 > An obvious issue with this is that not all strings are writable. The > signature of the function also specifies it is a pointer to a const so > you shouldn't be able to do this anyway.