Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96715 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 41489 invoked from network); 3 Nov 2016 11:21:11 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Nov 2016 11:21:11 -0000 Authentication-Results: pb1.pair.com smtp.mail=ben.coutu@zeyos.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ben.coutu@zeyos.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zeyos.com designates 89.163.237.165 as permitted sender) X-PHP-List-Original-Sender: ben.coutu@zeyos.com X-Host-Fingerprint: 89.163.237.165 mx.zeyos.com Received: from [89.163.237.165] ([89.163.237.165:54538] helo=mx.zeyos.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C4/4A-34238-0AD1B185 for ; Thu, 03 Nov 2016 06:21:08 -0500 Received: from mx.zeyos.com (localhost [127.0.0.1]) by mx.zeyos.com (Postfix) with ESMTP id C28155FAB4 for ; Thu, 3 Nov 2016 12:21:01 +0100 (CET) Authentication-Results: mx.zeyos.com (amavisd-new); dkim=pass reason="pass (just generated, assumed good)" header.d=zeyos.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=zeyos.com; h= content-transfer-encoding:content-type:content-type:mime-version :to:subject:subject:from:from:date:date; s=dkim; t=1478172061; x=1479036062; bh=DsCUwHWwCj3UcXMtHvDtgKQTknokbkxfGt9fmIZWthc=; b= BnW1e49tgEdc+r+iZe9IJ9ykX+vd00ul0g5cjs+bXO/s/baiz/BgsV1gyEBeK/Nc EACZ99qW4turPZXwgvm1cxCo7zd6CTh2hyQmTYxQIsSwrdTfP45aR0JEakSU5qCv WDPYK3JYxCIYYIEy3OQng1bEfoc7S3HgGHs32vmAbGQ= X-Virus-Scanned: Debian amavisd-new at mx.zeyos.com Received: from mx.zeyos.com ([127.0.0.1]) by mx.zeyos.com (mx.zeyos.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9-Eze_jJarXS for ; Thu, 3 Nov 2016 12:21:01 +0100 (CET) Received: from 127.0.0.1 (srv32.dedicated.server-hosting.expert [89.163.135.32]) by mx.zeyos.com (Postfix) with ESMTPSA id 5F9095FA98; Thu, 3 Nov 2016 12:21:01 +0100 (CET) Date: Thu, 03 Nov 2016 12:21:01 +0100 To: PHP Internals Cc: Dmitry Stogov MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID: <20161103112101.C28155FAB4@mx.zeyos.com> Subject: [PHP-DEV] Algorithmic efficiency of zend_memrchr From: ben.coutu@zeyos.com (Benjamin Coutu) Hello everyone, I think there are a few improvements that we could make to some fundamental= algorithms. I'd like to point out one particular for now, to see if there is any intere= st in these kind of low level code optimizations from the community. Please consider the following code snippet extracted from https://github.com/php/php-src/blob/49412756df244d94a217853395d15e96cb60e18= f/Zend/zend_operators.h#L193 static zend_always_inline const void *zend_memrchr(const void *s, int c, si= ze_t n) { const unsigned char *e; if (n <=3D 0) { return NULL; } for (e =3D (const unsigned char *)s + n - 1; e >=3D (const unsigned char = *)s; e--) { if (*e =3D=3D (const unsigned char)c) { return (const void *)e; } } return NULL; } The issue with this is that on each loop we have to do two comparison tests= , one against the length boundary and one to check for the actual occurrenc= e of the search character. We can avoid checking the length boundary if we use a sentinel by setting t= he last possible element to the character we are searching for, whereby we = guarantee that we will find it and no longer need the test inside the loop.= After the search we reset that character and then recheck the last iterati= on step to see if we have aborted the loop because of reaching the end (fal= se positive) or because of an actual positive. A patch could look like this: static zend_always_inline const void *zend_memrchr(const void *s, int c, si= ze_t n) { const unsigned char *b; const unsigned char *e; const unsigned char t; =20 if (n > 0) { /* Initialize boundary pointers */ b =3D (const unsigned char *)s; e =3D b + n - 1; =20 /* Set sentinel */ t =3D *b; *b =3D (const unsigned char)c; =20 /* Search */ while (*e !=3D (const unsigned char)c) { --e;=0A } =0A /* Restore sentinel */ =20 *b =3D t;=0A =20 /* Recheck */ if (e !=3D b || t =3D=3D (const unsigned char)c) { return (const void *)e; } } =20 return NULL; } For the above example we could go one step further and test an entire longw= ord at a time with alignment on longword boundaries, just like some modern = C library implementations do. Which begs the question: Is there a reason we= are not using standard memrchr here?=0A This kind of approach might be applicable to other string handling function= s as well. Should I look into this further?=0A Cheers, Benjamin Coutu=0A=0A--=20 Bejamin Coutu ben.coutu@zeyos.com ZeyOS, Inc. http://www.zeyos.com