Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96716 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 45006 invoked from network); 3 Nov 2016 12:23:30 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Nov 2016 12:23:30 -0000 Authentication-Results: pb1.pair.com header.from=morrison.levi@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=morrison.levi@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.213.174 as permitted sender) X-PHP-List-Original-Sender: morrison.levi@gmail.com X-Host-Fingerprint: 209.85.213.174 mail-yb0-f174.google.com Received: from [209.85.213.174] ([209.85.213.174:34154] helo=mail-yb0-f174.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C4/BA-34238-F3C2B185 for ; Thu, 03 Nov 2016 07:23:27 -0500 Received: by mail-yb0-f174.google.com with SMTP id f97so15234274ybi.1 for ; Thu, 03 Nov 2016 05:23:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=N+HRr23MC26X5QcYN4z6oFAS1SEUQz5JQ63HRm8UAX4=; b=TeVxIs5q/K6t1zDTJAAvllLZXfDI1c/yN2izRMJXD38joIqHnZHyWB5hatr4MvvfLU J0dbjSpYsLqGhD3go+f0mmpnfnGZPrx71Fr0dpjMp2VXGd9RK+xRpBVk9ZzArN2LaxfG foxID7iozQPyRiBxFiLNvHKTsAoFIVtJrGnIjrVO75524lZGNdVfyyZJaWjulSzy1ly5 QwrnSUnE/EOqblbfwvfEwY1ALwwYkYlg3CNPWDa4oSsiALHN3UrBoeXzly2E1U89k1fk IBTZqH5d1KRNjlLeOvkpJqvbncwUGOPIoK9qO9CUX6MMsSbnvB01mooUfITkbBPlPxz1 wRgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=N+HRr23MC26X5QcYN4z6oFAS1SEUQz5JQ63HRm8UAX4=; b=ZE0t5lvEFDkvNaOMrny4qomizNgoQmY9pDYzVbfhe6tx95eZySwXQ7+ZH7kH7KGmQ/ uWc/8fmyB5+QenaCdwXj1HgkX1bG4A7+eq6IFeMovfjxovSBIhkgtyfs/R/Gf1KME74X OXny30n8KceyhksU3LkxoQaHG3fRkJYfVxD0+M+HHDy0oUvV5MOKpo34bWRHtnxV8Hr1 6uOKG08YZ4mExWlg4+qIcjoI1Lplp2lRxE2p9bx5Ndw78Si3EE1Wht8vl/tEfzLZvY+s Ka2wenWq7AV3DBaiE3O/smm5KQfhTp6Y6Jm4bc62lT4J90SvjjBlkyEQnQsgrMPL/LQJ DpQQ== X-Gm-Message-State: ABUngved/9cXHHXVkzwCxeiDTMa74mVzmGQ+Jg0hT4ulKqmrvJV9iwp/GmWWJQ4YVK95ziySiRMyz2aGzM0TkQ== X-Received: by 10.37.97.197 with SMTP id v188mr7321341ybb.146.1478175804382; Thu, 03 Nov 2016 05:23:24 -0700 (PDT) MIME-Version: 1.0 Sender: morrison.levi@gmail.com Received: by 10.13.235.81 with HTTP; Thu, 3 Nov 2016 05:23:23 -0700 (PDT) In-Reply-To: <20161103112101.C28155FAB4@mx.zeyos.com> References: <20161103112101.C28155FAB4@mx.zeyos.com> Date: Thu, 3 Nov 2016 06:23:23 -0600 X-Google-Sender-Auth: 8GuKafMgnwJda_UDcjGdIv3TPVo Message-ID: To: Benjamin Coutu Cc: PHP Internals , Dmitry Stogov Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Algorithmic efficiency of zend_memrchr From: levim@php.net (Levi Morrison) On Thu, Nov 3, 2016 at 5:21 AM, Benjamin Coutu wrote: > Hello everyone, > > I think there are a few improvements that we could make to some fundament= al algorithms. > I'd like to point out one particular for now, to see if there is any inte= rest in these kind of low level code optimizations from the community. > > Please consider the following code snippet extracted from > > https://github.com/php/php-src/blob/49412756df244d94a217853395d15e96cb60e= 18f/Zend/zend_operators.h#L193 > > static zend_always_inline const void *zend_memrchr(const void *s, int c, = size_t n) > { > const unsigned char *e; > if (n <=3D 0) { > return NULL; > } > > for (e =3D (const unsigned char *)s + n - 1; e >=3D (const unsigned cha= r *)s; e--) { > if (*e =3D=3D (const unsigned char)c) { > return (const void *)e; > } > } > return NULL; > } > > The issue with this is that on each loop we have to do two comparison tes= ts, one against the length boundary and one to check for the actual occurre= nce of the search character. > > We can avoid checking the length boundary if we use a sentinel by setting= the last possible element to the character we are searching for, whereby w= e guarantee that we will find it and no longer need the test inside the loo= p. After the search we reset that character and then recheck the last itera= tion step to see if we have aborted the loop because of reaching the end (f= alse positive) or because of an actual positive. > > A patch could look like this: > > static zend_always_inline const void *zend_memrchr(const void *s, int c, = size_t n) > { > const unsigned char *b; > const unsigned char *e; > const unsigned char t; > > if (n > 0) { > /* Initialize boundary pointers */ > b =3D (const unsigned char *)s; > e =3D b + n - 1; > > /* Set sentinel */ > t =3D *b; > *b =3D (const unsigned char)c; > > /* Search */ > while (*e !=3D (const unsigned char)c) { > --e; > } > > /* Restore sentinel */ > *b =3D t; > > /* Recheck */ > if (e !=3D b || t =3D=3D (const unsigned char)c) { > return (const void *)e; > } > } > > return NULL; > } > > For the above example we could go one step further and test an entire lon= gword at a time with alignment on longword boundaries, just like some moder= n C library implementations do. Which begs the question: Is there a reason = we are not using standard memrchr here? > > This kind of approach might be applicable to other string handling functi= ons as well. Should I look into this further? > > Cheers, > > Benjamin Coutu > > -- > > Bejamin Coutu > ben.coutu@zeyos.com > > ZeyOS, Inc. > http://www.zeyos.com > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php An obvious issue with this is that not all strings are writable. The signature of the function also specifies it is a pointer to a const so you shouldn't be able to do this anyway.