Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92950 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 84978 invoked from network); 29 Apr 2016 16:53:04 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Apr 2016 16:53:04 -0000 Authentication-Results: pb1.pair.com header.from=jared.williams1@ntlworld.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=jared.williams1@ntlworld.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ntlworld.com designates 80.0.253.70 as permitted sender) X-PHP-List-Original-Sender: jared.williams1@ntlworld.com X-Host-Fingerprint: 80.0.253.70 know-smtprelay-omc-6.server.virginmedia.net Received: from [80.0.253.70] ([80.0.253.70:56962] helo=know-smtprelay-omc-6.server.virginmedia.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F0/C0-26386-C6193275 for ; Fri, 29 Apr 2016 12:53:02 -0400 Received: from [192.168.1.102] ([81.101.128.161]) by know-smtprelay-6-imp with bizsmtp id oGsx1s0103V4BlM01GsyVE; Fri, 29 Apr 2016 17:52:58 +0100 X-Originating-IP: [81.101.128.161] X-Spam: 0 X-Authority: v=2.1 cv=AchMZCjG c=1 sm=1 tr=0 a=F/DkaBmB3CRA5jdVf8OlYA==:117 a=F/DkaBmB3CRA5jdVf8OlYA==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=IkcTkHD0fZMA:10 a=gu6fZOg2AAAA:8 a=67BIL_jfAAAA:8 a=8pif782wAAAA:8 a=NEAV23lmAAAA:8 a=K7hBvmqnAl7tILOzsYsA:9 a=imx__O5qD8JvYnsU:21 a=-rvpU6T2Tpz5FBgE:21 a=QEXdDO2ut3YA:10 a=-FEs8UIgK8oA:10 a=NWVoK91CQyQA:10 Message-ID: <1461948777.27972.5.camel@ntlworld.com> To: Matt Wilmas , internals@lists.php.net Cc: Dmitry Stogov , Nikita Popov , Xinchen Hui Date: Fri, 29 Apr 2016 17:52:57 +0100 In-Reply-To: <70BE415E8A7D439FBBEB3817F61577D5@pc1> References: <70BE415E8A7D439FBBEB3817F61577D5@pc1> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2-0ubuntu1 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] New HashTable implementation? From: jared.williams1@ntlworld.com (Jared Williams) On Thu, 2016-04-28 at 19:04 -0500, Matt Wilmas wrote: > Hi all, > > Last June, it was briefly mentioned about changing PHP's string hash > function [1] (DJB33 *seems* pretty horrible, after all, as far as > collisions...).  So 8 months ago I tried almost, if not, a half-dozen > of > them (including Murmur3) that were simple enough to quickly toss in. > > The result?  In all cases, I believe, fewer instructions (in > zend_hash_find, > and the hashing function), BUT also a slight-to-small increase in > cache > misses in Wordpress and other scripts... > > And in a test filling an array with a million "string_$i" keys (high > collision pattern for DJB33?), the speed was halved by the *huge* > cache miss > increase. :-/ > > I couldn't quite understand what was happening, where, if there were > fewer > collisions...  Misses all spread out in the hash-array? > > So there didn't seem to be anything useful to gain there. > > > Now, after seeing Bogdan's hash optimization idea last month [2], and > reading Nikita's blog post again, I had some ideas I'd like to try -- > assuming nobody else is planning major changes. :-)  Besides Nikita, > I'm > addressing Dmitry and Xinchen because your names are on some minor > hash > items on the 7.1 ideas wiki [4]. > > I'm thinking of a Robin Hood implementation with Universal hashing > [5] (of > int keys and the string hashes).  I haven't touched any code yet, but > think > I've worked out all the details in my head, and am ready to take a > stab at > it.  I think it's fairly simple to get the basics working; and when I > see > how that goes, I can do the additional optimizations I have in mind > that it > allows (including reduced memory, on 64-bit at least). > > Question: Can I use zval.u1.v.reserved ?  I guess I'll find out > otherwise. > :-O > > > The string hash function itself is really a separate thing [6], but > fasthash > [7] (not to be confused with "superfast") looks like a good one that > I > missed before...  After thinking about things, I think we could even > keep/use a 64-bit hash on 32-bit arch. > > > Well, just wanted to mention it first if anyone has a comment. :- > )  Should > be interesting, but no idea how it'll perform (lookups should be > very, very > fast (upsizing also); but cache factors and inserts/deletes are > wildcards). > Wish me luck!? > > > Thanks, > Matt > > [1] https://marc.info/?l=php-internals&m=143444845304138&w=2 > [2] https://marc.info/?t=145744248100001&r=1&w=2 > [4] https://wiki.php.net/php-7.1-ideas > [5] https://en.wikipedia.org/wiki/Universal_hashing > [6] https://github.com/rurban/smhasher > [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64 > > Have you taken a look a go's internal hashing function? On platforms supporting AES-NI they use the AESENC instruction to get a fast hash, but with also some protection against collision attacks.  https://github.com/golang/go/blob/0104a31b8fbcbe52728a08867b26415d282c3 5d2/src/runtime/asm_amd64.s#L870 Jared