Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92927 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 1878 invoked from network); 29 Apr 2016 11:02:04 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Apr 2016 11:02:04 -0000 Authentication-Results: pb1.pair.com smtp.mail=bogdan.andone@intel.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=bogdan.andone@intel.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain intel.com designates 134.134.136.24 as permitted sender) X-PHP-List-Original-Sender: bogdan.andone@intel.com X-Host-Fingerprint: 134.134.136.24 mga09.intel.com Received: from [134.134.136.24] ([134.134.136.24:30032] helo=mga09.intel.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 07/81-26386-72F33275 for ; Fri, 29 Apr 2016 07:02:02 -0400 Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP; 29 Apr 2016 04:01:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,551,1455004800"; d="scan'208";a="794950806" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga003.jf.intel.com with ESMTP; 29 Apr 2016 04:01:55 -0700 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.172]) by IRSMSX107.ger.corp.intel.com ([169.254.10.228]) with mapi id 14.03.0248.002; Fri, 29 Apr 2016 12:01:54 +0100 To: Dmitry Stogov , Matt Wilmas , "internals@lists.php.net" CC: Nikita Popov , Xinchen Hui Thread-Topic: New HashTable implementation? Thread-Index: AQHRoew3HR1WRx8eXEyl1J6Pm7lxm5+guXkg Date: Fri, 29 Apr 2016 11:01:53 +0000 Message-ID: <0ABC26E371A76440A370CFC5EB1056CC40F68468@IRSMSX106.ger.corp.intel.com> References: <70BE415E8A7D439FBBEB3817F61577D5@pc1> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: RE: New HashTable implementation? From: bogdan.andone@intel.com ("Andone, Bogdan") > -----Original Message----- > From: Dmitry Stogov [mailto:dmitry@zend.com] > Sent: Friday, April 29, 2016 10:52 AM > To: Matt Wilmas ; internals@lists.php.net > Cc: Nikita Popov ; Xinchen Hui > Subject: [PHP-DEV] Re: New HashTable implementation? >=20 > Hi Matt, >=20 > I also tried different hash functions (including mumur3) and came to simi= lar > result. > Faster function but more hash collisions. > Actually these collisions occurs not because of different hash values, bu= t > because of small hash array size. > Also the way we transform hash value into hash array index (use low bits) > also may matter. > If we decrease hash load-factor, we get less collisions, but use more mem= ory > and this also leads to cache misses increase. >=20 > You are welcome to try Robin-Hood, in case you get positive results we ma= y > switch to it (however I'm sceptical) >=20 > Thanks. Dmitry. Hi Matt, Just two cents on my side after playing around a little bit as well with th= e hash tables and functions :) ... As Dmitry stated with different words you can have a hash function with a g= reat quality but unfortunately you will still have a lot of collisions as o= nly a reduced number of bits of the hash value will be used for the lookup = in small hash tables. Sometimes browsing collisions will be cheaper than ca= lculating gold hash values.=20 But browsing collisions are always cheap, I think. Another experiment I did= based on the observation that a key is looked up 2-3 consecutive times fre= quently, was to have a simple LRU management of colliding bucket lists; the= overhead was quite small (just the list refresh in case of a collision whe= re the searched bucket was not in the first position) but big enough to gen= erate overall worse results. What we see at microarchitecture level of IA when running PHP7 on Wordpress= for ex, is that branch miss-predictions and ICACHE misses are very high; o= verall we can talk about a kind of BPU (branch prediction unit) and ICACHE = saturation; adding complexity to the implementation meaning bigger code and= /or more conditional branches will have negative impacts if there is not an= important gain in having less instructions executed.=20 I write this just with the hope of giving some hints on possible gains/loss= es in your attempts. :-) I will be not surprised if simpler and crappy hash functions together with = cheap collision solver will drive to an overall better speed than other sop= histicated and theoretical better approaches. Hope it helps, Bogdan >=20 > ________________________________________ > From: Matt Wilmas > Sent: Friday, April 29, 2016 3:04:44 AM > To: internals@lists.php.net > Cc: Dmitry Stogov; Nikita Popov; Xinchen Hui > Subject: New HashTable implementation? >=20 > Hi all, >=20 > Last June, it was briefly mentioned about changing PHP's string hash func= tion > [1] (DJB33 *seems* pretty horrible, after all, as far as collisions...). = So 8 > months ago I tried almost, if not, a half-dozen of them (including Murmur= 3) > that were simple enough to quickly toss in. >=20 > The result? In all cases, I believe, fewer instructions (in zend_hash_fi= nd, and > the hashing function), BUT also a slight-to-small increase in cache misse= s in > Wordpress and other scripts... >=20 > And in a test filling an array with a million "string_$i" keys (high coll= ision > pattern for DJB33?), the speed was halved by the *huge* cache miss > increase. :-/ >=20 > I couldn't quite understand what was happening, where, if there were fewe= r > collisions... Misses all spread out in the hash-array? >=20 > So there didn't seem to be anything useful to gain there. >=20 >=20 > Now, after seeing Bogdan's hash optimization idea last month [2], and > reading Nikita's blog post again, I had some ideas I'd like to try -- ass= uming > nobody else is planning major changes. :-) Besides Nikita, I'm addressin= g > Dmitry and Xinchen because your names are on some minor hash items on > the 7.1 ideas wiki [4]. >=20 > I'm thinking of a Robin Hood implementation with Universal hashing [5] (o= f > int keys and the string hashes). I haven't touched any code yet, but thi= nk I've > worked out all the details in my head, and am ready to take a stab at it.= I > think it's fairly simple to get the basics working; and when I see how th= at > goes, I can do the additional optimizations I have in mind that it allows > (including reduced memory, on 64-bit at least). >=20 > Question: Can I use zval.u1.v.reserved ? I guess I'll find out otherwise= . > :-O >=20 >=20 > The string hash function itself is really a separate thing [6], but fasth= ash [7] > (not to be confused with "superfast") looks like a good one that I missed > before... After thinking about things, I think we could even keep/use a = 64-bit > hash on 32-bit arch. >=20 >=20 > Well, just wanted to mention it first if anyone has a comment. :-) Shoul= d be > interesting, but no idea how it'll perform (lookups should be very, very = fast > (upsizing also); but cache factors and inserts/deletes are wildcards). > Wish me luck!? >=20 >=20 > Thanks, > Matt >=20 > [1] https://marc.info/?l=3Dphp-internals&m=3D143444845304138&w=3D2 > [2] https://marc.info/?t=3D145744248100001&r=3D1&w=3D2 > [4] https://wiki.php.net/php-7.1-ideas > [5] https://en.wikipedia.org/wiki/Universal_hashing > [6] https://github.com/rurban/smhasher > [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64 >=20 >=20 > -- > PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visi= t: > http://www.php.net/unsub.php