Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92913 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 49362 invoked from network); 29 Apr 2016 00:41:07 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Apr 2016 00:41:07 -0000 Authentication-Results: pb1.pair.com smtp.mail=bobwei9@hotmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=bobwei9@hotmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain hotmail.com designates 65.55.116.18 as permitted sender) X-PHP-List-Original-Sender: bobwei9@hotmail.com X-Host-Fingerprint: 65.55.116.18 blu004-omc1s7.hotmail.com Received: from [65.55.116.18] ([65.55.116.18:52444] helo=BLU004-OMC1S7.hotmail.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 5F/48-28296-0ADA2275 for ; Thu, 28 Apr 2016 20:41:05 -0400 Received: from BLU436-SMTP222 ([65.55.116.7]) by BLU004-OMC1S7.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Thu, 28 Apr 2016 17:41:02 -0700 X-TMN: [TzJCOPyMF8dj3MNP56i8curtNle8HAHa] X-Originating-Email: [bobwei9@hotmail.com] Message-ID: MIME-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_29F306CD-94E5-4879-8812-530C253AC48C" X-Priority: 3 In-Reply-To: <70BE415E8A7D439FBBEB3817F61577D5@pc1> Date: Fri, 29 Apr 2016 02:40:57 +0200 CC: internals@lists.php.net, Dmitry Stogov , Nikita Popov , Xinchen Hui References: <70BE415E8A7D439FBBEB3817F61577D5@pc1> To: Matt Wilmas X-Mailer: Apple Mail (2.2070.6) X-OriginalArrivalTime: 29 Apr 2016 00:40:59.0968 (UTC) FILETIME=[CCE9D400:01D1A1AF] Subject: Re: [PHP-DEV] New HashTable implementation? From: bobwei9@hotmail.com (Bob Weinand) --Apple-Mail=_29F306CD-94E5-4879-8812-530C253AC48C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Hey, > Am 29.4.2016 um 02:04 schrieb Matt Wilmas : >=20 > Hi all, >=20 > Last June, it was briefly mentioned about changing PHP's string hash > function [1] (DJB33 *seems* pretty horrible, after all, as far as > collisions...). So 8 months ago I tried almost, if not, a half-dozen = of > them (including Murmur3) that were simple enough to quickly toss in. >=20 > The result? In all cases, I believe, fewer instructions (in = zend_hash_find, > and the hashing function), BUT also a slight-to-small increase in = cache > misses in Wordpress and other scripts... >=20 > And in a test filling an array with a million "string_$i" keys (high > collision pattern for DJB33?), the speed was halved by the *huge* = cache miss > increase. :-/ >=20 > I couldn't quite understand what was happening, where, if there were = fewer > collisions... Misses all spread out in the hash-array? >=20 > So there didn't seem to be anything useful to gain there. >=20 >=20 > Now, after seeing Bogdan's hash optimization idea last month [2], and > reading Nikita's blog post again, I had some ideas I'd like to try -- > assuming nobody else is planning major changes. :-) Besides Nikita, = I'm > addressing Dmitry and Xinchen because your names are on some minor = hash > items on the 7.1 ideas wiki [4]. >=20 > I'm thinking of a Robin Hood implementation with Universal hashing [5] = (of > int keys and the string hashes). I haven't touched any code yet, but = think > I've worked out all the details in my head, and am ready to take a = stab at > it. I think it's fairly simple to get the basics working; and when I = see > how that goes, I can do the additional optimizations I have in mind = that it > allows (including reduced memory, on 64-bit at least). >=20 > Question: Can I use zval.u1.v.reserved ? I guess I'll find out = otherwise. > :-O No, not really. zval.u1 gets typically overwritten on each type = assignment. (As type assigns typically happen via 32 bit type_info field directly = for performance reasons.) Eventually you may find a way to split up u2 in an useful way? >=20 > The string hash function itself is really a separate thing [6], but = fasthash > [7] (not to be confused with "superfast") looks like a good one that I > missed before... After thinking about things, I think we could even > keep/use a 64-bit hash on 32-bit arch. >=20 >=20 > Well, just wanted to mention it first if anyone has a comment. :-) = Should > be interesting, but no idea how it'll perform (lookups should be very, = very > fast (upsizing also); but cache factors and inserts/deletes are = wildcards). > Wish me luck!? Good luck! :-) >=20 > Thanks, > Matt >=20 > [1] https://marc.info/?l=3Dphp-internals&m=3D143444845304138&w=3D2 > [2] https://marc.info/?t=3D145744248100001&r=3D1&w=3D2 > [4] https://wiki.php.net/php-7.1-ideas > [5] https://en.wikipedia.org/wiki/Universal_hashing > [6] https://github.com/rurban/smhasher > [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64 Bob= --Apple-Mail=_29F306CD-94E5-4879-8812-530C253AC48C--