Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92911 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 45405 invoked from network); 29 Apr 2016 00:04:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Apr 2016 00:04:51 -0000 Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 68.114.190.27 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 68.114.190.27 mtaout002-public.msg.strl.va.charter.net Received: from [68.114.190.27] ([68.114.190.27:8570] helo=mtaout002-public.msg.strl.va.charter.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B5/97-28296-025A2275 for ; Thu, 28 Apr 2016 20:04:49 -0400 Received: from impout005 ([68.114.189.20]) by mtaout002.msg.strl.va.charter.net (InterMail vM.9.00.023.01 201-2473-194) with ESMTP id <20160429000445.HKDV25761.mtaout002.msg.strl.va.charter.net@impout005>; Thu, 28 Apr 2016 19:04:45 -0500 Received: from pc1 ([97.87.188.16]) by impout005 with charter.net id o04l1s0020Mfu3D0104lGw; Thu, 28 Apr 2016 19:04:45 -0500 X-Authority-Analysis: v=2.1 cv=f5rGBYCM c=1 sm=1 tr=0 a=Ay788ph35uAhBIV2K373vw==:117 a=Ay788ph35uAhBIV2K373vw==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=8nJEP1OIZ-IA:10 a=gu6fZOg2AAAA:8 a=67BIL_jfAAAA:8 a=8pif782wAAAA:8 a=NEAV23lmAAAA:8 a=sg_CkgYiutMbDQnVxRwA:9 a=9lljJ2mETIdNYCJo:21 a=RS6jc679C9hSAWil:21 a=wPNLvfGTeEIA:10 a=-FEs8UIgK8oA:10 a=NWVoK91CQyQA:10 Message-ID: <70BE415E8A7D439FBBEB3817F61577D5@pc1> To: Cc: "Dmitry Stogov" , "Nikita Popov" , "Xinchen Hui" Date: Thu, 28 Apr 2016 19:04:44 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Subject: New HashTable implementation? From: php_lists@realplain.com ("Matt Wilmas") Hi all, Last June, it was briefly mentioned about changing PHP's string hash function [1] (DJB33 *seems* pretty horrible, after all, as far as collisions...). So 8 months ago I tried almost, if not, a half-dozen of them (including Murmur3) that were simple enough to quickly toss in. The result? In all cases, I believe, fewer instructions (in zend_hash_find, and the hashing function), BUT also a slight-to-small increase in cache misses in Wordpress and other scripts... And in a test filling an array with a million "string_$i" keys (high collision pattern for DJB33?), the speed was halved by the *huge* cache miss increase. :-/ I couldn't quite understand what was happening, where, if there were fewer collisions... Misses all spread out in the hash-array? So there didn't seem to be anything useful to gain there. Now, after seeing Bogdan's hash optimization idea last month [2], and reading Nikita's blog post again, I had some ideas I'd like to try -- assuming nobody else is planning major changes. :-) Besides Nikita, I'm addressing Dmitry and Xinchen because your names are on some minor hash items on the 7.1 ideas wiki [4]. I'm thinking of a Robin Hood implementation with Universal hashing [5] (of int keys and the string hashes). I haven't touched any code yet, but think I've worked out all the details in my head, and am ready to take a stab at it. I think it's fairly simple to get the basics working; and when I see how that goes, I can do the additional optimizations I have in mind that it allows (including reduced memory, on 64-bit at least). Question: Can I use zval.u1.v.reserved ? I guess I'll find out otherwise. :-O The string hash function itself is really a separate thing [6], but fasthash [7] (not to be confused with "superfast") looks like a good one that I missed before... After thinking about things, I think we could even keep/use a 64-bit hash on 32-bit arch. Well, just wanted to mention it first if anyone has a comment. :-) Should be interesting, but no idea how it'll perform (lookups should be very, very fast (upsizing also); but cache factors and inserts/deletes are wildcards). Wish me luck!? Thanks, Matt [1] https://marc.info/?l=php-internals&m=143444845304138&w=2 [2] https://marc.info/?t=145744248100001&r=1&w=2 [4] https://wiki.php.net/php-7.1-ideas [5] https://en.wikipedia.org/wiki/Universal_hashing [6] https://github.com/rurban/smhasher [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64