Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92916 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 66879 invoked from network); 29 Apr 2016 06:35:26 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Apr 2016 06:35:26 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.193 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.220.193 mail-qk0-f193.google.com Received: from [209.85.220.193] ([209.85.220.193:33834] helo=mail-qk0-f193.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9D/90-60902-EA003275 for ; Fri, 29 Apr 2016 02:35:26 -0400 Received: by mail-qk0-f193.google.com with SMTP id i7so3866717qkd.1 for ; Thu, 28 Apr 2016 23:35:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=67zo6k/jB7mQDPxP2GwOctARh+HMkF2RZ8GLYYz++QA=; b=cR0nrmhGfEi2ieRclJq38ARNCiH9vTbVVMctU85sZbKM8wcZCxRZjimhqMm9hXPNuu L7tfE+aI7fH/0mr+yIcj3NefTT88pj7QhMedZI1Mn1r4Xrtrq7fb5qGDBDXlZ4+QrPQV ohVTfleaer+NaLsUtdS6GCpPOSlszd/phboMSqQ/VAdvMC25LV1bKVyhOlJukLk4S+Qr A+D3BfmH+vmzjNG8aHOJsip09vxZ3dIVptKibVYh7GtktCMg6Wgjh7LG8XHj7dxCnmuc D0dMOumbZz1MGT5GFahfr8XHWYpmgIdKC0boSUajA9yPpJ6r5lBIFj5knAk3IxKpJEs7 cu4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=67zo6k/jB7mQDPxP2GwOctARh+HMkF2RZ8GLYYz++QA=; b=Ne6zPcTCHjF/S32A3qx1TgcmeRe9KBm05d9rJosaDki8RWQlJJQH0M4uh4ZpiQwM6x qe+CX2LMo0KQaJJKqOBUHuRpZzXJ4wsOn5vqVchqH4EsxhrpTVboWGy422Vx03QP8+19 9EIed8+Qs4EQhf0y/D9So5pcpMlXSrTeKcfidkUex7SiRttnigCnSzGmUq9ZWmucRnkA 8U9R/r31fXy/dQLvsYgt8pOOW8/gNqAHL7uMkHIFZIpONIZJzKsWn2Pz3w4RN4HF3DoU Y9fj0URuvpmf1rnX2ffC9Kkg6+J8FUkI4DaA0Qq+X/GelFM0zs5yBlk6vMh+49gMLhYP xM7Q== X-Gm-Message-State: AOPr4FVKK7PMdT2JPIWHpJ+N031gToZWI+lFgjMevUw4je17cvac5alSk4PGw0L1gvhKZfKq5wwz54gzscHzrg== X-Received: by 10.55.185.67 with SMTP id j64mr18056361qkf.205.1461911722993; Thu, 28 Apr 2016 23:35:22 -0700 (PDT) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.140.27.133 with HTTP; Thu, 28 Apr 2016 23:34:43 -0700 (PDT) In-Reply-To: <70BE415E8A7D439FBBEB3817F61577D5@pc1> References: <70BE415E8A7D439FBBEB3817F61577D5@pc1> Date: Fri, 29 Apr 2016 15:34:43 +0900 X-Google-Sender-Auth: j61wiUJE_IwOPI9CXTO4SDMPHpU Message-ID: To: Matt Wilmas Cc: "internals@lists.php.net" , Dmitry Stogov , Nikita Popov , Xinchen Hui Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] New HashTable implementation? From: yohgaki@ohgaki.net (Yasuo Ohgaki) On Fri, Apr 29, 2016 at 9:04 AM, Matt Wilmas wrote: > Last June, it was briefly mentioned about changing PHP's string hash > function [1] (DJB33 *seems* pretty horrible, after all, as far as > collisions...). So 8 months ago I tried almost, if not, a half-dozen of > them (including Murmur3) that were simple enough to quickly toss in. > > The result? In all cases, I believe, fewer instructions (in zend_hash_find, > and the hashing function), BUT also a slight-to-small increase in cache > misses in Wordpress and other scripts... > > And in a test filling an array with a million "string_$i" keys (high > collision pattern for DJB33?), the speed was halved by the *huge* cache miss > increase. :-/ > > I couldn't quite understand what was happening, where, if there were fewer > collisions... Misses all spread out in the hash-array? > > So there didn't seem to be anything useful to gain there. > > > Now, after seeing Bogdan's hash optimization idea last month [2], and > reading Nikita's blog post again, I had some ideas I'd like to try -- > assuming nobody else is planning major changes. :-) Besides Nikita, I'm > addressing Dmitry and Xinchen because your names are on some minor hash > items on the 7.1 ideas wiki [4]. > > I'm thinking of a Robin Hood implementation with Universal hashing [5] (of > int keys and the string hashes). I haven't touched any code yet, but think > I've worked out all the details in my head, and am ready to take a stab at > it. I think it's fairly simple to get the basics working; and when I see > how that goes, I can do the additional optimizations I have in mind that it > allows (including reduced memory, on 64-bit at least). > > Question: Can I use zval.u1.v.reserved ? I guess I'll find out otherwise. > :-O > > > The string hash function itself is really a separate thing [6], but fasthash > [7] (not to be confused with "superfast") looks like a good one that I > missed before... After thinking about things, I think we could even > keep/use a 64-bit hash on 32-bit arch. > > > Well, just wanted to mention it first if anyone has a comment. :-) Should > be interesting, but no idea how it'll perform (lookups should be very, very > fast (upsizing also); but cache factors and inserts/deletes are wildcards). > Wish me luck!? > > > Thanks, > Matt > > [1] https://marc.info/?l=php-internals&m=143444845304138&w=2 > [2] https://marc.info/?t=145744248100001&r=1&w=2 > [4] https://wiki.php.net/php-7.1-ideas > [5] https://en.wikipedia.org/wiki/Universal_hashing > [6] https://github.com/rurban/smhasher > [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64 Hi Matt, I've benchmarked DJB and other hash before. I don't think any crypto safe hash cannot compete against DJB hash. I fully agree that collisions with DJB hash is too easy. However, collisions will not happen a lot under normal usage. DJB hash is super fast and any hash function will be compromised as time goes by. Therefore, Nikita's proposal that detects and limits number of collisions would be the best choice. IMO. Nikita, any progress on limiting collisions for all arrays? This feature is must have item for 7.1 IMHO. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net