Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:92918 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 73710 invoked from network); 29 Apr 2016 07:52:12 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Apr 2016 07:52:12 -0000 Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 157.56.111.110 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 157.56.111.110 mail-bn1bon0110.outbound.protection.outlook.com Received: from [157.56.111.110] ([157.56.111.110:42889] helo=na01-bn1-obe.outbound.protection.outlook.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 77/81-60902-5A213275 for ; Fri, 29 Apr 2016 03:52:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=RWSoftware.onmicrosoft.com; s=selector1-zend-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=n9W4Y+fIgpr0BXno5hm/sKqsM8I3y0BUh9AU5gIcLFo=; b=lpro7gJUODWoMulzRVmLLAqMda5HfRfiggtaiIHxoA+HJhLW7x7nYNc83W0OYd/vs5vz9KICtOMZ1Mwuphjxz+3m9aBcgeWgUQHllZkyCcjXe1L9weChNfwSAoAuxJU38rC0ES6vjMZOnTitECpTrJEkIwJ53FleX7JndqX56Ck= Received: from BY2PR0201MB1784.namprd02.prod.outlook.com (10.163.72.26) by BY2PR0201MB1783.namprd02.prod.outlook.com (10.163.72.25) with Microsoft SMTP Server (TLS) id 15.1.477.8; Fri, 29 Apr 2016 07:51:54 +0000 Received: from BY2PR0201MB1784.namprd02.prod.outlook.com ([10.163.72.26]) by BY2PR0201MB1784.namprd02.prod.outlook.com ([10.163.72.26]) with mapi id 15.01.0477.014; Fri, 29 Apr 2016 07:51:54 +0000 To: Matt Wilmas , "internals@lists.php.net" CC: Nikita Popov , Xinchen Hui Thread-Topic: New HashTable implementation? Thread-Index: AQHRoaq+UID7yar2ukuSm9Qd4neWEp+gkoux Date: Fri, 29 Apr 2016 07:51:53 +0000 Message-ID: References: <70BE415E8A7D439FBBEB3817F61577D5@pc1> In-Reply-To: <70BE415E8A7D439FBBEB3817F61577D5@pc1> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: realplain.com; dkim=none (message not signed) header.d=none;realplain.com; dmarc=none action=none header.from=zend.com; x-originating-ip: [132.245.81.165] x-ms-office365-filtering-correlation-id: 6bd6b6dd-46ac-4b68-1305-08d37003219f x-microsoft-exchange-diagnostics: 1;BY2PR0201MB1783;5:4TY9tU/pnMoeNW97eXuiRVDu7aDWyvUPYghiXFAO46Kwp5sk90s/TQItrnwAtzIEfwWKtpU3L5nZq0PlsByo/jx4nx7xqshevbVXF2NE+eYayrP9C0qbKfug0hA9z+KsD9BS0RijHGQl5oasZSurtQ==;24:uiNy/4UgbHSP463Dxrmz8lSOhFkX4GvIXmfj3xYlCVXc5mVVe2BmWDIlzmGrcN2Y2xh4MDQQZbOmsMOsL/Lohl/eVussuIEn5HNCkWaUHI8=;7:CLwNN6mRI07GAH/l7BxAvYansuiD5nYFjDscXwdQXd1mlOROpT+JtReRD4ISMWw/g2glqAOjsQIHsaY8d6Ovv39EJOjJQnSxLuCezH32NP5lVylpoOj97SPE2K+mWU9ihoNgJHNKdip5nmP1mt0TIoSlVEUD4hEYw4wc76TTsUWDpPX9QbdWBjjwwuLa9SqV x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR0201MB1783; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(9101521072)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:BY2PR0201MB1783;BCL:0;PCL:0;RULEID:;SRVR:BY2PR0201MB1783; x-forefront-prvs: 0927AA37C7 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(51694002)(377454003)(53754006)(102836003)(5004730100002)(2906002)(6116002)(586003)(3846002)(77096005)(1220700001)(1096002)(99286002)(86362001)(106116001)(5008740100001)(87936001)(9686002)(76576001)(3280700002)(3660700001)(33656002)(11100500001)(4326007)(81166005)(15975445007)(74316001)(3480700004)(2900100001)(122556002)(5002640100001)(5003600100002)(2501003)(189998001)(2950100001)(50986999)(66066001)(54356999)(76176999)(5001770100001)(19580405001)(19580395003)(10400500002)(6606295002);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR0201MB1783;H:BY2PR0201MB1784.namprd02.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: zend.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Apr 2016 07:51:53.4720 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 32210298-c08b-4829-8097-6b12c025a892 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR0201MB1783 Subject: Re: New HashTable implementation? From: dmitry@zend.com (Dmitry Stogov) Hi Matt, I also tried different hash functions (including mumur3) and came to simila= r result. Faster function but more hash collisions. Actually these collisions occurs not because of different hash values, but = because of small hash array size. Also the way we transform hash value into hash array index (use low bits) a= lso may matter. If we decrease hash load-factor, we get less collisions, but use more memor= y and this also leads to cache misses increase. You are welcome to try Robin-Hood, in case you get positive results we may = switch to it (however I'm sceptical) Thanks. Dmitry. ________________________________________ From: Matt Wilmas Sent: Friday, April 29, 2016 3:04:44 AM To: internals@lists.php.net Cc: Dmitry Stogov; Nikita Popov; Xinchen Hui Subject: New HashTable implementation? Hi all, Last June, it was briefly mentioned about changing PHP's string hash function [1] (DJB33 *seems* pretty horrible, after all, as far as collisions...). So 8 months ago I tried almost, if not, a half-dozen of them (including Murmur3) that were simple enough to quickly toss in. The result? In all cases, I believe, fewer instructions (in zend_hash_find= , and the hashing function), BUT also a slight-to-small increase in cache misses in Wordpress and other scripts... And in a test filling an array with a million "string_$i" keys (high collision pattern for DJB33?), the speed was halved by the *huge* cache mis= s increase. :-/ I couldn't quite understand what was happening, where, if there were fewer collisions... Misses all spread out in the hash-array? So there didn't seem to be anything useful to gain there. Now, after seeing Bogdan's hash optimization idea last month [2], and reading Nikita's blog post again, I had some ideas I'd like to try -- assuming nobody else is planning major changes. :-) Besides Nikita, I'm addressing Dmitry and Xinchen because your names are on some minor hash items on the 7.1 ideas wiki [4]. I'm thinking of a Robin Hood implementation with Universal hashing [5] (of int keys and the string hashes). I haven't touched any code yet, but think I've worked out all the details in my head, and am ready to take a stab at it. I think it's fairly simple to get the basics working; and when I see how that goes, I can do the additional optimizations I have in mind that it allows (including reduced memory, on 64-bit at least). Question: Can I use zval.u1.v.reserved ? I guess I'll find out otherwise. :-O The string hash function itself is really a separate thing [6], but fasthas= h [7] (not to be confused with "superfast") looks like a good one that I missed before... After thinking about things, I think we could even keep/use a 64-bit hash on 32-bit arch. Well, just wanted to mention it first if anyone has a comment. :-) Should be interesting, but no idea how it'll perform (lookups should be very, very fast (upsizing also); but cache factors and inserts/deletes are wildcards). Wish me luck!? Thanks, Matt [1] https://marc.info/?l=3Dphp-internals&m=3D143444845304138&w=3D2 [2] https://marc.info/?t=3D145744248100001&r=3D1&w=3D2 [4] https://wiki.php.net/php-7.1-ideas [5] https://en.wikipedia.org/wiki/Universal_hashing [6] https://github.com/rurban/smhasher [7] https://github.com/rurban/smhasher/blob/master/doc/fasthash64