Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:51180 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 7063 invoked from network); 1 Jan 2011 12:30:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Jan 2011 12:30:14 -0000 Authentication-Results: pb1.pair.com header.from=marcin.babij@nasza-klasa.pl; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=marcin.babij@nasza-klasa.pl; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain nasza-klasa.pl designates 209.85.215.42 as permitted sender) X-PHP-List-Original-Sender: marcin.babij@nasza-klasa.pl X-Host-Fingerprint: 209.85.215.42 mail-ew0-f42.google.com Received: from [209.85.215.42] ([209.85.215.42:36012] helo=mail-ew0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 0F/48-30618-45E1F1D4 for ; Sat, 01 Jan 2011 07:30:13 -0500 Received: by ewy1 with SMTP id 1so5309269ewy.29 for ; Sat, 01 Jan 2011 04:30:09 -0800 (PST) Received: by 10.213.9.79 with SMTP id k15mr15540546ebk.27.1293885008508; Sat, 01 Jan 2011 04:30:08 -0800 (PST) Received: from laptop (90-156-89-180.internetia.net.pl [90.156.89.180]) by mx.google.com with ESMTPS id u1sm13064714eeh.10.2011.01.01.04.30.06 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 01 Jan 2011 04:30:07 -0800 (PST) Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: "internals@lists.php.net" References: <4D1E56E1.5090208@sugarcrm.com> Date: Sat, 01 Jan 2011 13:30:03 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Message-ID: In-Reply-To: <4D1E56E1.5090208@sugarcrm.com> User-Agent: Opera Mail/11.00 (Win32) Subject: Re: [PHP-DEV] Re: Zend engine's hashtable performance tweaks From: marcin.babij@nasza-klasa.pl ("Marcin Babij") > Hi! > >> Sorry for no attachments in previous message, I think my attachments >> weren't redirected with message by lists.php.net email confirmation >> system. I send them again, and for sure I attach links to public copy of >> them over HTTP: >> https://gist.github.com/761094 - php-5.3.4-hashtable-optimization.patch >> https://gist.github.com/761096 - apc-3.1.6-hashtable-optimization.patch > > I just took a quick look, so some preliminary notes: > > 1. +#define ZEND_HASH_SMALL_LOAD_FACTOR_INV 1.75 etc. - maybe makes > sense to convert them from floats to couple of integers? I'm not sure if > gcc is smart enough not to use floating point here, which would probably > be slower than int *7/4. My idea here was that *7 could lead to overflow on integer values, and division can't be made first. Way out is casting to int64 or float. And I found specifying one float more human-readable, than 2 integers in relation. > 2. Would it be possible to clean up the patch? There are tons of diffs > like this: > - HASH_UNPROTECT_RECURSION(ht1); > - HASH_UNPROTECT_RECURSION(ht2); > + HASH_UNPROTECT_RECURSION(ht1); > + HASH_UNPROTECT_RECURSION(ht2); > which make no sense and just make it harder to understand what's going > on. Possibly they're some whitespace changes. I'll create bug report, and attach patches with removed such "changes". > > 3. > - ulong h; /* Used for numeric indexing */ > + ulong h; > > why? It should be back, this comment still remains valid, looks like it wasn't at some time of evolution of patch. :) AFAIR it was removed, when it turned out, that with open addressing we can't use subsequent buckets like we used to do with numeric indexes. Then I've changed that by making additional hash->bucket index macro ZEND_HASH_BUCKET that prevents clustering. > > 4. zend_inline_hash_func - could you describe why you changed it? In any > case, it needs some comments, if you deleted the old one, please add > description of the new one any why it is better. I'll provide some comments in patch. Now I can tell that the main idea was to handle long keys (>= 8 bytes long) in new way: take sizeof(ulong) bytes at once, hash them into hash variable, and to add remaining bytes, just take last sizeof(ulong) bytes. This makes much less instructions to execute, yet should keep hash function collision properties. Also, I don't care (why should I?) that if key length isn't multiple of sizeof(ulong) I'll count some bytes twice. I would like to take such approach when nKeyLength < sizeof(ulong), which speeds hashing function a lot for short (most common) keys, but that needs to do some byte-masking and assumes that key's starting address is aligned to sizeof(ulong), otherwise it could read bytes out of page and lead to segmentation fault. Could we take such assumption, maybe just on specified architectures, to speed it up more? > > Will follow up after reading the patch more in depth. Thank you for your comments.