Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:51161 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 59740 invoked from network); 31 Dec 2010 16:25:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 Dec 2010 16:25:41 -0000 Authentication-Results: pb1.pair.com header.from=marcin.babij@nasza-klasa.pl; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=marcin.babij@nasza-klasa.pl; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain nasza-klasa.pl designates 209.85.215.42 as permitted sender) X-PHP-List-Original-Sender: marcin.babij@nasza-klasa.pl X-Host-Fingerprint: 209.85.215.42 mail-ew0-f42.google.com Received: from [209.85.215.42] ([209.85.215.42:43448] helo=mail-ew0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 20/29-01063-19DFD1D4 for ; Fri, 31 Dec 2010 10:58:10 -0500 Received: by ewy1 with SMTP id 1so5142137ewy.29 for ; Fri, 31 Dec 2010 07:58:06 -0800 (PST) Received: by 10.213.29.201 with SMTP id r9mr13606212ebc.6.1293811086248; Fri, 31 Dec 2010 07:58:06 -0800 (PST) Received: from laptop (90-156-89-180.internetia.net.pl [90.156.89.180]) by mx.google.com with ESMTPS id t5sm12386161eeh.14.2010.12.31.07.58.03 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 31 Dec 2010 07:58:05 -0800 (PST) Content-Type: multipart/mixed; boundary=----------e9dywymTzrO2UdF1nReKoU To: internals@lists.php.net References: Date: Fri, 31 Dec 2010 16:58:00 +0100 MIME-Version: 1.0 Message-ID: In-Reply-To: User-Agent: Opera Mail/11.00 (Win32) Subject: Re: Zend engine's hashtable performance tweaks From: marcin.babij@nasza-klasa.pl ("Marcin Babij") ------------e9dywymTzrO2UdF1nReKoU Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Sorry for no attachments in previous message, I think my attachments weren't redirected with message by lists.php.net email confirmation system. I send them again, and for sure I attach links to public copy of them over HTTP: https://gist.github.com/761094 - php-5.3.4-hashtable-optimization.patch https://gist.github.com/761096 - apc-3.1.6-hashtable-optimization.patch > Hello, > I work for social network company, where we were running optimization > project. One of it's results is patch to Zend engine's Hashtable, which > we > want to share and ask you for comments and improvements. > > Why we do this? > We run profiling on our production servers and found out that zend_hash_* > functions take 10-20% CPU time of request. So there is some room for easy > improvements. > > What was done? > - Hash function in zend_hash.h was rebuilt and became much faster, > without > losing the most important properties. > - Hashtable implementation was changed from Simple chaining to Open > addressing with linear probing, but with linked bucket, not included in > hash array, which causes: > -- Bucket structure to lose 2 pointers. > -- Searching works similar, but don't have to jump with pointers stored > in > different memory locations, inserting, deleting and rehashing don't need > to update linked list, but must search for first empty bucket, which is > fast, because it scans continuous memory. > -- Load factor decreases from 1.0 to 0.5-0.75 to make less collisions and > faster hashtable, which in turn increases memory footprint a little. > - Open addressing doesn't change significantly performance, but next > thing > was to create new array (arEmpty), which is of size nTableSize bytes, > which keeps track of used/empty buckets and makes inserting and rehashing > much faster. In future it can be tested as bit-array with size of > nTableSize/8 bytes. > - More macros were added to replace repetitive constructs. > - New constants were added to allow: > -- Creating new hashtables of size at least X (where 4 and 8 are > reasonable), which makes no rehashing and reallocing memory while > changing > size to 2 and then to 4. > -- For small tables it's better to extend them by a factor of 4 times, > not > 2, to make rehashing cost smaller for most hashtables, of cost of little > higher memory consumption. > -- For large tables it's better to have other load factor, closer to 1, > while for small tables it's better to use load factor closer to 0.5. > - APC was patched to take changes in Bucket structure into account. > > How was it tested? > It was tested with make test, where one more (comparing to original > sources) test fails, but it's most probably because > http://bugs.php.net/bug.php?id=48858 - IMO test is badly constructed (is > too simple) and any change of hashing function makes it fail. Also it was > tested on our testing environment and production servers against >30mln > requests to our site, with 120requests/s at peak on Xeon @ 2.50GHz with > 8GB RAM running Debian Linux. > > What is the gain? > After tests CPU usage dropped by about 4% -6%. > Memory footprint goes up just by few percent. > > What can be done in future? > - Make new constants configurable by php.ini. > - Test if changing arEmpty from byte-array to bit-array helps on > performance. > - Tweak default constants' values using some real-live benchmarks. > - Prove (or modify and prove) hash function to have property, that it has > no collisions if two keys don't differ on no more than 6 bytes, which > will > lead to memcmp omit first (or last) 6 bytes of key. Also simpler thing > may > be proven, that is it has no collisions if two keys are not longer than 6 > bytes, which will make most string keys omit memcpy at all. > > The patch was created and tested against php-5.3.0, apc-3.1.3p1, then > merged with php-5.3.4, apc-3.1.6 without conflicts, and for these last > versions patches are attached. Also, it shouldn't conflict with > http://wiki.php.net/rfc/performanceimprovements . ------------e9dywymTzrO2UdF1nReKoU--