Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:51256 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 6183 invoked from network); 12 Jan 2011 00:22:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Jan 2011 00:22:51 -0000 Authentication-Results: pb1.pair.com header.from=christopher.jones@oracle.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=christopher.jones@oracle.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain oracle.com from 148.87.113.121 cause and error) X-PHP-List-Original-Sender: christopher.jones@oracle.com X-Host-Fingerprint: 148.87.113.121 rcsinet10.oracle.com Linux 2.6 Received: from [148.87.113.121] ([148.87.113.121:53281] helo=rcsinet10.oracle.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BB/A2-20658-854FC2D4 for ; Tue, 11 Jan 2011 19:22:49 -0500 Received: from rcsinet13.oracle.com (rcsinet13.oracle.com [148.87.113.125]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p0C0MiDA010447 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 12 Jan 2011 00:22:46 GMT Received: from acsmt353.oracle.com (acsmt353.oracle.com [141.146.40.153]) by rcsinet13.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p0BK3dhB029132; Wed, 12 Jan 2011 00:22:44 GMT Received: from abhmt016.oracle.com by acsmt355.oracle.com with ESMTP id 951130061294791727; Tue, 11 Jan 2011 16:22:07 -0800 Received: from [130.35.68.31] (/130.35.68.31) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 11 Jan 2011 16:22:07 -0800 Message-ID: <4D2CF42E.6020209@oracle.com> Date: Tue, 11 Jan 2011 16:22:06 -0800 User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101207 Lightning/1.0b2 Thunderbird/3.1.7 MIME-Version: 1.0 To: Marcin Babij CC: internals@lists.php.net References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Re: Zend engine's hashtable performance tweaks From: christopher.jones@oracle.com (Christopher Jones) On 12/31/2010 10:15 AM, Pierre Joye wrote: > hi, > > Thanks for the patches :) > > Can you open a bug report please (and attach the patches to it)? I'm > sure this patch will be updated a couple of times before it reaches > the repository. > > Cheers, Hi Marcin, Did you log a bug for this? Regards, Chris > > On Fri, Dec 31, 2010 at 4:58 PM, Marcin Babij > wrote: >> Sorry for no attachments in previous message, I think my attachments weren't >> redirected with message by lists.php.net email confirmation system. I send >> them again, and for sure I attach links to public copy of them over HTTP: >> https://gist.github.com/761094 - php-5.3.4-hashtable-optimization.patch >> https://gist.github.com/761096 - apc-3.1.6-hashtable-optimization.patch >> >>> Hello, >>> I work for social network company, where we were running optimization >>> project. One of it's results is patch to Zend engine's Hashtable, which we >>> want to share and ask you for comments and improvements. >>> >>> Why we do this? >>> We run profiling on our production servers and found out that zend_hash_* >>> functions take 10-20% CPU time of request. So there is some room for easy >>> improvements. >>> >>> What was done? >>> - Hash function in zend_hash.h was rebuilt and became much faster, without >>> losing the most important properties. >>> - Hashtable implementation was changed from Simple chaining to Open >>> addressing with linear probing, but with linked bucket, not included in >>> hash array, which causes: >>> -- Bucket structure to lose 2 pointers. >>> -- Searching works similar, but don't have to jump with pointers stored in >>> different memory locations, inserting, deleting and rehashing don't need >>> to update linked list, but must search for first empty bucket, which is >>> fast, because it scans continuous memory. >>> -- Load factor decreases from 1.0 to 0.5-0.75 to make less collisions and >>> faster hashtable, which in turn increases memory footprint a little. >>> - Open addressing doesn't change significantly performance, but next thing >>> was to create new array (arEmpty), which is of size nTableSize bytes, >>> which keeps track of used/empty buckets and makes inserting and rehashing >>> much faster. In future it can be tested as bit-array with size of >>> nTableSize/8 bytes. >>> - More macros were added to replace repetitive constructs. >>> - New constants were added to allow: >>> -- Creating new hashtables of size at least X (where 4 and 8 are >>> reasonable), which makes no rehashing and reallocing memory while changing >>> size to 2 and then to 4. >>> -- For small tables it's better to extend them by a factor of 4 times, not >>> 2, to make rehashing cost smaller for most hashtables, of cost of little >>> higher memory consumption. >>> -- For large tables it's better to have other load factor, closer to 1, >>> while for small tables it's better to use load factor closer to 0.5. >>> - APC was patched to take changes in Bucket structure into account. >>> >>> How was it tested? >>> It was tested with make test, where one more (comparing to original >>> sources) test fails, but it's most probably because >>> http://bugs.php.net/bug.php?id=48858 - IMO test is badly constructed (is >>> too simple) and any change of hashing function makes it fail. Also it was >>> tested on our testing environment and production servers against>30mln >>> requests to our site, with 120requests/s at peak on Xeon @ 2.50GHz with >>> 8GB RAM running Debian Linux. >>> >>> What is the gain? >>> After tests CPU usage dropped by about 4% -6%. >>> Memory footprint goes up just by few percent. >>> >>> What can be done in future? >>> - Make new constants configurable by php.ini. >>> - Test if changing arEmpty from byte-array to bit-array helps on >>> performance. >>> - Tweak default constants' values using some real-live benchmarks. >>> - Prove (or modify and prove) hash function to have property, that it has >>> no collisions if two keys don't differ on no more than 6 bytes, which will >>> lead to memcmp omit first (or last) 6 bytes of key. Also simpler thing may >>> be proven, that is it has no collisions if two keys are not longer than 6 >>> bytes, which will make most string keys omit memcpy at all. >>> >>> The patch was created and tested against php-5.3.0, apc-3.1.3p1, then >>> merged with php-5.3.4, apc-3.1.6 without conflicts, and for these last >>> versions patches are attached. Also, it shouldn't conflict with >>> http://wiki.php.net/rfc/performanceimprovements . >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> > > > -- Email: christopher.jones@oracle.com Tel: +1 650 506 8630 Blog: http://blogs.oracle.com/opal/