Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:25842 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 91614 invoked by uid 1010); 27 Sep 2006 12:34:17 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 91599 invoked from network); 27 Sep 2006 12:34:17 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Sep 2006 12:34:17 -0000 Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 209.142.136.132 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 209.142.136.132 msa2-mx.centurytel.net Linux 2.4/2.6 Received: from [209.142.136.132] ([209.142.136.132:38413] helo=msa2-mx.centurytel.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 56/C2-02907-7CF6A154 for ; Wed, 27 Sep 2006 08:34:17 -0400 Received: from pc1 (d35-225.rt-bras.wnvl.centurytel.net [69.179.162.225]) by msa2-mx.centurytel.net (8.13.6/8.13.6) with SMTP id k8RCYCmR012748 for ; Wed, 27 Sep 2006 07:34:12 -0500 Message-ID: <00dc01c6e231$3d091c80$0201a8c0@pc1> To: Date: Wed, 27 Sep 2006 07:34:12 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1807 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1807 Subject: array/HashTable filling optimization From: php_lists@realplain.com ("Matt W") Hi all, I noticed awhile ago how most every use of zend[_u]_hash_init has nSize as 0. Of course it isn't always known how many elements will be added to the array (nTableSize), but there are places where an accurate value could be used instead of getting the minimum default (8). For anyone who doesn't know, HashTable sizes are a power of 2, and when there are more than that many elements, zend_hash_do_resize realloc's more memory to double the table size, and then zend_hash_rehash basically goes through and "reinserts" all the elements again. I ran some tests to see the speed improvement from specifying the *actual* number of elements in hash_init, thus saving that extra work (_rehash mostly). The "n+1" is with one more element (9, 17, 33...) that triggers another rehash when the actual number wasn't specified (most extreme). (I used add_next_index_zval, so the direct zend_hash* functions would give slightly higher % I guess, being less overhead, right?) Results with 5.2.0-RC4: Elements n | n n+1 --------------------- 8 | 0% 14.2% 16 | 11.0% 20.9% 32 | 13.5% 22.1% 64 | 12.6% 21.3% 128 | 11.7% 18.5% 256 | 9.3% 16.4% 512 | 8.6% 17.0% 1024 | 7.9% 15.8% 2048 | 4.8% 33.3% 4096 | 7.8% 28.3% 8192 | 10.2% 58.4% 16384 | 24.1% 70.5% 32768 | 34.5% 80.4% 65536 | 34.8% 68.6% I haven't looked thoroughly, but the only place I've seen that uses an non-zero nSize in hash_init (besides some in the core engine) is the unserialize() function. :-/ It seems the most simple and obvious place that could be changed is in Zend/zend_variables.c:_zval_copy_ctor_func? Just replace zend[_u]_hash_init(tmp_ht, 0, ... with zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ... Other than that, some of PHP's array functions and array-returning functions (database fetch row functions, etc.) are ones I thought of that could be optimized like this. Maybe create an array_init_size() macro to be used in places where the number of elements can be easily determined? Thoughts? I'd volunteer to look for places that could be updated and modify them. :-) Thanks, Matt P.S. I guess the array stuff applies for objects also? But I don't know much about their internals...