Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:41851 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 78063 invoked from network); 11 Nov 2008 06:36:26 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Nov 2008 06:36:26 -0000 Authentication-Results: pb1.pair.com smtp.mail=ron@Opus1.COM; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ron@Opus1.COM; sender-id=unknown Received-SPF: pass (pb1.pair.com: domain Opus1.COM designates 192.245.12.8 as permitted sender) X-PHP-List-Original-Sender: ron@Opus1.COM X-Host-Fingerprint: 192.245.12.8 Viola.Opus1.COM OpenVMS 7.2 (Multinet 4.3-4.4 stack) Received: from [192.245.12.8] ([192.245.12.8:4198] helo=Viola.Opus1.COM) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 02/70-07308-8E729194 for ; Tue, 11 Nov 2008 01:36:26 -0500 Received: from [192.168.1.2] ([76.105.139.204]) by Opus1.COM (PMDF V6.2-X27 #9830) with ESMTPSA id <01N1S5DOTNOO8Y7JZW@Opus1.COM> for internals@lists.php.net; Mon, 10 Nov 2008 23:36:20 -0700 (MST) Date: Mon, 10 Nov 2008 22:36:36 -0800 In-reply-to: To: shire Cc: Stan Vassilev | FM , internals@lists.php.net Message-ID: MIME-version: 1.0 X-Mailer: Apple Mail (2.753.1) Content-type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Content-transfer-encoding: 7bit X-Priority: 3 X-Gpgmail-State: !signed References: <7e270cea0810191211w2cb77075y5e0ad78c2f7306f7@mail.gmail.com> <243C7392-C6A0-4EE8-8AAA-34964E78D453@tekrat.com> <4022974B-4A13-4AB0-A282-973503200406@tekrat.com> <38820B1D-6D5F-4B42-9FEE-0CDF23142A57@tekrat.com> <140FB8F69197441FAB5F45A64FDA946E@pc> <753B18C7-AC2F-4C96-B9CF-9FF5A042B96C@tekrat.com> Subject: Re: [PHP-DEV] An optimization idea From: ron@Opus1.COM (Ronald Chmara) On Nov 10, 2008, at 5:24 PM, shire wrote: >>> >>> It sounds like this would only work if the array contents where >>> static though, as you're mapping a constant string to the >>> contents of the hash (or did I misunderstand and you'd be >>> mapping string const. values to hash IDs?). >> My point is, replacing this process: >> $a['foo'] or $a->foo -> compute hash of 'foo' -> find item for >> hash 'foo' -> many items? -> resolve conflict -> return item >> With this process: >> $a[% string_literal_id_5 %] -> lookup item key 5 for array -> >> return item >> Notice we skipped hash generation, and conflict resolution >> altogether. We only have the lookup for the integer id. >> If some additional work is done, even this lookup can be >> eliminated and make this an O(1) process. >> If instead the coder used variable: >> $a[$bar] or $a->$foo (var array lookup and var var object lookup), >> then this optimization can't kick in, and the existing algorithm >> will be used. >> However "static" access is the predominant usage, especially for >> objects, but also for arrays, so this should have significant impact. > > Thanks for the clarification, this is pretty much the same idea as > what I've been interested in working on next. I think I was more > inclined to store an extra hash value within the zvals themselves, > with the hope that this could be expanded to non-constant values. > I believe ruby implements it's lookups this way (noted just for > reference, not as an argument to copy another language ;-) ). Any > thoughts on reasons not to do this (other than increasing the size > of zval struct), it's pretty simple to implement this for static > values I believe, dynamic values are a lot more difficult obviously... Since nobody else has chimed in with the obvious (to me, anyways): I've worked with some code that uses disgustingly huge (>512Mb) arrays, largest implementation was a single 2.5 Gb array (before we took the offending programmer into a room and had a... chat). I'd be interested in seeing some metrics on the needed extra CPU ticks for determining if an array (or array sub-element) is static or dynamic under the scheme, as well as the extra memory for storing an (many?) extra value(s). It sounds like it might be totally livable, if done right... done wrong, we could be looking at millions of CPU hits for checking millions of single element static arrays.... (and yes, storing millions of values as single element arrays is "doing it wrong", but I've learned not to underestimate the creativity of people who write software). Oh, and while we're at it, what about "re-assigning" "static" arrays? The idea sounds good, the corner-cases on mis-implementations are where it always becomes amusing. -Bop