Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:74287 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 48054 invoked from network); 17 May 2014 18:46:27 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 May 2014 18:46:27 -0000 Authentication-Results: pb1.pair.com smtp.mail=zeev@zend.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=zeev@zend.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain zend.com from 209.85.128.170 cause and error) X-PHP-List-Original-Sender: zeev@zend.com X-Host-Fingerprint: 209.85.128.170 mail-ve0-f170.google.com Received: from [209.85.128.170] ([209.85.128.170:58486] helo=mail-ve0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 40/01-43080-28EA7735 for ; Sat, 17 May 2014 14:46:26 -0400 Received: by mail-ve0-f170.google.com with SMTP id db11so4773824veb.1 for ; Sat, 17 May 2014 11:46:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc:content-type; bh=4MxL53EaDiY09lJRjL7L2TbTgEEOzlbPcm3jU/ACFtI=; b=cf8YRqgQKbf58v0DWa9RF02gWKeMCSd9BILMnjCElp/41PJSjuyJZBo0YLds8Vwvc7 1nCvkBgzNqtqx9MDhJe5hvkuyxujq6krlEAGSBeLJEgLs0egAws9Amq8TEsDkAg0g4Y/ BQYqSAOJ3PW46DRrG3Jh+p8/D6ASb3ptGhtt2dlo0Y/ECnAuch13O6mI8WuqMOHVtLIy Yuv++y/QLRZ+D0G+O4UgKc+f18VR5W0v/zY/mW/ZHqlnOQDQjFbRAlwmtUg+8vRSwLFa EzAPgemkXOK4YXE4Xtvck0avXB0YDZbw4z1mY7vGnqw0/xeGsMKBTUdjoFwXtVL1qaso /8oA== X-Gm-Message-State: ALoCoQlcNmp0eTTXNoSS4+n/fd9AmiRADAubFgSaJUMZwcvr27RY7t8T+790u7mtbb1ud7mcTnto3c0KSuCQFUt8dtEOBCTW2nyTAnkrrMLct2ljr2MCIQdh14FKxHQLynUFCOj+Gw+1 X-Received: by 10.220.10.2 with SMTP id n2mr4095640vcn.26.1400352383282; Sat, 17 May 2014 11:46:23 -0700 (PDT) References: In-Reply-To: MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQKrV4i1v1hCPJhAXeBdTCLU3GH8uZmNUMqw Date: Sat, 17 May 2014 21:46:21 +0300 Message-ID: <5335468dcf589330012ffbfc44fb68f3@mail.gmail.com> To: Nikita Popov Cc: PHP internals Content-Type: text/plain; charset=UTF-8 Subject: RE: [PHP-DEV] Rethinking 64bit sizes and PHP-NG From: zeev@zend.com (Zeev Suraski) Nikita, I think your email is spot on, except for one thing. Thankfully, there haven't any accusations or name calling - at least none that I've seen. And that's a good thing, even when we have a heated debate. I'll vote Yes on an RFC-translated version of this email in a heartbeat. Zeev > -----Original Message----- > From: Nikita Popov [mailto:nikita.ppv@gmail.com] > Sent: Saturday, May 17, 2014 9:27 PM > To: PHP internals > Subject: [PHP-DEV] Rethinking 64bit sizes and PHP-NG > > Hi internals! > > The discussions around the patch for improved 64bit support have > deteriorated > from a semi-reasonable discussions to pointless accusations and > name-calling. > I'd like to get back to discussing this issue from a technical point of > view and see > which parts of the patch can be salvaged and which can not. > > As the current voting results show, the 64bit changes, if accepted, will > be > integrated into phpng rather than into master. As such I will argue purely > from a > phpng perspective. > > Before going any further, it should be established that two aspects of the > 64 bit patch are, as far as I can see, acceptable to everyone: The > introduction of > 64bit integers on Win64 and other LLP64 architectures and the use of an > unsigned integer type for lengths and sizes. > > The disputed aspect of the patch is the use of 64 bit lengths and sizes. > It has > been argued that the use of 64bit sizes will significantly increase memory > consumption of the phpng branch and may also negatively impact > performance. > > If you take a look at an early patch for porting the 64bit changes for > phpng [1] > you'll find that these are very real concerns, as the changes increase the > sizes of > many commonly used data structures. > > However, this is only the case if 64bit sizes are blindly used in every > possible > place. In the following I'll argue how a more careful choice of the used > size type > in key places can make this patch a lot more realistic. > > First of all it should be noted that the patch introduces size_t usage in > several > places where, as far as I can see, it is not necessary. For example it > changes the > storage type of line numbers from zend_uint to zend_size_t: > > From zend_opline: > - uint lineno; > + zend_size_t lineno; > > From zend_class_entry: > - zend_uint line_start; > - zend_uint line_end; > + zend_size_t line_start; > + zend_size_t line_end; > > I don't think we need to concern ourselves about files with more than 4 > billion > lines. > > Similarly the patch also changes the length of argument names and class > typehints to 64bit: > > From zend_arg_info: > - zend_uint name_len; > + zend_size_t name_len; > const char *class_name; > - zend_uint class_name_len; > + zend_size_t class_name_len; > > This once again seem rather unnecessary and wasteful. Another case are the > number of arguments of a function: > > From zend_op_array: > - zend_uint num_args; > - zend_uint required_num_args; > + zend_size_t num_args; > + zend_size_t required_num_args; > > Again it seems doubtful whether functions with more than 4 billion > arguments > are quite relevant. > > While there are more examples than these, I'll not go into this point any > further, > as the pattern should be clear. Instead we'll be moving on to the > zend_string and > HashTable structures. > > For the HashTable structure, the diff currently looks as follows: > > - zend_uint nTableSize; > - zend_uint nTableMask; > - zend_uint nNumUsed; > - zend_uint nNumOfElements; > - long nNextFreeElement; > + zend_uint_t nTableSize; > + zend_uint_t nTableMask; > + zend_uint_t nNumUsed; > + zend_uint_t nNumOfElements; > + zend_int_t nNextFreeElement; > Bucket *arData; > - zend_uint *arHash; > + zend_uint_t *arHash; > dtor_func_t pDestructor; > zend_uint nInternalPointer; > > This actually misses changing nInternalPointer. If we change that one as > well > and account for differences in alignment, the total increase for the > HashTable > structure is 6*4 = 24 bytes. For a heavily used structure, that's a lot. > However, > even worse are the per-bucket memory increases: > > From the diff above you can already see that the arHash array will be > twice as > large, i.e. you'll get an additional 4 bytes per bucket. However - and > this is > currently not represented in the patch stub - changing hashtables to 64bit > lengths will make the use of Z_NEXT(bucket->val) to store the next > collision > resolution index impossible. As such another 8 bytes per bucket will be > needed. > > This brings us to a total increase of 24 bytes per hashtable and 12 bytes > per > bucket. Especially the latter seems unacceptable to me. > > As such, I would suggest to *not* change sizes for the hashtable structure > and > limit these changes to other parts of the engine (strings in particular). > > To put this into context, consider what it actually means to have an array > with > more than 4 billion elements (which is the point where the different size > type > becomes relevant). With hashtables as implemented in master with > 144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If > we > introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of > memory for > to create an array that can make use of those sizes. > > I hope this visualizes that supporting 64bit sizes for hashtables is not > necessary. > We can easily implement a size restriction in zend_hash_do_resize. As > hashtables go through a centralized API controlling the size limitation is > a lot > more straightforward than for strings. > > Now that we have covered the HashTable changes, we're only left with one > significant change. The diff for the zend_string structure is as follows: > > zend_refcounted gc; > - zend_ulong h; /* hash value */ > - int len; > + zend_uint_t h; /* hash value */ > + zend_size_t len; > char val[1]; > > This means that a zend_string will on average be approximately 4 bytes > longer. > (Approximately because the exact difference is subject to details of the > allocation process and depends on the string length distribution.) > > I think that this difference might be acceptable, if this is the cost of > uniform > string size usage in the codebase. I tested the impact that adding > 4 bytes on the string structure has on one application and the result was > a > change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That > difference is acceptable to me. > > However I don't know how representative that result is and I don't > currently > have time to set up a testing environment for something like wordpress on > a > 64bit vm. I would appreciate it if somebody who already has such a setup > could > run a test for this, to confirm that there is indeed no large difference > in memory > consumption. Just add a "int unused" after the len member and compare > memory usage. > > So, what I'm suggesting here is the following: > > We implement 64bit integers on LLP64, we implement unsigned sizes, we > implement size_t for strings lengths. I honestly still don't fully > understand the > necessity for the latter, but if the impact in phpng is indeed low (in the > 1% > range), then I think I can compromise on that. > > What we don't implement is indiscriminate usage of size_t or other 64bit > types > all over the place (like for linenos, argument counts, argument lengths > etc) and > usage of 64bit sizes for hashtables. > > This would be acceptable for me and I hope that it would also be > acceptable for > others. > > While we're at it, I would also like to quickly touch on the topic of > naming and > renaming. I hope it is clear now that we need both 64bit types for some > things > and 32bit types for others (like HT sizes, linenos, etc). > As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit > platforms of course) and zend_(u)int_t as the 32bit type. This more or > less > matches our current naming and hopefully avoids confusion. > > With that naming convention, we should also continue using IS_LONG and > Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using > Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially > as > they conflict with other zpp changes in phpng. I understand why these > changes > were beneficial in the original 64bit patch, however considering that the > usage > of many APIs changed in phpng and extensions need to be carefully reviewed > anyway, I don't think these renames make a lot of sense anymore. > > So, that's my opinion on how we should proceed with the 64bit patch. I > very > much hope that we can reach some consensus about this. > > Credits: This mail is the result of a discussion between Anthony, Bob and > myself. > > Nikita > > [1]: https://gist.github.com/weltling/a941d8cf6c731640b51f