Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:74284 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 43283 invoked from network); 17 May 2014 18:27:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 May 2014 18:27:06 -0000 Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.219.44 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.219.44 mail-oa0-f44.google.com Received: from [209.85.219.44] ([209.85.219.44:36266] helo=mail-oa0-f44.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 02/00-43080-9F9A7735 for ; Sat, 17 May 2014 14:27:05 -0400 Received: by mail-oa0-f44.google.com with SMTP id o6so4511923oag.31 for ; Sat, 17 May 2014 11:27:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=u/y43qkGfF5HQdfRpbCcLttC+i1HVGjrlcIEee2S0HU=; b=s0+1TJa79X+oTygZ1lZT8m8jyjIa7ugGhTfOBv5C0suATeSNZJO8AcW8SiQhYIkaYN m/RcsdaH9GFf1wNv4aUxfpsM0z04Nfsz4sG3/e2s5K2XZXAHzXBLkrX65RGi18LupbdS hGpDplLWYOGh2mdbVnZ70szl9VjG1qA2/D9fHx+GJKjWemS/m3kZznQXpZeHnaEQFDCz Gf2ngwpV6A50HCsTVeglX5stI50pe2ZiH0/ORRVpZturjULGS3+B5lGdATTEROepOuxH LQSEXI9Z9oFBCx1XbF2jvWkPgtRknsCNL4zRKqbvWmSJycSYwe7fdlecxAp59qwKrRHK ivMg== MIME-Version: 1.0 X-Received: by 10.182.40.201 with SMTP id z9mr25129158obk.45.1400351222786; Sat, 17 May 2014 11:27:02 -0700 (PDT) Received: by 10.182.165.69 with HTTP; Sat, 17 May 2014 11:27:02 -0700 (PDT) Date: Sat, 17 May 2014 20:27:02 +0200 Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary=001a11c33b0231f06c04f99cad0a Subject: Rethinking 64bit sizes and PHP-NG From: nikita.ppv@gmail.com (Nikita Popov) --001a11c33b0231f06c04f99cad0a Content-Type: text/plain; charset=UTF-8 Hi internals! The discussions around the patch for improved 64bit support have deteriorated from a semi-reasonable discussions to pointless accusations and name-calling. I'd like to get back to discussing this issue from a technical point of view and see which parts of the patch can be salvaged and which can not. As the current voting results show, the 64bit changes, if accepted, will be integrated into phpng rather than into master. As such I will argue purely from a phpng perspective. Before going any further, it should be established that two aspects of the 64 bit patch are, as far as I can see, acceptable to everyone: The introduction of 64bit integers on Win64 and other LLP64 architectures and the use of an unsigned integer type for lengths and sizes. The disputed aspect of the patch is the use of 64 bit lengths and sizes. It has been argued that the use of 64bit sizes will significantly increase memory consumption of the phpng branch and may also negatively impact performance. If you take a look at an early patch for porting the 64bit changes for phpng [1] you'll find that these are very real concerns, as the changes increase the sizes of many commonly used data structures. However, this is only the case if 64bit sizes are blindly used in every possible place. In the following I'll argue how a more careful choice of the used size type in key places can make this patch a lot more realistic. First of all it should be noted that the patch introduces size_t usage in several places where, as far as I can see, it is not necessary. For example it changes the storage type of line numbers from zend_uint to zend_size_t: From zend_opline: - uint lineno; + zend_size_t lineno; From zend_class_entry: - zend_uint line_start; - zend_uint line_end; + zend_size_t line_start; + zend_size_t line_end; I don't think we need to concern ourselves about files with more than 4 billion lines. Similarly the patch also changes the length of argument names and class typehints to 64bit: From zend_arg_info: - zend_uint name_len; + zend_size_t name_len; const char *class_name; - zend_uint class_name_len; + zend_size_t class_name_len; This once again seem rather unnecessary and wasteful. Another case are the number of arguments of a function: From zend_op_array: - zend_uint num_args; - zend_uint required_num_args; + zend_size_t num_args; + zend_size_t required_num_args; Again it seems doubtful whether functions with more than 4 billion arguments are quite relevant. While there are more examples than these, I'll not go into this point any further, as the pattern should be clear. Instead we'll be moving on to the zend_string and HashTable structures. For the HashTable structure, the diff currently looks as follows: - zend_uint nTableSize; - zend_uint nTableMask; - zend_uint nNumUsed; - zend_uint nNumOfElements; - long nNextFreeElement; + zend_uint_t nTableSize; + zend_uint_t nTableMask; + zend_uint_t nNumUsed; + zend_uint_t nNumOfElements; + zend_int_t nNextFreeElement; Bucket *arData; - zend_uint *arHash; + zend_uint_t *arHash; dtor_func_t pDestructor; zend_uint nInternalPointer; This actually misses changing nInternalPointer. If we change that one as well and account for differences in alignment, the total increase for the HashTable structure is 6*4 = 24 bytes. For a heavily used structure, that's a lot. However, even worse are the per-bucket memory increases: From the diff above you can already see that the arHash array will be twice as large, i.e. you'll get an additional 4 bytes per bucket. However - and this is currently not represented in the patch stub - changing hashtables to 64bit lengths will make the use of Z_NEXT(bucket->val) to store the next collision resolution index impossible. As such another 8 bytes per bucket will be needed. This brings us to a total increase of 24 bytes per hashtable and 12 bytes per bucket. Especially the latter seems unacceptable to me. As such, I would suggest to *not* change sizes for the hashtable structure and limit these changes to other parts of the engine (strings in particular). To put this into context, consider what it actually means to have an array with more than 4 billion elements (which is the point where the different size type becomes relevant). With hashtables as implemented in master with 144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If we introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of memory for to create an array that can make use of those sizes. I hope this visualizes that supporting 64bit sizes for hashtables is not necessary. We can easily implement a size restriction in zend_hash_do_resize. As hashtables go through a centralized API controlling the size limitation is a lot more straightforward than for strings. Now that we have covered the HashTable changes, we're only left with one significant change. The diff for the zend_string structure is as follows: zend_refcounted gc; - zend_ulong h; /* hash value */ - int len; + zend_uint_t h; /* hash value */ + zend_size_t len; char val[1]; This means that a zend_string will on average be approximately 4 bytes longer. (Approximately because the exact difference is subject to details of the allocation process and depends on the string length distribution.) I think that this difference might be acceptable, if this is the cost of uniform string size usage in the codebase. I tested the impact that adding 4 bytes on the string structure has on one application and the result was a change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That difference is acceptable to me. However I don't know how representative that result is and I don't currently have time to set up a testing environment for something like wordpress on a 64bit vm. I would appreciate it if somebody who already has such a setup could run a test for this, to confirm that there is indeed no large difference in memory consumption. Just add a "int unused" after the len member and compare memory usage. So, what I'm suggesting here is the following: We implement 64bit integers on LLP64, we implement unsigned sizes, we implement size_t for strings lengths. I honestly still don't fully understand the necessity for the latter, but if the impact in phpng is indeed low (in the 1% range), then I think I can compromise on that. What we don't implement is indiscriminate usage of size_t or other 64bit types all over the place (like for linenos, argument counts, argument lengths etc) and usage of 64bit sizes for hashtables. This would be acceptable for me and I hope that it would also be acceptable for others. While we're at it, I would also like to quickly touch on the topic of naming and renaming. I hope it is clear now that we need both 64bit types for some things and 32bit types for others (like HT sizes, linenos, etc). As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit platforms of course) and zend_(u)int_t as the 32bit type. This more or less matches our current naming and hopefully avoids confusion. With that naming convention, we should also continue using IS_LONG and Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially as they conflict with other zpp changes in phpng. I understand why these changes were beneficial in the original 64bit patch, however considering that the usage of many APIs changed in phpng and extensions need to be carefully reviewed anyway, I don't think these renames make a lot of sense anymore. So, that's my opinion on how we should proceed with the 64bit patch. I very much hope that we can reach some consensus about this. Credits: This mail is the result of a discussion between Anthony, Bob and myself. Nikita [1]: https://gist.github.com/weltling/a941d8cf6c731640b51f --001a11c33b0231f06c04f99cad0a--