Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:74307 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79860 invoked from network); 17 May 2014 20:38:00 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 May 2014 20:38:00 -0000 Authentication-Results: pb1.pair.com smtp.mail=anatol.php@belski.net; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=anatol.php@belski.net; sender-id=unknown Received-SPF: error (pb1.pair.com: domain belski.net from 85.214.73.107 cause and error) X-PHP-List-Original-Sender: anatol.php@belski.net X-Host-Fingerprint: 85.214.73.107 klapt.com Received: from [85.214.73.107] ([85.214.73.107:52138] helo=klapt.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B2/A5-53190-6A8C7735 for ; Sat, 17 May 2014 16:37:59 -0400 Received: by klapt.com (Postfix, from userid 33) id 3876323D6106; Sat, 17 May 2014 22:37:56 +0200 (CEST) Received: from 88.64.184.125 (SquirrelMail authenticated user anatol@belski.net) by webmail.klapt.com with HTTP; Sat, 17 May 2014 22:37:56 +0200 Message-ID: In-Reply-To: References: Date: Sat, 17 May 2014 22:37:56 +0200 To: "Dmitry Stogov" Cc: "Nikita Popov" , "PHP internals" User-Agent: SquirrelMail/1.5.2 [SVN] MIME-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] Rethinking 64bit sizes and PHP-NG From: anatol.php@belski.net ("Anatol Belski") Hi, On Sat, May 17, 2014 22:12, Dmitry Stogov wrote: > Hi Nikita, > > > Thanks for this deep analyzes. > I hope, this will lead us to agreement. > > > I see no problems with accepting IS_LONG part of the patch. > Then we may think about string length. > As you pointed, it makes no sense to use size_t everywhere, and may be > using it only in "right" places won't make serious harm. > > Thanks. Dmitry. > > > > > On Sat, May 17, 2014 at 10:27 PM, Nikita Popov > wrote: > > >> Hi internals! >> >> >> The discussions around the patch for improved 64bit support have >> deteriorated from a semi-reasonable discussions to pointless accusations >> and name-calling. I'd like to get back to discussing this issue from a >> technical point of view and see which parts of the patch can be >> salvaged and which can not. >> >> As the current voting results show, the 64bit changes, if accepted, >> will be integrated into phpng rather than into master. As such I will >> argue purely from a phpng perspective. >> >> Before going any further, it should be established that two aspects of >> the 64 bit patch are, as far as I can see, acceptable to everyone: The >> introduction of 64bit integers on Win64 and other LLP64 architectures >> and the use of an unsigned integer type for lengths and sizes. >> >> The disputed aspect of the patch is the use of 64 bit lengths and >> sizes. It has been argued that the use of 64bit sizes will significantly >> increase memory consumption of the phpng branch and may also negatively >> impact performance. >> >> If you take a look at an early patch for porting the 64bit changes for >> phpng [1] you'll find that these are very real concerns, as the changes >> increase the sizes of many commonly used data structures. >> >> However, this is only the case if 64bit sizes are blindly used in every >> possible place. In the following I'll argue how a more careful choice >> of the used size type in key places can make this patch a lot more >> realistic. >> >> First of all it should be noted that the patch introduces size_t usage >> in several places where, as far as I can see, it is not necessary. For >> example it changes the storage type of line numbers from zend_uint to >> zend_size_t: >> >> >> From zend_opline: >> - uint lineno; >> + zend_size_t lineno; >> >> >> From zend_class_entry: >> - zend_uint line_start; >> - zend_uint line_end; >> + zend_size_t line_start; >> + zend_size_t line_end; >> >> >> I don't think we need to concern ourselves about files with more than 4 >> billion lines. >> >> Similarly the patch also changes the length of argument names and class >> typehints to 64bit: >> >> From zend_arg_info: >> - zend_uint name_len; >> + zend_size_t name_len; >> const char *class_name; - zend_uint class_name_len; >> + zend_size_t class_name_len; >> >> >> This once again seem rather unnecessary and wasteful. Another case are >> the number of arguments of a function: >> >> From zend_op_array: >> - zend_uint num_args; >> - zend_uint required_num_args; >> + zend_size_t num_args; >> + zend_size_t required_num_args; >> >> >> Again it seems doubtful whether functions with more than 4 billion >> arguments are quite relevant. >> >> While there are more examples than these, I'll not go into this point >> any further, as the pattern should be clear. Instead we'll be moving on >> to the zend_string and HashTable structures. >> >> For the HashTable structure, the diff currently looks as follows: >> >> >> - zend_uint nTableSize; >> - zend_uint nTableMask; >> - zend_uint nNumUsed; >> - zend_uint nNumOfElements; >> - long nNextFreeElement; >> + zend_uint_t nTableSize; >> + zend_uint_t nTableMask; >> + zend_uint_t nNumUsed; >> + zend_uint_t nNumOfElements; >> + zend_int_t nNextFreeElement; >> Bucket *arData; >> - zend_uint *arHash; >> + zend_uint_t *arHash; >> dtor_func_t pDestructor; zend_uint nInternalPointer; >> >> This actually misses changing nInternalPointer. If we change that one >> as well and account for differences in alignment, the total increase for >> the HashTable structure is 6*4 = 24 bytes. For a heavily used structure, >> that's a lot. However, even worse are the per-bucket memory increases: >> >> From the diff above you can already see that the arHash array will be >> twice as large, i.e. you'll get an additional 4 bytes per bucket. >> However - and >> this is currently not represented in the patch stub - changing >> hashtables to 64bit lengths will make the use of Z_NEXT(bucket->val) to >> store the next collision resolution index impossible. As such another 8 >> bytes per bucket will be needed. >> >> This brings us to a total increase of 24 bytes per hashtable and 12 >> bytes per bucket. Especially the latter seems unacceptable to me. >> >> As such, I would suggest to *not* change sizes for the hashtable >> structure and limit these changes to other parts of the engine (strings >> in particular). >> >> To put this into context, consider what it actually means to have an >> array with more than 4 billion elements (which is the point where the >> different size type becomes relevant). With hashtables as implemented in >> master with 144 bytes per bucket, this would require 2^32 * 144 = 576 GB >> of memory. If we introduced 64bit hashtable sizes in phpng, you'd still >> need 206 GB of memory for to create an array that can make use of those >> sizes. >> >> I hope this visualizes that supporting 64bit sizes for hashtables is >> not necessary. We can easily implement a size restriction in >> zend_hash_do_resize. As hashtables go through a centralized API >> controlling the size limitation is a lot more straightforward than for >> strings. >> >> Now that we have covered the HashTable changes, we're only left with >> one significant change. The diff for the zend_string structure is as >> follows: >> >> >> zend_refcounted gc; - zend_ulong h; /* hash >> value */ - int len; >> + zend_uint_t h; /* hash value */ >> + zend_size_t len; >> char val[1]; >> >> This means that a zend_string will on average be approximately 4 bytes >> longer. (Approximately because the exact difference is subject to >> details of the allocation process and depends on the string length >> distribution.) >> >> I think that this difference might be acceptable, if this is the cost >> of uniform string size usage in the codebase. I tested the impact that >> adding 4 bytes on the string structure has on one application and the >> result was a change from 38.2 to 38.5 MiB, which translates to a 0.7% >> difference. That difference is acceptable to me. >> >> However I don't know how representative that result is and I don't >> currently have time to set up a testing environment for something like >> wordpress on a 64bit vm. I would appreciate it if somebody who already >> has such a setup could run a test for this, to confirm that there is >> indeed no large difference in memory consumption. Just add a "int >> unused" after the len member and compare memory usage. >> >> So, what I'm suggesting here is the following: >> >> >> We implement 64bit integers on LLP64, we implement unsigned sizes, we >> implement size_t for strings lengths. I honestly still don't fully >> understand the necessity for the latter, but if the impact in phpng is >> indeed low (in the 1% range), then I think I can compromise on that. >> >> What we don't implement is indiscriminate usage of size_t or other >> 64bit >> types all over the place (like for linenos, argument counts, argument >> lengths etc) and usage of 64bit sizes for hashtables. >> >> This would be acceptable for me and I hope that it would also be >> acceptable for others. >> >> While we're at it, I would also like to quickly touch on the topic of >> naming and renaming. I hope it is clear now that we need both 64bit >> types for some things and 32bit types for others (like HT sizes, >> linenos, etc). As such I would suggest to use zend_(u)long_t as the >> 64bit type (on 64bit >> platforms of course) and zend_(u)int_t as the 32bit type. This more or >> less matches our current naming and hopefully avoids confusion. >> >> With that naming convention, we should also continue using IS_LONG and >> Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using >> Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially >> as they conflict with other zpp changes in phpng. I understand why these >> changes were beneficial in the original 64bit patch, however >> considering that the usage of many APIs changed in phpng and extensions >> need to be carefully reviewed anyway, I don't think these renames make a >> lot of sense anymore. >> >> So, that's my opinion on how we should proceed with the 64bit patch. I >> very much hope that we can reach some consensus about this. >> >> Credits: This mail is the result of a discussion between Anthony, Bob >> and myself. >> >> Nikita >> I really appreciate you guys have looked inside the str_size_and_int64 patch. Dmitry even has compiled and tested it. While the Nikitas notes sound pretty meaningful, let me ask you a one simple question before proceeding with any details - how much time you personally would spend on the integration? Thanks Anatol