Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:74310 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 84277 invoked from network); 17 May 2014 20:43:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 May 2014 20:43:51 -0000 Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain zend.com from 209.85.220.176 cause and error) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 209.85.220.176 mail-vc0-f176.google.com Received: from [209.85.220.176] ([209.85.220.176:49030] helo=mail-vc0-f176.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id CC/A6-53190-60AC7735 for ; Sat, 17 May 2014 16:43:50 -0400 Received: by mail-vc0-f176.google.com with SMTP id lg15so7801631vcb.21 for ; Sat, 17 May 2014 13:43:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=Zn+U76VBpwif7p/CkrBvyJWbWHR3kV78DRfDv1ONRHc=; b=IZzFyAFhNygtKKJ3GhT8xxIN4aoGEUnChGOKMBrGY5YmD8x90juVcySNsGe+TNAEMq mD41WWcqfO1E4UhWfWFU92X6/aPbG/4gQY0kYAB+nzlmCfQ/K+TsEjQsvfBB7ddWcZ0B bAMqi5eNHRrWMw9G0yxknWyKU/UJC6eorjwELEwjeaqdPnFvDcvbfLa0p3/Z7WYaVu64 jwjHyU9UjXK/mypH9Dr0VvS0p3M94wQQxaMttCXa6EN9eYYqrz57pT2ff6E6letxwrMg IbdO52H9sxi0jttpgyBzXucTVZKH1X51Cz7gtGoIzDJplh0Ir0vQlE+oCq/hN7g1OmLn uqzw== X-Gm-Message-State: ALoCoQnG850YD4jGf1o6cKPDbr2pi73ArEYiVrUfW7pFK0amW158KuB1UqOxtRnAvMkACKAIHRZxAydoPlot6aGJs9Dd9ZWXVJ5LIpZq59sbUjfm1e8W/263hIrny8pmQOg4cMUjBuvk MIME-Version: 1.0 X-Received: by 10.58.143.13 with SMTP id sa13mr1976102veb.44.1400359427828; Sat, 17 May 2014 13:43:47 -0700 (PDT) Received: by 10.52.111.71 with HTTP; Sat, 17 May 2014 13:43:47 -0700 (PDT) In-Reply-To: References: Date: Sun, 18 May 2014 00:43:47 +0400 Message-ID: To: Anatol Belski Cc: Nikita Popov , PHP internals Content-Type: multipart/alternative; boundary=047d7b67778641049404f99e9615 Subject: Re: [PHP-DEV] Rethinking 64bit sizes and PHP-NG From: dmitry@zend.com (Dmitry Stogov) --047d7b67778641049404f99e9615 Content-Type: text/plain; charset=UTF-8 I may take a part of the work. It's not a problem. Thanks. Dmitry. On Sun, May 18, 2014 at 12:37 AM, Anatol Belski wrote: > Hi, > > On Sat, May 17, 2014 22:12, Dmitry Stogov wrote: > > Hi Nikita, > > > > > > Thanks for this deep analyzes. > > I hope, this will lead us to agreement. > > > > > > I see no problems with accepting IS_LONG part of the patch. > > Then we may think about string length. > > As you pointed, it makes no sense to use size_t everywhere, and may be > > using it only in "right" places won't make serious harm. > > > > Thanks. Dmitry. > > > > > > > > > > On Sat, May 17, 2014 at 10:27 PM, Nikita Popov > > wrote: > > > > > >> Hi internals! > >> > >> > >> The discussions around the patch for improved 64bit support have > >> deteriorated from a semi-reasonable discussions to pointless accusations > >> and name-calling. I'd like to get back to discussing this issue from a > >> technical point of view and see which parts of the patch can be > >> salvaged and which can not. > >> > >> As the current voting results show, the 64bit changes, if accepted, > >> will be integrated into phpng rather than into master. As such I will > >> argue purely from a phpng perspective. > >> > >> Before going any further, it should be established that two aspects of > >> the 64 bit patch are, as far as I can see, acceptable to everyone: The > >> introduction of 64bit integers on Win64 and other LLP64 architectures > >> and the use of an unsigned integer type for lengths and sizes. > >> > >> The disputed aspect of the patch is the use of 64 bit lengths and > >> sizes. It has been argued that the use of 64bit sizes will significantly > >> increase memory consumption of the phpng branch and may also negatively > >> impact performance. > >> > >> If you take a look at an early patch for porting the 64bit changes for > >> phpng [1] you'll find that these are very real concerns, as the changes > >> increase the sizes of many commonly used data structures. > >> > >> However, this is only the case if 64bit sizes are blindly used in every > >> possible place. In the following I'll argue how a more careful choice > >> of the used size type in key places can make this patch a lot more > >> realistic. > >> > >> First of all it should be noted that the patch introduces size_t usage > >> in several places where, as far as I can see, it is not necessary. For > >> example it changes the storage type of line numbers from zend_uint to > >> zend_size_t: > >> > >> > >> From zend_opline: > >> - uint lineno; > >> + zend_size_t lineno; > >> > >> > >> From zend_class_entry: > >> - zend_uint line_start; > >> - zend_uint line_end; > >> + zend_size_t line_start; > >> + zend_size_t line_end; > >> > >> > >> I don't think we need to concern ourselves about files with more than 4 > >> billion lines. > >> > >> Similarly the patch also changes the length of argument names and class > >> typehints to 64bit: > >> > >> From zend_arg_info: > >> - zend_uint name_len; > >> + zend_size_t name_len; > >> const char *class_name; - zend_uint class_name_len; > >> + zend_size_t class_name_len; > >> > >> > >> This once again seem rather unnecessary and wasteful. Another case are > >> the number of arguments of a function: > >> > >> From zend_op_array: > >> - zend_uint num_args; > >> - zend_uint required_num_args; > >> + zend_size_t num_args; > >> + zend_size_t required_num_args; > >> > >> > >> Again it seems doubtful whether functions with more than 4 billion > >> arguments are quite relevant. > >> > >> While there are more examples than these, I'll not go into this point > >> any further, as the pattern should be clear. Instead we'll be moving on > >> to the zend_string and HashTable structures. > >> > >> For the HashTable structure, the diff currently looks as follows: > >> > >> > >> - zend_uint nTableSize; > >> - zend_uint nTableMask; > >> - zend_uint nNumUsed; > >> - zend_uint nNumOfElements; > >> - long nNextFreeElement; > >> + zend_uint_t nTableSize; > >> + zend_uint_t nTableMask; > >> + zend_uint_t nNumUsed; > >> + zend_uint_t nNumOfElements; > >> + zend_int_t nNextFreeElement; > >> Bucket *arData; > >> - zend_uint *arHash; > >> + zend_uint_t *arHash; > >> dtor_func_t pDestructor; zend_uint nInternalPointer; > >> > >> This actually misses changing nInternalPointer. If we change that one > >> as well and account for differences in alignment, the total increase for > >> the HashTable structure is 6*4 = 24 bytes. For a heavily used structure, > >> that's a lot. However, even worse are the per-bucket memory increases: > >> > >> From the diff above you can already see that the arHash array will be > >> twice as large, i.e. you'll get an additional 4 bytes per bucket. > >> However - and > >> this is currently not represented in the patch stub - changing > >> hashtables to 64bit lengths will make the use of Z_NEXT(bucket->val) to > >> store the next collision resolution index impossible. As such another 8 > >> bytes per bucket will be needed. > >> > >> This brings us to a total increase of 24 bytes per hashtable and 12 > >> bytes per bucket. Especially the latter seems unacceptable to me. > >> > >> As such, I would suggest to *not* change sizes for the hashtable > >> structure and limit these changes to other parts of the engine (strings > >> in particular). > >> > >> To put this into context, consider what it actually means to have an > >> array with more than 4 billion elements (which is the point where the > >> different size type becomes relevant). With hashtables as implemented in > >> master with 144 bytes per bucket, this would require 2^32 * 144 = 576 GB > >> of memory. If we introduced 64bit hashtable sizes in phpng, you'd still > >> need 206 GB of memory for to create an array that can make use of those > >> sizes. > >> > >> I hope this visualizes that supporting 64bit sizes for hashtables is > >> not necessary. We can easily implement a size restriction in > >> zend_hash_do_resize. As hashtables go through a centralized API > >> controlling the size limitation is a lot more straightforward than for > >> strings. > >> > >> Now that we have covered the HashTable changes, we're only left with > >> one significant change. The diff for the zend_string structure is as > >> follows: > >> > >> > >> zend_refcounted gc; - zend_ulong h; /* hash > >> value */ - int len; > >> + zend_uint_t h; /* hash value */ > >> + zend_size_t len; > >> char val[1]; > >> > >> This means that a zend_string will on average be approximately 4 bytes > >> longer. (Approximately because the exact difference is subject to > >> details of the allocation process and depends on the string length > >> distribution.) > >> > >> I think that this difference might be acceptable, if this is the cost > >> of uniform string size usage in the codebase. I tested the impact that > >> adding 4 bytes on the string structure has on one application and the > >> result was a change from 38.2 to 38.5 MiB, which translates to a 0.7% > >> difference. That difference is acceptable to me. > >> > >> However I don't know how representative that result is and I don't > >> currently have time to set up a testing environment for something like > >> wordpress on a 64bit vm. I would appreciate it if somebody who already > >> has such a setup could run a test for this, to confirm that there is > >> indeed no large difference in memory consumption. Just add a "int > >> unused" after the len member and compare memory usage. > >> > >> So, what I'm suggesting here is the following: > >> > >> > >> We implement 64bit integers on LLP64, we implement unsigned sizes, we > >> implement size_t for strings lengths. I honestly still don't fully > >> understand the necessity for the latter, but if the impact in phpng is > >> indeed low (in the 1% range), then I think I can compromise on that. > >> > >> What we don't implement is indiscriminate usage of size_t or other > >> 64bit > >> types all over the place (like for linenos, argument counts, argument > >> lengths etc) and usage of 64bit sizes for hashtables. > >> > >> This would be acceptable for me and I hope that it would also be > >> acceptable for others. > >> > >> While we're at it, I would also like to quickly touch on the topic of > >> naming and renaming. I hope it is clear now that we need both 64bit > >> types for some things and 32bit types for others (like HT sizes, > >> linenos, etc). As such I would suggest to use zend_(u)long_t as the > >> 64bit type (on 64bit > >> platforms of course) and zend_(u)int_t as the 32bit type. This more or > >> less matches our current naming and hopefully avoids confusion. > >> > >> With that naming convention, we should also continue using IS_LONG and > >> Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using > >> Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially > >> as they conflict with other zpp changes in phpng. I understand why these > >> changes were beneficial in the original 64bit patch, however > >> considering that the usage of many APIs changed in phpng and extensions > >> need to be carefully reviewed anyway, I don't think these renames make a > >> lot of sense anymore. > >> > >> So, that's my opinion on how we should proceed with the 64bit patch. I > >> very much hope that we can reach some consensus about this. > >> > >> Credits: This mail is the result of a discussion between Anthony, Bob > >> and myself. > >> > >> Nikita > >> > I really appreciate you guys have looked inside the str_size_and_int64 > patch. Dmitry even has compiled and tested it. > > While the Nikitas notes sound pretty meaningful, let me ask you a one > simple question before proceeding with any details - how much time you > personally would spend on the integration? > > Thanks > > Anatol > > --047d7b67778641049404f99e9615--