Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:74298 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 67156 invoked from network); 17 May 2014 20:12:55 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 May 2014 20:12:55 -0000 Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain zend.com from 209.85.220.178 cause and error) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 209.85.220.178 mail-vc0-f178.google.com Received: from [209.85.220.178] ([209.85.220.178:39770] helo=mail-vc0-f178.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3D/B2-53190-6C2C7735 for ; Sat, 17 May 2014 16:12:54 -0400 Received: by mail-vc0-f178.google.com with SMTP id hq16so7993528vcb.9 for ; Sat, 17 May 2014 13:12:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=7e/xx3SQKhdKHWZzW9htXVTkJcMM65GqeL33TgEwk7A=; b=PA2hKaD29mNdqePyGE0Cyi0HQ6VfoYOPr4Lo4BJzxk1Qx2L6yTfj2KjMH9+dtwAZmz 6gQivfI6yD90do0MS3CBhC9ouBS2UyAhQbB2GtosDZRNIKThN82vtLLpoWQIKlpja3t5 Q/chTGBeBadyKBwr9/bPUDoqrmO4vQiKe9UrkGkpEJS91VRCivYBpyKSLnYtDs8xDzv1 bluDUThqq0wx3rWLKwmK6CQjyfQDoQq9aGXWKMMDk1JbzbM29lMnD7NyUyoppJoOg4HO L9Y3HDG7J2Zqu/h+RBycW4/kxkdYaOObccAAo6GM8g8GRrrTEBGpNDPdT1Mz6J3M0CpB g8Gg== X-Gm-Message-State: ALoCoQk9tStEYLq4o+9aiHEIXEVtLSTD14dkizND6DOB8yCOKO7LldWS/jnZqaqHKcbS0R8OZkOcdkF1MwOJsveA4oKbohECzgUXH14mbZflpAx72H7w5QKHcYe9onS6d4Ee993NH4Mr MIME-Version: 1.0 X-Received: by 10.52.37.135 with SMTP id y7mr3647005vdj.38.1400357571931; Sat, 17 May 2014 13:12:51 -0700 (PDT) Received: by 10.52.111.71 with HTTP; Sat, 17 May 2014 13:12:51 -0700 (PDT) In-Reply-To: References: Date: Sun, 18 May 2014 00:12:51 +0400 Message-ID: To: Nikita Popov Cc: PHP internals Content-Type: multipart/alternative; boundary=bcaec51d25aea2480f04f99e27a1 Subject: Re: [PHP-DEV] Rethinking 64bit sizes and PHP-NG From: dmitry@zend.com (Dmitry Stogov) --bcaec51d25aea2480f04f99e27a1 Content-Type: text/plain; charset=UTF-8 Hi Nikita, Thanks for this deep analyzes. I hope, this will lead us to agreement. I see no problems with accepting IS_LONG part of the patch. Then we may think about string length. As you pointed, it makes no sense to use size_t everywhere, and may be using it only in "right" places won't make serious harm. Thanks. Dmitry. On Sat, May 17, 2014 at 10:27 PM, Nikita Popov wrote: > Hi internals! > > The discussions around the patch for improved 64bit support have > deteriorated from a semi-reasonable discussions to pointless accusations > and name-calling. I'd like to get back to discussing this issue from a > technical point of view and see which parts of the patch can be salvaged > and which can not. > > As the current voting results show, the 64bit changes, if accepted, will be > integrated into phpng rather than into master. As such I will argue purely > from a phpng perspective. > > Before going any further, it should be established that two aspects of the > 64 bit patch are, as far as I can see, acceptable to everyone: The > introduction of 64bit integers on Win64 and other LLP64 architectures and > the use of an unsigned integer type for lengths and sizes. > > The disputed aspect of the patch is the use of 64 bit lengths and sizes. It > has been argued that the use of 64bit sizes will significantly increase > memory consumption of the phpng branch and may also negatively impact > performance. > > If you take a look at an early patch for porting the 64bit changes for > phpng [1] you'll find that these are very real concerns, as the changes > increase the sizes of many commonly used data structures. > > However, this is only the case if 64bit sizes are blindly used in every > possible place. In the following I'll argue how a more careful choice of > the used size type in key places can make this patch a lot more realistic. > > First of all it should be noted that the patch introduces size_t usage in > several places where, as far as I can see, it is not necessary. For example > it changes the storage type of line numbers from zend_uint to zend_size_t: > > From zend_opline: > - uint lineno; > + zend_size_t lineno; > > From zend_class_entry: > - zend_uint line_start; > - zend_uint line_end; > + zend_size_t line_start; > + zend_size_t line_end; > > I don't think we need to concern ourselves about files with more than 4 > billion lines. > > Similarly the patch also changes the length of argument names and class > typehints to 64bit: > > From zend_arg_info: > - zend_uint name_len; > + zend_size_t name_len; > const char *class_name; > - zend_uint class_name_len; > + zend_size_t class_name_len; > > This once again seem rather unnecessary and wasteful. Another case are the > number of arguments of a function: > > From zend_op_array: > - zend_uint num_args; > - zend_uint required_num_args; > + zend_size_t num_args; > + zend_size_t required_num_args; > > Again it seems doubtful whether functions with more than 4 billion > arguments are quite relevant. > > While there are more examples than these, I'll not go into this point any > further, as the pattern should be clear. Instead we'll be moving on to the > zend_string and HashTable structures. > > For the HashTable structure, the diff currently looks as follows: > > - zend_uint nTableSize; > - zend_uint nTableMask; > - zend_uint nNumUsed; > - zend_uint nNumOfElements; > - long nNextFreeElement; > + zend_uint_t nTableSize; > + zend_uint_t nTableMask; > + zend_uint_t nNumUsed; > + zend_uint_t nNumOfElements; > + zend_int_t nNextFreeElement; > Bucket *arData; > - zend_uint *arHash; > + zend_uint_t *arHash; > dtor_func_t pDestructor; > zend_uint nInternalPointer; > > This actually misses changing nInternalPointer. If we change that one as > well and account for differences in alignment, the total increase for the > HashTable structure is 6*4 = 24 bytes. For a heavily used structure, that's > a lot. However, even worse are the per-bucket memory increases: > > From the diff above you can already see that the arHash array will be twice > as large, i.e. you'll get an additional 4 bytes per bucket. However - and > this is currently not represented in the patch stub - changing hashtables > to 64bit lengths will make the use of Z_NEXT(bucket->val) to store the next > collision resolution index impossible. As such another 8 bytes per bucket > will be needed. > > This brings us to a total increase of 24 bytes per hashtable and 12 bytes > per bucket. Especially the latter seems unacceptable to me. > > As such, I would suggest to *not* change sizes for the hashtable structure > and limit these changes to other parts of the engine (strings in > particular). > > To put this into context, consider what it actually means to have an array > with more than 4 billion elements (which is the point where the different > size type becomes relevant). With hashtables as implemented in master with > 144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If > we introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of > memory for to create an array that can make use of those sizes. > > I hope this visualizes that supporting 64bit sizes for hashtables is not > necessary. We can easily implement a size restriction in > zend_hash_do_resize. As hashtables go through a centralized API controlling > the size limitation is a lot more straightforward than for strings. > > Now that we have covered the HashTable changes, we're only left with one > significant change. The diff for the zend_string structure is as follows: > > zend_refcounted gc; > - zend_ulong h; /* hash value */ > - int len; > + zend_uint_t h; /* hash value */ > + zend_size_t len; > char val[1]; > > This means that a zend_string will on average be approximately 4 bytes > longer. (Approximately because the exact difference is subject to details > of the allocation process and depends on the string length distribution.) > > I think that this difference might be acceptable, if this is the cost of > uniform string size usage in the codebase. I tested the impact that adding > 4 bytes on the string structure has on one application and the result was a > change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That > difference is acceptable to me. > > However I don't know how representative that result is and I don't > currently have time to set up a testing environment for something like > wordpress on a 64bit vm. I would appreciate it if somebody who already has > such a setup could run a test for this, to confirm that there is indeed no > large difference in memory consumption. Just add a "int unused" after the > len member and compare memory usage. > > So, what I'm suggesting here is the following: > > We implement 64bit integers on LLP64, we implement unsigned sizes, we > implement size_t for strings lengths. I honestly still don't fully > understand the necessity for the latter, but if the impact in phpng is > indeed low (in the 1% range), then I think I can compromise on that. > > What we don't implement is indiscriminate usage of size_t or other 64bit > types all over the place (like for linenos, argument counts, argument > lengths etc) and usage of 64bit sizes for hashtables. > > This would be acceptable for me and I hope that it would also be acceptable > for others. > > While we're at it, I would also like to quickly touch on the topic of > naming and renaming. I hope it is clear now that we need both 64bit types > for some things and 32bit types for others (like HT sizes, linenos, etc). > As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit > platforms of course) and zend_(u)int_t as the 32bit type. This more or less > matches our current naming and hopefully avoids confusion. > > With that naming convention, we should also continue using IS_LONG and > Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using > Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially as > they conflict with other zpp changes in phpng. I understand why these > changes were beneficial in the original 64bit patch, however considering > that the usage of many APIs changed in phpng and extensions need to be > carefully reviewed anyway, I don't think these renames make a lot of sense > anymore. > > So, that's my opinion on how we should proceed with the 64bit patch. I very > much hope that we can reach some consensus about this. > > Credits: This mail is the result of a discussion between Anthony, Bob and > myself. > > Nikita > > [1]: https://gist.github.com/weltling/a941d8cf6c731640b51f > --bcaec51d25aea2480f04f99e27a1--