Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:74293 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 59057 invoked from network); 17 May 2014 19:43:24 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 May 2014 19:43:24 -0000 Authentication-Results: pb1.pair.com smtp.mail=tyra3l@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=tyra3l@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.175 as permitted sender) X-PHP-List-Original-Sender: tyra3l@gmail.com X-Host-Fingerprint: 209.85.216.175 mail-qc0-f175.google.com Received: from [209.85.216.175] ([209.85.216.175:51717] helo=mail-qc0-f175.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 87/01-53190-ADBB7735 for ; Sat, 17 May 2014 15:43:22 -0400 Received: by mail-qc0-f175.google.com with SMTP id w7so6480397qcr.6 for ; Sat, 17 May 2014 12:43:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=7C2Ag3XnpKxyF4Xecx+GY9q2+6Mvsz/9/3y5VQayfBU=; b=haRBiDLlx0EeT6QjPVSuX+wocgDEEbRdN36EwgdbOAd16RGJuspNTYp5QLgeNBqni9 jhjFvhiS7EWfAdh6nnFTCLF6OlnMUiuRBLuQKunNvOJM8cerEn6Tb1QIcRqkF6N3+KVi AbEFqgeduzxbKPvQ1h/w73eqz1TYLCaOvDeAx0+gE1GW+hjmMRoru3S8cp+QbbzBeS1K bM8zVbWukHcFU3+98VNLk0kmeuJ77aHZH2+OSvAR308RjW9BNhF77krbDFaDUV5cGzos XAJLBhfnIY3FlH/WWaaCHNIqzpiIAkd9KIedRlE9hmCn4ogkOrVCeMPjNtUmejlcpn6e 9mhQ== MIME-Version: 1.0 X-Received: by 10.140.90.84 with SMTP id w78mr35285821qgd.52.1400355799974; Sat, 17 May 2014 12:43:19 -0700 (PDT) Received: by 10.140.17.77 with HTTP; Sat, 17 May 2014 12:43:19 -0700 (PDT) In-Reply-To: References: Date: Sat, 17 May 2014 21:43:19 +0200 Message-ID: To: Nikita Popov Cc: PHP internals Content-Type: multipart/alternative; boundary=001a11c117a8044af604f99dbe25 Subject: Re: [PHP-DEV] Rethinking 64bit sizes and PHP-NG From: tyra3l@gmail.com (Ferenc Kovacs) --001a11c117a8044af604f99dbe25 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sat, May 17, 2014 at 8:27 PM, Nikita Popov wrote: > Hi internals! > > The discussions around the patch for improved 64bit support have > deteriorated from a semi-reasonable discussions to pointless accusations > and name-calling. I'd like to get back to discussing this issue from a > technical point of view and see which parts of the patch can be salvaged > and which can not. > > As the current voting results show, the 64bit changes, if accepted, will = be > integrated into phpng rather than into master. As such I will argue purel= y > from a phpng perspective. > > Before going any further, it should be established that two aspects of th= e > 64 bit patch are, as far as I can see, acceptable to everyone: The > introduction of 64bit integers on Win64 and other LLP64 architectures and > the use of an unsigned integer type for lengths and sizes. > > The disputed aspect of the patch is the use of 64 bit lengths and sizes. = It > has been argued that the use of 64bit sizes will significantly increase > memory consumption of the phpng branch and may also negatively impact > performance. > > If you take a look at an early patch for porting the 64bit changes for > phpng [1] you'll find that these are very real concerns, as the changes > increase the sizes of many commonly used data structures. > > However, this is only the case if 64bit sizes are blindly used in every > possible place. In the following I'll argue how a more careful choice of > the used size type in key places can make this patch a lot more realistic= . > > First of all it should be noted that the patch introduces size_t usage in > several places where, as far as I can see, it is not necessary. For examp= le > it changes the storage type of line numbers from zend_uint to zend_size_t= : > > From zend_opline: > - uint lineno; > + zend_size_t lineno; > > From zend_class_entry: > - zend_uint line_start; > - zend_uint line_end; > + zend_size_t line_start; > + zend_size_t line_end; > > I don't think we need to concern ourselves about files with more than 4 > billion lines. > > Similarly the patch also changes the length of argument names and class > typehints to 64bit: > > From zend_arg_info: > - zend_uint name_len; > + zend_size_t name_len; > const char *class_name; > - zend_uint class_name_len; > + zend_size_t class_name_len; > > This once again seem rather unnecessary and wasteful. Another case are th= e > number of arguments of a function: > > From zend_op_array: > - zend_uint num_args; > - zend_uint required_num_args; > + zend_size_t num_args; > + zend_size_t required_num_args; > > Again it seems doubtful whether functions with more than 4 billion > arguments are quite relevant. > > While there are more examples than these, I'll not go into this point any > further, as the pattern should be clear. Instead we'll be moving on to th= e > zend_string and HashTable structures. > > For the HashTable structure, the diff currently looks as follows: > > - zend_uint nTableSize; > - zend_uint nTableMask; > - zend_uint nNumUsed; > - zend_uint nNumOfElements; > - long nNextFreeElement; > + zend_uint_t nTableSize; > + zend_uint_t nTableMask; > + zend_uint_t nNumUsed; > + zend_uint_t nNumOfElements; > + zend_int_t nNextFreeElement; > Bucket *arData; > - zend_uint *arHash; > + zend_uint_t *arHash; > dtor_func_t pDestructor; > zend_uint nInternalPointer; > > This actually misses changing nInternalPointer. If we change that one as > well and account for differences in alignment, the total increase for the > HashTable structure is 6*4 =3D 24 bytes. For a heavily used structure, th= at's > a lot. However, even worse are the per-bucket memory increases: > > From the diff above you can already see that the arHash array will be twi= ce > as large, i.e. you'll get an additional 4 bytes per bucket. However - and > this is currently not represented in the patch stub - changing hashtables > to 64bit lengths will make the use of Z_NEXT(bucket->val) to store the ne= xt > collision resolution index impossible. As such another 8 bytes per bucket > will be needed. > > This brings us to a total increase of 24 bytes per hashtable and 12 bytes > per bucket. Especially the latter seems unacceptable to me. > > As such, I would suggest to *not* change sizes for the hashtable structur= e > and limit these changes to other parts of the engine (strings in > particular). > > To put this into context, consider what it actually means to have an arra= y > with more than 4 billion elements (which is the point where the different > size type becomes relevant). With hashtables as implemented in master wit= h > 144 bytes per bucket, this would require 2^32 * 144 =3D 576 GB of memory.= If > we introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of > memory for to create an array that can make use of those sizes. > > I hope this visualizes that supporting 64bit sizes for hashtables is not > necessary. We can easily implement a size restriction in > zend_hash_do_resize. As hashtables go through a centralized API controlli= ng > the size limitation is a lot more straightforward than for strings. > > Now that we have covered the HashTable changes, we're only left with one > significant change. The diff for the zend_string structure is as follows: > > zend_refcounted gc; > - zend_ulong h; /* hash value */ > - int len; > + zend_uint_t h; /* hash value */ > + zend_size_t len; > char val[1]; > > This means that a zend_string will on average be approximately 4 bytes > longer. (Approximately because the exact difference is subject to details > of the allocation process and depends on the string length distribution.) > > I think that this difference might be acceptable, if this is the cost of > uniform string size usage in the codebase. I tested the impact that addin= g > 4 bytes on the string structure has on one application and the result was= a > change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That > difference is acceptable to me. > > However I don't know how representative that result is and I don't > currently have time to set up a testing environment for something like > wordpress on a 64bit vm. I would appreciate it if somebody who already ha= s > such a setup could run a test for this, to confirm that there is indeed n= o > large difference in memory consumption. Just add a "int unused" after the > len member and compare memory usage. > > So, what I'm suggesting here is the following: > > We implement 64bit integers on LLP64, we implement unsigned sizes, we > implement size_t for strings lengths. I honestly still don't fully > understand the necessity for the latter, but if the impact in phpng is > indeed low (in the 1% range), then I think I can compromise on that. > > What we don't implement is indiscriminate usage of size_t or other 64bit > types all over the place (like for linenos, argument counts, argument > lengths etc) and usage of 64bit sizes for hashtables. > > This would be acceptable for me and I hope that it would also be acceptab= le > for others. > > While we're at it, I would also like to quickly touch on the topic of > naming and renaming. I hope it is clear now that we need both 64bit types > for some things and 32bit types for others (like HT sizes, linenos, etc). > As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit > platforms of course) and zend_(u)int_t as the 32bit type. This more or le= ss > matches our current naming and hopefully avoids confusion. > > With that naming convention, we should also continue using IS_LONG and > Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using > Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially = as > they conflict with other zpp changes in phpng. I understand why these > changes were beneficial in the original 64bit patch, however considering > that the usage of many APIs changed in phpng and extensions need to be > carefully reviewed anyway, I don't think these renames make a lot of sens= e > anymore. > > So, that's my opinion on how we should proceed with the 64bit patch. I ve= ry > much hope that we can reach some consensus about this. > > Credits: This mail is the result of a discussion between Anthony, Bob and > myself. > > Nikita > > [1]: https://gist.github.com/weltling/a941d8cf6c731640b51f > First of all: thank you for bringing back the discussion to the technical level. I would really like if Pierre and Anatol could reply what do they think about this, as based on the previous discussion, the changes proposed here would be acceptable for most people voting no for the original rfc, and it is only sacrifices parts which was already considered as a "side-effect" not a goal of the rfc anyways. --=20 Ferenc Kov=C3=A1cs @Tyr43l - http://tyrael.hu --001a11c117a8044af604f99dbe25--