Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:65020 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 53142 invoked from network); 18 Jan 2013 11:44:02 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Jan 2013 11:44:02 -0000 X-Host-Fingerprint: 83.153.85.71 cag06-7-83-153-85-71.fbx.proxad.net Received: from [83.153.85.71] ([83.153.85.71:6641] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F8/82-24194-18539F05 for ; Fri, 18 Jan 2013 06:44:02 -0500 Message-ID: To: internals@lists.php.net Date: Fri, 18 Jan 2013 12:43:50 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 83.153.85.71 Subject: Re: broken inline assembler in zend_operators.h From: ard.biesheuvel@linaro.org (Ard Biesheuvel) Replying to self: The code is not as broken as I thought. Apologies for the noise. (I misread zend.h where lval and dval are actually part of a union type) However, there is still a lot of hardcodedness which I think should be cleaned up, and the gain is not obvious. -- Ard. On 01/18/2013 12:25 PM, Ard Biesheuvel wrote: > Hello all, > > As I indicated yesterday, I am looking into PHP performance when running > on ARM, and while doing that, I ran into the following code in > Zend/zend_operators.h. > > For 32-bit x86, we have the following assembler code to do a 'fast' > increment (%0 is a zval*): > > incl (%0) @ increment value at address %0 > jno 0f @ jump to label 0 if no overflow > movl $0x0, (%0) @ else store 0 to address %0 > movl $0x41e00000, 0x4(%0) @ store LONG_MAX+1 at %0+4 > movb $0x2,0xc(%0) @ set field %0 + 0xc to 2 > 0: > > This code makes a fair number of assumptions: > - zval.lval is the first member of a zval struct > - zval.dval is the second member, at offset 4 > - zval.type lives at offset 0xc > - IS_DOUBLE == 0x2 > > For 64-bit x86, we have a similar snippet: > > incq (%0) @ increment value at address %0 > jno 0f @ jump to label 0 if no overflow > movl $0x0, (%0) @ else store 0 to address %0 > movl $0x43e00000, 0x4(%0) @ store LONG_MAX+1 at %0+4 > movb $0x2,0x14(%0) @ set field %0 + 0x14 to 2 > 0: > > only, expectedly, (double)LONG_MAX+1.0 has a different value (as > LONG_MAX on 64-bit has another value than on 32-bit), but also note that > while the hard coded offset of the type field (0x14) is likely correct, > the dval offset is 0x4 like in the previous example, while a long has > size 8 on 64-bit. In other words, this code is broken. > > Another thing to note is that branch predictors typically predict all > branches as not taken, and it is up to the compiler to arrange the > branch in an optimal way; as we branch on no overflow, the code above > basically amounts to using > > if (EXPECT(overflow)) { > /* juggle doubles */ > } else { > lval++; > } > > The fast_add and fast_sub function also appear to be broken. They fall > back to double arithmetic if the integer instructions overflow: > > fildl (%1) @ load operand 1 > fildl (%2) @ load operand 2 > faddp %%st, %%st(1) @ perform the addition > movb $0x2,0xc(%0) @ set type of result to IS_DOUBLE > fstpl (%0) @ convert the result to long and store to lval!! > > Note that this code is only executed if we have already established that > the integer addition will overflow, and so we know lval cannot hold the > result. Still, we store the result in lval, and nothing is stored to the > dval field in this case. > > I ran some tests both on x86 and ARM, and the inline assembler does not > result in a noticeable speed up, as far as I can tell. Adding to that > the fact that they are all broken for x86, I suppose we get rid of them, > and just use the C versions in all cases. (I will not be adding ARM > versions of the inline assembler, as the performance gain is not obvious). > > Regards, > Ard.