Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:65020
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Message-ID: <F8.82.24194.18539F05@pb1.pair.com>
To: internals@lists.php.net
Date: Fri, 18 Jan 2013 12:43:50 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2
MIME-Version: 1.0
References: <DF.12.24194.A2139F05@pb1.pair.com>
In-Reply-To: <DF.12.24194.A2139F05@pb1.pair.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: broken inline assembler in zend_operators.h
From: ard.biesheuvel@linaro.org (Ard Biesheuvel)

Replying to self:

The code is not as broken as I thought. Apologies for the noise.
(I misread zend.h where lval and dval are actually part of a union type)

However, there is still a lot of hardcodedness which I think should be 
cleaned up, and the gain is not obvious.

-- 
Ard.


On 01/18/2013 12:25 PM, Ard Biesheuvel wrote:
> Hello all,
>
> As I indicated yesterday, I am looking into PHP performance when running
> on ARM, and while doing that, I ran into the following code in
> Zend/zend_operators.h.
>
> For 32-bit x86, we have the following assembler code to do a 'fast'
> increment (%0 is a zval*):
>
> incl (%0)            @ increment value at address %0
> jno  0f                @ jump to label 0 if no overflow
> movl $0x0, (%0)            @ else store 0 to address %0
> movl $0x41e00000, 0x4(%0)    @      store LONG_MAX+1 at %0+4
> movb $0x2,0xc(%0)        @ set field %0 + 0xc to 2
> 0:
>
> This code makes a fair number of assumptions:
> - zval.lval is the first member of a zval struct
> - zval.dval is the second member, at offset 4
> - zval.type lives at offset 0xc
> - IS_DOUBLE == 0x2
>
> For 64-bit x86, we have a similar snippet:
>
> incq (%0)            @ increment value at address %0
> jno  0f                @ jump to label 0 if no overflow
> movl $0x0, (%0)            @ else store 0 to address %0
> movl $0x43e00000, 0x4(%0)    @      store LONG_MAX+1 at %0+4
> movb $0x2,0x14(%0)        @ set field %0 + 0x14 to 2
> 0:
>
> only, expectedly, (double)LONG_MAX+1.0 has a different value (as
> LONG_MAX on 64-bit has another value than on 32-bit), but also note that
> while the hard coded offset of the type field (0x14) is likely correct,
> the dval offset is 0x4 like in the previous example, while a long has
> size 8 on 64-bit. In other words, this code is broken.
>
> Another thing to note is that branch predictors typically predict all
> branches as not taken, and it is up to the compiler to arrange the
> branch in an optimal way; as we branch on no overflow, the code above
> basically amounts to using
>
> if (EXPECT(overflow)) {
>     /* juggle doubles */
> } else {
>    lval++;
> }
>
> The fast_add and fast_sub function also appear to be broken. They fall
> back to double arithmetic if the integer instructions overflow:
>
> fildl    (%1)        @ load operand 1
> fildl    (%2)        @ load operand 2
> faddp    %%st, %%st(1)    @ perform the addition
> movb   $0x2,0xc(%0)    @ set type of result to IS_DOUBLE
> fstpl    (%0)        @ convert the result to long and store to lval!!
>
> Note that this code is only executed if we have already established that
> the integer addition will overflow, and so we know lval cannot hold the
> result. Still, we store the result in lval, and nothing is stored to the
> dval field in this case.
>
> I ran some tests both on x86 and ARM, and the inline assembler does not
> result in a noticeable speed up, as far as I can tell. Adding to that
> the fact that they are all broken for x86, I suppose we get rid of them,
> and just use the C versions in all cases. (I will not be adding ARM
> versions of the inline assembler, as the performance gain is not obvious).
>
> Regards,
> Ard.