Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:65633 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19159 invoked from network); 4 Feb 2013 12:50:48 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 Feb 2013 12:50:48 -0000 Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain zend.com does not designate 209.85.219.52 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 209.85.219.52 mail-oa0-f52.google.com Received: from [209.85.219.52] ([209.85.219.52:58319] helo=mail-oa0-f52.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 41/37-14611-6AEAF015 for ; Mon, 04 Feb 2013 07:50:47 -0500 Received: by mail-oa0-f52.google.com with SMTP id k14so6574220oag.11 for ; Mon, 04 Feb 2013 04:50:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=zMtRv+KBmEkxmyZ5et9cCZs23EIciZ0zFlc92BcZq2I=; b=iyTUcnm2hy/gmnw2uQWMeabLTyhET2SML3xq/nFxLxRaCNfECRZt1i7dxO9hMFFVsG I4vYOyuQ27MTQqbCddbumt/HHG+lMJCxKdeu0XQt5vnyW2DeFknNwMnzj1ojlVI9TnRo zlEVue4Dc8eDt283Loiie+JY6bC/g6RXRh+NbdnYAl94dyl+c+80qMxk73ZahRNUh+xR M+TFxm91sGIJaB9gecOpfcDwQzmF56aXPb333kk9OQ7v3aBUyP3IU8iOmBBV37Kf9O9Q XEQagtWgQiQ1Ys6/GE8sNOvoF+x+sXmbN2GkuD+ZJgB97asqpIoL86OBcekOosP67Eal Uo+g== MIME-Version: 1.0 X-Received: by 10.60.20.35 with SMTP id k3mr11642450oee.119.1359982244043; Mon, 04 Feb 2013 04:50:44 -0800 (PST) Received: by 10.182.17.104 with HTTP; Mon, 4 Feb 2013 04:50:43 -0800 (PST) In-Reply-To: References: Date: Mon, 4 Feb 2013 15:50:43 +0300 Message-ID: To: Ard Biesheuvel Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary=e89a8fb206f28e9b2f04d4e58af3 X-Gm-Message-State: ALoCoQnAl6zuyeiBOVNkJZU2kG1iJj28KMFFWUXy+qRgK9K+U9QjXNpLZB+Wa0cvkTxSoEiaekUKCvCwepTn0f7Yn+As5l7jFMVr9Hu85wQdgyhXbNDT68VFFK5uwFQMd6JB/iL59imF Subject: Re: [PHP-DEV] more inline assembler noise From: dmitry@zend.com (Dmitry Stogov) --e89a8fb206f28e9b2f04d4e58af3 Content-Type: text/plain; charset=UTF-8 I can't remember if I did any special benchmarks except for bench.php when I introduced fast math functions. That time, I rearranged the code to allow inlining of the most probable paths and added assembler code to catch overflow (C can't do it in optimal way). As I remember the bench.php showed some visible improvement. even increment/decrement save 1 CPU instruction on fast path. inc (%ecx) jno FAST_PATH ... FASR_PATH: instead of cmp (%ecx), $0x7fffffff je SLOW_PATH inc (%ecx) FAST_PATH: However, I'm not sure if this saved instruction makes any visible speed difference by itself. Thanks. Dmitry. On Mon, Feb 4, 2013 at 2:38 PM, Ard Biesheuvel wrote: > Hi Dimitry, > > The main problem I have with this code is that most of it (the double > handling) is outside the hot path, and that it is riddled with > hardcoded constants, struct offsets etc. However, if it works than I > am not necessarily in favour of making changes to it. > > So can you explain a little bit which benchmarks you used to prove > that the inline assembly is faster than C? Especially in the > increment/decrement cases, there is no real overflow detection > necessary other than comparing with LONG_MIN/LONG_MAX, so I would > expect the compiler to generate fairly optimal code in these cases. > > I am not trying to challenge these decisions, mind you. I am trying to > decide whether ARM will require similar handling as x86 to obtain > optimal performance. > > Thanks, > Ard. > > > > On 4 February 2013 11:32, Dmitry Stogov wrote: > > Hi Ard, > > > > Actually with your patch the fast_increment_function() is going to be > > compile into something like this > > > > incl (%ecx) > > seto %al > > test %al,%al > > jz .FLOAT > > .END: > > ... > > .FLOAT: > > movl $0x0, (%ecx) > > movl $0x41e00000, 0x4(%ecx) > > movb $0x2,0xc(%ecx) > > jmp . END > > > > while before the patch it would > > > > incl (%ecx) > > jno .END > > .FLOATL > > movl $0x0, (%ecx) > > movl $0x41e00000, 0x4(%ecx) > > movb $0x2,0xc(%ecx) > > .END: > > ... > > > > So the only advantage of your code is eliminated static branch > misprediction > > in cost of two additional CPU instructions. > > However CPU branch predictor should make this advantage unimportant. > > > > Thanks. Dmitry. > > > > > > On Fri, Jan 18, 2013 at 10:08 PM, Ard Biesheuvel < > ard.biesheuvel@linaro.org> > > wrote: > >> > >> Hello, > >> > >> Again, apologies for prematurely declaring someone else's code 'crap'. > >> There are no bugs in the inline x86 assembler in Zend/zend_operators.h, > as > >> far as I can tell, only two kinds of issues that I still think should be > >> addressed. > >> > >> First of all, from a maintenance pov, having field offsets (like the > >> offset of zval.type) and constants (like $0x2 for IS_DOUBLE) hard coded > >> inside the instructions is a bad idea. > >> > >> The other issue is the branching and the floating point instructions. > The > >> inline assembler addresses the common case, but also adds a bunch of > >> instructions that address the corner case, and some branches to jump > over > >> them. As I indicated in my previous email, branching is relatively > costly on > >> a modern CPU with deep pipelines and having a bunch of FPU instructions > in > >> there that hardly ever get executed doesn't help either. > >> > >> The primary reason for having inline assembler at all is the ability to > >> detect overflow. This mainly applies to multiplication, as in that case, > >> detecting overflow in C code is much harder compared to reading a > condition > >> flag in the CPU (hence the various accelerated implementations in > >> zend_signed_multiply.h). However, detecting overflow in > addition/subtraction > >> implemented in C is much easier, as the code in zend_operators.h proves: > >> just a matter of checking the sign bits, or doing a simple compare with > >> LONG_MIN/LONG_MAX. > >> > >> Therefore, I would be interested in finding out which benchmark was used > >> to make the case for having these accelerated implementations in the > first > >> place. The differences in performance between various implementations > are > >> very small in the tests I have done. > >> > >> As for the code style/maintainability, I propose to apply the attached > >> patch. The performance is on par, as far as I can tell, but it is > arguably > >> better code. I will also hook in the ARM versions once I manage to prove > >> that the performance is affected favourably by them. > >> > >> Regards, > >> Ard. > >> > >> > >> > >> Before > >> ------- > >> > >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i++);' > >> > >> real 0m56.910s > >> user 0m56.876s > >> sys 0m0.008s > >> > >> > >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i--);' > >> > >> real 1m34.576s > >> user 1m34.518s > >> sys 0m0.020s > >> > >> > >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i += 3);' > >> > >> real 0m21.494s > >> user 0m21.473s > >> sys 0m0.008s > >> > >> > >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i -= 3);' > >> > >> real 0m19.879s > >> user 0m19.865s > >> sys 0m0.004s > >> > >> > >> After > >> ----- > >> > >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i++);' > >> > >> real 0m56.687s > >> user 0m56.656s > >> sys 0m0.004s > >> > >> > >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i--);' > >> > >> real 1m28.124s > >> user 1m28.082s > >> sys 0m0.004s > >> > >> > >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i += 3);' > >> > >> real 0m20.561s > >> user 0m20.545s > >> sys 0m0.004s > >> > >> > >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i -= 3);' > >> > >> real 0m20.524s > >> user 0m20.509s > >> sys 0m0.004s > >> > >> > >> -- > >> PHP Internals - PHP Runtime Development Mailing List > >> To unsubscribe, visit: http://www.php.net/unsub.php > > > > > --e89a8fb206f28e9b2f04d4e58af3--