Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:65626 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 3362 invoked from network); 4 Feb 2013 10:38:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 Feb 2013 10:38:15 -0000 Authentication-Results: pb1.pair.com smtp.mail=ard.biesheuvel@linaro.org; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=ard.biesheuvel@linaro.org; sender-id=unknown Received-SPF: error (pb1.pair.com: domain linaro.org from 209.85.219.53 cause and error) X-PHP-List-Original-Sender: ard.biesheuvel@linaro.org X-Host-Fingerprint: 209.85.219.53 mail-oa0-f53.google.com Received: from [209.85.219.53] ([209.85.219.53:39832] helo=mail-oa0-f53.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A8/84-14611-69F8F015 for ; Mon, 04 Feb 2013 05:38:15 -0500 Received: by mail-oa0-f53.google.com with SMTP id m1so5379233oag.40 for ; Mon, 04 Feb 2013 02:38:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=wI2YHFUNgAwmMsS3Ul4/7fwLZHkz6daFo1xO6bpjUh4=; b=hDX4NY+eGWGdlKpGRed05eC0J6GU7R0VnBw/K6zOjtfZW2wFV9m51CFWuxeJPz5wey W74/xXcpZVYnXrR/tJNMb91iA/SV/DFdv6U5yPobV1vhbOXCyOY1oIZ1fuKO2iO/88V5 SKd7XKlG4zjb/BHwN7ZcIbgimt7mycjFe/RHc7eTnMyzlX4lnQa/26HVh7coMacSYzfp PduQPijka2pHkL+XBWe90e+o05VVK3ECuFenMDUmbzJYLsv4+awRT2pwIsgYeRsSJSsB vlQVC/dTO7WZzk3YaL30M1AIuDBJbcvEUV7etEDbbPyomR3Q0YgS0ILvalCEi7djKIIw VfnQ== MIME-Version: 1.0 X-Received: by 10.60.7.167 with SMTP id k7mr16728554oea.20.1359974292162; Mon, 04 Feb 2013 02:38:12 -0800 (PST) Received: by 10.76.82.138 with HTTP; Mon, 4 Feb 2013 02:38:12 -0800 (PST) In-Reply-To: References: Date: Mon, 4 Feb 2013 11:38:12 +0100 Message-ID: To: Dmitry Stogov Cc: internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQn9sPm/1xvGBfpLGqVfuPqgxjjnkOpnH9O6z3kI0WZC8AdLMn3O2aUBvXyE2KOzUO4fRP6h Subject: Re: [PHP-DEV] more inline assembler noise From: ard.biesheuvel@linaro.org (Ard Biesheuvel) Hi Dimitry, The main problem I have with this code is that most of it (the double handling) is outside the hot path, and that it is riddled with hardcoded constants, struct offsets etc. However, if it works than I am not necessarily in favour of making changes to it. So can you explain a little bit which benchmarks you used to prove that the inline assembly is faster than C? Especially in the increment/decrement cases, there is no real overflow detection necessary other than comparing with LONG_MIN/LONG_MAX, so I would expect the compiler to generate fairly optimal code in these cases. I am not trying to challenge these decisions, mind you. I am trying to decide whether ARM will require similar handling as x86 to obtain optimal performance. Thanks, Ard. On 4 February 2013 11:32, Dmitry Stogov wrote: > Hi Ard, > > Actually with your patch the fast_increment_function() is going to be > compile into something like this > > incl (%ecx) > seto %al > test %al,%al > jz .FLOAT > .END: > ... > .FLOAT: > movl $0x0, (%ecx) > movl $0x41e00000, 0x4(%ecx) > movb $0x2,0xc(%ecx) > jmp . END > > while before the patch it would > > incl (%ecx) > jno .END > .FLOATL > movl $0x0, (%ecx) > movl $0x41e00000, 0x4(%ecx) > movb $0x2,0xc(%ecx) > .END: > ... > > So the only advantage of your code is eliminated static branch misprediction > in cost of two additional CPU instructions. > However CPU branch predictor should make this advantage unimportant. > > Thanks. Dmitry. > > > On Fri, Jan 18, 2013 at 10:08 PM, Ard Biesheuvel > wrote: >> >> Hello, >> >> Again, apologies for prematurely declaring someone else's code 'crap'. >> There are no bugs in the inline x86 assembler in Zend/zend_operators.h, as >> far as I can tell, only two kinds of issues that I still think should be >> addressed. >> >> First of all, from a maintenance pov, having field offsets (like the >> offset of zval.type) and constants (like $0x2 for IS_DOUBLE) hard coded >> inside the instructions is a bad idea. >> >> The other issue is the branching and the floating point instructions. The >> inline assembler addresses the common case, but also adds a bunch of >> instructions that address the corner case, and some branches to jump over >> them. As I indicated in my previous email, branching is relatively costly on >> a modern CPU with deep pipelines and having a bunch of FPU instructions in >> there that hardly ever get executed doesn't help either. >> >> The primary reason for having inline assembler at all is the ability to >> detect overflow. This mainly applies to multiplication, as in that case, >> detecting overflow in C code is much harder compared to reading a condition >> flag in the CPU (hence the various accelerated implementations in >> zend_signed_multiply.h). However, detecting overflow in addition/subtraction >> implemented in C is much easier, as the code in zend_operators.h proves: >> just a matter of checking the sign bits, or doing a simple compare with >> LONG_MIN/LONG_MAX. >> >> Therefore, I would be interested in finding out which benchmark was used >> to make the case for having these accelerated implementations in the first >> place. The differences in performance between various implementations are >> very small in the tests I have done. >> >> As for the code style/maintainability, I propose to apply the attached >> patch. The performance is on par, as far as I can tell, but it is arguably >> better code. I will also hook in the ARM versions once I manage to prove >> that the performance is affected favourably by them. >> >> Regards, >> Ard. >> >> >> >> Before >> ------- >> >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i++);' >> >> real 0m56.910s >> user 0m56.876s >> sys 0m0.008s >> >> >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i--);' >> >> real 1m34.576s >> user 1m34.518s >> sys 0m0.020s >> >> >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i += 3);' >> >> real 0m21.494s >> user 0m21.473s >> sys 0m0.008s >> >> >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i -= 3);' >> >> real 0m19.879s >> user 0m19.865s >> sys 0m0.004s >> >> >> After >> ----- >> >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i++);' >> >> real 0m56.687s >> user 0m56.656s >> sys 0m0.004s >> >> >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i--);' >> >> real 1m28.124s >> user 1m28.082s >> sys 0m0.004s >> >> >> $ time php -r 'for ($i = 0; $i < 0x7fffffff; $i += 3);' >> >> real 0m20.561s >> user 0m20.545s >> sys 0m0.004s >> >> >> $ time php -r 'for ($i = 0x7fffffff; $i >= 0; $i -= 3);' >> >> real 0m20.524s >> user 0m20.509s >> sys 0m0.004s >> >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php > >