Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:87392 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 99682 invoked from network); 30 Jul 2015 12:24:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Jul 2015 12:24:06 -0000 Authentication-Results: pb1.pair.com smtp.mail=pthreads@pthreads.org; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=pthreads@pthreads.org; sender-id=unknown Received-SPF: error (pb1.pair.com: domain pthreads.org from 209.85.160.180 cause and error) X-PHP-List-Original-Sender: pthreads@pthreads.org X-Host-Fingerprint: 209.85.160.180 mail-yk0-f180.google.com Received: from [209.85.160.180] ([209.85.160.180:35232] helo=mail-yk0-f180.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BD/60-31830-3671AB55 for ; Thu, 30 Jul 2015 08:24:04 -0400 Received: by ykdu72 with SMTP id u72so32337441ykd.2 for ; Thu, 30 Jul 2015 05:24:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=SpK3i8OLJJFEu0Dun6iw5R40k8RCAAVzrO4VeqG7bpM=; b=doJh95mMMWkcakCRBnwqkhaHbnv++TvokGuR0oBYJ7kxll8py1/GYnnw9MpHbj/czF XO4XC22yixzwCeGKKU4hmdOgrwHcrXLfRB6C4yW/Xl+3f7ez7yHTfveL8f82kRMvZGyf TSRcRBuhYL4UTQrBuuTYrQtO2vaXXJprA+QlIACyv9HZmLVBywKdDQUUclR5TnycYHPU BglmEM9UeR+7CUIJ3L9tnwhBBNptbtP3wAzPp7Ph3rYyvNT0rvSMEx8Ew3tdC9uLCGDz 5NSucdkIQsyhzJzCei3eJ0ncDxBJUK6THavq8NMSUInFnKHZf7zT7pAHpVSY+Bnv74z4 vNFw== X-Gm-Message-State: ALoCoQnXJkCfEvoc4Smjdvd6sUzDcx0t0eaULc1i5psjWDnHNPc1LbI1ndk1HJzsx0b/UmU1KdBF MIME-Version: 1.0 X-Received: by 10.170.150.7 with SMTP id r7mr50875576ykc.48.1438259041109; Thu, 30 Jul 2015 05:24:01 -0700 (PDT) Received: by 10.129.114.213 with HTTP; Thu, 30 Jul 2015 05:24:01 -0700 (PDT) X-Originating-IP: [188.29.164.59] In-Reply-To: <0ABC26E371A76440A370CFC5EB1056CC2F6C9AE9@IRSMSX106.ger.corp.intel.com> References: <0ABC26E371A76440A370CFC5EB1056CC2F6C9AE9@IRSMSX106.ger.corp.intel.com> Date: Thu, 30 Jul 2015 13:24:01 +0100 Message-ID: To: "Andone, Bogdan" Cc: "internals@lists.php.net" , Dmitry Stogov Content-Type: multipart/alternative; boundary=001a1139c0d23dbaf6051c16c728 Subject: Re: [PHP-DEV] Introduction and some opcache SSE related stuff From: pthreads@pthreads.org (Joe Watkins) --001a1139c0d23dbaf6051c16c728 Content-Type: text/plain; charset=UTF-8 Hi Andone, I'm not sure why nobody has replied to you yet, we've all looked at the PR and spent a lot of the day yesterday discussing it. I've CC'd Dmitry, he doesn't always read internals, so this should get his attention. Lastly, very cool ... I look forward to some more cleverness ... Cheers Joe On Wed, Jul 29, 2015 at 3:22 PM, Andone, Bogdan wrote: > Hi Guys, > > My name is Bogdan Andone and I work for Intel in the area of SW > performance analysis and optimizations. > We would like to actively contribute to Zend PHP project and to involve > ourselves in finding new performance improvement opportunities based on > available and/or new hardware features. > I am still in the source code digesting phase but I had a look to the > fast_memcpy() implementation in opcache extension which uses SSE intrinsics: > > If I am not wrong fast_memcpy() function is not currently used, as I > didn't find the "-msse4.2" gcc flag in the Makefile. I assume you probably > didn't see any performance benefit so you preserved generic memcpy() usage. > > I would like to propose a slightly different implementation which uses > _mm_store_si128() instead of _mm_stream_si128(). This ensures that copied > memory is preserved in data cache, which is not bad as the interpreter will > start to use this data without the need to go back one more time to memory. > _mm_stream_si128() in the current implementation is intended to be used for > stores where we want to avoid reading data into the cache and the cache > pollution; in opcache scenario it seems that preserving the data in cache > has a positive impact. > > Running php-cgi -T10000 on WordPress4.1/index.php I see ~1% performance > increase for the new version of fast_memcpy() compared with the generic > memcpy(). Same result using a full load test with http_load on a Haswell EP > 18 cores. > > Here is the proposed pull request: > https://github.com/php/php-src/pull/1446 > > Related to the SW prefetching instructions in fast_memcpy()... they are > not really useful in this place. There benefit is almost negligible as the > address requested for prefetch will be needed at the next iteration (few > cycles later), while the time needed to get data from RAM is >100 cycles > usually.. Nevertheless... they don't heart and it seems they still have a > very small benefit so I preserved the original instruction and I added a > new prefetch request for the destination pointer. > > Hope it helps, > Bogdan > --001a1139c0d23dbaf6051c16c728--