Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:87401 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 17208 invoked from network); 30 Jul 2015 14:23:11 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Jul 2015 14:23:11 -0000 Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 209.85.223.170 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 209.85.223.170 mail-io0-f170.google.com Received: from [209.85.223.170] ([209.85.223.170:35789] helo=mail-io0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F3/C3-31830-C433AB55 for ; Thu, 30 Jul 2015 10:23:10 -0400 Received: by iodd187 with SMTP id d187so55097468iod.2 for ; Thu, 30 Jul 2015 07:23:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=qYiWthaD6W6xb/iSrE9kQLuz0jNcK2VlatuG2Mkz/iw=; b=mPZ/ZXVBW98MkTYaR0IF9p7qnOoOmwHCFG4wlcXFWYPazSKxWcFrFOJa2hRXr55jsB hP0JLo6NMt5PyfqRLUghoZVP55K9hEAAjQ40Lcg+BLMEJ2YyF4srTOEMUwQuIpo3/L2r QrcEOC+j2LlmoNm9WCiDnh5g+RbjbpRrhwA/psLxM+yFILs0LbW8SCpgWirrRoYklfuq cHGpsiTB805F9aeq69twj3IexuwK7tW69oGW1ZUW7QdTFvu0eQ1tityEnoHXIJzN+f6T I4XK8Smn/EiclznIaMzotz00sfu7/MNGgUtvimTZcOVa5Q7iXkEzvlfObfx0HSAxdksM LlIQ== X-Gm-Message-State: ALoCoQn9IkaWu3ywFatqzvL1EyB2Cv7jDqMzbvgNsef0PdD2MnUb/zqSJvzjh3yRlKkyIZ4ZUGYrEsCLGaGRJEYJkh0UVZEyB4oSeppeNV4Pphbe+1ItKFSWAvM2cWwtBGCGUcJme5RYbkUdKbqaDodMTw8BdYWZ2fVwAG3fWNAV/gt2GJijVwE= MIME-Version: 1.0 X-Received: by 10.107.152.148 with SMTP id a142mr10822194ioe.196.1438266185966; Thu, 30 Jul 2015 07:23:05 -0700 (PDT) Received: by 10.50.29.230 with HTTP; Thu, 30 Jul 2015 07:23:05 -0700 (PDT) In-Reply-To: <0ABC26E371A76440A370CFC5EB1056CC2F6C9AE9@IRSMSX106.ger.corp.intel.com> References: <0ABC26E371A76440A370CFC5EB1056CC2F6C9AE9@IRSMSX106.ger.corp.intel.com> Date: Thu, 30 Jul 2015 17:23:05 +0300 Message-ID: To: "Andone, Bogdan" Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=001a1140ad721b7edc051c187127 Subject: Re: [PHP-DEV] Introduction and some opcache SSE related stuff From: dmitry@zend.com (Dmitry Stogov) --001a1140ad721b7edc051c187127 Content-Type: text/plain; charset=UTF-8 Hi Bogdan, On Wed, Jul 29, 2015 at 5:22 PM, Andone, Bogdan wrote: > Hi Guys, > > My name is Bogdan Andone and I work for Intel in the area of SW > performance analysis and optimizations. > We would like to actively contribute to Zend PHP project and to involve > ourselves in finding new performance improvement opportunities based on > available and/or new hardware features. > I am still in the source code digesting phase but I had a look to the > fast_memcpy() implementation in opcache extension which uses SSE intrinsics: > > If I am not wrong fast_memcpy() function is not currently used, as I > didn't find the "-msse4.2" gcc flag in the Makefile. I assume you probably > didn't see any performance benefit so you preserved generic memcpy() usage. > This is not SSE4.2 this is SSE2. Any X86_64 target implements SSE2, so it's enabled by default on x86_64 systems (at least on Linux). It also may be enabled on x86 targets adding "-msse2" option. > > I would like to propose a slightly different implementation which uses > _mm_store_si128() instead of _mm_stream_si128(). This ensures that copied > memory is preserved in data cache, which is not bad as the interpreter will > start to use this data without the need to go back one more time to memory. > _mm_stream_si128() in the current implementation is intended to be used for > stores where we want to avoid reading data into the cache and the cache > pollution; in opcache scenario it seems that preserving the data in cache > has a positive impact. > _mm_stream_si128() was used on purpose, to avoid CPU cache pollution, because data copied from SHM to process memory is not necessary used before eviction. By the way, I'm not completely sure. May be _mm_store_si128() can provide better result. > > Running php-cgi -T10000 on WordPress4.1/index.php I see ~1% performance > increase for the new version of fast_memcpy() compared with the generic > memcpy(). Same result using a full load test with http_load on a Haswell EP > 18 cores. > 1% is really big improvement. I'll able to check this only on next week (when back from vacation). > > Here is the proposed pull request: > https://github.com/php/php-src/pull/1446 > > Related to the SW prefetching instructions in fast_memcpy()... they are > not really useful in this place. There benefit is almost negligible as the > address requested for prefetch will be needed at the next iteration (few > cycles later), while the time needed to get data from RAM is >100 cycles > usually.. Nevertheless... they don't heart and it seems they still have a > very small benefit so I preserved the original instruction and I added a > new prefetch request for the destination pointer. > I also didn't see significant difference from software prefetching. Thanks. Dmitry. > > Hope it helps, > Bogdan > --001a1140ad721b7edc051c187127--