Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:87658 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 26329 invoked from network); 6 Aug 2015 00:06:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Aug 2015 00:06:49 -0000 Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 216.33.127.81 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 216.33.127.81 mta21.charter.net Solaris 10 1203 Received: from [216.33.127.81] ([216.33.127.81:34378] helo=mta21.charter.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BB/70-22467-715A2C55 for ; Wed, 05 Aug 2015 20:06:48 -0400 Received: from imp11 ([10.20.200.11]) by mta21.charter.net (InterMail vM.8.01.05.09 201-2260-151-124-20120717) with ESMTP id <20150806000645.KJIP23400.mta21.charter.net@imp11>; Wed, 5 Aug 2015 20:06:45 -0400 Received: from mtaout002.msg.strl.va.charter.net ([68.114.190.27]) by imp11 with smtp.charter.net id 1C6l1r0010buw5Q05C6lZJ; Wed, 05 Aug 2015 20:06:45 -0400 Received: from impout003 ([68.114.189.18]) by mtaout002.msg.strl.va.charter.net (InterMail vM.9.00.020.01 201-2473-160) with ESMTP id <20150806000645.PRTI2337.mtaout002.msg.strl.va.charter.net@impout003>; Wed, 5 Aug 2015 19:06:45 -0500 Received: from pc1 ([96.35.251.86]) by impout003 with charter.net id 1C6k1r00H1sc0so01C6kKS; Wed, 05 Aug 2015 19:06:45 -0500 X-Authority-Analysis: v=2.1 cv=Pvkdbm83 c=1 sm=1 tr=0 a=Is5gsZaFXO8aPum+t7Tz+g==:117 a=Is5gsZaFXO8aPum+t7Tz+g==:17 a=hOpmn2quAAAA:8 a=BCPeO_TGAAAA:8 a=N659UExz7-8A:10 a=mDV3o1hIAAAA:8 a=9pPbQEYhLw30ff4Xp3oA:9 a=IHX7So9uwvBUySyu:21 a=PbWwrU9QY_dROncZ:21 a=pILNOxqGKmIA:10 a=21gvIwNvQA8A:10 Message-ID: To: "Dmitry Stogov" Cc: "PHP Internals" , "Ferenc Kovacs" , "Yasuo Ohgaki" References: <051CB45EE4B24C7C88C3EABC7E8EE0A2@pc1><77CE51405C92499BA13CE0913883EE05@pc1> Date: Wed, 5 Aug 2015 19:06:44 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="Windows-1252"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Subject: Re: [PHP-DEV] Move to Fast ZPP? From: php_lists@realplain.com ("Matt Wilmas") Hi Dmitry, ----- Original Message ----- From: "Dmitry Stogov" Sent: Monday, August 03, 2015 > Hi Matt, > > > On Wed, Jul 22, 2015 at 11:16 PM, Matt Wilmas > wrote: > >> Hi again Dmitry, all, >> >> Hopefully the final update on this, before all is revealed... :-) >> >> [...] >> >> I tried to rush and finish things up before the weekend *2 weeks ago*, >> but >> it took me too long to get the macros sorted out and working right. :-/ >> Sorry for the delay, but more and better goodness should now be included. >> The extra time allowed me to "relax and take notes" (Notorious B.I.G.), >> however. :-D >> >> So yeah, that was all working 10 days ago. Then I realized more function >> param data could be packed together which saved another mov >> instruction -- >> so at the call site, it's just mov+lea+call on 64-bit (since execute_data >> is already in %rdi). There's nothing else (ignoring checking return >> value/return on error, etc.), and each &dest variable is filled in even >> though their address isn't taken (thanks to compiler magic). The only >> exceptions are FUNC (4 instructions I think) and OBJECT_OF_CLASS and >> VARIADIC (1 instruction) types. >> >> Unfortunately (only because I said "same macro syntax," but no big deal), >> the syntax had to be changed, from: >> >> ZEND_PARSE_PARAMETERS_START[_EX](...) >> Z_PARAM_*(...) >> Z_PARAM_*(...) >> ZEND_PARSE_PARAMETERS_END[_EX] >> >> to >> >> ZEND_PARSE_PARAMETERS_START[_EX](...)( // Parentheses >> Z_PARAM_*(...), // Comma-separated >> Z_PARAM_*(...) >> ) ZEND_PARSE_PARAMETERS_END[_EX] >> > > Errors in nested macros might be very difficult to understand :( > I would prefer not to use nested macros without a significant gain. Not sure what you mean about errors, unless you're talking about missing a comma or such... And those macro calls themselves aren't really nested, just in parentheses of course. They are filling [multiple] structs, although that was also the case with the version using the EXACT current syntax. :-) Anyway though, it doesn't matter much; not sure what you'll want to do with all the possibilities I have! And a simple script converts occurrences to the new syntax for testing (instead of bigger patch). Significant gain? Nope. :-) I only did that in order to use the "static" storage specifier in one place, for a pointer to the packed rodata, instead of filling it at runtime. But I think the file size was the same with or without static, even though it saved instructions. So not a requirement, just part of my experiments Like I said, the BIG neat thing is getting the same optimization (all except the "static" part) for the *traditional* ZPP. I hadn't touched it since last message until this week (doing other stuff and too sick ~4 days to do anything :-/) and wanted to check closer to final code before replying -- but still looks good with GCC so far! So depending, there's maybe less interest in my smaller FAST_ZPP implementation... *shrug* >> Overall, the *code* size is reduced (vs traditional ZPP), but the file >> size isn't (static stuff in rodata or whatever), which was a bit >> surprising, although most of these PHP functions don't have many >> parameters... >> > > I may just guess, where this static data came from, because I didn't see > the code yet :) Just "static const" stuff. :-) After the very first attempt, I've wanted to pack stuff together. Function min/max args and any flags (QUIET/THROW, or the new METHOD) are in a 4 byte int. (GCC doesn't want to pack them together in the latest case, but easily fixed.) Then a byte for each parameter. So, I tried "static const" to eliminate the movb instructions, that's all. Just to give an idea, here's the different instructions for atan2() with GCC 4.8 -O2 (after push %rbx, comments mostly for others): == Tradtional ZPP == xor %eax,%eax # ??? align padding? mov %rsi,%rbx mov $0x61c4f3,%esi # format string ptr sub $0x10,%rsp mov 0x2c(%rdi),%edi # ZEND_NUM_ARGS() lea 0x8(%rsp),%rcx # &num2 mov %rsp,%rdx # &num1 callq 595670 cmp $0xffffffff,%eax je 4f7f4f movsd 0x8(%rsp),%xmm1 movsd (%rsp),%xmm0 callq 419190 == My macros, "static const" version == mov %rsi,%rbx mov $0x7709f8,%esi # packed static info ptr; execute_data in %rdi sub $0x20,%rsp # 16 bytes more; each parameter needs 16 bytes stack mov %rsp,%rdx # &num1 AND &num2, effectively; usually "lea ?,%rdx" callq 5935d0 test %eax,%eax # shorter than cmp comparing with SUCCESS vs FAILURE jne 4f6f84 movsd 0x10(%rsp),%xmm1 movsd (%rsp),%xmm0 callq 419330 == Traditional ZPP, **optimized at compile time** == mov $0x2,%eax # ??? max_args, for below mov %rsi,%rbx sub $0x30,%rsp lea 0x10(%rsp),%rdx # &num1, &num2, ..., effectively mov %rsp,%rsi # packed info, filled by the following movb $0x2,(%rsp) # min_args mov %ax,0x1(%rsp) # max_args movb $0x0,0x3(%rsp) # flags (none) movb $0x2,0x4(%rsp) # 'd' double type: 2 movb $0x2,0x5(%rsp) # 'd' double type: 2 callq 5935f0 test %eax,%eax jne 4f6fa8 movsd 0x20(%rsp),%xmm1 movsd 0x10(%rsp),%xmm0 callq 419330 That (optimizing traditional string ZPP) will be the *equivalent* of 64KB+ of C code (repetition), all reduced to nothing. :-) And more of that should (will) be packed together. Hopefully this continues, and with other compilers, on non-Windows anyway. Don't know about Windows now... Visual Studio 2008 and 2012 (not much difference) are NOT optimizing away the code (other times it was GCC with issues). :-/ Not sure why. Of course they don't support the necessary compound literals anyway, but I was just testing a manual case... I'll have to try and check 2015 version soon. Regardless, there will be a fallback function to be called with optimized runtime string parsing, to be used if compilers don't create optimized code. I'll be checking more compilers, of course... Sorry for the delay. Thought I'd have patch for you when you got back! It's really about "finished" now, but not sure how many more days of final tweaking and testing till ready for patch. :-) > Thanks. Dmitry. - Matt >> >> The biggest size savings actually came from the simple initial >> optimization of zend_parse_params_none(). Down to almost nothing, much >> faster, and saved 4KB on my --disable-all builds. >> >> >> NEW GOODNESS -- What would of course be nice to have is a big >> optimization >> of the traditional zend_parse[_method]_parameters[_ex|_throw] to avoid >> changing them all. And it seems some people, like Derick, prefer it. >> >> Of course the obvious way I first had in mind weeks ago was to simply >> parse its format string faster (once-ish) at runtime, and then feed it to >> this new FAST_parse function. Should give at least 2x speedup I figured. >> But with this latest implementation, where the function should probably >> now >> be called parse_parameters_ARRAY instead of fast_parse, it would need a >> second pass after parsing the string. Not a huge deal, but... >> >> What would be *really nice* is to have the compiler parse the format >> string, at compile time, and use the new system directly. And... that >> should be possible!! 8-) >> >> Last week I figured GCC's "statement expressions" [1] could be used, >> which >> most compilers seem to support, except MSVC. But just over the weekend I >> realized an inline function could be used with a compound literal (for >> the >> varargs), which is also supported in the latest MSVC versions. Awesome! >> >> And again, fear not, ALL the code can be completely removed by the >> compiler, leaving only movb instructions instead of lea+mov/push for the >> traditional ZPP function call. So, better than my initial >> implementation(s), and nearly the same as my final macro version! I was >> just testing prototypes of portions with GCC yesterday, which does fine >> after adjusting to not generate *horribly stupid* code. >> >> Now to implement it into PHP ASAP! Then I'll save a few more >> banches/instructions in the parse function (specialized for common cases; >> some useless GCC instructions), comment and clean up my experimental >> mess, >> and write up some explanation of the changes before sending patch. Oh, >> and >> I should verify what Clang does with the code as well... >> >> Stay tuned! >> >> [1] https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html