Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:29042 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 74459 invoked by uid 1010); 26 Apr 2007 03:52:49 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 74444 invoked from network); 26 Apr 2007 03:52:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 26 Apr 2007 03:52:49 -0000 Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 69.179.208.43 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 69.179.208.43 msa3-mx.centurytel.net Linux 2.4/2.6 Received: from [69.179.208.43] ([69.179.208.43:34137] helo=msa3-mx.centurytel.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C1/DD-00993-D0220364 for ; Wed, 25 Apr 2007 23:52:47 -0400 Received: from pc1 (d30-194.rt-bras.wnvl.centurytel.net [69.179.157.194]) by msa3-mx.centurytel.net (8.13.6/8.13.6) with SMTP id l3Q3qfKI031084 for ; Wed, 25 Apr 2007 22:52:42 -0500 Message-ID: <004201c787b6$588705f0$0201a8c0@pc1> To: References: <00d201c77d07$dc0819f0$0201a8c0@pc1> Date: Wed, 25 Apr 2007 22:52:43 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1807 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1896 Subject: Re: [PHP-DEV] [PATCH] Major optimization for heredocs/interpolated strings From: php_lists@realplain.com ("Matt Wilmas") Hi again, Hmm, not a single reply about this patch...? Did anyone try it out? :-) Think it can be used after 5.2.2? Matt ----- Original Message ----- From: "Matt Wilmas" Sent: Thursday, April 12, 2007 Subject: [PHP-DEV] [PATCH] Major optimization for heredocs/interpolated strings > Hi all, > > I think I first realized that PHP's scanner splits non-constant strings into > many "pieces" after reading Sara's "How long is a piece of string?" blog > entry[1] last summer. At the time I didn't know much about the internals > and didn't know if anything could be done to change it. Then in the fall I > finally took a look at the scanner ;-) and thought it would be possible to > only "split" strings at variables. Finally a few months ago, I began > working out the changes -- it was working almost 2 months ago, but then I > got sidetracked :-/ from doing some more testing and making a few semantic > token changes till now. > > So anyway, now heredocs and interpolated strings should be pretty much just > like constant strings and concatenation (except for the extra INIT_STRING > opcode). They scan/parse/compile faster (with less memory), run faster, and > there's less to free when destroying opcodes. > > With a simple string like "This is $var string" (say $var = 'some'), I found > the compile/cleanup time to be up to 50% faster, and runtime 55% faster! > (Note: To test compile time, I eval()'d about 50 of them in an if (0) {...} > block.) The difference will be *much more* depending on how many "pieces" > there would've been before (e.g. longer). > > The more complex rules increased the size of Flex's tables about 40%. > However, removing the old heredoc end rule, which used the ^ > beginning-of-line operator, made the YY_RULE_SETUP macro be empty, saving > some space. The net result was an 8K/12K larger binary in 5.2/HEAD. I was > surprised at the overall performance increase without the ^ rule. Its > saving a few operations per match made just about as much difference as > Flex's -Cfe table compression (was playing with that first :^)) when > compiling the code from Zend/bench.php (5% I think). > > This was with a Windows ZTS build. Running ApacheBench on a few different > scripts showed pretty nice overall improvements -- 10-15% was common in my > quick tests. > > BTW, removing that ^ rule lifts the requirement that the character before > the closing heredoc label "must be a newline as defined by your operating > system," to quote the manual. > > Now some of the other changes: > > The ST_SINGLE_QUOTE state was removed from 5.2, like in HEAD. > > A string like "$$$" is considered constant now, since that's really what it > is, right? > > CG(zend_lineno) wasn't incremented before if a \n or \r newline (not \r\n) > followed a backslash in a non-constant string. \{ returned T_STRING instead > of T_BAD_CHARACTER like any other invalid escape sequence. (Note: Of course > these won't usually match now anyway, but will be part of a longer string.) > > I removed HANDLE_NEWLINES() from the code that scans a string's text, > instead doing the newline check in the escape-checking loop, to prevent > scanning twice. And I removed the additional boundary check in > HANDLE_NEWLINES() and elsewhere since I didn't see the need -- AFAIK in all > cases you'll only hit '\0'. > > I removed the one <> rule since it was missing some states and it > wasn't doing anything that the default EOF rule doesn't by calling > yyterminate(). > > In zendlex(), the goto target doesn't need to recheck CG(increment_lineno) > since it hasn't changed, and I simplified the closing tag newline check > (also looked like it would miss \r ones). > > Sorry for the long message! I'll send another if I think of something I > forgot to mention. Here are the patches: > > http://realplain.com/php/scanner_optimizations.diff > http://realplain.com/php/scanner_optimizations_5_2.diff > > Appreciate any feedback, or questions about any of it. :-) > > > Thanks, > Matt > > [1] > http://blog.libssh2.org/index.php?/archives/28-How-long-is-a-piece-of-string.html