Hi all,
This patch... Saves one opcode per interpolated string by moving what was
the first ADD_STRING/ADD_VAR op into INIT_STRING; changes the
ADD_[CHAR|STRING|VAR] op types from (TMP, <whatever>) to (<whatever>,
UNUSED), since that seems to make more sense (see ADD_ARRAY_ELEMENT) and
simplifies things a bit, and saves some znode copying while compiling.
Next, after nowdoc support was added, I noticed it was mostly duplicate
heredoc code in the scanner, so I combined them for the most part, removing
the NOWDOC tokens, etc. Is that OK? It seems like they may have just been
there for the parser... but I updated the parser so that static heredocs
also work like nowdocs. Also removed the ST_START_[HEREDOC|NOWDOC] states
in the scanner, by doing their work (to catch immediate ending label) in the
initial heredoc rule.
While removing the NOWDOC references from zend_highlight.c, I made a little
change that I think improves (err... adds) highlighting of variables in
double-quoted strings, and makes literal text in heredocs/backticks the
correct "highlight_string" color. This makes coloring consistent across all
strings. :-) Before/after example: http://realplain.com/php/highlight.html
Speaking of backticks, while updating their parser grammar, they now behave
more precisely like shell_exec()
, in that cmd
(constant string) or $cmd
(one variable) won't use INIT_STRING and create a temporary variable...
Oh, almost forgot a couple things -- for HEAD: updated the yyless()
definition to match 5_3 (Nuno didn't get to it yet...), and moved
HANDLE_NEWLINES() so nowdoc text is copied first, otherwise the line number
would be off if there's an error in zend_copy_scanner_string. For 5_3:
there are NOWDOC references in tokenizer_data.c that the patch doesn't
remove, as I think it's supposed to be automated, though they're missing in
HEAD.
http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diff
Thanks,
Matt
P.S. (for Marcus?), the ({LABEL}|([']{LABEL}['])|(["]{LABEL}["])) part of
the heredoc rule crashed re2c (Windows) until I added the extra ( ), which
shouldn't be needed AFAIK and was fine with just 2 alternate matches...
Thanks Matt for the nice work :)
Also removed the ST_START_[HEREDOC|NOWDOC] states
in the scanner, by doing their work (to catch immediate ending label) in
the
initial heredoc rule.
This one was in my todo list as well. I think there might be other places
where a similar optimization can be done.
Oh, almost forgot a couple things -- for HEAD: updated the yyless()
definition to match 5_3 (Nuno didn't get to it yet...), and moved
(...)
P.S. (for Marcus?), the ({LABEL}|([']{LABEL}['])|(["]{LABEL}["])) part of
the heredoc rule crashed re2c (Windows) until I added the extra ( ), which
shouldn't be needed AFAIK and was fine with just 2 alternate matches...
Thanks. I didn't commit the yyless() merge yet because it was crashing re2c.
I found it to be related with 2 different bugs in re2c. I've fixed one, but
the other is still there. Maybe I should pass the details to Marcus, as I
don't have time to fix it myself until the end of the month.
Nuno
A bit of a messy patch in that it doesn't have a single top-level
directory, but after hacking it, it applied. For others testing this,
make sure you run ext/tokenizer/tokenizer_data_gen.sh and of course
genfiles with the latest re2c.
I'm still not getting something that works though. Everything compiles
fine, but make test is throwing:
Fatal error: Invalid opcode 56/16/8. in
/Users/rasmus/php53/run-tests.php on line 547
Could we get a cleaner patch to have a look at? One that doesn't hurt
my brain quite as much to get working?
-Rasmus
Matt Wilmas wrote:
Hi all,
This patch... Saves one opcode per interpolated string by moving what was
the first ADD_STRING/ADD_VAR op into INIT_STRING; changes the
ADD_[CHAR|STRING|VAR] op types from (TMP, <whatever>) to (<whatever>,
UNUSED), since that seems to make more sense (see ADD_ARRAY_ELEMENT) and
simplifies things a bit, and saves some znode copying while compiling.Next, after nowdoc support was added, I noticed it was mostly duplicate
heredoc code in the scanner, so I combined them for the most part, removing
the NOWDOC tokens, etc. Is that OK? It seems like they may have just been
there for the parser... but I updated the parser so that static heredocs
also work like nowdocs. Also removed the ST_START_[HEREDOC|NOWDOC] states
in the scanner, by doing their work (to catch immediate ending label) in the
initial heredoc rule.While removing the NOWDOC references from zend_highlight.c, I made a little
change that I think improves (err... adds) highlighting of variables in
double-quoted strings, and makes literal text in heredocs/backticks the
correct "highlight_string" color. This makes coloring consistent across all
strings. :-) Before/after example: http://realplain.com/php/highlight.htmlSpeaking of backticks, while updating their parser grammar, they now behave
more precisely likeshell_exec()
, in thatcmd
(constant string) or$cmd
(one variable) won't use INIT_STRING and create a temporary variable...Oh, almost forgot a couple things -- for HEAD: updated the yyless()
definition to match 5_3 (Nuno didn't get to it yet...), and moved
HANDLE_NEWLINES() so nowdoc text is copied first, otherwise the line number
would be off if there's an error in zend_copy_scanner_string. For 5_3:
there are NOWDOC references in tokenizer_data.c that the patch doesn't
remove, as I think it's supposed to be automated, though they're missing in
HEAD.http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffThanks,
MattP.S. (for Marcus?), the ({LABEL}|([']{LABEL}['])|(["]{LABEL}["])) part of
the heredoc rule crashed re2c (Windows) until I added the extra ( ), which
shouldn't be needed AFAIK and was fine with just 2 alternate matches...
Fatal error: Invalid opcode 56/16/8. in
/Users/rasmus/php53/run-tests.php on line 547Could we get a cleaner patch to have a look at? One that doesn't hurt
my brain quite as much to get working?
Because he doesn't attached the regenerated zend_vm_execute.h in the
patch.
Matt Wilmas wrote:
Hi all,
This patch... Saves one opcode per interpolated string by moving what was
the first ADD_STRING/ADD_VAR op into INIT_STRING; changes the
ADD_[CHAR|STRING|VAR] op types from (TMP, <whatever>) to (<whatever>,
UNUSED), since that seems to make more sense (see ADD_ARRAY_ELEMENT) and
simplifies things a bit, and saves some znode copying while compiling.http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diff
Hi Matt, the nowdoc_015.phpt fails in HEAD.
--
Regards,
Felipe Pena.
Hi Felipe, all,
The patches have been updated to fix the failing nowdoc_015.phpt test.
http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diff
The problem was that after removing the ST_START_NOWDOC stuff, the
NOWDOC_CHARS pattern wasn't matching on:
<<<'EOT'
SingleLineOfTextThatMatchesThe_LABEL_Pattern
EOT;
I updated NOWDOC_CHARS, I think simplifying it a bit and making it more like
HEREDOC_CHARS. (And this made zend_language_scanner.c ~5K smaller.) I
previously tried to change it, and it just hung PHP when scanning a nowdoc
(endless loop being generated somewhere, etc.), so I left it alone. Looking
into it more now, with the new pattern, re2c is messing up (endless loop)
with
<ST_NOWDOC>{NOWDOC_CHARS}*{NEWLINE}+{LABEL}";"?[\n\r]
And I had to change it to
<ST_NOWDOC>({NOWDOC_CHARS}+{NEWLINE}+|{NEWLINE}+){LABEL}";"?[\n\r]
Though the original should be equivalent... Well, hopefully it's all
working correctly now!
- Matt
----- Original Message -----
From: "Felipe Pena"
Sent: Monday, May 05, 2008
[...]
Hi Matt, the nowdoc_015.phpt fails in HEAD.--
Regards,
Felipe Pena.
Hi Matt,
about optimization..., do you have any test for comparison? Why, in my
crazy test (http://rafb.net/p/ZzQQQP97.html), the actual code is more
faster than your patch.
real 0m1.156s
vs
real 0m1.368s
(using time command)
I'm just curious, i'm not against your patch.
Thanks.
Em Ter, 2008-05-06 às 09:05 -0500, Matt Wilmas escreveu:
Hi Felipe, all,
The patches have been updated to fix the failing nowdoc_015.phpt test.
http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffThe problem was that after removing the ST_START_NOWDOC stuff, the
NOWDOC_CHARS pattern wasn't matching on:<<<'EOT'
SingleLineOfTextThatMatchesThe_LABEL_Pattern
EOT;I updated NOWDOC_CHARS, I think simplifying it a bit and making it more like
HEREDOC_CHARS. (And this made zend_language_scanner.c ~5K smaller.) I
previously tried to change it, and it just hung PHP when scanning a nowdoc
(endless loop being generated somewhere, etc.), so I left it alone. Looking
into it more now, with the new pattern, re2c is messing up (endless loop)
with<ST_NOWDOC>{NOWDOC_CHARS}*{NEWLINE}+{LABEL}";"?[\n\r]
And I had to change it to
<ST_NOWDOC>({NOWDOC_CHARS}+{NEWLINE}+|{NEWLINE}+){LABEL}";"?[\n\r]
Though the original should be equivalent... Well, hopefully it's all
working correctly now!
- Matt
----- Original Message -----
From: "Felipe Pena"
Sent: Monday, May 05, 2008[...]
Hi Matt, the nowdoc_015.phpt fails in HEAD.--
Regards,
Felipe Pena.
--
Regards,
Felipe Pena.
Hi Felipe,
I don't know. :-/ That's a pretty big discrepancy... (Did you do a
complete recompile before and after the patch?) With your test, timed with
microtime()
on my old Win2k system (release/non-debug), I get 1.56s before
and 1.54s after -- about what I'd expect after doing my own random testing
previously. It's not expected to be a big optimization, just eliminate the
overhead of the old INIT_STRING only allocating an empty string initially.
(Of course, the more ADD_STRING/ADD_VAR ops for a string, the less relative
difference there is.) And saving an opcode makes creating/destroying op
arrays just a bit faster without a cache.
BTW, after recompiling the latest 5.3 code, after some changes in the last
days (mine had been a few days old), I observed a couple different things
after my changes: the binary size was 8K smaller (no change a few days ago),
and there was a surprising improvement on a lot of the bench.php tests (I
think "ackermann" is the only one that uses interpolated strings). Just
compiler randomness I guess, but going from 27.9s to 27.1s is more than the
fluctuation I've seen before. :-)
- Matt
----- Original Message -----
From: "Felipe Pena"
Sent: Tuesday, May 06, 2008
Hi Matt,
about optimization..., do you have any test for comparison? Why, in my
crazy test (http://rafb.net/p/ZzQQQP97.html), the actual code is more
faster than your patch.real 0m1.156s
vs
real 0m1.368s(using time command)
I'm just curious, i'm not against your patch.
Thanks.
Regards,
Felipe Pena.
Hi Rasmus,
Sorry about the patch format. :-/ It's been updated to correct that, as
well as fix a test failure that Felipe mentioned. And as Felipe said, the
invalid opcode is because the patch doesn't include the zend_vm_execute.h
changes, and zend_vm_gen.php needs to be run first. I also should have
mentioned that!
- Matt
----- Original Message -----
From: "Rasmus Lerdorf"
Sent: Monday, May 05, 2008
A bit of a messy patch in that it doesn't have a single top-level
directory, but after hacking it, it applied. For others testing this,
make sure you run ext/tokenizer/tokenizer_data_gen.sh and of course
genfiles with the latest re2c.I'm still not getting something that works though. Everything compiles
fine, but make test is throwing:Fatal error: Invalid opcode 56/16/8. in
/Users/rasmus/php53/run-tests.php on line 547Could we get a cleaner patch to have a look at? One that doesn't hurt
my brain quite as much to get working?-Rasmus
Hi all,
I had been meaning to update this patch for a while (after conflicting
updates to some files), and finally did. :-) See original message for more
details...
There's been one change for strings -- before, I was moving the first ADD_*
op into INIT_STRING, but wasn't too satisfied with that. So I eliminated
the INIT_STRING opcode after realizing that simply setting the string to
NULL
makes it create a new string in add_string_to_string (the first ADD_*
has op1 set to IS_UNUSED; otherwise it's still IS_TMP_VAR, though it's not
"used" after my changes). Is this OK?
The other prior changes seem fine to me: syntax highlighting consistency,
combining the duplicate heredoc/nowdoc stuff and removing the NOWDOC tokens,
allowing static heredocs to be used in static contexts, etc.?
http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diff
Remember to regenerate the scanner/parser with re2c/bison and run
zend_vm_gen.php! I also included possible NEWS updates in the 5.3 patch
this time. :-P
Thanks for any feedback,
Matt
----- Original Message -----
From: "Matt Wilmas"
Sent: Monday, May 05, 2008
Hi all,
This patch... Saves one opcode per interpolated string [snip, old info]
Next, after nowdoc support was added, I noticed it was mostly duplicate
heredoc code in the scanner, so I combined them for the most part,
removing
the NOWDOC tokens, etc. Is that OK? It seems like they may have just
been
there for the parser... but I updated the parser so that static heredocs
also work like nowdocs. Also removed the ST_START_[HEREDOC|NOWDOC] states
in the scanner, by doing their work (to catch immediate ending label) in
the
initial heredoc rule.While removing the NOWDOC references from zend_highlight.c, I made a
little
change that I think improves (err... adds) highlighting of variables in
double-quoted strings, and makes literal text in heredocs/backticks the
correct "highlight_string" color. This makes coloring consistent across
all
strings. :-) Before/after example:
http://realplain.com/php/highlight.htmlSpeaking of backticks, while updating their parser grammar, they now
behave
more precisely likeshell_exec()
, in thatcmd
(constant string) or
$cmd
(one variable) won't use INIT_STRING and create a temporary variable...Oh, almost forgot a couple things -- for HEAD: [snip] moved
HANDLE_NEWLINES() so nowdoc text is copied first, otherwise the line
number
would be off if there's an error in zend_copy_scanner_string. [snip]http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffThanks,
MattP.S. (for Marcus?), the ({LABEL}|([']{LABEL}['])|(["]{LABEL}["])) part of
the heredoc rule crashed re2c (Windows) until I added the extra ( ), which
shouldn't be needed AFAIK and was fine with just 2 alternate matches...
Patch seems ok to me, although I haven't tested it. But let's see what
Dmitry thinks about it.
I'm not sure about the removal of the nowdoc tokens, though. Somebody may me
relying on them for pretty-printing..
Nuno
----- Original Message -----
From: "Matt Wilmas" php_lists@realplain.com
To: internals@lists.php.net
Cc: "Dmitry Stogov" dmitry@zend.com; "Nuno Lopes" nlopess@php.net
Sent: Thursday, July 10, 2008 1:45 PM
Subject: Re: [PHP-DEV] [PATCH] Some string changes/optimizations
Hi all,
I had been meaning to update this patch for a while (after conflicting
updates to some files), and finally did. :-) See original message for
more
details...There's been one change for strings -- before, I was moving the first
ADD_*
op into INIT_STRING, but wasn't too satisfied with that. So I eliminated
the INIT_STRING opcode after realizing that simply setting the string to
NULL
makes it create a new string in add_string_to_string (the first ADD_*
has op1 set to IS_UNUSED; otherwise it's still IS_TMP_VAR, though it's not
"used" after my changes). Is this OK?The other prior changes seem fine to me: syntax highlighting consistency,
combining the duplicate heredoc/nowdoc stuff and removing the NOWDOC
tokens,
allowing static heredocs to be used in static contexts, etc.?http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffRemember to regenerate the scanner/parser with re2c/bison and run
zend_vm_gen.php! I also included possible NEWS updates in the 5.3 patch
this time. :-PThanks for any feedback,
Matt----- Original Message -----
From: "Matt Wilmas"
Sent: Monday, May 05, 2008Hi all,
This patch... Saves one opcode per interpolated string [snip, old info]
Next, after nowdoc support was added, I noticed it was mostly duplicate
heredoc code in the scanner, so I combined them for the most part,
removing
the NOWDOC tokens, etc. Is that OK? It seems like they may have just
been
there for the parser... but I updated the parser so that static heredocs
also work like nowdocs. Also removed the ST_START_[HEREDOC|NOWDOC]
states
in the scanner, by doing their work (to catch immediate ending label) in
the
initial heredoc rule.While removing the NOWDOC references from zend_highlight.c, I made a
little
change that I think improves (err... adds) highlighting of variables in
double-quoted strings, and makes literal text in heredocs/backticks the
correct "highlight_string" color. This makes coloring consistent across
all
strings. :-) Before/after example:
http://realplain.com/php/highlight.htmlSpeaking of backticks, while updating their parser grammar, they now
behave
more precisely likeshell_exec()
, in thatcmd
(constant string) or
$cmd
(one variable) won't use INIT_STRING and create a temporary variable...Oh, almost forgot a couple things -- for HEAD: [snip] moved
HANDLE_NEWLINES() so nowdoc text is copied first, otherwise the line
number
would be off if there's an error in zend_copy_scanner_string. [snip]http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffThanks,
MattP.S. (for Marcus?), the ({LABEL}|([']{LABEL}['])|(["]{LABEL}["])) part of
the heredoc rule crashed re2c (Windows) until I added the extra ( ),
which
shouldn't be needed AFAIK and was fine with just 2 alternate matches...
There hasn't been an official release with the NOWDOCS token though.
Scott
Nuno Lopes wrote:
Patch seems ok to me, although I haven't tested it. But let's see what
Dmitry thinks about it.
I'm not sure about the removal of the nowdoc tokens, though. Somebody
may me relying on them for pretty-printing..Nuno
----- Original Message ----- From: "Matt Wilmas" php_lists@realplain.com
To: internals@lists.php.net
Cc: "Dmitry Stogov" dmitry@zend.com; "Nuno Lopes" nlopess@php.net
Sent: Thursday, July 10, 2008 1:45 PM
Subject: Re: [PHP-DEV] [PATCH] Some string changes/optimizationsHi all,
I had been meaning to update this patch for a while (after conflicting
updates to some files), and finally did. :-) See original message for
more
details...There's been one change for strings -- before, I was moving the first
ADD_*
op into INIT_STRING, but wasn't too satisfied with that. So I eliminated
the INIT_STRING opcode after realizing that simply setting the string to
NULL
makes it create a new string in add_string_to_string (the first
ADD_*
has op1 set to IS_UNUSED; otherwise it's still IS_TMP_VAR, though it's
not
"used" after my changes). Is this OK?The other prior changes seem fine to me: syntax highlighting consistency,
combining the duplicate heredoc/nowdoc stuff and removing the NOWDOC
tokens,
allowing static heredocs to be used in static contexts, etc.?http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffRemember to regenerate the scanner/parser with re2c/bison and run
zend_vm_gen.php! I also included possible NEWS updates in the 5.3 patch
this time. :-PThanks for any feedback,
Matt----- Original Message -----
From: "Matt Wilmas"
Sent: Monday, May 05, 2008Hi all,
This patch... Saves one opcode per interpolated string [snip, old info]
Next, after nowdoc support was added, I noticed it was mostly duplicate
heredoc code in the scanner, so I combined them for the most part,
removing
the NOWDOC tokens, etc. Is that OK? It seems like they may have just
been
there for the parser... but I updated the parser so that static heredocs
also work like nowdocs. Also removed the ST_START_[HEREDOC|NOWDOC]
states
in the scanner, by doing their work (to catch immediate ending label) in
the
initial heredoc rule.While removing the NOWDOC references from zend_highlight.c, I made a
little
change that I think improves (err... adds) highlighting of variables in
double-quoted strings, and makes literal text in heredocs/backticks the
correct "highlight_string" color. This makes coloring consistent across
all
strings. :-) Before/after example:
http://realplain.com/php/highlight.htmlSpeaking of backticks, while updating their parser grammar, they now
behave
more precisely likeshell_exec()
, in thatcmd
(constant string) or
$cmd
(one variable) won't use INIT_STRING and create a temporary variable...Oh, almost forgot a couple things -- for HEAD: [snip] moved
HANDLE_NEWLINES() so nowdoc text is copied first, otherwise the line
number
would be off if there's an error in zend_copy_scanner_string. [snip]http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffThanks,
MattP.S. (for Marcus?), the ({LABEL}|([']{LABEL}['])|(["]{LABEL}["]))
part of
the heredoc rule crashed re2c (Windows) until I added the extra ( ),
which
shouldn't be needed AFAIK and was fine with just 2 alternate matches...
Ah, nowdocs is php >= 5.3 only. it has been such a long time since last
release that I didn't even remember that..
So, forget my previous comment.
Nuno
----- Original Message -----
There hasn't been an official release with the NOWDOCS token though.
Scott
Nuno Lopes wrote:
Patch seems ok to me, although I haven't tested it. But let's see what
Dmitry thinks about it.
I'm not sure about the removal of the nowdoc tokens, though. Somebody may
me relying on them for pretty-printing..Nuno
----- Original Message ----- From: "Matt Wilmas"
php_lists@realplain.com
To: internals@lists.php.net
Cc: "Dmitry Stogov" dmitry@zend.com; "Nuno Lopes" nlopess@php.net
Sent: Thursday, July 10, 2008 1:45 PM
Subject: Re: [PHP-DEV] [PATCH] Some string changes/optimizationsHi all,
I had been meaning to update this patch for a while (after conflicting
updates to some files), and finally did. :-) See original message for
more
details...There's been one change for strings -- before, I was moving the first
ADD_*
op into INIT_STRING, but wasn't too satisfied with that. So I
eliminated
the INIT_STRING opcode after realizing that simply setting the string to
NULL
makes it create a new string in add_string_to_string (the first
ADD_*
has op1 set to IS_UNUSED; otherwise it's still IS_TMP_VAR, though it's
not
"used" after my changes). Is this OK?The other prior changes seem fine to me: syntax highlighting
consistency,
combining the duplicate heredoc/nowdoc stuff and removing the NOWDOC
tokens,
allowing static heredocs to be used in static contexts, etc.?http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffRemember to regenerate the scanner/parser with re2c/bison and run
zend_vm_gen.php! I also included possible NEWS updates in the 5.3 patch
this time. :-PThanks for any feedback,
Matt----- Original Message -----
From: "Matt Wilmas"
Sent: Monday, May 05, 2008Hi all,
This patch... Saves one opcode per interpolated string [snip, old
info]Next, after nowdoc support was added, I noticed it was mostly duplicate
heredoc code in the scanner, so I combined them for the most part,
removing
the NOWDOC tokens, etc. Is that OK? It seems like they may have just
been
there for the parser... but I updated the parser so that static
heredocs
also work like nowdocs. Also removed the ST_START_[HEREDOC|NOWDOC]
states
in the scanner, by doing their work (to catch immediate ending label)
in
the
initial heredoc rule.While removing the NOWDOC references from zend_highlight.c, I made a
little
change that I think improves (err... adds) highlighting of variables in
double-quoted strings, and makes literal text in heredocs/backticks the
correct "highlight_string" color. This makes coloring consistent
across
all
strings. :-) Before/after example:
http://realplain.com/php/highlight.htmlSpeaking of backticks, while updating their parser grammar, they now
behave
more precisely likeshell_exec()
, in thatcmd
(constant string) or
$cmd
(one variable) won't use INIT_STRING and create a temporary variable...Oh, almost forgot a couple things -- for HEAD: [snip] moved
HANDLE_NEWLINES() so nowdoc text is copied first, otherwise the line
number
would be off if there's an error in zend_copy_scanner_string. [snip]http://realplain.com/php/string_optimizations.diff
http://realplain.com/php/string_optimizations_5_3.diffThanks,
MattP.S. (for Marcus?), the ({LABEL}|([']{LABEL}['])|(["]{LABEL}["])) part
of
the heredoc rule crashed re2c (Windows) until I added the extra ( ),
which
shouldn't be needed AFAIK and was fine with just 2 alternate matches...