Graham and I are having a brief chat about the work he's going to do
on the PECL optimizer. People have asked me to do this on-list (they
may have meant the PECL list, but optimizations on PHP seem more
relevant here), so here goes.
Hi Graham,
So the general gist of what I have to say is that dataflow
optimizations on PHP are very difficult, and nearly impossible at the
function-local level. Loop-invariant hoisting and other redundant
expression computation liekwise. If you're planning on working on
them, we can go into more detail.
I guess the biggest thing is that I'm wondering what your plans are
for the PECL optimizer? I've spent about 2 years working on the phc
optimizer, (and a bit longer on relevant things) so I hope that my
advice will be relevant.
I've taken a look through the optimizer a few times over the last
while, (and even stolen some ideas from it). Here are my comments on
the current code:
-
There is lots of code which reimplements parts of the engine, for
example: ini_bool_decode, optimizer_acosh and friends, optimize_md5,
optimize_crc32, optimize_sha1, optimize_class_exists and friends (to a
lesser extent). There are also lots of constant foldings, like casts
and "0 == false" (etc) in optimize_code_block. I don't understand why
there is logic in the code for that, rather than simply executing the
opcodes, or constructing an eval and executing that. -
is_numeric_result: there has been great effort to figure out numeric
results from pure functions, when it seems straightforward to
optimizer the results straight in. Maybe that is being done elsewhere?
If so, there may need to be some care taken to ensure that all
optimizations terminate. -
File system functions are very iffy. I would be surprised if people
have content that reads from files repeatedly, but where the files do
not change, and who are willing to use that flag. -
Most of the identity optimizations arent safe. $x + 0 !== $x,
unfortunately, due to integer coercions (parallels exist for other
types/operators) -
I think I saw an optimizations converting ("45" + $x) into (45+$x) -
that's a great idea, which I will steal. -
How does runkit (and other weird extensions) affect optimizations on
constants, class_exists, etc? -
The optimization "unsafe: optimize out isset()/empty() ops on
GLOBALS['foo'] into $foo " is not safe, as GLOBALS['foo'] may not be
the same variable as $foo ($GLOBALS may be unset, and indeed, there
may be good reasons to do so).
I'm also wondering what the optimizations are on fcall? I couldn't make it out.
That's quite a lot, but its everything I have on the current PECL optimizer :)
Thanks,
Paul
--
Paul Biggar
paul.biggar@gmail.com
Hey,
I always love having input. When you said it was vicious I was expecting more, in fact I agree completely with you on a lot of things :-)
Anyway, I'm not really sure how much detail you want me to go into (or how much detail people on internals really want me to get into). So, I'll keep it brief for now and can expand on anything.
Why not start off with the big stuff, dataflow. I personally believe that working out good data flow for PHP is key to getting good optimizations. But you are right, its a very tricky thing to do and in some cases impossible. Ultimately, I would like to move a lot of the optimizer work more into this direction and use the data flow to build a basic platform for code analysis on which optimizations can be done. For now though, pecl/optimizer is "dumb" about data types :-)
The reimplementations of some engine code is messy and work should probably be done to try to remove this where possible. Also, I might be mistaken but the is_numeric_result stuff is partly left over from Turck MMCache which to my understanding this version of pecl/optimizer was based off of. Some of the stuff I was doing with building a function table (for optimizable and some non optimizable functions) was to try and get rid of rudimentary data type detection like this. Actually folding in values from function calls is happening over in the optimize_fcr.c file.
I 100% agree with you on the file system functions. They were in there when I started working on the optimizer and I havent really paid much attention to them. The latest CVS version of pecl/optimizer has them at least removed from being candidates for optimization (the code to actually optimize is still there).
I'm not sure which optimization you are talking about with the GLOBALS stuff but what your saying makes sense. (Its been awhile since I've looked at the code base myself, I'm just getting back to working on it)
As far as my future plans for pecl/optimizer I should really gather up all my ideas and stuff in the next week or so that you or anyone else who is interested can give feedback. At the moment, I'm working on getting the current version to a stable state. I'm also still trying to gauge demand for pecl/optimizer to maybe help figure out direction for the project. (or if there is really any real interest/or use).
From: Paul Biggar [paul.biggar@gmail.com]
Sent: Thursday, June 04, 2009 4:20 PM
To: Graham Kelly
Cc: PHP Internals; Brian Shire
Subject: Optimizer discussion
Graham and I are having a brief chat about the work he's going to do
on the PECL optimizer. People have asked me to do this on-list (they
may have meant the PECL list, but optimizations on PHP seem more
relevant here), so here goes.
Hi Graham,
So the general gist of what I have to say is that dataflow
optimizations on PHP are very difficult, and nearly impossible at the
function-local level. Loop-invariant hoisting and other redundant
expression computation liekwise. If you're planning on working on
them, we can go into more detail.
I guess the biggest thing is that I'm wondering what your plans are
for the PECL optimizer? I've spent about 2 years working on the phc
optimizer, (and a bit longer on relevant things) so I hope that my
advice will be relevant.
I've taken a look through the optimizer a few times over the last
while, (and even stolen some ideas from it). Here are my comments on
the current code:
-
There is lots of code which reimplements parts of the engine, for
example: ini_bool_decode, optimizer_acosh and friends, optimize_md5,
optimize_crc32, optimize_sha1, optimize_class_exists and friends (to a
lesser extent). There are also lots of constant foldings, like casts
and "0 == false" (etc) in optimize_code_block. I don't understand why
there is logic in the code for that, rather than simply executing the
opcodes, or constructing an eval and executing that. -
is_numeric_result: there has been great effort to figure out numeric
results from pure functions, when it seems straightforward to
optimizer the results straight in. Maybe that is being done elsewhere?
If so, there may need to be some care taken to ensure that all
optimizations terminate. -
File system functions are very iffy. I would be surprised if people
have content that reads from files repeatedly, but where the files do
not change, and who are willing to use that flag. -
Most of the identity optimizations arent safe. $x + 0 !== $x,
unfortunately, due to integer coercions (parallels exist for other
types/operators) -
I think I saw an optimizations converting ("45" + $x) into (45+$x) -
that's a great idea, which I will steal. -
How does runkit (and other weird extensions) affect optimizations on
constants, class_exists, etc? -
The optimization "unsafe: optimize out isset()/empty() ops on
GLOBALS['foo'] into $foo " is not safe, as GLOBALS['foo'] may not be
the same variable as $foo ($GLOBALS may be unset, and indeed, there
may be good reasons to do so).
I'm also wondering what the optimizations are on fcall? I couldn't make it out.
That's quite a lot, but its everything I have on the current PECL optimizer :)
Thanks,
Paul
--
Paul Biggar
paul.biggar@gmail.com
Hi Graham,
Simple things first:
I'm not sure which optimization you are talking about with the GLOBALS stuff but what your saying makes sense. (Its been awhile since I've looked at the code base myself, I'm just getting back to working on it)
I copied that comment straight from the source, but I can't find it
now that I went looking for it. No matter.
Why not start off with the big stuff, dataflow. I personally believe that working out good data flow for PHP is key to getting good optimizations. But you are right, its a very tricky thing to do and in some cases impossible. Ultimately, I would like to move a lot of the optimizer work more into this direction and use the data flow to build a basic platform for code analysis on which optimizations can be done. For now though, pecl/optimizer is "dumb" about data types :-)
And now the hard stuff. To avoid me repeating myself, let me just pimp
my Tech Talk. Have a look at
http://www.youtube.com/watch?v=kKySEUrP7LA from about the 30:45 mark
until just before the 47:00 mark (slides at
https://www.cs.tcd.ie/~pbiggar/paul_biggar_google_18_mar_2009_notes.pdf).
That highlights most of the problems, and vaguely hints at their
solution. We can go into much greater detail on the solutions after.
Thanks,
Paul
--
Paul Biggar
paul.biggar@gmail.com
Hi Graham,
Why not start off with the big stuff, dataflow. I personally believe that working out good data flow for PHP is key to getting good optimizations. But you are right, its a very tricky thing to do and in some cases impossible. Ultimately, I would like to move a lot of the optimizer work more into this direction and use the data flow to build a basic platform for code analysis on which optimizations can be done. For now though, pecl/optimizer is "dumb" about data types :-)
And now the hard stuff. To avoid me repeating myself, let me just pimp
my Tech Talk. Have a look at
http://www.youtube.com/watch?v=kKySEUrP7LA from about the 30:45 mark
until just before the 47:00 mark (slides at
https://www.cs.tcd.ie/~pbiggar/paul_biggar_google_18_mar_2009_notes.pdf).
That highlights most of the problems, and vaguely hints at their
solution. We can go into much greater detail on the solutions after.
Based on the fact that you want to do dataflow, I wonder if its a good
idea to think about co-opting the phc optimizer to perform analysis on
bytecode. To my mind this seems much easier than re-implementing from
scratch. As I mentioned before, this incorporates about 2 years of
work (much of it research of course, so it might not take as long to
replicate). This would mean you could go straight to performing
analyses (though there will no doubt be work required on the optimizer
itself).
Technically speaking, this isn't a big problem. We'd probably need to
change the phc MIR to mirror the bytecode (no harm anyway in terms of
correctness), and have a bytecode-reader and -writer (though this
needn't involve serializing - likely a small interface instead).
Politically, I assume it won't be a problem either, since its in PECL.
Thoughts?
Paul
--
Paul Biggar
paul.biggar@gmail.com
Hi,
I'm happy there's some interest in a PHP optimizer :)
I agree with Paul that PECL's optimizer duplicates way too much stuff from
the Zend engine, which is not practic nor maintainable. (compare for example
with the simple constant folder I implemented some years ago:
http://web.ist.utl.pt/nuno.lopes/zend_constant_folding.txt).
About runkit & friends, I wouldn't worry much about them. If you're running
them problably you also don't care about optimizations. If you want to be
able to optimize something, you need to remove as many freedom degrees as
you can..
Anyway, I don't know how much time you're going to invest in this optimizer,
but I'll certainly be more than happy to discuss your ideas.
Nuno
P.S.: I'll try to meet with Paul in PLDI (in a week) and chat about these
kinds of things. Is anyone else comming that wants to join the discussion?
----- Original Message -----
From: "Graham Kelly" grahamk@facebook.com
To: "Paul Biggar" paul.biggar@gmail.com
Cc: "PHP Internals" internals@lists.php.net; "Brian Shire"
shire@facebook.com
Sent: Friday, June 05, 2009 1:08 AM
Subject: [PHP-DEV] RE: Optimizer discussion
Hey,
I always love having input. When you said it was vicious I was expecting
more, in fact I agree completely with you on a lot of things :-)
Anyway, I'm not really sure how much detail you want me to go into (or how
much detail people on internals really want me to get into). So, I'll keep
it brief for now and can expand on anything.
Why not start off with the big stuff, dataflow. I personally believe that
working out good data flow for PHP is key to getting good optimizations. But
you are right, its a very tricky thing to do and in some cases impossible.
Ultimately, I would like to move a lot of the optimizer work more into this
direction and use the data flow to build a basic platform for code analysis
on which optimizations can be done. For now though, pecl/optimizer is "dumb"
about data types :-)
The reimplementations of some engine code is messy and work should probably
be done to try to remove this where possible. Also, I might be mistaken but
the is_numeric_result stuff is partly left over from Turck MMCache which to
my understanding this version of pecl/optimizer was based off of. Some of
the stuff I was doing with building a function table (for optimizable and
some non optimizable functions) was to try and get rid of rudimentary data
type detection like this. Actually folding in values from function calls is
happening over in the optimize_fcr.c file.
I 100% agree with you on the file system functions. They were in there when
I started working on the optimizer and I havent really paid much attention
to them. The latest CVS version of pecl/optimizer has them at least removed
from being candidates for optimization (the code to actually optimize is
still there).
I'm not sure which optimization you are talking about with the GLOBALS stuff
but what your saying makes sense. (Its been awhile since I've looked at the
code base myself, I'm just getting back to working on it)
As far as my future plans for pecl/optimizer I should really gather up all
my ideas and stuff in the next week or so that you or anyone else who is
interested can give feedback. At the moment, I'm working on getting the
current version to a stable state. I'm also still trying to gauge demand for
pecl/optimizer to maybe help figure out direction for the project. (or if
there is really any real interest/or use).
From: Paul Biggar [paul.biggar@gmail.com]
Sent: Thursday, June 04, 2009 4:20 PM
To: Graham Kelly
Cc: PHP Internals; Brian Shire
Subject: Optimizer discussion
Graham and I are having a brief chat about the work he's going to do
on the PECL optimizer. People have asked me to do this on-list (they
may have meant the PECL list, but optimizations on PHP seem more
relevant here), so here goes.
Hi Graham,
So the general gist of what I have to say is that dataflow
optimizations on PHP are very difficult, and nearly impossible at the
function-local level. Loop-invariant hoisting and other redundant
expression computation liekwise. If you're planning on working on
them, we can go into more detail.
I guess the biggest thing is that I'm wondering what your plans are
for the PECL optimizer? I've spent about 2 years working on the phc
optimizer, (and a bit longer on relevant things) so I hope that my
advice will be relevant.
I've taken a look through the optimizer a few times over the last
while, (and even stolen some ideas from it). Here are my comments on
the current code:
-
There is lots of code which reimplements parts of the engine, for
example: ini_bool_decode, optimizer_acosh and friends, optimize_md5,
optimize_crc32, optimize_sha1, optimize_class_exists and friends (to a
lesser extent). There are also lots of constant foldings, like casts
and "0 == false" (etc) in optimize_code_block. I don't understand why
there is logic in the code for that, rather than simply executing the
opcodes, or constructing an eval and executing that. -
is_numeric_result: there has been great effort to figure out numeric
results from pure functions, when it seems straightforward to
optimizer the results straight in. Maybe that is being done elsewhere?
If so, there may need to be some care taken to ensure that all
optimizations terminate. -
File system functions are very iffy. I would be surprised if people
have content that reads from files repeatedly, but where the files do
not change, and who are willing to use that flag. -
Most of the identity optimizations arent safe. $x + 0 !== $x,
unfortunately, due to integer coercions (parallels exist for other
types/operators) -
I think I saw an optimizations converting ("45" + $x) into (45+$x) -
that's a great idea, which I will steal. -
How does runkit (and other weird extensions) affect optimizations on
constants, class_exists, etc? -
The optimization "unsafe: optimize out isset()/empty() ops on
GLOBALS['foo'] into $foo " is not safe, as GLOBALS['foo'] may not be
the same variable as $foo ($GLOBALS may be unset, and indeed, there
may be good reasons to do so).
I'm also wondering what the optimizations are on fcall? I couldn't make it
out.
That's quite a lot, but its everything I have on the current PECL optimizer
:)
Thanks,
Paul
I'm happy there's some interest in a PHP optimizer :)
I agree with Paul that PECL's optimizer duplicates way too much stuff from
the Zend engine, which is not practic nor maintainable. (compare for example
with the simple constant folder I implemented some years ago:
http://web.ist.utl.pt/nuno.lopes/zend_constant_folding.txt).
This is certainly a much better demonstration of how the optimizer should work.
About runkit & friends, I wouldn't worr
much about them. If you're running
them problably you also don't care about optimizations. If you want to be
able to optimize something, you need to remove as many freedom degrees as
you can..
This is probably true of runkit. However, I would be careful what you
remove for extra freedom. There is very likely PHP code out there that
relies (possibly by accident) on some edge cases.
P.S.: I'll try to meet with Paul in PLDI (in a week) and chat about these
kinds of things. Is anyone else comming that wants to join the discussion?
You should probably mention this is in Dublin.
Some of the IBM Toyko researches who work on (or maybe close to)
Project Zero will be there, and might have interesting ideas. They
have a paper on PHP memory usage.
Paul
--
Paul Biggar
paul.biggar@gmail.com
Paul Biggar schrieb:
They have a paper on PHP memory usage.
Link? I am collecting papers that deal with PHP at
http://delicious.com/sebastian_bergmann/academic_paper+php
--
Sebastian Bergmann Co-Founder and Principal Consultant
http://sebastian-bergmann.de/ http://thePHP.cc/
Graham, Paul,
Paul Biggar paul.biggar@gmail.com wrote on 07/06/2009 02:28:48:
About runkit & friends, I wouldn't worr
much about them. If you're running
them problably you also don't care about optimizations. If you want to
be
able to optimize something, you need to remove as many freedom degrees
as
you can..This is probably true of runkit. However, I would be careful what you
remove for extra freedom. There is very likely PHP code out there that
relies (possibly by accident) on some edge cases.
Firsly its great to see more and more folks experimenting with the
implementation
of PHP. I think this will be good for the wider PHP community as the
design
of PHP and the possible optimisations become better understood.
I think you'll find that there are a lot of "edge cases" as Paul mentions
in PHP that PHP code relies on. I work on IBM's project zero and we have
hit
quite a lot of them. Just one example to illustrate.
We found that the evaluation order within assignments is not at all what
you
might predict and that existing PHP applications actually rely on the
evaluation
order. Consider the following where foo() bar() and baz() have some
coupling.
$a[foo()]=$b[bar()][baz()];
Even though the test coverage of the Zend Engine as measured by line
coverage is
fairly complete we found that there were missing testcases to verify this
behaviour. We've been following a policy of writing new tests for any such
behaviour
that we find so I would suggest that you ensure that you can run and
pass all the PHPT tescases under /tests/lang and under /Zend.
For example the tests for the behaviour I mention above are
tests/lang/engine_assignExecutionOrder_XXX.phpt
Then if you find any more PHP code that does not run the same optimised as
it
does unoptimised it would be great if you could contribute testcases for
them.
Actually for full disclosure I should say that although most of the tests
we have
written are now in cvs, we are still a little behind with contributing all
the
engine tests we have written. Hopefully they'll all be there before you
need them.
Rob Nicholson
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
I'm happy there's some interest in a PHP optimizer :)
I agree with Paul that PECL's optimizer duplicates way too much
stuff from
the Zend engine, which is not practic nor maintainable. (compare
for example
with the simple constant folder I implemented some years ago:
http://web.ist.utl.pt/nuno.lopes/zend_constant_folding.txt).This is certainly a much better demonstration of how the optimizer
should work.
The existing optimizer already does constant folding...
Ilia Alshanetsky