Zend JIT Open Sourced

10 years ago by Joe Watkins — view source

unread

Dmitry,

Thanks for the opportunity to read, that's cool ;)

Cheers
Joe

Hi,

With the recent discussions of JIT/AOT and the good progress we made on
PHP-7, we decided to open up the JIT experiment we've been working on.

https://github.com/zendtech/php-src/tree/zend-jit/ext/opcache/jit

You may just clone or pull zend-jit branch and compile/configure according
to instruction. Don't merge it with master. It may work today but will stop
working tomrrow.

Disclaimers:

It's an experiment, and is not in any way ready for anything.
In the future we may try to implement JIT quite different from this PoC.

I'm not planning to invest into it in the near future. (PHP-7 takes all
my time)
Consider it available for academic purposes only at this point.

Enjoy!

Thanks. Dmitry.

10 years ago by Anthony Ferrara — view source

unread

Dmitry and Zend,

Thank you for sharing your code. I look forward to playing with it.

Perhaps after 7 stabilizes (and ships) you could write up your
thoughts around it? Why decisions were made and the findings that you
have?

Thanks again

Anthony

Hi,

With the recent discussions of JIT/AOT and the good progress we made on
PHP-7, we decided to open up the JIT experiment we've been working on.

https://github.com/zendtech/php-src/tree/zend-jit/ext/opcache/jit

You may just clone or pull zend-jit branch and compile/configure according
to instruction. Don't merge it with master. It may work today but will stop
working tomrrow.

Disclaimers:

It's an experiment, and is not in any way ready for anything.
In the future we may try to implement JIT quite different from this PoC.

I'm not planning to invest into it in the near future. (PHP-7 takes all my
time)
Consider it available for academic purposes only at this point.

Enjoy!

Thanks. Dmitry.

10 years ago by Andi Gutmans — view source

unread

Dmitry and Zend,

Thank you for sharing your code. I look forward to playing with it.

Perhaps after 7 stabilizes (and ships) you could write up your
thoughts around it? Why decisions were made and the findings that you
have?

Yes I think we can definitely do that. It is an interesting experiment and clarified also that JIT was less interesting in the short term as we can all observe by the fabulous results of the current PHP 7 runtime. But absolutely worth discussing post 7 as there surely are interesting opportunities.

Andi

Thanks again

Anthony

Hi,

With the recent discussions of JIT/AOT and the good progress we made on
PHP-7, we decided to open up the JIT experiment we've been working on.

https://github.com/zendtech/php-src/tree/zend-jit/ext/opcache/jit

You may just clone or pull zend-jit branch and compile/configure according
to instruction. Don't merge it with master. It may work today but will stop
working tomrrow.

Disclaimers:

It's an experiment, and is not in any way ready for anything.
In the future we may try to implement JIT quite different from this PoC.

I'm not planning to invest into it in the near future. (PHP-7 takes all my
time)
Consider it available for academic purposes only at this point.

Enjoy!

Thanks. Dmitry.

10 years ago by Jordi Boggiano — view source

unread

Dmitry and Zend,

Thank you for sharing your code. I look forward to playing with it.

Perhaps after 7 stabilizes (and ships) you could write up your
thoughts around it? Why decisions were made and the findings that you
have?

Yes I think we can definitely do that. It is an interesting experiment and clarified also that JIT was less interesting in the short term as we can all observe by the fabulous results of the current PHP 7 runtime. But absolutely worth discussing post 7 as there surely are interesting opportunities.

Do you have a one line summary of why it's useless for real world
applications? Is it just because they don't do enough number crunching
compared to I/O or is it a matter of the JIT not kicking in fast enough
to improve things in a single request cycle?

I am biased but if it improves the bench code so much it still sounds
like a potentially good things for specific code like the composer
dependency solver :)

Cheers

10 years ago by Dmitry Stogov — view source

unread

On Feb 27, 2015, at 7:12 AM, Anthony Ferrara ircmaxell@gmail.com

wrote:

Dmitry and Zend,

Thank you for sharing your code. I look forward to playing with it.

Perhaps after 7 stabilizes (and ships) you could write up your
thoughts around it? Why decisions were made and the findings that you
have?

Yes I think we can definitely do that. It is an interesting experiment
and clarified also that JIT was less interesting in the short term as we
can all observe by the fabulous results of the current PHP 7 runtime. But
absolutely worth discussing post 7 as there surely are interesting
opportunities.

Do you have a one line summary of why it's useless for real world
applications? Is it just because they don't do enough number crunching
compared to I/O or is it a matter of the JIT not kicking in fast enough to
improve things in a single request cycle?

It's not a single request cycle. JIT integrated into opcache, it compiles
php script(s) of first access and stores code in shared memory.
On following requests precompiled code is executed directly from shared
memory.

The first request may be extremely slow (few minutes)
The speed improvement on the following request may be insignificant or
even negative. It very depends on application, but from my experience only
small apps got significant improvements. This may be explained by huge
increase in ICACHE and ITLB misses, but I'm not 100% sure.

I am biased but if it improves the bench code so much it still sounds like
a potentially good things for specific code like the composer dependency
solver :)

Yeah. Probably, if we position this work as a JIT for hotspots only, or
even enable it for some functions manually we may get better results.

Thanks. Dmitry,

Cheers

10 years ago by Rowan Collins — view source

unread

Dmitry Stogov wrote on 27/02/2015 15:56:

On Feb 27, 2015, at 7:12 AM, Anthony Ferraraircmaxell@gmail.com

wrote:

Dmitry and Zend,

Thank you for sharing your code. I look forward to playing with it.

Perhaps after 7 stabilizes (and ships) you could write up your
thoughts around it? Why decisions were made and the findings that you
have?

Yes I think we can definitely do that. It is an interesting experiment
and clarified also that JIT was less interesting in the short term as we
can all observe by the fabulous results of the current PHP 7 runtime. But
absolutely worth discussing post 7 as there surely are interesting
opportunities.

Do you have a one line summary of why it's useless for real world
applications? Is it just because they don't do enough number crunching
compared to I/O or is it a matter of the JIT not kicking in fast enough to
improve things in a single request cycle?

It's not a single request cycle. JIT integrated into opcache, it compiles
php script(s) of first access and stores code in shared memory.
On following requests precompiled code is executed directly from shared
memory.

This reminds me of an idea I had a while ago - with OpCache, and
potentially JIT, relying on shared memory for optimisations,
command-line scripts (e.g. background processing via cron or
supervisord) are getting left behind in terms of performance. So I
wonder if it would be possible to implement a "FastCLI" application
server similar to FastCGI, which could be sent multiple requests
representing POSIX command invocations, and serve them from a threaded
environment. So instead of "php composer.phar install", you'd run
"php-fastcli --port 55555 composer.phar install", which would "attach"
to a running FastCLI server.

Has anyone ever looked at such a thing? It seems like it would be useful
for other languages as well, in exactly the way FastCGI is.

Regards,

Rowan Collins
[IMSoP]

10 years ago by Anthony Ferrara — view source

unread

Dmitry,

It's not a single request cycle. JIT integrated into opcache, it compiles
php script(s) of first access and stores code in shared memory.
On following requests precompiled code is executed directly from shared
memory.

The first request may be extremely slow (few minutes)

That sounds more along the lines of AOT like I did with Recki rather
than a true JIT (which compiles when a function is called).

Judging from the jit_init() function, it appears like you're compiling
the entire codebase ahead of time.

Is that correct?

Anthony

10 years ago by Dmitry Stogov — view source

unread

On Fri, Feb 27, 2015 at 7:30 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

It's not a single request cycle. JIT integrated into opcache, it compiles
php script(s) of first access and stores code in shared memory.
On following requests precompiled code is executed directly from shared
memory.

The first request may be extremely slow (few minutes)

That sounds more along the lines of AOT like I did with Recki rather
than a true JIT (which compiles when a function is called).

Judging from the jit_init() function, it appears like you're compiling
the entire codebase ahead of time.

Is that correct?

Right now it compiles script (php file) at once.
So yes, our JIT uses some kind of AOT approach, but completely
transparently for the rest of PHP.

We also tried few different approaches to collect formation about hot
functions and generate code only for them.
Unfortunately, this didn't change the picture.

Thanks. Dmitry.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Dmitry Stogov [mailto:dmitry@zend.com]
Sent: Friday, February 27, 2015 7:31 PM
To: Anthony Ferrara
Cc: Jordi Boggiano; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

On Fri, Feb 27, 2015 at 7:30 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

It's not a single request cycle. JIT integrated into opcache, it
compiles php script(s) of first access and stores code in shared
memory.
On following requests precompiled code is executed directly from
shared memory.

The first request may be extremely slow (few minutes)

That sounds more along the lines of AOT like I did with Recki rather
than a true JIT (which compiles when a function is called).

Judging from the jit_init() function, it appears like you're compiling
the entire codebase ahead of time.

Is that correct?

Right now it compiles script (php file) at once.
So yes, our JIT uses some kind of AOT approach, but completely
transparently for the rest of PHP.

Just to slightly further clarify - we don't compile the whole codebase at
once, but we keep the existing semantics that every file is independent, may
change independently of other files, and include() may end up load one file
in one flow and another one in another flow. There's isn't any cross-file
optimization.

We also tried few different approaches to collect formation about hot
functions and generate code only for them.
Unfortunately, this didn't change the picture.

(again, the picture being no performance gains in common Web apps).

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

Right now it compiles script (php file) at once.
So yes, our JIT uses some kind of AOT approach, but completely
transparently for the rest of PHP.

Just to slightly further clarify - we don't compile the whole codebase at
once, but we keep the existing semantics that every file is independent, may
change independently of other files, and include() may end up load one file
in one flow and another one in another flow. There's isn't any cross-file
optimization.

We also tried few different approaches to collect formation about hot
functions and generate code only for them.
Unfortunately, this didn't change the picture.

(again, the picture being no performance gains in common Web apps).

Well, I just want to make one clarification here to your point:
there's no performance gains to this AOT approach for common web apps.

It's not really fair to judge a true JIT implementation based on this
because it lacks crucial runtime information that a real JIT compiler
would have (such as input types, values, etc). So it would be left
generating generic native code instead of specific code. I just want
to point out that the results here aren't really applicable to a JIT
approach. And that should be made clear when discussing it.

That's not to say there's anything wrong with this approach, nor that
there isn't a ton we can learn from it. I think it's a fantastic
research effort and plan on digging through it myself. Thank you for
open sourcing it.

Anthony

10 years ago by Dmitry Stogov — view source

unread

On Fri, Feb 27, 2015 at 9:55 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Zeev,

Right now it compiles script (php file) at once.
So yes, our JIT uses some kind of AOT approach, but completely
transparently for the rest of PHP.

Just to slightly further clarify - we don't compile the whole codebase at
once, but we keep the existing semantics that every file is independent,
may
change independently of other files, and include() may end up load one
file
in one flow and another one in another flow. There's isn't any
cross-file
optimization.

We also tried few different approaches to collect formation about hot
functions and generate code only for them.
Unfortunately, this didn't change the picture.

(again, the picture being no performance gains in common Web apps).

Well, I just want to make one clarification here to your point:
there's no performance gains to this AOT approach for common web apps.

It's not really fair to judge a true JIT implementation based on this
because it lacks crucial runtime information that a real JIT compiler
would have (such as input types, values, etc). So it would be left
generating generic native code instead of specific code. I just want
to point out that the results here aren't really applicable to a JIT
approach. And that should be made clear when discussing it.

Nobody talk about JIT in general, only about this PoC.
And yes, some other approaches may provide better gain.
At least I think so.
However, we may just guess, until implement them.

That's not to say there's anything wrong with this approach, nor that
there isn't a ton we can learn from it. I think it's a fantastic
research effort and plan on digging through it myself. Thank you for
open sourcing it.

Thanks for good words :)

This work may be adopted for some specific cases.
25-30 times speedup on Mandelbrot allows usage for numeric calculation
instead of C.

https://gist.github.com/dstogov/12323ad13d3240aee8f1

anyone may repeat the language battle :)

Thanks. Dmitry.

Anthony

10 years ago by Anthony Ferrara — view source

unread

Dmitry,

That's not to say there's anything wrong with this approach, nor that
there isn't a ton we can learn from it. I think it's a fantastic
research effort and plan on digging through it myself. Thank you for
open sourcing it.

Thanks for good words :)

This work may be adopted for some specific cases.
25-30 times speedup on Mandelbrot allows usage for numeric calculation
instead of C.

https://gist.github.com/dstogov/12323ad13d3240aee8f1

anyone may repeat the language battle :)

These tests seem really odd. A 15% speed advantage over GCC -O2? Sure,
it's possible. But I don't think it's likely. It really smells to me
like bias in the testing methodology. (and the lack of an -O3 result
is suspicious as well).

And looking at the code, I can see why. The PHP version is writing to
an internal buffer, while every other version has to write to STDOUT
on every single iteration.

So you are intentionally not benchmarking the output in the PHP
version (you even explicitly call ob_start()) but are benchmarking it
in every other version. So in fact, the PHP code does something
different than the rest of the code.

Sneaky sneaky. Also completely fake. A proper methodology would have
explicitly disabled any buffer so that the tests all tested the same
thing. Or even better, build up an internal buffer in all of the
implementations. That way you can compare the computation and not rely
on STDOUT (terminal) response.

Anthony

10 years ago by Dmitry Stogov — view source

unread

On Fri, Feb 27, 2015 at 10:36 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

That's not to say there's anything wrong with this approach, nor that
there isn't a ton we can learn from it. I think it's a fantastic
research effort and plan on digging through it myself. Thank you for
open sourcing it.

Thanks for good words :)

This work may be adopted for some specific cases.
25-30 times speedup on Mandelbrot allows usage for numeric calculation
instead of C.

https://gist.github.com/dstogov/12323ad13d3240aee8f1

anyone may repeat the language battle :)

These tests seem really odd. A 15% speed advantage over GCC -O2? Sure,
it's possible. But I don't think it's likely. It really smells to me
like bias in the testing methodology. (and the lack of an -O3 result
is suspicious as well).

No. it true, but of course it's not 100% fair.
gcc compiles files for x86 or x86_64 platform in general.
When we compile in run-time we may relay on knowledge of our CPU.
In this case LLVM generates AVX instructions, while gcc SSE2.
Looking into assembler code, you may see that PHP even not inferred type of
all variables and makes few unnecessary check in the loop, but modern CPU
are so smart that the code looking mach worse work with the same speed as
(gcc -O2). Unfortunately it works in the other direction as well.

gcc -O2 -mavx will outperform us :)

And looking at the code, I can see why. The PHP version is writing to
an internal buffer, while every other version has to write to STDOUT
on every single iteration.

So you are intentionally not benchmarking the output in the PHP
version (you even explicitly call ob_start()) but are benchmarking it
in every other version. So in fact, the PHP code does something
different than the rest of the code.

Sneaky sneaky. Also completely fake. A proper methodology would have
explicitly disabled any buffer so that the tests all tested the same
thing. Or even better, build up an internal buffer in all of the
implementations. That way you can compare the computation and not rely
on STDOUT (terminal) response.

this also may make some difference, but I think PHP stream layer is not as
good as C.
just profile it with Linux perf, oprofile, callgrind, ...

perf record <command>

perf report -n

Thanks. Dmitry

Anthony

10 years ago by Anthony Ferrara — view source

unread

Dmitry,

Sneaky sneaky. Also completely fake.

It's been brought to my attention that some people have taken what I
said completely out of context and insinuated it as a direct insult to
you. I assure you that was not the intent (I called the benchmark
sneaky and fake, which it is).

So if you interpreted it as an insult, I apologize sincerely. I did
not intend to insult you personally at all with that remark. It was
only meant as a comment about the code that was posted.

Anthony

10 years ago by Dmitry Stogov — view source

unread

On Fri, Feb 27, 2015 at 11:21 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

Sneaky sneaky. Also completely fake.

It's been brought to my attention that some people have taken what I
said completely out of context and insinuated it as a direct insult to
you. I assure you that was not the intent (I called the benchmark
sneaky and fake, which it is).

So if you interpreted it as an insult, I apologize sincerely. I did
not intend to insult you personally at all with that remark. It was
only meant as a comment about the code that was posted.

Not a problem. I wouldn't even notice this because my bad English knowledge
allows me to filter this :)

Dmitry.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Friday, February 27, 2015 10:21 PM
To: Dmitry Stogov
Cc: Zeev Suraski; Jordi Boggiano; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Dmitry,

Sneaky sneaky. Also completely fake.

It's been brought to my attention that some people have taken what I said
completely out of context and insinuated it as a direct insult to you.

Anthony,

I'm not sure how calling what Dmitry did sneaky (adj. furtive, stealthy;
deceptive, deceitful) and fake (ajd. counterfeit, false) is not an insult.
You could have picked wrong, problematic, inadequate, poor - or a dozen
other adjectives that don't literally claim that Dmitry did it intentionally
to give an unfair advantage to the PHP implementation (which, just in case
anybody's wondering, you also wrote literally, using the word
'intentionally' in the previous sentence.

You're not clairvoyant and you have no idea whether Dmitry did it
intentionally or not, and the adjectives you used mean negative intent.

If you apologize, apologize for real and not with disclaimers that it was
taken out of context. It wasn't.

And I have no idea why I had to bring it to your attention. If somehow you
slipped, you should have fixed it yourself immediately.

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Friday, February 27, 2015 10:21 PM
To: Dmitry Stogov
Cc: Zeev Suraski; Jordi Boggiano; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Dmitry,

Sneaky sneaky. Also completely fake.

It's been brought to my attention that some people have taken what I said
completely out of context and insinuated it as a direct insult to you.

Anthony,

I'm not sure how calling what Dmitry did sneaky (adj. furtive, stealthy;
deceptive, deceitful) and fake (ajd. counterfeit, false) is not an insult.
You could have picked wrong, problematic, inadequate, poor - or a dozen
other adjectives that don't literally claim that Dmitry did it intentionally
to give an unfair advantage to the PHP implementation (which, just in case
anybody's wondering, you also wrote literally, using the word
'intentionally' in the previous sentence.

You're not clairvoyant and you have no idea whether Dmitry did it
intentionally or not, and the adjectives you used mean negative intent.

If you apologize, apologize for real and not with disclaimers that it was
taken out of context. It wasn't.

And I have no idea why I had to bring it to your attention. If somehow you
slipped, you should have fixed it yourself immediately.

I had intended it as a remark about the code. Not about him
personally, not about you, not about Zend. The presence of the
explicit buffering code indicates that it wasn't an accident. Whether
it was intentional for extra speed or not, it's still an intentionally
different codepath between the rest of the implementations. One that
in practice can have non-trivial differences over outputting directly.

If you took that as an insult against him, you or Zend, then I'm
sorry. I still believe the benchmark is very subtly broken, and hence
the results are invalid. I apologized to any insult that may have been
misdirected at the person.

Please, can we talk about code separately from the person? A good
person can produce bad code. That happens. I know, I produce a lot of
it. I don't take insult when people call my code bad. And I hope we
can call code bad. Because if we can't, we can never grow or move on
as people or as a project.

So I do apologize to the person. I don't to the code.

Anthony

PS: Dmitry accepted my apology. Can you please?

10 years ago by Dmitry Stogov — view source

unread

On Fri, Feb 27, 2015 at 11:53 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Zeev,

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Friday, February 27, 2015 10:21 PM
To: Dmitry Stogov
Cc: Zeev Suraski; Jordi Boggiano; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Dmitry,

Sneaky sneaky. Also completely fake.

It's been brought to my attention that some people have taken what I
said
completely out of context and insinuated it as a direct insult to you.

Anthony,

I'm not sure how calling what Dmitry did sneaky (adj. furtive, stealthy;
deceptive, deceitful) and fake (ajd. counterfeit, false) is not an
insult.
You could have picked wrong, problematic, inadequate, poor - or a dozen
other adjectives that don't literally claim that Dmitry did it
intentionally
to give an unfair advantage to the PHP implementation (which, just in
case
anybody's wondering, you also wrote literally, using the word
'intentionally' in the previous sentence.

You're not clairvoyant and you have no idea whether Dmitry did it
intentionally or not, and the adjectives you used mean negative intent.

If you apologize, apologize for real and not with disclaimers that it was
taken out of context. It wasn't.

And I have no idea why I had to bring it to your attention. If somehow
you
slipped, you should have fixed it yourself immediately.

I had intended it as a remark about the code. Not about him
personally, not about you, not about Zend. The presence of the
explicit buffering code indicates that it wasn't an accident. Whether
it was intentional for extra speed or not, it's still an intentionally
different codepath between the rest of the implementations. One that
in practice can have non-trivial differences over outputting directly.

If you took that as an insult against him, you or Zend, then I'm
sorry. I still believe the benchmark is very subtly broken, and hence
the results are invalid. I apologized to any insult that may have been
misdirected at the person.

Please, can we talk about code separately from the person? A good
person can produce bad code. That happens. I know, I produce a lot of
it. I don't take insult when people call my code bad. And I hope we
can call code bad. Because if we can't, we can never grow or move on
as people or as a project.

So I do apologize to the person. I don't to the code.

Anthony

PS: Dmitry accepted my apology. Can you please?

yes. please. we all have good expectancies anyway. (even if we can't agree
on some topics)

Dmitry.

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Friday, February 27, 2015 10:54 PM
To: Zeev Suraski
Cc: Dmitry Stogov; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Zeev,

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Friday, February 27, 2015 10:21 PM
To: Dmitry Stogov
Cc: Zeev Suraski; Jordi Boggiano; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Dmitry,

Sneaky sneaky. Also completely fake.

It's been brought to my attention that some people have taken what I
said completely out of context and insinuated it as a direct insult to
you.

Anthony,

I'm not sure how calling what Dmitry did sneaky (adj. furtive,
stealthy; deceptive, deceitful) and fake (ajd. counterfeit, false) is
not an
insult.
You could have picked wrong, problematic, inadequate, poor - or a
dozen other adjectives that don't literally claim that Dmitry did it
intentionally to give an unfair advantage to the PHP implementation
(which, just in case anybody's wondering, you also wrote literally,
using the word 'intentionally' in the previous sentence.

You're not clairvoyant and you have no idea whether Dmitry did it
intentionally or not, and the adjectives you used mean negative intent.

If you apologize, apologize for real and not with disclaimers that it
was taken out of context. It wasn't.

And I have no idea why I had to bring it to your attention. If
somehow you slipped, you should have fixed it yourself immediately.

I had intended it as a remark about the code. Not about him personally

I read the email again and again, and I don't see any possibility to read
what you wrote in any other way other than blaming the code author (Dmitry)
for being sneaky and intentionally faking the test. Sneaky code would
either be code that performs something very different from what one would
expect (that's not our case here), or code that was written by the author
with the intent of doing something sneaky.

, not about you, not about Zend.

I'm not sure why it even needs to be brought up, this has nothing to do with
anybody but Dmitry. Which made it worse in my book, as he's is one of the
most honest people I've ever bumped into.

PS: Dmitry accepted my apology. Can you please?

Yes.

Zeev

10 years ago by Zeev Suraski — view source

unread

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of output, and
stdout is in fact buffered - output shouldn't move the needle) and asked
Dmitry to rerun the C test on the same system, but this time with the output
code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and therefore it
can actually be faster than a generic native executable in some (I would
guess not all that common) cases.

Zeev

10 years ago by Julien Pauli — view source

unread

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of output,
and
stdout is in fact buffered - output shouldn't move the needle) and asked
Dmitry to rerun the C test on the same system, but this time with the
output
code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and therefore it
can actually be faster than a generic native executable in some (I would
guess not all that common) cases.

That's why one may run GCC -march=native when mastering the hardware.
I guess here, no JIT can outperform that.

Julien.P

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: julienpauli@gmail.com [mailto:julienpauli@gmail.com] On Behalf Of
Julien Pauli
Sent: Tuesday, March 03, 2015 5:11 PM
To: Zeev Suraski
Cc: Anthony Ferrara; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of
output, and
stdout is in fact buffered - output shouldn't move the needle) and
asked
Dmitry to rerun the C test on the same system, but this time with the
output
code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and
therefore it
can actually be faster than a generic native executable in some (I
would
guess not all that common) cases.

That's why one may run GCC -march=native when mastering the hardware.
I guess here, no JIT can outperform that.

Of course, but the vast majority of people don't build their own binaries.
internals@ audience may not be the best sample to gauge that percentage,
though :)

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of output, and
stdout is in fact buffered - output shouldn't move the needle) and asked
Dmitry to rerun the C test on the same system, but this time with the output
code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and therefore it
can actually be faster than a generic native executable in some (I would
guess not all that common) cases.

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not). along side
PHP 5.5, PHP 7 and GCC -O0 through -O3.

I also turned on the ob_start and off (commenting out the ob_start and
ob_end_flush lines):

https://docs.google.com/spreadsheets/d/1b4yFh0i62haDoQBRf8pOoi63OLrxRbecHSj9sQpD5Nk/edit?usp=sharing

With ob_start, the "JIT" was fastest. Without it, it was more than 2x
slower (slightly faster than -O0).

Raw results (average):

GCC -O0: 0.0258
GCC -O1: 0.0160
GCC -O2: 0.0144
GCC -O3: 0.0140
"JIT" /w ob_start: 0.011
"JIT" /wo ob_start: 0.0238
5.5 /w: 1.273
5.5 /wo: 1.301
7 /w: 1.492
7 /wo: 1.545

I used identical code to what Dmitry posted earlier, with the one
exception that ob_start was commented out for the "/wo" runs.

Now, there's something really interesting in those results. The
numbers given back from the "JIT" are far more stable than anything
else (more than an order of magnitude more stable /wo, and several
orders /w ob_start). Something smells off about it. I'm not so sure
what off hand, but I'm going to dig further.

Now, to the point that "gcc uses output buffering". Yes, it does.
However, PHP (including the "JIT") is compiled with GCC. So it will
use a similar output buffer unless you disable the buffer. The only
place in 7 that we do that is sapi/phpdbg/phpdbg.c:881. So either way,
you're going to be using the same output buffer on the STDOUT stream.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, March 03, 2015 5:44 PM
To: Zeev Suraski
Cc: PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Zeev,

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of
output, and stdout is in fact buffered - output shouldn't move the
needle) and asked Dmitry to rerun the C test on the same system, but
this time with the output code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and
therefore it can actually be faster than a generic native executable
in some (I would guess not all that common) cases.

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not). along side PHP
5.5, PHP
7 and GCC -O0 through -O3.

I also turned on the ob_start and off (commenting out the ob_start and
ob_end_flush lines):

https://docs.google.com/spreadsheets/d/1b4yFh0i62haDoQBRf8pOoi63OLr
xRbecHSj9sQpD5Nk/edit?usp=sharing

With ob_start, the "JIT" was fastest. Without it, it was more than 2x
slower
(slightly faster than -O0).

Raw results (average):

GCC -O0: 0.0258
GCC -O1: 0.0160
GCC -O2: 0.0144
GCC -O3: 0.0140
"JIT" /w ob_start: 0.011
"JIT" /wo ob_start: 0.0238
5.5 /w: 1.273
5.5 /wo: 1.301
7 /w: 1.492
7 /wo: 1.545

I used identical code to what Dmitry posted earlier, with the one
exception
that ob_start was commented out for the "/wo" runs.

Anthony,

What you demonstrate here is that direct output slows PHP down (at least
php-cli), but not that it's the reason that PHP runs faster.
As we don't really care about the output layers when benchmarking
Mandelbrot - but rather at how fast the algorithm is executed, eliminating
output in both tests is the simplest and most accurate to benchmark the raw
performance of the execution engine. And the (very experimental) JIT engine
wins here.

Now, there's something really interesting in those results. The numbers
given
back from the "JIT" are far more stable than anything else (more than an
order of magnitude more stable /wo, and several orders /w ob_start).
Something smells off about it. I'm not so sure what off hand, but I'm
going to
dig further.

Now, to the point that "gcc uses output buffering".

Not gcc, glibc's stdout.

Yes, it does.
However, PHP (including the "JIT") is compiled with GCC. So it will use a
similar output buffer unless you disable the buffer. The only place in 7
that
we do that is sapi/phpdbg/phpdbg.c:881. So either way, you're going to be
using the same output buffer on the STDOUT stream.

Actually no, it doesn't, not if you use CLI. The CLI SAPI uses the write(1,
...) syscall, which is unbuffered. You would have been correct if it was
using fwrite(..., stdout) - but it doesn't. See
stackoverflow.com/questions/1360021/why-fwrite-libc-function-is-faster-than-write-syscall

So in reality, what Dmitry did by adding the ob_start() call is actually
make the PHP version (more or less) equivalent to the C version, as opposed
to giving it an unfair advantage.

If you really want to test the raw performance difference, get rid of the
output code altogether in both the PHP and C versions. I haven't tried
that, but as I do believe our output buffering code is a lot more
complicated than that of glibc's streams - my guess is that the gap between
the two implementations would actually grow bigger, and in PHP's favor. We
already know that the PHP version with the output (using the output
buffering layer) runs as fast as the C version with no output at all.

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, March 03, 2015 5:44 PM
To: Zeev Suraski
Cc: PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Zeev,

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of
output, and stdout is in fact buffered - output shouldn't move the
needle) and asked Dmitry to rerun the C test on the same system, but
this time with the output code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and
therefore it can actually be faster than a generic native executable
in some (I would guess not all that common) cases.

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not). along side PHP
5.5, PHP
7 and GCC -O0 through -O3.

I also turned on the ob_start and off (commenting out the ob_start and
ob_end_flush lines):

https://docs.google.com/spreadsheets/d/1b4yFh0i62haDoQBRf8pOoi63OLr
xRbecHSj9sQpD5Nk/edit?usp=sharing

With ob_start, the "JIT" was fastest. Without it, it was more than 2x
slower
(slightly faster than -O0).

Raw results (average):

GCC -O0: 0.0258
GCC -O1: 0.0160
GCC -O2: 0.0144
GCC -O3: 0.0140
"JIT" /w ob_start: 0.011
"JIT" /wo ob_start: 0.0238
5.5 /w: 1.273
5.5 /wo: 1.301
7 /w: 1.492
7 /wo: 1.545

I used identical code to what Dmitry posted earlier, with the one
exception
that ob_start was commented out for the "/wo" runs.

Anthony,

What you demonstrate here is that direct output slows PHP down (at least
php-cli), but not that it's the reason that PHP runs faster.
As we don't really care about the output layers when benchmarking
Mandelbrot - but rather at how fast the algorithm is executed, eliminating
output in both tests is the simplest and most accurate to benchmark the raw
performance of the execution engine. And the (very experimental) JIT engine
wins here.

It wins on uneven ground. Which was the very initial point that I
made. One writes to interactive output within the loop (even if
buffered, it's flushed 69 times), and one doesn't. Apples-to-oranges.
And the test that I just made proves that.

If you really want to test the raw performance difference, get rid of the
output code altogether in both the PHP and C versions. I haven't tried
that, but as I do believe our output buffering code is a lot more
complicated than that of glibc's streams - my guess is that the gap between
the two implementations would actually grow bigger, and in PHP's favor. We
already know that the PHP version with the output (using the output
buffering layer) runs as fast as the C version with no output at all.

GCC at -O1 and higher will run in 0 seconds. Because the loop will be
dead-code and hence eliminated. It is idempotent (no side effects) and
the result of the program doesn't depend on it, so it's eliminated.

You'd need to make the computation meaningful with a result that you
can return for it to be live code.

Anthony

10 years ago by Zeev Suraski — view source

unread

It wins on uneven ground.

Without the output buffering code it's a LOT more uneven, as the PHP version
is flushing every byte - approx. 6,000 such flushes.

Which was the very initial point that I
made. One writes to interactive output within the loop (even if
buffered, it's flushed 69 times), and one doesn't. Apples-to-oranges.
And the test that I just made proves that.

I don't see how it proves that. Flushing a few dozen times is negligible on
6K of output. You can add ob_end_flush(); ob_start(); on every line to make
it 100% equivalent (or hack the SAPI write callback to use fwrite() instead
of write(), which would be even more comparable); But I still maintain that
saying it's Apples and Oranges is not a very realistic view of things.

Both versions are buffered. Without the output buffering code - the PHP
version would be completely unbuffered when run in CLI mode, which is why
it was added in the first place. The C version is also buffered - a touch
less than the PHP version - but using a much simpler buffering system (glibc
streams, vs. our much more complex multi-layer output buffering system) and
therefore probably at least slightly faster. Of course, adding 'full'
output buffering to the C version (that would also buffer newlines) is way
beyond the scope of such a simple test. The two versions are very much
comparable, and are the reasonable implementations one would use without
giving one platform or the other an unfair advantage or disadvantage (as
removing the ob_() calls does, given that CLI uses unbuffered write()'s).

It's perhaps the difference between a 200gr apple and a 205gr apple, but
certainly not apples and oranges, and it's absolutely not the reason for why
the PHP JIT version was faster.

If you really want to test the raw performance difference, get rid of
the
output code altogether in both the PHP and C versions. I haven't tried
that, but as I do believe our output buffering code is a lot more
complicated than that of glibc's streams - my guess is that the gap
between
the two implementations would actually grow bigger, and in PHP's favor.
We
already know that the PHP version with the output (using the output
buffering layer) runs as fast as the C version with no output at all.

GCC at -O1 and higher will run in 0 seconds. Because the loop will be
dead-code and hence eliminated. It is idempotent (no side effects) and
the result of the program doesn't depend on it, so it's eliminated.

You'd need to make the computation meaningful with a result that you
can return for it to be live code.

So just get rid of the newline prints, but keep everything else. That means
zero flushing, full buffering for the C version. How fast does that run?

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

It wins on uneven ground.

Without the output buffering code it's a LOT more uneven, as the PHP version
is flushing every byte - approx. 6,000 such flushes.

Then the benchmark should suffer because of that. You're testing the
full stack of all of the other languages, yet disabling a significant
part of it for PHP and saying "it's still fair because that part of
the stack is expensive".

Anthony

10 years ago by Zeev Suraski — view source

unread

Zeev,

It wins on uneven ground.

Without the output buffering code it's a LOT more uneven, as the PHP version
is flushing every byte - approx. 6,000 such flushes.

Then the benchmark should suffer because of that. You're testing the
full stack of all of the other languages, yet disabling a significant
part of it for PHP and saying "it's still fair because that part of
the stack is expensive".

Yes, the part that has nothing at all to do with the benchmark at hand, and that changes on a per-SAPI basis and creates nothing but unrelated noise for this benchmark. As I repeatedly said, and as should be obvious to anybody looking at cpu-bound benchmarks like Mandelbrot, the idea is to focus on the algorithm and factor out everything else, which is exactly what the code for both PHP and C (as well as the other langs) does. If you haven't yet - run the new line-free C version that has full buffering. Spoiler: it has identical performance.

Zeev

10 years ago by Dmitry Stogov — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, March 03, 2015 5:44 PM
To: Zeev Suraski
Cc: PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Zeev,

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of
output, and stdout is in fact buffered - output shouldn't move the
needle) and asked Dmitry to rerun the C test on the same system, but
this time with the output code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and
therefore it can actually be faster than a generic native executable
in some (I would guess not all that common) cases.

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not). along side PHP
5.5, PHP
7 and GCC -O0 through -O3.

I also turned on the ob_start and off (commenting out the ob_start and
ob_end_flush lines):

https://docs.google.com/spreadsheets/d/1b4yFh0i62haDoQBRf8pOoi63OLr
xRbecHSj9sQpD5Nk/edit?usp=sharing

With ob_start, the "JIT" was fastest. Without it, it was more than 2x
slower
(slightly faster than -O0).

Raw results (average):

GCC -O0: 0.0258
GCC -O1: 0.0160
GCC -O2: 0.0144
GCC -O3: 0.0140
"JIT" /w ob_start: 0.011
"JIT" /wo ob_start: 0.0238
5.5 /w: 1.273
5.5 /wo: 1.301
7 /w: 1.492
7 /wo: 1.545

I used identical code to what Dmitry posted earlier, with the one
exception
that ob_start was commented out for the "/wo" runs.

Anthony,

What you demonstrate here is that direct output slows PHP down (at least
php-cli), but not that it's the reason that PHP runs faster.
As we don't really care about the output layers when benchmarking
Mandelbrot - but rather at how fast the algorithm is executed, eliminating
output in both tests is the simplest and most accurate to benchmark the raw
performance of the execution engine. And the (very experimental) JIT
engine
wins here.

Now, there's something really interesting in those results. The numbers
given
back from the "JIT" are far more stable than anything else (more than an
order of magnitude more stable /wo, and several orders /w ob_start).
Something smells off about it. I'm not so sure what off hand, but I'm
going to
dig further.

Now, to the point that "gcc uses output buffering".

Not gcc, glibc's stdout.

CLI uses unbuffered write() syscall.

Thanks. Dmitry.

Yes, it does.
However, PHP (including the "JIT") is compiled with GCC. So it will use a
similar output buffer unless you disable the buffer. The only place in 7
that
we do that is sapi/phpdbg/phpdbg.c:881. So either way, you're going to be
using the same output buffer on the STDOUT stream.

Actually no, it doesn't, not if you use CLI. The CLI SAPI uses the
write(1,
...) syscall, which is unbuffered. You would have been correct if it was
using fwrite(..., stdout) - but it doesn't. See

stackoverflow.com/questions/1360021/why-fwrite-libc-function-is-faster-than-write-syscall

So in reality, what Dmitry did by adding the ob_start() call is actually
make the PHP version (more or less) equivalent to the C version, as opposed
to giving it an unfair advantage.

If you really want to test the raw performance difference, get rid of the
output code altogether in both the PHP and C versions. I haven't tried
that, but as I do believe our output buffering code is a lot more
complicated than that of glibc's streams - my guess is that the gap between
the two implementations would actually grow bigger, and in PHP's favor. We
already know that the PHP version with the output (using the output
buffering layer) runs as fast as the C version with no output at all.

Zeev

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, March 03, 2015 5:44 PM
To: Zeev Suraski
Cc: PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

Now, to the point that "gcc uses output buffering".

Not gcc, glibc's stdout.

CLI uses unbuffered write() syscall.

Oh, I know that. But that part was about the C version and how it gains output buffering implicitly - it's thanks to using glibc's stdout - not thanks to using gcc. Building PHP CLI with gcc therefore doesn't magically give it buffering - because it uses the write() syscall.

Zeev

10 years ago by Dmitry Stogov — view source

unread

Zeev,

So I do apologize to the person. I don't to the code.

I wanted to verify whether my gut was correct (minimal amount of output,
and
stdout is in fact buffered - output shouldn't move the needle) and asked
Dmitry to rerun the C test on the same system, but this time with the
output
code completely commented out:
real 0m0.011s (+- 0.01)
user 0m0.011s (+- 0.01)
sys 0m0.001s

Apologies to the code might be in order :)

The source of the JIT engine's edge is, as Dmitry and Andi said, the
CPU-specific optimizations that gcc -O2 doesn't generate, and therefore
it
can actually be faster than a generic native executable in some (I would
guess not all that common) cases.

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not).

This is JIT!

along side
PHP 5.5, PHP 7 and GCC -O0 through -O3.

I also turned on the ob_start and off (commenting out the ob_start and
ob_end_flush lines):
ope

https://docs.google.com/spreadsheets/d/1b4yFh0i62haDoQBRf8pOoi63OLrxRbecHSj9sQpD5Nk/edit?usp=sharing

With ob_start, the "JIT" was fastest. Without it, it was more than 2x
slower (slightly faster than -O0).

C FILE API is buffering as well.
I hope you knew.
Use write() instead of printf() in C to disable buffering as well.

Raw results (average):

GCC -O0: 0.0258
GCC -O1: 0.0160
GCC -O2: 0.0144
GCC -O3: 0.0140
"JIT" /w ob_start: 0.011
"JIT" /wo ob_start: 0.0238
5.5 /w: 1.273
5.5 /wo: 1.301
7 /w: 1.492
7 /wo: 1.545

I used identical code to what Dmitry posted earlier, with the one
exception that ob_start was commented out for the "/wo" runs.

Now, there's something really interesting in those results. The

numbers given back from the "JIT" are far more stable than anything
else (more than an order of magnitude more stable /wo, and several
orders /w ob_start). Something smells off about it. I'm not so sure
what off hand, but I'm going to dig further.

php -d opcache.jit_debug=0x100 bench.php

Now, to the point that "gcc uses output buffering". Yes, it does.
However, PHP (including the "JIT") is compiled with GCC. So it will
use a similar output buffer unless you disable the buffer. The only
place in 7 that we do that is sapi/phpdbg/phpdbg.c:881. So either way,
you're going to be using the same output buffer on the STDOUT stream.

Please check php/sapi_cli.c and the setting of PHP_WRITE_STDOUT before
claiming others.

Thanks. Dmitry.

Anthony

10 years ago by Anthony Ferrara — view source

unread

Dmitry,

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not).

This is JIT!

My apologies. I interpreted your reply to an earlier email that you
were doing all of the code generation at compile time, not at runtime.
I should have dug into the code a bit more earlier, but what I looked
at briefly before supported that interpretation.

However after digging through zend_jit_llvm.cpp a bit more I can see
what you're doing now. You're basically AOT compiling from PHP
directly to LLVM bytecode (a file at a time), then using LLVM's VM and
jit compile to compile to native at runtime. Is that the correct
interpretation?

Thanks,

Anthony

10 years ago by Dmitry Stogov — view source

unread

On Tue, Mar 3, 2015 at 10:55 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not).

This is JIT!

My apologies. I interpreted your reply to an earlier email that you
were doing all of the code generation at compile time, not at runtime.
I should have dug into the code a bit more earlier, but what I looked
at briefly before supported that interpretation.

However after digging through zend_jit_llvm.cpp a bit more I can see
what you're doing now. You're basically AOT compiling from PHP
directly to LLVM bytecode (a file at a time), then using LLVM's VM and
jit compile to compile to native at runtime. Is that the correct
interpretation?

More or less right, except that term AOT is not correct.
We compile PHP file when it's requested (just in time).
We compile one PHP file at once, similar to what AOT compiler would do, but
we compile directly to memory and then execute it,

Thanks. Dmitry.

Thanks,

Anthony

10 years ago by Xinchen Hui — view source

unread

On Tue, Mar 3, 2015 at 10:55 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

So, let's put that to the test, shall we. I compiled and ran the "JIT"
compiler (can we please stop calling it that, it's not).

This is JIT!

My apologies. I interpreted your reply to an earlier email that you
were doing all of the code generation at compile time, not at runtime.
I should have dug into the code a bit more earlier, but what I looked
at briefly before supported that interpretation.

However after digging through zend_jit_llvm.cpp a bit more I can see
what you're doing now. You're basically AOT compiling from PHP
directly to LLVM bytecode (a file at a time), then using LLVM's VM and
jit compile to compile to native at runtime. Is that the correct
interpretation?

More or less right, except that term AOT is not correct.
We compile PHP file when it's requested (just in time).
We compile one PHP file at once, similar to what AOT compiler would do, but
we compile directly to memory and then execute it,
With the type inference result we get in runtime.

thanks

Thanks. Dmitry.

Thanks,

Anthony

--
Xinchen Hui
@Laruence
http://www.laruence.com/

10 years ago by Joe Watkins — view source

unread

Just-In-Time-At-Once JITAO

It is a bit different to the thing we think of as JIT ... new names are
good if old names don't fit ...

Cheers
Joe

On Tue, Mar 3, 2015 at 10:55 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

So, let's put that to the test, shall we. I compiled and ran the
"JIT"
compiler (can we please stop calling it that, it's not).

This is JIT!

My apologies. I interpreted your reply to an earlier email that you
were doing all of the code generation at compile time, not at runtime.
I should have dug into the code a bit more earlier, but what I looked
at briefly before supported that interpretation.

However after digging through zend_jit_llvm.cpp a bit more I can see
what you're doing now. You're basically AOT compiling from PHP
directly to LLVM bytecode (a file at a time), then using LLVM's VM and
jit compile to compile to native at runtime. Is that the correct
interpretation?

More or less right, except that term AOT is not correct.
We compile PHP file when it's requested (just in time).
We compile one PHP file at once, similar to what AOT compiler would do,
but
we compile directly to memory and then execute it,
With the type inference result we get in runtime.

thanks

Thanks. Dmitry.

Thanks,

Anthony

--
Xinchen Hui
@Laruence
http://www.laruence.com/

10 years ago by Dmitry Stogov — view source

unread

Just-In-Time-At-Once JITAO

file at once, function at once, trace at once, basic block at once - just
different JIT approaches.
The bigger part we analyze at once the more information we may get for
optimization, but the slower compilation.

It is a bit different to the thing we think of as JIT ... new names are
good if old names don't fit ...

:)

Thanks. Dmitry.

Cheers
Joe

On Tue, Mar 3, 2015 at 10:55 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

So, let's put that to the test, shall we. I compiled and ran the
"JIT"
compiler (can we please stop calling it that, it's not).

This is JIT!

My apologies. I interpreted your reply to an earlier email that you
were doing all of the code generation at compile time, not at runtime.
I should have dug into the code a bit more earlier, but what I looked
at briefly before supported that interpretation.

However after digging through zend_jit_llvm.cpp a bit more I can see
what you're doing now. You're basically AOT compiling from PHP
directly to LLVM bytecode (a file at a time), then using LLVM's VM and
jit compile to compile to native at runtime. Is that the correct
interpretation?

More or less right, except that term AOT is not correct.
We compile PHP file when it's requested (just in time).
We compile one PHP file at once, similar to what AOT compiler would do,
but
we compile directly to memory and then execute it,
With the type inference result we get in runtime.

thanks

Thanks. Dmitry.

Thanks,

Anthony

--
Xinchen Hui
@Laruence
http://www.laruence.com/

10 years ago by Dmitry Stogov — view source

unread

On Fri, Feb 27, 2015 at 10:36 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

That's not to say there's anything wrong with this approach, nor that
there isn't a ton we can learn from it. I think it's a fantastic
research effort and plan on digging through it myself. Thank you for
open sourcing it.

Thanks for good words :)

This work may be adopted for some specific cases.
25-30 times speedup on Mandelbrot allows usage for numeric calculation
instead of C.

https://gist.github.com/dstogov/12323ad13d3240aee8f1

anyone may repeat the language battle :)

These tests seem really odd. A 15% speed advantage over GCC -O2? Sure,
it's possible. But I don't think it's likely. It really smells to me
like bias in the testing methodology. (and the lack of an -O3 result
is suspicious as well).

And looking at the code, I can see why. The PHP version is writing to
an internal buffer, while every other version has to write to STDOUT
on every single iteration.

So you are intentionally not benchmarking the output in the PHP
version (you even explicitly call ob_start()) but are benchmarking it
in every other version. So in fact, the PHP code does something
different than the rest of the code.

Sneaky sneaky. Also completely fake.

Please, be polite. We opened sources, and the sources of benchmarks. Anyone
can repeat this.
Smart people may analyze results themselves before claiming others.
I think you are smart person, and I respect the things you are doing.

Thanks. Dmitry.

A proper methodology would have

explicitly disabled any buffer so that the tests all tested the same
thing. Or even better, build up an internal buffer in all of the
implementations. That way you can compare the computation and not rely
on STDOUT (terminal) response.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Friday, February 27, 2015 9:36 PM
To: Dmitry Stogov
Cc: Zeev Suraski; Jordi Boggiano; PHP Internals
Subject: Re: [PHP-DEV] Re: Zend JIT Open Sourced

And looking at the code, I can see why. The PHP version is writing to an
internal buffer, while every other version has to write to STDOUT on every
single iteration.

Except stdout is buffered too (www.turnkeylinux.org/blog/unix-buffering).
Perhaps there's some difference there, but it's not nearly as obvious as you
claim (let alone the other stuff in the other email, but let's leave that
aside).
Personally, my money would be on glibc stdout's buffering being more
efficient than our output buffering layer. But you're welcome to test.

Zeev

10 years ago by Andi Gutmans — view source

unread

Dmitry,

That's not to say there's anything wrong with this approach, nor that
there isn't a ton we can learn from it. I think it's a fantastic
research effort and plan on digging through it myself. Thank you for
open sourcing it.

Thanks for good words :)

This work may be adopted for some specific cases.
25-30 times speedup on Mandelbrot allows usage for numeric calculation
instead of C.

https://gist.github.com/dstogov/12323ad13d3240aee8f1

anyone may repeat the language battle :)

These tests seem really odd. A 15% speed advantage over GCC -O2? Sure,
it's possible. But I don't think it's likely. It really smells to me
like bias in the testing methodology. (and the lack of an -O3 result
is suspicious as well).

And looking at the code, I can see why. The PHP version is writing to
an internal buffer, while every other version has to write to STDOUT
on every single iteration.

So you are intentionally not benchmarking the output in the PHP
version (you even explicitly call ob_start()) but are benchmarking it
in every other version. So in fact, the PHP code does something
different than the rest of the code.

We actually discussed this at the time of the results.
IIRC it really has nothing to do with the output mechanism, etc.. The benchmark does enough iterations and very little output that the impact there is negligible (you can test this yourself to see if I am right but I am pretty sure I am).
It is due to the fact that at runtime LLVM can optimize better to the architecture than a static standard gcc build. Constraining gcc with the right architecture dependent switches upfront will also improve the gcc results. Anyway, still pretty cool to see this although it has very little impact (if any) on real world apps ala Magent, WordPress, Drupal, ...

I think the important learning is that faster synthetic benchmarks have very little impact on overall application performance. Sure it can have an impact on specific algorithmic pieces of code but that’s the exception not the norm. No doubt there are other ways to write JIT including tracing JITs etc. but I do think we found that we are more bound by I/O and memory/caches than the quality of the machine code as the engine is already quite tight. And with apps consuming more and more Cloud services the I/O bottleneck issue looks grimmer than ever! :) That also comes across consistently in benchmarks of PHP 7 vs. hhvm on real-world apps - you see a JIT and non-JIT platform pretty much head to head on performance and actually on the complex stuff PHP 7 is often faster.

Anyway, definitely makes sense to continue to look at these kind of opportunities down the road but PHP 7 is such a huge step-up on real world application performance I think getting that out the door is the biggest possible short-term win when it comes to performance. Looking forward to seeing folks dig into the code and have ideas down the road!!

Andi

10 years ago by Dmitry Stogov — view source

unread

On Feb 27, 2015, at 11:36 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Dmitry,

That's not to say there's anything wrong with this approach, nor that
there isn't a ton we can learn from it. I think it's a fantastic
research effort and plan on digging through it myself. Thank you for
open sourcing it.

Thanks for good words :)

This work may be adopted for some specific cases.
25-30 times speedup on Mandelbrot allows usage for numeric calculation
instead of C.

https://gist.github.com/dstogov/12323ad13d3240aee8f1

anyone may repeat the language battle :)

These tests seem really odd. A 15% speed advantage over GCC -O2? Sure,
it's possible. But I don't think it's likely. It really smells to me
like bias in the testing methodology. (and the lack of an -O3 result
is suspicious as well).

And looking at the code, I can see why. The PHP version is writing to
an internal buffer, while every other version has to write to STDOUT
on every single iteration.

So you are intentionally not benchmarking the output in the PHP
version (you even explicitly call ob_start()) but are benchmarking it
in every other version. So in fact, the PHP code does something
different than the rest of the code.

We actually discussed this at the time of the results.
IIRC it really has nothing to do with the output mechanism, etc.. The
benchmark does enough iterations and very little output that the impact
there is negligible (you can test this yourself to see if I am right but I
am pretty sure I am).
It is due to the fact that at runtime LLVM can optimize better to the
architecture than a static standard gcc build. Constraining gcc with the
right architecture dependent switches upfront will also improve the gcc
results. Anyway, still pretty cool to see this although it has very little
impact (if any) on real world apps ala Magent, WordPress, Drupal, ...

I think the important learning is that faster synthetic benchmarks have
very little impact on overall application performance. Sure it can have an
impact on specific algorithmic pieces of code but that’s the exception not
the norm. No doubt there are other ways to write JIT including tracing JITs
etc. but I do think we found that we are more bound by I/O and
memory/caches than the quality of the machine code as the engine is already
quite tight. And with apps consuming more and more Cloud services the I/O
bottleneck issue looks grimmer than ever! :) That also comes across
consistently in benchmarks of PHP 7 vs. hhvm on real-world apps - you see a
JIT and non-JIT platform pretty much head to head on performance and
actually on the complex stuff PHP 7 is often faster.

Anyway, definitely makes sense to continue to look at these kind of
opportunities down the road but PHP 7 is such a huge step-up on real world
application performance I think getting that out the door is the biggest
possible short-term win when it comes to performance. Looking forward to
seeing folks dig into the code and have ideas down the road!!

Completely agree. And have to say that these experiments with JIT leaded us
to understanding of real PHP-5 bottleneck, that allowed us to make about
60% improvement on real-life in PHPNG and already near 2 times in PHP7.

But LuaJIT without JIT is 4 times faster on Mandelbrot. It's a challenge...

Thanks. Dmitry.

Andi

10 years ago by Sebastian Bergmann — view source

unread

Am 27.02.2015 um 16:12 schrieb Anthony Ferrara:

Thank you for sharing your code. I look forward to playing with it.

Perhaps after 7 stabilizes (and ships) you could write up your
thoughts around it? Why decisions were made and the findings that you
have?

What Joe and Anthony said :-) Thanks for sharing this with the
community and I'll be looking forward to the promised documentation
of decisions etc.

10 years ago by Pierre Joye — view source

unread

hi,

Hi,

With the recent discussions of JIT/AOT and the good progress we made on
PHP-7, we decided to open up the JIT experiment we've been working on.

https://github.com/zendtech/php-src/tree/zend-jit/ext/opcache/jit

You may just clone or pull zend-jit branch and compile/configure according
to instruction. Don't merge it with master. It may work today but will stop
working tomrrow.

Disclaimers:

It's an experiment, and is not in any way ready for anything.
In the future we may try to implement JIT quite different from this PoC.

I'm not planning to invest into it in the near future. (PHP-7 takes all
my time)
Consider it available for academic purposes only at this point.

Fantastic move! The way to do it! Thanks a lot!.

Cheers,

Pierre

@pierrejoye | http://www.libgd.org

10 years ago by Xinchen Hui — view source

unread

Hey:

Hi,

With the recent discussions of JIT/AOT and the good progress we made on
PHP-7, we decided to open up the JIT experiment we've been working on.

https://github.com/zendtech/php-src/tree/zend-jit/ext/opcache/jit

You may just clone or pull zend-jit branch and compile/configure according
to instruction. Don't merge it with master. It may work today but will stop
working tomrrow.

Disclaimers:

It's an experiment, and is not in any way ready for anything.
In the future we may try to implement JIT quite different from this PoC.

I'm not planning to invest into it in the near future. (PHP-7 takes all my
time)
Consider it available for academic purposes only at this point.
Great, too many threads these days, I almost miss this mail :)

the mail problem here is we can not get real performance improvement
in reallife apps.

opensource this, could maybe get more ideas on how to improve it.
(except the icachemiss, dcachemiss we already knew).

welcome to play with it, and give some thoughts :)

thanks

Enjoy!

Thanks. Dmitry.

--
Xinchen Hui
@Laruence
http://www.laruence.com/

10 years ago by Yasuo Ohgaki — view source

unread

Hi Dmitry,

With the recent discussions of JIT/AOT and the good progress we made on
PHP-7, we decided to open up the JIT experiment we've been working on.

https://github.com/zendtech/php-src/tree/zend-jit/ext/opcache/jit

You may just clone or pull zend-jit branch and compile/configure according
to instruction. Don't merge it with master. It may work today but will stop
working tomrrow.

Disclaimers:

It's an experiment, and is not in any way ready for anything.
In the future we may try to implement JIT quite different from this PoC.

I'm not planning to invest into it in the near future. (PHP-7 takes all
my time)
Consider it available for academic purposes only at this point.

Enjoy!

Awesome!
I wish I have more time to play with this.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net