Method call overhead

10 years ago by Bishop Bettini — view source

unread

Hi!

I've measured the overhead for method calls in a variety of environments
(Amazon, Travis, and 3v4l). The results are reliable and here's 3v4l
http://3v4l.org/NsjJR.

Some observations. First, as expected, direct calls are faster than static
object calls, which are faster than object calls. Second, in absolute
times PHP7 outperforms HHVM3 substantially. Kudos, really impressive.

My question though is on relative times. Method call overhead is
consistently 50% to 150% over a direct call. Is my experiment invalid, or
is this overhead expected? Is the overhead in the allocation,
deallocation, GC?

Cheers,
bishop

10 years ago by Michael Wallner — view source

unread

My question though is on relative times. Method call overhead is
consistently 50% to 150% over a direct call. Is my experiment invalid, or
is this overhead expected? Is the overhead in the allocation,
deallocation, GC?

I’suggest you use a tool like valgrind’s callgrind (and visualize with e.g. kcachegrind).

Regards,
Mike

10 years ago by Sara Golemon — view source

unread

My question though is on relative times. Method call overhead is
consistently 50% to 150% over a direct call. Is my experiment invalid, or
is this overhead expected? Is the overhead in the allocation,
deallocation, GC?

I’suggest you use a tool like valgrind’s callgrind (and visualize with e.g. kcachegrind).

I'd also take variable function calling out of the picture (or at least, have it as a separate dimension). Normal applications make pretty spare use of variable functions (compared to actual direct calls), whereas your benchmark uses them exclusively. So you've explicitly chosen the most pessimistic path, and an unrepresentative one.

-Sara

10 years ago by Julien Pauli — view source

unread

My question though is on relative times. Method call overhead is
consistently 50% to 150% over a direct call. Is my experiment invalid,
or
is this overhead expected? Is the overhead in the allocation,
deallocation, GC?

I’suggest you use a tool like valgrind’s callgrind (and visualize with
e.g. kcachegrind).

I'd also take variable function calling out of the picture (or at least,
have it as a separate dimension). Normal applications make pretty spare
use of variable functions (compared to actual direct calls), whereas your
benchmark uses them exclusively. So you've explicitly chosen the most
pessimistic path, and an unrepresentative one.

Yep, as I explain in [1] several [2] articles, direct calls are more
efficient that variable ones, which makes the engine analyze your variable
and look up for the function at runtime (whereas its done at compile time
when direct call).

Methods are heavier to trigger, because the engine needs to find the class,
then to find the method in the class, to finally use it. The compiler can't
pre-compute things, because the class may not exist yet and be
rutime-autoloaded, which adds flexibility to the language, but also prevent
us from optimizing things too deep.
The lookups in method calls are done only the first time, those lookups are
cached in the VM frame for reusage later.

You may have a look at ZEND_INIT_STATIC_METHOD_CALL handler if you want to
see what happens (for static calls).

For functions, if the function is known at the time its called, and if the
call is direct, then the process is fully optimized.

[1] http://jpauli.github.io/2015/02/05/zend-vm-executor.html
[2] http://jpauli.github.io/2015/01/22/on-php-function-calls.html

Julien Pauli

-Sara

10 years ago by Brian Moon — view source

unread

Hi!

I've measured the overhead for method calls in a variety of environments
(Amazon, Travis, and 3v4l). The results are reliable and here's 3v4l
http://3v4l.org/NsjJR.

Some observations. First, as expected, direct calls are faster than static
object calls, which are faster than object calls. Second, in absolute
times PHP7 outperforms HHVM3 substantially. Kudos, really impressive.

My question though is on relative times. Method call overhead is
consistently 50% to 150% over a direct call. Is my experiment invalid, or
is this overhead expected? Is the overhead in the allocation,
deallocation, GC?

Cheers,
bishop

This is a better representation of what you are trying to show. It
removes all the magic call back stuff that could be adding to the
slowness you are seeing. In addition, it does not create a new object on
every call for the object method. Creating a new object is going to
explicitly slow things down. But, it's not related to the call time.

http://3v4l.org/biM9G

Brian.

http://brian.moonspot.net/

10 years ago by Sara Golemon — view source

unread

I've measured the overhead for method calls in a variety of environments
(Amazon, Travis, and 3v4l). The results are reliable and here's 3v4l
http://3v4l.org/NsjJR.

Some observations. First, as expected, direct calls are faster than
static
object calls, which are faster than object calls. Second, in absolute
times PHP7 outperforms HHVM3 substantially. Kudos, really impressive.

My question though is on relative times. Method call overhead is
consistently 50% to 150% over a direct call. Is my experiment invalid, or
is this overhead expected? Is the overhead in the allocation,
deallocation, GC?

This is a better representation of what you are trying to show. It removes
all the magic call back stuff that could be adding to the slowness you are
seeing. In addition, it does not create a new object on every call for the
object method. Creating a new object is going to explicitly slow things
down. But, it's not related to the call time.

Interesting data. Regarding your original question, I would expect
method calls to be somewhat more expensive since even with a known
method name there's polymorphism to take into account. I wouldn't
expect it to be massively more, but non-zero.

I would still recommend using Callgrind as Mike suggested. It's going
to give you much more reliable (and useful) time numbers than
microtime().

-Sara

10 years ago by Yasuo Ohgaki — view source

unread

Hi Brian,

This is a better representation of what you are trying to show. It removes
all the magic call back stuff that could be adding to the slowness you are
seeing. In addition, it does not create a new object on every call for the
object method. Creating a new object is going to explicitly slow things
down. But, it's not related to the call time.

http://3v4l.org/biM9G

This version adds up precision errors because microtime() precision is not
high enough for
the purpose. Since the error is random, it does not affect proportion much
but total
execution time. Following damn code does better job for these kinds of
benchmarks.

http://3v4l.org/eJK07

See the total amount of execution time recorded.
If we really need pure execution time, then it should record "for loop"
execution time
with empty body.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Yasuo Ohgaki — view source

unread

This is a better representation of what you are trying to show. It
removes all the magic call back stuff that could be adding to the slowness
you are seeing. In addition, it does not create a new object on every call
for the object method. Creating a new object is going to explicitly slow
things down. But, it's not related to the call time.

http://3v4l.org/biM9G

This version adds up precision errors because microtime() precision is not
high enough for
the purpose. Since the error is random, it does not affect proportion much
but total
execution time. Following damn code does better job for these kinds of
benchmarks.

http://3v4l.org/eJK07

See the total amount of execution time recorded.
If we really need pure execution time, then it should record "for loop"
execution time
with empty body.

One additional note.
Even when "for loop" execution time is subtracted, we still have to take
into the return
value handling time to be precise. Since "for loop" and "return value"
handling is constant
time, increasing the number of iterations and ignoring the time is easier.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Marc Bennewitz — view source

unread

Hi Brian,

This is a better representation of what you are trying to show. It removes
all the magic call back stuff that could be adding to the slowness you are
seeing. In addition, it does not create a new object on every call for the
object method. Creating a new object is going to explicitly slow things
down. But, it's not related to the call time.

http://3v4l.org/biM9G

This version adds up precision errors because microtime() precision is not
high enough for
the purpose. Since the error is random, it does not affect proportion much
but total
execution time. Following damn code does better job for these kinds of
benchmarks.

http://3v4l.org/eJK07

See the total amount of execution time recorded.
If we really need pure execution time, then it should record "for loop"
execution time
with empty body.
-> incl. displaying the time for the loop: http://3v4l.org/1vZJI

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Rowan Collins — view source

unread

Marc Bennewitz wrote on 04/06/2015 10:01:

http://3v4l.org/eJK07

See the total amount of execution time recorded.
If we really need pure execution time, then it should record "for loop"
execution time
with empty body.
-> incl. displaying the time for the loop: http://3v4l.org/1vZJI

Percentages relative to the base for loop, not previous item:
http://3v4l.org/l75kf

HHVM's percentages are lower primarily because its for loop measure is
much slower in absolute terms.

10 years ago by Sara Golemon — view source

unread

: http://3v4l.org/l75kf

HHVM's percentages are lower primarily because its for loop measure is much slower in absolute terms.

At the risk of distracting from the central topic, I'd like to point out that HHVM's times are almost certainly based on non-JITted code. The JIT doesn't kick in until the 11th request, and 3v4l scripts are, by definition, only run once.

PHP is optimized for blind interp, so it's doing better in that mode.

-Sara

10 years ago by Rowan Collins — view source

unread

Sara Golemon wrote on 04/06/2015 22:55:

: http://3v4l.org/l75kf

HHVM's percentages are lower primarily because its for loop measure is much slower in absolute terms.

At the risk of distracting from the central topic, I'd like to point out that HHVM's times are almost certainly based on non-JITted code. The JIT doesn't kick in until the 11th request, and 3v4l scripts are, by definition, only run once.

PHP is optimized for blind interp, so it's doing better in that mode.

That makes a lot of sense, and just goes to show how hard it is to get
useful information out of a simple benchmark.