Method call improvements

16 years ago by Timm Friebe — view source

unread

Hi,

in every programming language, method calls are expensive. Especially in
PHP, which does not spend any effort during compile time to resolve method
calls their target (and cannot due to the possibility of lazily loading
classes using include w/ variables). I recently did some performance
profiling on exactly how slow method calls are compared to other operation
such as, for example, incrementing an int (the factor is around seven) and
how they compare to compiled languages (the factor lies between 400 and
1400).

Here goes the test:

$instance->method();

...in different variants, using public, private and protected (the latter
are the slowest). On my machine I get about somewhere around 700'000 method
calls per second, while C# scores 250'000'000, for example. Your mileage is
going to vary.

The difference in these numbers being quite discouraging, I started digging
a bit deeper into how method calls are handled by the Zend Engine. Again,
let's take the example from above, here's what happens (in zend_vm_def.h and
zend_object_handlers.c):

Finding the execution target
a. $instance is a variable, so we have a zval*
b. if Z_TYPE_P() of this zval is IS_OBJECT, OK.
c. Z_OBJCE_P() will render the zend_class_entry* ce
d. method is a zval*, its zval being a IS_STRING
e. Given ce's function_table, we can lookup the zend_function*
corresponding to the method entry by its (previously lower-
cased!) name
f. If we can't find it and the ce has a __call, go for that,
else zend_error()
Verifying it
a. If the modifiers are PUBLIC, OK.
b. If they're private, verify EG(scope) == ce. If they match,
OK, if not, try for ce->__call, if that doesn't exist, error.
c. If they're protected, verify instanceof_function(ce, EG(scope))
If that returns FAILURE, try ce->__call, if that doesn't exist,
error. If it exists, OK.
Insurance
a. Finally test if the zend_function* found is neither abstract
nor deprecated.
b. Test non-static methods aren't called statically, else issue
a warning (or error, depending on the situation).
Execute
a. Take EX(function_state).function->op_array and zend_execute()
it.

You can clearly see the checks in #1 and #2 (most of which happens in
zend_std_get_method())are quite extensive. Now the idea I developed was to
cache this information and I thus came up with the following:

At [1d], calculate a hash key for the following:
- method->name
- ce->name
- EG(scope) ? EG(scope)->name : ""
  These are the only variables used for verifying scope and
  modifiers, and the verification is always going to yield the
  same result as long as the stay the same.
Look this up in a hashtable (in generic-speak:
HashTable<ulong, zend_function*>). If found, return that,
continue with [1e] otherwise.
After [2c], store the found zend_function* to the hash.

I was curious how this would affect overall performance, both in synthetic
and in real-world situations. The first tests I ran were something along the
lines of:

for ($i= 0; $i < $times; $i++) {
$instance->method();
}

...with and without the patch - this gave me a factor of 1.7 to 1.8 (times
the PHP I built with the patch was faster)! The real-world situation was
running the test suite of an object-oriented PHP framework, taking 1.55
seconds before and 0.91 after. I would call this good, almost doubling the
speed. Of course this is nowhere near the factors I mentioned before but I
think this has potential. Of course, caching comes at a cost, but by using a
numeric key instead of a string I could reduce the overhead to a minimum,
the real-world application consuming about 20 KB more memory, which I'd call
negligible.

Last but not least I verified I hadn't utterly broken the way PHP works by
running the tests from Zend/tests and found no test where failing with the
patch that weren't already failing without it (some of them expected, some
not).

The simple idea is a ~50 line patch intended for the PHP_5_3 branch and
available at the following location:

http://sitten-polizei.de/php/method-call-cache.diff

It serves its purpose quite well in CLI sapi and would definitive fixing up
for it to go into production (parts of it belong to zend_hash.c, and the
cache variable needs to be an EG() instead of static).

I'm interested in your opinions and if you think its addition would be worth
a try.

Timm

16 years ago by Arvids Godjuks — view source

unread

Hi Timm!
Worth it? I think that's absolute necessity to add this patch, because if it
gives almost 2 times speed boost for typical frameworks, it's just fantastic
(well, the less calls, the less performance boost, but really some monsters
like Zend or CakePHP and others will definitely benefit from it) . I'm
definitely going to do some tests with your patch and give feedback here.
Thanks for your work!

2009/1/16 Timm Friebe thekid@thekid.de

Hi,

in every programming language, method calls are expensive. Especially in
PHP, which does not spend any effort during compile time to resolve method
calls their target (and cannot due to the possibility of lazily loading
classes using include w/ variables). I recently did some performance
profiling on exactly how slow method calls are compared to other operation
such as, for example, incrementing an int (the factor is around seven) and
how they compare to compiled languages (the factor lies between 400 and
1400).

Here goes the test:

$instance->method();

...in different variants, using public, private and protected (the latter
are the slowest). On my machine I get about somewhere around 700'000 method
calls per second, while C# scores 250'000'000, for example. Your mileage is
going to vary.

The difference in these numbers being quite discouraging, I started digging
a bit deeper into how method calls are handled by the Zend Engine. Again,
let's take the example from above, here's what happens (in zend_vm_def.h and
zend_object_handlers.c):

Finding the execution target
a. $instance is a variable, so we have a zval*
b. if Z_TYPE_P() of this zval is IS_OBJECT, OK.
c. Z_OBJCE_P() will render the zend_class_entry* ce
d. method is a zval*, its zval being a IS_STRING
e. Given ce's function_table, we can lookup the zend_function*
corresponding to the method entry by its (previously lower-
cased!) name
f. If we can't find it and the ce has a __call, go for that,
else zend_error()

Verifying it
a. If the modifiers are PUBLIC, OK.
b. If they're private, verify EG(scope) == ce. If they match,
OK, if not, try for ce->__call, if that doesn't exist, error.
c. If they're protected, verify instanceof_function(ce, EG(scope))
If that returns FAILURE, try ce->__call, if that doesn't exist,
error. If it exists, OK.

Insurance
a. Finally test if the zend_function* found is neither abstract
nor deprecated.
b. Test non-static methods aren't called statically, else issue
a warning (or error, depending on the situation).

Execute
a. Take EX(function_state).function->op_array and zend_execute()
it.

You can clearly see the checks in #1 and #2 (most of which happens in
zend_std_get_method())are quite extensive. Now the idea I developed was to
cache this information and I thus came up with the following:

At [1d], calculate a hash key for the following:

method->name

ce->name

EG(scope) ? EG(scope)->name : ""
These are the only variables used for verifying scope and
modifiers, and the verification is always going to yield the
same result as long as the stay the same.

Look this up in a hashtable (in generic-speak:
HashTable<ulong, zend_function*>). If found, return that,
continue with [1e] otherwise.

After [2c], store the found zend_function* to the hash.

I was curious how this would affect overall performance, both in synthetic
and in real-world situations. The first tests I ran were something along the
lines of:

for ($i= 0; $i < $times; $i++) {
$instance->method();
}

...with and without the patch - this gave me a factor of 1.7 to 1.8 (times
the PHP I built with the patch was faster)! The real-world situation was
running the test suite of an object-oriented PHP framework, taking 1.55
seconds before and 0.91 after. I would call this good, almost doubling the
speed. Of course this is nowhere near the factors I mentioned before but I
think this has potential. Of course, caching comes at a cost, but by using a
numeric key instead of a string I could reduce the overhead to a minimum,
the real-world application consuming about 20 KB more memory, which I'd call
negligible.

Last but not least I verified I hadn't utterly broken the way PHP works by
running the tests from Zend/tests and found no test where failing with the
patch that weren't already failing without it (some of them expected, some
not).

The simple idea is a ~50 line patch intended for the PHP_5_3 branch and
available at the following location:

http://sitten-polizei.de/php/method-call-cache.diff

It serves its purpose quite well in CLI sapi and would definitive fixing up
for it to go into production (parts of it belong to zend_hash.c, and the
cache variable needs to be an EG() instead of static).

I'm interested in your opinions and if you think its addition would be
worth a try.

Timm

16 years ago by Marcus Boerger — view source

unread

Hello Timm,

Friday, January 16, 2009, 9:35:13 PM, you wrote:

Hi,

in every programming language, method calls are expensive. Especially in
PHP, which does not spend any effort during compile time to resolve method
calls their target (and cannot due to the possibility of lazily loading
classes using include w/ variables). I recently did some performance
profiling on exactly how slow method calls are compared to other operation
such as, for example, incrementing an int (the factor is around seven) and
how they compare to compiled languages (the factor lies between 400 and
1400).

Here goes the test:

$instance->method();

...in different variants, using public, private and protected (the latter
are the slowest). On my machine I get about somewhere around 700'000 method
calls per second, while C# scores 250'000'000, for example. Your mileage is
going to vary.

The difference in these numbers being quite discouraging, I started digging
a bit deeper into how method calls are handled by the Zend Engine. Again,
let's take the example from above, here's what happens (in zend_vm_def.h and
zend_object_handlers.c):

Finding the execution target
a. $instance is a variable, so we have a zval*
b. if Z_TYPE_P() of this zval is IS_OBJECT, OK.
c. Z_OBJCE_P() will render the zend_class_entry* ce
d. method is a zval*, its zval being a IS_STRING
e. Given ce's function_table, we can lookup the zend_function*
corresponding to the method entry by its (previously lower-
cased!) name
f. If we can't find it and the ce has a __call, go for that,
else zend_error()

Verifying it
a. If the modifiers are PUBLIC, OK.
b. If they're private, verify EG(scope) == ce. If they match,
OK, if not, try for ce->__call, if that doesn't exist, error.
c. If they're protected, verify instanceof_function(ce, EG(scope))
If that returns FAILURE, try ce->__call, if that doesn't exist,
error. If it exists, OK.

Insurance
a. Finally test if the zend_function* found is neither abstract
nor deprecated.
b. Test non-static methods aren't called statically, else issue
a warning (or error, depending on the situation).

Execute
a. Take EX(function_state).function->op_array and zend_execute()
it.

You can clearly see the checks in #1 and #2 (most of which happens in
zend_std_get_method())are quite extensive. Now the idea I developed was to
cache this information and I thus came up with the following:

At [1d], calculate a hash key for the following:

method->name

ce->name

EG(scope) ? EG(scope)->name : ""
These are the only variables used for verifying scope and
modifiers, and the verification is always going to yield the
same result as long as the stay the same.

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

Also the zend_class_enty lookup in [1c] imo is completely useless. If
the zval object would store the class entry and the class entry had a
pointer to the handlers then we would save another costly lookup and
simply follow a pointer instead.

Even more we could have the object id be a pointer into the object
storage or a direct index into the storage (like we have right now).
But something that does work faster and does not need so many function
calls. If we go for a pointer we can easily provide a means to resolve
that pointer into an object id for the one case we need an object id,
which is var_dump().

Then your hash table sounds like a nice idea.

Look this up in a hashtable (in generic-speak:
HashTable<ulong, zend_function*>). If found, return that,
continue with [1e] otherwise.

After [2c], store the found zend_function* to the hash.

I was curious how this would affect overall performance, both in synthetic
and in real-world situations. The first tests I ran were something along the
lines of:

for ($i= 0; $i < $times; $i++) {
$instance->method();
}

...with and without the patch - this gave me a factor of 1.7 to 1.8 (times
the PHP I built with the patch was faster)! The real-world situation was
running the test suite of an object-oriented PHP framework, taking 1.55
seconds before and 0.91 after. I would call this good, almost doubling the
speed. Of course this is nowhere near the factors I mentioned before but I
think this has potential. Of course, caching comes at a cost, but by using a
numeric key instead of a string I could reduce the overhead to a minimum,
the real-world application consuming about 20 KB more memory, which I'd call
negligible.

Last but not least I verified I hadn't utterly broken the way PHP works by
running the tests from Zend/tests and found no test where failing with the
patch that weren't already failing without it (some of them expected, some
not).

The simple idea is a ~50 line patch intended for the PHP_5_3 branch and
available at the following location:

http://sitten-polizei.de/php/method-call-cache.diff

It serves its purpose quite well in CLI sapi and would definitive fixing up
for it to go into production (parts of it belong to zend_hash.c, and the
cache variable needs to be an EG() instead of static).

I'm interested in your opinions and if you think its addition would be worth
a try.

Timm

Best regards,
Marcus

16 years ago by Hannes Magnusson — view source

unread

(for the fun of it, I am CCing Andi, the CTO and senior VP of Zend)

Hello Timm,

Friday, January 16, 2009, 9:35:13 PM, you wrote:

At [1d], calculate a hash key for the following:

method->name

ce->name

EG(scope) ? EG(scope)->name : ""
These are the only variables used for verifying scope and
modifiers, and the verification is always going to yield the
same result as long as the stay the same.

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

Wait what? Wtf?
Why would an opinion from a company matter to us?

Sure. Dmitry does a kickass good job, but I can guarantee you that
Arnaud, Felipe, Tony and youself could do just as good job.

If I was king (with m4d skillz) I would fork that "Zend engine", try
to fix the license (which lies, there is no way of "downloading the
engine" from their website) and try to optimize things as much as
possible before it ever hits (almost non-existing) optimizers.

By saying this kind of things Marcus you make me very scared. You
shouldn't care what Zend, Yahoo, Facebook or whatever company says. If
you think it is the right decision then (quoting Christina Aguilera ;)
) "Do your thing honey!"

You were well on-track with your Closures commits, unfortunately that
didn't turn out quite as well neither of us had hoped for, but that is
life. Sometimes your ideas don't work out. Sometimes they do.
It hasn't stopped you so far, so why would it now?

-Hannes

16 years ago by Guilherme Blanco — view source

unread

JFYI: Method calls increased from 3,2 million to 4 million when I have
this patch applied locally.

It's a valuable addiction to core! =)

Short version: Commit!

On Sat, Jan 17, 2009 at 8:26 PM, Hannes Magnusson
hannes.magnusson@gmail.com wrote:

(for the fun of it, I am CCing Andi, the CTO and senior VP of Zend)

Hello Timm,

Friday, January 16, 2009, 9:35:13 PM, you wrote:

At [1d], calculate a hash key for the following:

method->name

ce->name

EG(scope) ? EG(scope)->name : ""
These are the only variables used for verifying scope and
modifiers, and the verification is always going to yield the
same result as long as the stay the same.

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

Wait what? Wtf?
Why would an opinion from a company matter to us?

Sure. Dmitry does a kickass good job, but I can guarantee you that
Arnaud, Felipe, Tony and youself could do just as good job.

If I was king (with m4d skillz) I would fork that "Zend engine", try
to fix the license (which lies, there is no way of "downloading the
engine" from their website) and try to optimize things as much as
possible before it ever hits (almost non-existing) optimizers.

By saying this kind of things Marcus you make me very scared. You
shouldn't care what Zend, Yahoo, Facebook or whatever company says. If
you think it is the right decision then (quoting Christina Aguilera ;)
) "Do your thing honey!"

You were well on-track with your Closures commits, unfortunately that
didn't turn out quite as well neither of us had hoped for, but that is
life. Sometimes your ideas don't work out. Sometimes they do.
It hasn't stopped you so far, so why would it now?

-Hannes

--

--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil

16 years ago by Stanislav Malyshev — view source

unread

Hi!

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

Err, I'm not sure how can you store in opcode something you don't know -
since opcode may be generated well before class or method exists, not
speaking about the object of which you know nothing at the time of
opcode generation? Same opcode could call entirely different methods of
different classes. Am I missing somehing?

Also the zend_class_enty lookup in [1c] imo is completely useless. If
the zval object would store the class entry and the class entry had a
pointer to the handlers then we would save another costly lookup and
simply follow a pointer instead.

having only one handler table per class would make objects less flexible.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

16 years ago by Marcus Boerger — view source

unread

Hello Stanislav,

Monday, January 19, 2009, 10:12:46 AM, you wrote:

Hi!

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

Err, I'm not sure how can you store in opcode something you don't know -
since opcode may be generated well before class or method exists, not
speaking about the object of which you know nothing at the time of
opcode generation? Same opcode could call entirely different methods of
different classes. Am I missing somehing?

Nope. But sometimes we are in a scope where it is always the same thing we
call. Or at leats a one of the three things are constant. And any of them
being constant would help.

Also the zend_class_enty lookup in [1c] imo is completely useless. If
the zval object would store the class entry and the class entry had a
pointer to the handlers then we would save another costly lookup and
simply follow a pointer instead.

having only one handler table per class would make objects less flexible.

How so? Tell me any case where that is different right now?

Right now every member of a class tree (each class derived from a specific
base class) has the same handler table. And that cannot be changed becasue
each of the class members have the same creation/destruction c functions.

The way to change that would be to have several creation functions that
create objects with the same class that have different handlers. While that
works at this level, we would end up in problems where we copy members of
these classes or perform other complex operations. Unfortunately we often
assume that same objects have the same handlers. And last but not least we
already have some code that checks for handlers rather than for classes.
Just because it is faster. My idea would in these cases add one more
pointer indirection but change the other way to simply pointer resolving.
Thus my idea would overall increase PHP speed.

--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

Best regards,
Marcus

16 years ago by Stanislav Malyshev — view source

unread

Hi!

Nope. But sometimes we are in a scope where it is always the same thing we
call. Or at leats a one of the three things are constant. And any of them
being constant would help.

Well, object can't be constant, so only thing constant can be method
name. We have it in opcode. Everything else depends on object, and since
we don't know the object in compile time we couldn't put more in opcode.
We could cache some things in runtime - though not inside opcodes.

having only one handler table per class would make objects less flexible.

How so? Tell me any case where that is different right now?

Now handler table is per-object.

Right now every member of a class tree (each class derived from a specific
base class) has the same handler table. And that cannot be changed becasue
each of the class members have the same creation/destruction c functions.

These C functions can have if()s which may produce different kinds of
objects.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

16 years ago by Dmitry Stogov — view source

unread

Marcus Boerger wrote:

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

We can't modify opcode it self as it'll break opcode caches.

However we can introduce some indirect table associated with op_array,
which can be used to implement inline caches without direct opcode
modification (in the same way as IS_CV variables work). There are a lot
of papers about polymorphic inline caches (e.g.
http://research.sun.com/self/papers/pics.html) which we probably should
use to not to invite bicycle.

Thanks. Dmitry.

16 years ago by Guilherme Blanco — view source

unread

Hi guys,

What's the status on this one?!

It's an important optimization that should be considered. Save more
than a million method calls on a framework does not worth?
None gave a final word on this subject.

I could not see this commited in 5.3 neither in HEAD.
So...can someone notify me about the status of this???

Cheers,

Marcus Boerger wrote:

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

We can't modify opcode it self as it'll break opcode caches.

However we can introduce some indirect table associated with op_array, which
can be used to implement inline caches without direct opcode modification
(in the same way as IS_CV variables work). There are a lot of papers about
polymorphic inline caches (e.g.
http://research.sun.com/self/papers/pics.html) which we probably should use
to not to invite bicycle.

Thanks. Dmitry.

--

--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil

16 years ago by Paul Biggar — view source

unread

On Mon, May 11, 2009 at 7:47 PM, Guilherme Blanco
guilhermeblanco@gmail.com wrote:

What's the status on this one?!

I think it died from neglect. But it was a really good idea.

One question that was raised was:

However we can introduce some indirect table associated with op_array, which
can be used to implement inline caches without direct opcode modification
(in the same way as IS_CV variables work). There are a lot of papers about
polymorphic inline caches (e.g.
http://research.sun.com/self/papers/pics.html) which we probably should use
to not to invite bicycle.

You can't actually use PICs or even ICs with the Zend engine, because
you can't insert code into the callee method's header (you would need
a JIT). You also wouldn't want to, since PHP can't use the
recompilation techniques that Self had. You can use lookup caches,
which is exactly what the original patch was.

FWIW, since PHP has a static inheritence chain, the best approach
seems to be to build a virtual dispatch table, instead of a hashtable
for functions. However, there might be some esoteric extensions which
make this difficult.

Paul

--
Paul Biggar
paul.biggar@gmail.com

16 years ago by Stanislav Malyshev — view source

unread

Hi!

FWIW, since PHP has a static inheritence chain, the best approach
seems to be to build a virtual dispatch table, instead of a hashtable
for functions. However, there might be some esoteric extensions which
make this difficult.

IHMO it's not static enough. I.e., since PHP is not compiled, we can not
create VD table for the class until runtime inheritance, which means
that the code using this class can use method resolution more efficient
than name->function, i.e. hashtable. These lookups can be cached (i.e.
CV style) but I don't see how they can be altogether prevented.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

16 years ago by Paul Biggar — view source

unread

Hi Stas, Dmitry,

IHMO it's not static enough. I.e., since PHP is not compiled, we can not
create VD table for the class until runtime inheritance, which means that
the code using this class can use method resolution more efficient than
name->function, i.e. hashtable. These lookups can be cached (i.e. CV style)
but I don't see how they can be altogether prevented.

The real things is even worse as during compilation of a class it's parent
class doesn't have to be known. So construction of VMTs becomes a bit
problematic. BTW we could think in this way...

Apologies, I'm not familiar with run-time inheritence in PHP. My
understanding was that when a classes source code is compiled, its
parent classes must be known. When is this not the case? Must it be
known for the class' first instantiation?

In the worst case, it might be cheaper to build it at instantiation
time, but I would have to look up how expensive that is in a more
static language to be sure. Certainly, it is currently so expensive
that almost anything else would be better (including the OP's patch).

Thanks,
Paul

--
Paul Biggar
paul.biggar@gmail.com

16 years ago by Dmitry Stogov — view source

unread

Paul Biggar wrote:

Hi Stas, Dmitry,

IHMO it's not static enough. I.e., since PHP is not compiled, we can not
create VD table for the class until runtime inheritance, which means that
the code using this class can use method resolution more efficient than
name->function, i.e. hashtable. These lookups can be cached (i.e. CV style)
but I don't see how they can be altogether prevented.

The real things is even worse as during compilation of a class it's parent
class doesn't have to be known. So construction of VMTs becomes a bit
problematic. BTW we could think in this way...

Apologies, I'm not familiar with run-time inheritence in PHP. My
understanding was that when a classes source code is compiled, its
parent classes must be known. When is this not the case?

The parent class may be defined in other file that is loaded at runtime
using include() statement. It's very usual case. So the PHP first loads
the include file and then declares child class at runtime.

Must it be known for the class' first instantiation?

Of course. :)

In the worst case, it might be cheaper to build it at instantiation
time, but I would have to look up how expensive that is in a more
static language to be sure. Certainly, it is currently so expensive
that almost anything else would be better (including the OP's patch).

I don't see how run-time VMT contraction may help, because calls to
virtual method must know VMT offset at compile-time.

Thanks. Dmitry.

Thanks,
Paul

16 years ago by Paul Biggar — view source

unread

Apologies, I'm not familiar with run-time inheritence in PHP. My
understanding was that when a classes source code is compiled, its
parent classes must be known. When is this not the case?

The parent class may be defined in other file that is loaded at runtime
using include() statement. It's very usual case. So the PHP first loads the
include file and then declares child class at runtime.

Must it be known for the class' first instantiation?

Of course. :)

The real things is even worse as during compilation of a class it's parent
class doesn't have to be known. So construction of VMTs becomes a bit
problematic. BTW we could think in this way...

OK, so I dont understand this exactly. Is it correct to say that if a
class uses inheritance its compilation will be deferred until its
first instantiation? Or is it compiled when it is seen, and its parent
backpatched in later. When is later?

But I think its fair to say that it has static inheritance - that is,
its full inheritance chain is known before it can be instantiated, and
it can never be changed after that.

In the worst case, it might be cheaper to build it at instantiation
time, but I would have to look up how expensive that is in a more
static language to be sure. Certainly, it is currently so expensive
that almost anything else would be better (including the OP's patch).

I don't see how run-time VMT contraction may help, because calls to virtual
method must know VMT offset at compile-time.

Right. Construction is fine. Their use is not. I don't know what I was
thinking.

So it looks like the best way forwards is still the OP's patch?

Thanks,
Paul

--
Paul Biggar
paul.biggar@gmail.com

16 years ago by Dmitry Stogov — view source

unread

Paul Biggar wrote:

Apologies, I'm not familiar with run-time inheritence in PHP. My
understanding was that when a classes source code is compiled, its
parent classes must be known. When is this not the case?
The parent class may be defined in other file that is loaded at runtime
using include() statement. It's very usual case. So the PHP first loads the
include file and then declares child class at runtime.

Must it be known for the class' first instantiation?
Of course. :)

The real things is even worse as during compilation of a class it's parent
class doesn't have to be known. So construction of VMTs becomes a bit
problematic. BTW we could think in this way...

OK, so I dont understand this exactly. Is it correct to say that if a
class uses inheritance its compilation will be deferred until its
first instantiation? Or is it compiled when it is seen, and its parent
backpatched in later. When is later?

The classes which parent isn't known during compilation inherited at
run-time by DECLARE_INHERITED_CLASS opcode. It patches property and
method tablas, checks for method compatibility, etc

But I think its fair to say that it has static inheritance - that is,
its full inheritance chain is known before it can be instantiated, and
it can never be changed after that.

Right, but it has a lot of dynamic issues anyway. E.g. parent class may
be changed or loaded from different file.

Thanks. Dmitry.

In the worst case, it might be cheaper to build it at instantiation
time, but I would have to look up how expensive that is in a more
static language to be sure. Certainly, it is currently so expensive
that almost anything else would be better (including the OP's patch).
I don't see how run-time VMT contraction may help, because calls to virtual
method must know VMT offset at compile-time.

Right. Construction is fine. Their use is not. I don't know what I was
thinking.

So it looks like the best way forwards is still the OP's patch?

Thanks,
Paul

16 years ago by Paul Biggar — view source

unread

But I think its fair to say that it has static inheritance - that is,
its full inheritance chain is known before it can be instantiated, and
it can never be changed after that.

Right, but it has a lot of dynamic issues anyway. E.g. parent class may be
changed or loaded from different file.

This is what I'm getting at. How can the parent class be changed? I
can see that it might be deferred, but I don't see how it can be
changed once it's set.

Thanks,
Paul

--
Paul Biggar
paul.biggar@gmail.com

16 years ago by Stanislav Malyshev — view source

unread

Hi!

Apologies, I'm not familiar with run-time inheritence in PHP. My
understanding was that when a classes source code is compiled, its
parent classes must be known. When is this not the case? Must it be
known for the class' first instantiation?

No, the problems here are different. The process works as follows:

Class X source is compiled.
"X" is added to the class table
Class Y (extends X) source is compiled.
Since Y extends X, methods of X are added to methods of Y
"Y" is added to the class table

Now, adding bytecode caching. Bytecode caching replaces steps 1 and 3
with "loaded from cache" - however since the identity of X can change
between requests, what is stored for step 3 can not bind to X as it is
now - for that there's step 4 which is executed at runtime, when the
line where class is defined is executed. That means static table
describing class Y can exist only after step 4, and it is not cacheable
beyond the bounds of one request.

However, if we now are compiling the code such as:
$a->foo();
we meet with following challenges:

We do not know what class $a is (suppose it's X, but in most cases we
won't know that)
If we did, we do not know what class X is (definition, as opposed to
just name) at the compile time (it could be defined later)
If we knew what class X definition is at compile time, the above
would preclude us from generating any code that binds to that definition
since such code would not be cacheable.

These are three independent challenges, without overcoming each of them
I do not see how virtual table would be helpful.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

16 years ago by Paul Biggar — view source

unread

Hi Stas,

Hi!

Apologies, I'm not familiar with run-time inheritence in PHP. My
understanding was that when a classes source code is compiled, its
parent classes must be known. When is this not the case? Must it be
known for the class' first instantiation?

No, the problems here are different. The process works as follows:

Class X source is compiled.

"X" is added to the class table

Class Y (extends X) source is compiled.

Since Y extends X, methods of X are added to methods of Y

"Y" is added to the class table

Now, adding bytecode caching. Bytecode caching replaces steps 1 and 3 with
"loaded from cache" - however since the identity of X can change between
requests, what is stored for step 3 can not bind to X as it is now - for
that there's step 4 which is executed at runtime, when the line where class
is defined is executed. That means static table describing class Y can exist
only after step 4, and it is not cacheable beyond the bounds of one request.

Great explanation, thank you. As far as terminology goes, this is
still static inheritance, as you cannot change a class' parent after
it has been "set" in a request. Run-time inheritance is where it can
change, for example in Javascript where an object's prototype can be
changed. I think you could do lookup caches (ie the OP's patch) either
way, but its probably cheaper with static inheritance.

However, if we now are compiling the code such as:
$a->foo();
we meet with following challenges:

We do not know what class $a is (suppose it's X, but in most cases we
won't know that)

If we did, we do not know what class X is (definition, as opposed to just
name) at the compile time (it could be defined later)

If we knew what class X definition is at compile time, the above would
preclude us from generating any code that binds to that definition since
such code would not be cacheable.

These are three independent challenges, without overcoming each of them I do
not see how virtual table would be helpful.

Yes. As I replied to Dmitry, I clearly wasn't thinking when I
suggested this. FYI, I do type-inference on PHP, and the types here
are difficult to calculate in the general case.

Thanks,
Paul

--
Paul Biggar
paul.biggar@gmail.com

16 years ago by Dmitry Stogov — view source

unread

Hi Paul,

Paul Biggar wrote:

On Mon, May 11, 2009 at 7:47 PM, Guilherme Blanco
guilhermeblanco@gmail.com wrote:

What's the status on this one?!

I think it died from neglect. But it was a really good idea.

One question that was raised was:

However we can introduce some indirect table associated with op_array, which
can be used to implement inline caches without direct opcode modification
(in the same way as IS_CV variables work). There are a lot of papers about
polymorphic inline caches (e.g.
http://research.sun.com/self/papers/pics.html) which we probably should use
to not to invite bicycle.

You can't actually use PICs or even ICs with the Zend engine, because
you can't insert code into the callee method's header (you would need
a JIT). You also wouldn't want to, since PHP can't use the
recompilation techniques that Self had. You can use lookup caches,
which is exactly what the original patch was.

I know PHP limitations, and I meant additional lookup caches for one or
few results connected directly to ZEND_INIT_METHOD_CALL (and family)
opcodes.

FWIW, since PHP has a static inheritence chain, the best approach
seems to be to build a virtual dispatch table, instead of a hashtable
for functions. However, there might be some esoteric extensions which
make this difficult.

The real things is even worse as during compilation of a class it's
parent class doesn't have to be known. So construction of VMTs becomes a
bit problematic. BTW we could think in this way...

Thanks. Dmitry.

Paul

16 years ago by Dmitry Stogov — view source

unread

Hi Guilherme,

5.3 is closed for major updates (it is in RC state). I would try to look
into this when we develop a strategy for next PHP version.

Thanks. Dmitry.

Guilherme Blanco wrote:

Hi guys,

What's the status on this one?!

It's an important optimization that should be considered. Save more
than a million method calls on a framework does not worth?
None gave a final word on this subject.

I could not see this commited in 5.3 neither in HEAD.
So...can someone notify me about the status of this???

Cheers,

Marcus Boerger wrote:

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.
We can't modify opcode it self as it'll break opcode caches.

However we can introduce some indirect table associated with op_array, which
can be used to implement inline caches without direct opcode modification
(in the same way as IS_CV variables work). There are a lot of papers about
polymorphic inline caches (e.g.
http://research.sun.com/self/papers/pics.html) which we probably should use
to not to invite bicycle.

Thanks. Dmitry.

16 years ago by Guilherme Blanco — view source

unread

Thanks Dmitry,

I imagined that. I just thought it was already applied, but it's not.
So I spoke a bit with Lukas and he suggested me to revamp this
discussion, since it stopped all of a sudden.

Anyway... once you guys find a final patch, should I expect it at
least commited into HEAD?

Cheers,

Hi Guilherme,

5.3 is closed for major updates (it is in RC state). I would try to look
into this when we develop a strategy for next PHP version.

Thanks. Dmitry.

Guilherme Blanco wrote:

Hi guys,

What's the status on this one?!

It's an important optimization that should be considered. Save more
than a million method calls on a framework does not worth?
None gave a final word on this subject.

I could not see this commited in 5.3 neither in HEAD.
So...can someone notify me about the status of this???

Cheers,

Marcus Boerger wrote:

Aren't we able to bind these at least partially to the function call
opcode, in case we know they are constant? If all is constsnt we could
even store the whole lookup in the opcode. Well you'd have to convince
Zend to do that because os far they have always been against this
approach.

We can't modify opcode it self as it'll break opcode caches.

However we can introduce some indirect table associated with op_array,
which
can be used to implement inline caches without direct opcode modification
(in the same way as IS_CV variables work). There are a lot of papers
about
polymorphic inline caches (e.g.
http://research.sun.com/self/papers/pics.html) which we probably should
use
to not to invite bicycle.

Thanks. Dmitry.

--

--
Guilherme Blanco - Web Developer
CBC - Certified Bindows Consultant
Cell Phone: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
URL: http://blog.bisna.com
São Paulo - SP/Brazil

Method call improvements

having only one handler table per class would make objects less flexible.

These C functions can have if()s which may produce different kinds of objects.

These are three independent challenges, without overcoming each of them I do not see how virtual table would be helpful.

These C functions can have if()s which may produce different kinds of
objects.

These are three independent challenges, without overcoming each of them
I do not see how virtual table would be helpful.