Hi,
I would like to know why the opcode is not optimized. Even for some very
simple optimization like constant folding.
For exemple:
line # op fetch ext return
operands
60 0 ADD ~0 5, 7
1 ECHO ~0
which is "echo 5+7;"
-- Mathieu Suen
Am 13.01.2010 12:18, schrieb mathieu.suen:
I would like to know why the opcode is not optimized.
Because any optimization, even very simple ones, impose a performance
penalty in the default execution model of PHP which does not use a
bytecode cache.
Only when the bytecode is not regenerated for each execution does it
make sense to invest for time for the then one-time compilation.
--
Sebastian Bergmann Co-Founder and Principal Consultant
http://sebastian-bergmann.de/ http://thePHP.cc/
Sebastian Bergmann wrote:
Am 13.01.2010 12:18, schrieb mathieu.suen:
I would like to know why the opcode is not optimized.
Because any optimization, even very simple ones, impose a performance
penalty in the default execution model of PHP which does not use abytecode cache.
For simple optimization I don't think so. Take the simple example:
function foo()
{
$a = 45;
return $a;
}
Here if you don't optimize you are creating a variable. So you put
pressure on the gc and the memory.
Best would be some benchmark.
By the way why there is no native bytecode cache ?
Only when the bytecode is not regenerated for each execution does it
make sense to invest for time for the then one-time compilation.
Sorry I don't understand what do you mean?
-- Mathieu Suen
mathieu.suen wrote:
Sebastian Bergmann wrote:
Am 13.01.2010 12:18, schrieb mathieu.suen:
Because any optimization, even very simple ones, impose a performance
penalty in the default execution model of PHP which does not use a
bytecode cache.For simple optimization I don't think so. Take the simple example:
function foo()
{
$a = 45;
return $a;
}Here if you don't optimize you are creating a variable. So you put
pressure on the gc and the memory.
But most of the time, the act of optimising will take longer than just
compiling and running the code, because you have to make decisions about
whether something can be optimised and the best way to do it. As
Sebastian said, it only makes sense to invest that time when you're
going to be reusing the compiler output. Without an opcode cache, PHP
just throws away the results of the compilation, so there are zero
advantages to optimisation.
Best would be some benchmark.
By the way why there is no native bytecode cache ?Only when the bytecode is not regenerated for each execution does it
make sense to invest for time for the then one-time compilation.Sorry I don't understand what do you mean?
What Sebastian means is that it would only make sense to optimise if
you're going to cache the output -- otherwise it is simply wasted time
that could be better spent on other things.
Dave
Hi,
Optimizations such as 5+7 into 13 really don't get you much. ZEND_ADD (and
other basic opcodes) are not in any way a slow point in a program. And
unfortunately to be able to optimize these you would probably need to put in
an extra pass in the compiler which would probably just slow things down
(unless you have a LOT of these types of additions).
As for the foo() example... This looks very simple however it is actually a
very hard problem that would most likely take far more time and resources to
solve in the compiler then it would to just leave it be. The problem here is
that you need to understand everywhere that $a is assigned a value and its
value is used. The problem becomes very hard in other functions that have
loops and other types of control structures. It really just ends up
becomming a fairly complex mess to solve.
The same can be said about quite a few of the other optimizations you can
think of. On the surface they seem simple (and a few of them actually are)
but most of them are complex... largely do to some of the unique 'features'
of PHP.
In any case, optimization in PHP is not a lost cause. The first thing you
should really do is be using an opcode cache such as APC. Other than that
there are some solutions being worked on. There is Zend Optimizer and there
is pecl/optimizer (which to warn you is probably far from being stable).
There are also a few efforts to compile PHP such as the PHC compiler.
Overall though, more often than not PHP is not the bottleneck of your
program and thus optimization wont get you too much.
- Graham Kelly
mathieu.suen wrote:
Sebastian Bergmann wrote:
Am 13.01.2010 12:18, schrieb mathieu.suen:
Because any optimization, even very simple ones, impose a performance
penalty in the default execution model of PHP which does not use a
bytecode cache.For simple optimization I don't think so. Take the simple example:
function foo()
{
$a = 45;
return $a;
}Here if you don't optimize you are creating a variable. So you put
pressure on the gc and the memory.But most of the time, the act of optimising will take longer than just
compiling and running the code, because you have to make decisions about
whether something can be optimised and the best way to do it. As Sebastian
said, it only makes sense to invest that time when you're going to be
reusing the compiler output. Without an opcode cache, PHP just throws away
the results of the compilation, so there are zero advantages to
optimisation.Best would be some benchmark.
By the way why there is no native bytecode cache ?
Only when the bytecode is not regenerated for each execution does it
make sense to invest for time for the then one-time compilation.
Sorry I don't understand what do you mean?
What Sebastian means is that it would only make sense to optimise if you're
going to cache the output -- otherwise it is simply wasted time that could
be better spent on other things.Dave
Hi,
Optimizations such as 5+7 into 13 really don't get you much. ZEND_ADD (and
other basic opcodes) are not in any way a slow point in a program. And
unfortunately to be able to optimize these you would probably need to put in
an extra pass in the compiler which would probably just slow things down
(unless you have a LOT of these types of additions).As for the foo() example... This looks very simple however it is actually a
very hard problem that would most likely take far more time and resources to
solve in the compiler then it would to just leave it be. The problem here is
that you need to understand everywhere that $a is assigned a value and its
value is used. The problem becomes very hard in other functions that have
loops and other types of control structures. It really just ends up
becomming a fairly complex mess to solve.The same can be said about quite a few of the other optimizations you can
think of. On the surface they seem simple (and a few of them actually are)
but most of them are complex... largely do to some of the unique 'features'
of PHP.In any case, optimization in PHP is not a lost cause. The first thing you
should really do is be using an opcode cache such as APC. Other than that
Unfortunately: APC does not work with PHP 5.3 -- I have a site where I would
love to use it but I cannot. I use APC to great effect elsewhere.
Can anyone say when APC will be fixed for PHP 5.3, what about it being
ready for PHP 6 ?
there are some solutions being worked on. There is Zend Optimizer and there
is pecl/optimizer (which to warn you is probably far from being stable).
There are also a few efforts to compile PHP such as the PHC compiler.Overall though, more often than not PHP is not the bottleneck of your
program and thus optimization wont get you too much.
--
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php
Past chairman of UKUUG: http://www.ukuug.org/
#include <std_disclaimer.h
Alain Williams wrote:
Hi,
Optimizations such as 5+7 into 13 really don't get you much. ZEND_ADD (and
other basic opcodes) are not in any way a slow point in a program. And
unfortunately to be able to optimize these you would probably need to put in
an extra pass in the compiler which would probably just slow things down
(unless you have a LOT of these types of additions).As for the foo() example... This looks very simple however it is actually a
very hard problem that would most likely take far more time and resources to
solve in the compiler then it would to just leave it be. The problem here is
that you need to understand everywhere that $a is assigned a value and its
value is used. The problem becomes very hard in other functions that have
loops and other types of control structures. It really just ends up
becomming a fairly complex mess to solve.The same can be said about quite a few of the other optimizations you can
think of. On the surface they seem simple (and a few of them actually are)
but most of them are complex... largely do to some of the unique 'features'
of PHP.In any case, optimization in PHP is not a lost cause. The first thing you
should really do is be using an opcode cache such as APC. Other than thatUnfortunately: APC does not work with PHP 5.3 -- I have a site where I would
love to use it but I cannot. I use APC to great effect elsewhere.
The svn version works ok with 5.3. Turn off gc though. You shouldn't
be writing code that requires garbage collection anyway if you are
looking for speed.
-Rasmus
Alain Williams wrote:
Unfortunately: APC does not work with PHP 5.3 -- I have a site where I would
love to use it but I cannot. I use APC to great effect elsewhere.The svn version works ok with 5.3. Turn off gc though. You shouldn't
be writing code that requires garbage collection anyway if you are
looking for speed.
Thanks. That compiles nicely and seems to work - CentOS 5.4 - tested on both 32 & 64 bit
gc ? I cannot see anything in php.ini about that -- other than session garbage collection,
I assume that you don't mean that ?
Much of my motivation for APC is using large programs & class libraries (eg smarty, media wiki)
that take a huge amount of time to compile - but the execution path is only a small faction
of the code.
BTW: 'make test' fails horribly because the modules/ directory doesn't contain
most of the modules used (it only contains apc.so). Is there a way of having
''extension_dir'' as a '':'' separated PATH so that more than one can be listed ?
--
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 http://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php
Past chairman of UKUUG: http://www.ukuug.org/
#include <std_disclaimer.h
Alain Williams wrote:
Alain Williams wrote:
Unfortunately: APC does not work with PHP 5.3 -- I have a site where I would
love to use it but I cannot. I use APC to great effect elsewhere.
The svn version works ok with 5.3. Turn off gc though. You shouldn't
be writing code that requires garbage collection anyway if you are
looking for speed.Thanks. That compiles nicely and seems to work - CentOS 5.4 - tested on both 32 & 64 bit
gc ? I cannot see anything in php.ini about that -- other than session garbage collection,
I assume that you don't mean that ?
zend.enable_gc = Off
Hi.
Alain Williams wrote:
Unfortunately: APC does not work with PHP 5.3 -- I have a site where I would
love to use it but I cannot. I use APC to great effect elsewhere.
Hm. I have 5.3.1 with APC 3.1.3p1 and it runs fine. This is not a
production environment, but I have not yet had the impression APC was
broken.
What is it that doesn't work yor you?
The svn version works ok with 5.3. Turn off gc though.
Why is that advisable? Any pointers to background information welcome.
Regards,
Karsten
Karsten Dambekalns wrote:
Why is that advisable? Any pointers to background information welcome.
The gc code when combined with apc is still a bit shaky in 5.3. I
haven't figured out why yet. And my motivation for figuring it out is
pretty low as code that relies on gc is slow.
-Rasmus
The gc code when combined with apc is still a bit shaky in 5.3. I
haven't figured out why yet. And my motivation for figuring it out is
pretty low as code that relies on gc is slow.-Rasmus
Motivation for relying on GC in 5.3 is pretty low because 5.3 is still a bit
shaky...
Graham Kelly wrote:
Overall though, more often than not PHP is not the bottleneck of your
program and thus optimization wont get you too much.
In a lot of ways, PHP is already well-optimised. The hash tables are
fast, the executor is decent, as executors for weakly-typed languages
go. Many internal functions have quite reasonable C implementations.
Given this, sometimes it's easy to forget that PHP is pathologically
memory hungry, to the point of making simple tasks difficult or
impossible to perform in limited environments. It's the worst language
I've ever encountered in this respect. An array of small strings will
use on the order of 200 bytes per element. An array of integers will use
not much less. A simple object (due to being based on the same
inefficient data structure) may use a kilobyte or two.
Despite the large amount of time I've spent optimising MediaWiki for
memory usage, it still can't run reliably with memory_limit set less
than about 80MB. That means you need a server with 500MB if you want to
set MaxClients high enough to let a few people use it at the same time.
So if it were my job to set priorities for PHP development, I'd spend
less time thinking about folding constants and more time thinking about
things like:
- Objects that can optionally pack themselves into a class-dependent
structure and unpack on demand - Exposing strongly-typed list and vector data structures to the user,
that don't have massive hashtable overheads - An oparray format with less 64-bit pointers and more smallish integers
That sort of thing.
-- Tim Starling
Tim Starling wrote:
Given this, sometimes it's easy to forget that PHP is pathologically
memory hungry, to the point of making simple tasks difficult or
impossible to perform in limited environments. It's the worst language
I've ever encountered in this respect. An array of small strings will
use on the order of 200 bytes per element. An array of integers will use
not much less. A simple object (due to being based on the same
inefficient data structure) may use a kilobyte or two.
A zval is around 64 bytes. So, to use 200 bytes per string element,
each of your strings must be around 136 chars long.
For me, working in super high-load environments, this was never an issue
because memory was always way more plentiful than cpu. You can only
slice a cpu in so many slices. Even if you could run 1024 concurrent
Apache/PHP processes, you wouldn't want to unless you could somehow
shove 64 cpus into your machine. For high-performance high-load
environments you want to get each request serviced as fast as possible
and attempting to handle too many concurrent requests works against you
here.
-Rasmus
Rasmus Lerdorf wrote:
Tim Starling wrote:
Given this, sometimes it's easy to forget that PHP is pathologically
memory hungry, to the point of making simple tasks difficult or
impossible to perform in limited environments. It's the worst language
I've ever encountered in this respect. An array of small strings will
use on the order of 200 bytes per element. An array of integers will use
not much less. A simple object (due to being based on the same
inefficient data structure) may use a kilobyte or two.A zval is around 64 bytes. So, to use 200 bytes per string element,
each of your strings must be around 136 chars long.
<?php
$m = memory_get_usage()
;
$a = explode(',', str_repeat(',', 100000));
print (memory_get_usage() - $m)/100000;
?>
I get 197 on 32-bit and 259 on 64-bit. Try it for yourself if you don't
believe me. I've cross-checked memory_get_usage()
against "ps -o rss",
it's pretty accurate.
For me, working in super high-load environments, this was never an issue
because memory was always way more plentiful than cpu. You can only
slice a cpu in so many slices. Even if you could run 1024 concurrent
Apache/PHP processes, you wouldn't want to unless you could somehow
shove 64 cpus into your machine. For high-performance high-load
environments you want to get each request serviced as fast as possible
and attempting to handle too many concurrent requests works against you
here.
Maybe the tasks you do are usually with small data sets.
-- Tim Starling
Hi!
<?php
$m =memory_get_usage()
;
$a = explode(',', str_repeat(',', 100000));
print (memory_get_usage() - $m)/100000;
Says 93.2482 for me. Should be even less since string generated by
str_repead itself also is counted as overhead (without it it's 92.2474).
Aren't you perchance using debug build? Debug build gives 196 for me.
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
Hi!
Says 93.2482 for me. Should be even less since string generated by
On 64-bit I get about 170 bytes for 5.2, don't have 5.3 build handy on
64-bit.
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
Says 93.2482 for me. Should be even less since string generated by
On 64-bit I get about 170 bytes for 5.2, don't have 5.3 build handy on 64-bit.
On 64bit (debug builds):
derick@kossu:~$ pe 5.3dev
derick@kossu:~$ php
<?php
$m = memory_get_usage()
;
$a = explode(',', str_repeat(',', 100000));
print (memory_get_usage() - $m)/100000;
?>
378.54448
derick@kossu:~$ pe 5.2dev
derick@kossu:~$ php
<?php
$m = memory_get_usage()
;
$a = explode(',', str_repeat(',', 100000));
print (memory_get_usage() - $m)/100000;
?>
370.57952
with kind regards,
Derick
http://derickrethans.nl | http://xdebug.org
twitter: @derickr
Stanislav Malyshev wrote:
Hi!
Says 93.2482 for me. Should be even less since string generated by
On 64-bit I get about 170 bytes for 5.2, don't have 5.3 build handy on
64-bit.
178.4972 5.3 non-debug 64-bit Linux
-Rasmus
Stanislav Malyshev wrote:
Hi!
<?php
$m =memory_get_usage()
;
$a = explode(',', str_repeat(',', 100000));
print (memory_get_usage() - $m)/100000;Says 93.2482 for me. Should be even less since string generated by
str_repead itself also is counted as overhead (without it it's
92.2474). Aren't you perchance using debug build? Debug build gives
196 for me.
Yes, it was debug on 32-bit, but non-debug on 64-bit. So non-debug
memory usage on 64-bit is still 259 bytes per element. On 64-bit I am
using PHP 5.2.4-2ubuntu5.7wm1 from apt.wikimedia.org.
In another post:
HashTable uses 40 bytes, zval is 16 bytes, Bucket is 36 bytes, which
means if you use integer indexes, the overhead is 72 bytes per value
including memory block headers and alignments. It might be too much
for you, in which case I'd go towards making an extension that creates
an object storing strings more efficiently and implementing either
get/set handlers or ArrayAccess (or both). This of course would be
most useful if you access only small part of strings in each
function/method.
Fair enough, but we do have to support default installations. We do
already have a couple of optional extensions which reduce memory usage,
but they do more specific tasks than that.
I do not see what could be removed from Bucket or zval without hurting
the functionality.
Right, and that's why PHP is so bad compared to other languages. Its
one-size-fits-all data structure has to store a lot of data per element
to support every possible use case. However, there is room for
optimisation. For instance, an array could start off as being like a C++
std::vector. Then when someone inserts an item into it with a
non-integer key, it could be converted to a hashtable. This could
potentially give you a time saving as well, because conversion to a
hashtable could resize the destination hashtable in one step instead of
growing it O(log N) times.
Some other operations, like deleting items from the middle of the array
or adding items past the end (leaving gaps) would also have to trigger
conversion. The point would be to optimise the most common use cases for
integer-indexed arrays.
not much less. A simple object (due to being based on the same
inefficient data structure) may use a kilobyte or two.Kilobyte looks like too much for a single simple object (unless we
have different notions of simple). Could you describe what exactly
makes up the kilobyte - what's in the object?
<?php
class C {
var $v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9, $v10;
}
$m = memory_get_usage()
;
$a = array();
for ( $i = 0; $i < 10000; $i++ ) {
$a[] = new C;
}
print ((memory_get_usage() - $m) / 10000) . "\n";
?>
1927 bytes (I'll use 64-bit from now on since it gives the most shocking
numbers)
- Objects that can optionally pack themselves into a class-dependent
structure and unpack on demandObjects can do pretty much anything in Zend Engine now, provided you
do some C :) For the engine, object is basically a pointer and an
integer, the rest is changeable. Of course, on PHP level we need to
have more, but that's because certain things just not doable on PHP
level. Do you have some specific use case that would allow to reduce
Basically I'm thinking along the same lines as the array optimisation I
suggested above. For my class C in the test above, the zend_class_entry
would have a hashtable like:
v1 => 0, v2 => 1, v3 => 2, v4 => 3, v5 => 4, v6 => 5, v7 => 6, v8 =>7,
v9 => 8, v10 => 9
Then the object could be stored as a zval[10]. Object member access
would be implemented by looking up the member name in the class entry
hashtable and then using the resulting index into the zval[10]. When the
object is unpacked (say if the user creates or deletes object members at
runtime), then the object value becomes a hashtable.
- Exposing strongly-typed list and vector data structures to the user,
that don't have massive hashtable overheads- An oparray format with less 64-bit pointers and more smallish integers
Ah, you're on 64-bit... That explains why your memory requirements is
larger :) But I'm not sure how the data op array needs can be stored
without using pointers.
Making oplines use a variable amount of memory (like they do in machine
code) would be a great help.
For declarations, you could pack structures like zend_class_entry and
zend_function_entry on to the end of the opline, and access them by
casting the opline to the appropriate opcode-specific type. That would
save pointers and also allocator overhead.
At the more extreme end of the spectrum, the compiler could produce a
pointerless oparray, like JVM bytecode. Then when a function is executed
for the first time, the oparray could be expanded, with pointers added,
and the result cached. This would reduce memory usage for code which is
never executed. And it would have the added advantage of making APC
easier to implement, since it could just copy the whole unexpanded
oparray with memcpy().
-- Tim Starling
Tim Starling wrote:
Some other operations, like deleting items from the middle of the array
or adding items past the end (leaving gaps) would also have to trigger
conversion. The point would be to optimise the most common use cases for
integer-indexed arrays.
I still say this isn't something most people run into. I have looked at
a lot of code in a lot of different use cases and I always see things
being cpu-bound long before it is memory-bound.
-Rasmus
Tim Starling wrote:
<?php
class C {
var $v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9, $v10;
}$m =
memory_get_usage()
;
$a = array();
for ( $i = 0; $i < 10000; $i++ ) {
$a[] = new C;
}
print ((memory_get_usage() - $m) / 10000) . "\n";
?>1927 bytes (I'll use 64-bit from now on since it gives the most shocking
numbers)
PHP 5.3.3-dev (cli) (built: Jan 11 2010 11:26:25)
Linux colo 2.6.31-1-amd64 #1 SMP Sat Oct 24 17:50:31 UTC 2009 x86_64
php > class C {
php { var $v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9, $v10;
php { }
php >
php > $m = memory_get_usage()
;
php > $a = array();
php > for ( $i = 0; $i < 10000; $i++ ) {
php { $a[] = new C;
php { }
php > print ((memory_get_usage() - $m) / 10000) . "\n";
1479.5632
So you need 1500 bytes per object in your array. I still fail to see
the problem for a web request. Maybe I am just old-fashioned in the way
I look at this stuff, but if you have more than 1000 objects loaded on a
single request, you are doing something wrong as far as I am concerned.
This is why we do things like unbuffered mysql queries, zero-copy stream
passing, etc. We never want entire result sets or entire files in
memory because even if we optimize the crap out of it, it is still going
to be way faster to simply not do that.
-Rasmus
Rasmus Lerdorf wrote:
Tim Starling wrote:
<?php
class C {
var $v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9, $v10;
}$m =
memory_get_usage()
;
$a = array();
for ( $i = 0; $i < 10000; $i++ ) {
$a[] = new C;
}
print ((memory_get_usage() - $m) / 10000) . "\n";
?>1927 bytes (I'll use 64-bit from now on since it gives the most shocking
numbers)PHP 5.3.3-dev (cli) (built: Jan 11 2010 11:26:25)
Linux colo 2.6.31-1-amd64 #1 SMP Sat Oct 24 17:50:31 UTC 2009 x86_64php > class C {
php { var $v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9, $v10;
php { }
php >
php > $m =memory_get_usage()
;
php > $a = array();
php > for ( $i = 0; $i < 10000; $i++ ) {
php { $a[] = new C;
php { }
php > print ((memory_get_usage() - $m) / 10000) . "\n";
1479.5632So you need 1500 bytes per object in your array. I still fail to see
the problem for a web request. Maybe I am just old-fashioned in the way
I look at this stuff, but if you have more than 1000 objects loaded on a
single request, you are doing something wrong as far as I am concerned.This is why we do things like unbuffered mysql queries, zero-copy stream
passing, etc. We never want entire result sets or entire files in
memory because even if we optimize the crap out of it, it is still going
to be way faster to simply not do that.
actually with mysqlnd a buffered set might be faster, if you know what
are you doing, because the data won't be copied once more. with
unbuffered sets data is copied from the network buffer to the zval. With
buffered sets the zval just pointes to the network buffer. If you have
the RAM then buffered should be faster. Of course you should use the set
and when finished close it and not fetch-close-process, because then
copy is forced.
Best,
Andrey
Hi!
class C {
var $v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9, $v10;
}$m =
memory_get_usage()
;
$a = array();
for ( $i = 0; $i< 10000; $i++ ) {
$a[] = new C;
}
print ((memory_get_usage() - $m) / 10000) . "\n";
?>1927 bytes (I'll use 64-bit from now on since it gives the most shocking
numbers)
OK, you have object with 10 vars - as we established, vars in array take
100-200 bytes overhead (depending on bits - 64bits is fatter) so it fits
the pattern.
Then the object could be stored as a zval[10]. Object member access
would be implemented by looking up the member name in the class entry
hashtable and then using the resulting index into the zval[10]. When the
object is unpacked (say if the user creates or deletes object members at
runtime), then the object value becomes a hashtable.
That would mean having 2 object types - "packed" and "unpacked" with all
(most of) operations basically duplicated. However, for objects it's
easier than for arrays since objects API is more abstract. I'm not sure
that would improve situation though - a lot of objects are dynamic and
for those it would mean a penalty when the object is unpacked.
But this can be tested on the current engine (maybe even without
breaking BC!) and if it gives good results it may be an option.
Making oplines use a variable amount of memory (like they do in machine
code) would be a great help.For declarations, you could pack structures like zend_class_entry and
zend_function_entry on to the end of the opline, and access them by
casting the opline to the appropriate opcode-specific type. That would
save pointers and also allocator overhead.
zend_class_entry is huge, why would you want to put it into the opline?
And what opline needs static zend_class_entry anyway?
At the more extreme end of the spectrum, the compiler could produce a
pointerless oparray, like JVM bytecode. Then when a function is executed
for the first time, the oparray could be expanded, with pointers added,
and the result cached. This would reduce memory usage for code which is
opcodes can be cached (bytecode caches do it) but op_array can't really
be cached between requests because it contains dynamic structures.
Unlike Java, PHP does full cleanup after each request, which means no
preserving dynamic data.
I'm not sure how using pointers in op_array in such manner would help
though - you'd still need to store things like function names, for
example, and since you need to store it somewhere, you'd also have some
pointer to this place. Same goes for a bunch of other op_array's
properties - you'd need to store them somewhere and be able to find
them, so I don't see how you'd do it without a pointer of some kind
involved.
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
Stanislav Malyshev wrote:
opcodes can be cached (bytecode caches do it) but op_array can't
really be cached between requests because it contains dynamic
structures. Unlike Java, PHP does full cleanup after each request,
which means no preserving dynamic data.
APC deep-copies the whole zend_op_array, see apc_copy_op_array() in
apc_compile.c. It does it using an impressive pile of hacks which break
with every major release and in some minor releases too. Every time the
compiler allocates memory, there has to be a matching shared memory
allocation in APC.
But maybe you missed my point. I'm talking about a cache which is cheap
to construct and cleared at the end of each request. It would optimise
tight loops of calls to user-defined functions. The dynamic data, like
static variable hashtables, would be in it. The compact pointerless
structure could be stored between requests, and would not contain
dynamic data.
Basically a structure like the current zend_op_array would be created on
demand by the executor instead of in advance by the compiler.
I'm not sure how using pointers in op_array in such manner would help
though - you'd still need to store things like function names, for
example, and since you need to store it somewhere, you'd also have
some pointer to this place.
You can do it with a length field and a char[1] at the end of the
structure. When you allocate memory for the structure, you add some on
for the string. Then you copy the string into the char[1], overflowing it.
If you need several strings, then you can have several byte offsets,
which are added to the start of the char[1] to find the location of the
string in question. You can make the offset fields small, say 16 bits.
But it's mostly zend_op I'm interested in rather than zend_op_array.
Currently if a zend_op has a string literal argument, you'd make a zval
for it and copy it into op1.u.constant. But the zval allocation could be
avoided. The handler could cast the zend_op to a zend_op_with_a_string,
which would have a length field and an overflowed char[1] at the end for
the string argument.
A variable op size would make iterating through zend_op_array.opcodes
would be slightly more awkward, something like:
for (; op < oparray_end; op = (zend_op*)((char*)op + op->size)) {
...
But obviously you could clean that up with a macro.
For Mr. "everyone has 8GB of memory and tiny little data sets" Lerdorf,
I could point out that reducing the average zend_op size and placing
strings close to other op data will also make execution faster, due to
the improved CPU cache hit rate.
-- Tim Starling
Tim Starling wrote:
For Mr. "everyone has 8GB of memory and tiny little data sets" Lerdorf,
I could point out that reducing the average zend_op size and placing
strings close to other op data will also make execution faster, due to
the improved CPU cache hit rate.
Nice twist there. I simply related memory to cpu and the assumption was
that if you had a dual quad-core system, chances are that you also had
8G of ram. Having 8 cores with only 1G of ram would be a weird server
config.
-Rasmus
Hi!
Basically a structure like the current zend_op_array would be created on
demand by the executor instead of in advance by the compiler.
I guess we could have strings, etc. put in one big string buffer and
refer to them by 32-bit index, that would probably work with statically
allocated things (like filenames, etc.) But that'd only be useful in
64-bit case, and would just slow down 32-bit (since we probably couldn't
afford having 16-bit indexes), which means we should either have
separate code for 32 and 64 or have a ton of macros for each string
access. I'm not sure that is worth the trouble.
We could do something that could improve things somewhat - namely,
organize all static strings into per-op-array string table (in op_array
and znode zvals) and refer to them by index. That also would give us
some advantages since we could precalculate hashes. IIRC Dmitry Stogov
had done some research on that.
You can do it with a length field and a char[1] at the end of the
structure. When you allocate memory for the structure, you add some on
for the string. Then you copy the string into the char[1], overflowing it.If you need several strings, then you can have several byte offsets,
which are added to the start of the char[1] to find the location of the
string in question. You can make the offset fields small, say 16 bits.
It's definitely be too much trouble to work with such structures, will
lead to a ton of bugs and it'd be a nightmare to manage...
But it's mostly zend_op I'm interested in rather than zend_op_array.
Currently if a zend_op has a string literal argument, you'd make a zval
for it and copy it into op1.u.constant. But the zval allocation could be
No, zval is part of znode. There might be an allocation on compile
stage, etc. but it's temporary - the zval itself is stored inside znode,
not allocated elsewhere. See zend_compile.h
avoided. The handler could cast the zend_op to a zend_op_with_a_string,
which would have a length field and an overflowed char[1] at the end for
the string argument.
Since we need to address zend_op's inside array, variable size ops would
be a major inconvenience. Also, since zval is an union, I'm not even
sure you'll be saving that much. Constant table though might allow some
savings, but would complicate opcodes somewhat.
A variable op size would make iterating through zend_op_array.opcodes
would be slightly more awkward, something like:
Note that we need not just iterating but also random access (and no, not
only for goto :) - many constructs are compiled into code including jumps).
BTW, as for more effective vars storage - did you look at SPL types,
especially SplFixedArray? It looks like exactly what you want with
fixed-size storage.
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
-----Original Message-----
From: Tim Starling [mailto:tstarling@wikimedia.org]
Sent: Wednesday, January 13, 2010 7:19 PM
To: Stas Malyshev
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] About optimizationStanislav Malyshev wrote:
opcodes can be cached (bytecode caches do it) but op_array can't
really be cached between requests because it contains dynamic
structures. Unlike Java, PHP does full cleanup after each request,
which means no preserving dynamic data.APC deep-copies the whole zend_op_array, see apc_copy_op_array() in
apc_compile.c. It does it using an impressive pile of hacks which
break with
every major release and in some minor releases too. Every time the
compiler
allocates memory, there has to be a matching shared memory allocation
in
APC.But maybe you missed my point. I'm talking about a cache which is
cheap to
construct and cleared at the end of each request. It would optimise
tight loops
of calls to user-defined functions. The dynamic data, like static
variable
hashtables, would be in it. The compact pointerless structure could be
stored
between requests, and would not contain dynamic data.Basically a structure like the current zend_op_array would be created
on
demand by the executor instead of in advance by the compiler.I'm not sure how using pointers in op_array in such manner would
help
though - you'd still need to store things like function names, for
example, and since you need to store it somewhere, you'd also have
some pointer to this place.You can do it with a length field and a char[1] at the end of the
structure. When
you allocate memory for the structure, you add some on for the string.
Then
you copy the string into the char[1], overflowing it.If you need several strings, then you can have several byte offsets,
which are
added to the start of the char[1] to find the location of the string
in question.
You can make the offset fields small, say 16 bits.But it's mostly zend_op I'm interested in rather than zend_op_array.
Currently if a zend_op has a string literal argument, you'd make a
zval for it
and copy it into op1.u.constant. But the zval allocation could be
avoided. The
handler could cast the zend_op to a zend_op_with_a_string, which would
have
a length field and an overflowed char[1] at the end for the string
argument.
I tried the char[1] trick in the past. I can't quite remember why I
passed on it but I think because it now changed the sizes from zval from
being fixed and therefore couldn't efficiently cache zval allocations in
the memory manager (and of course this does not work with zend_opline
like structures where we have more than one zend_op(zval) in the
structure.
Andi
Tim Starling wrote:
Rasmus Lerdorf wrote:
For me, working in super high-load environments, this was never an issue
because memory was always way more plentiful than cpu. You can only
slice a cpu in so many slices. Even if you could run 1024 concurrent
Apache/PHP processes, you wouldn't want to unless you could somehow
shove 64 cpus into your machine. For high-performance high-load
environments you want to get each request serviced as fast as possible
and attempting to handle too many concurrent requests works against you
here.Maybe the tasks you do are usually with small data sets.
Well, I was referring to Yahoo-sized stuff. So no, the datasets are
rather huge, but on a per-request basis you want to architect things so
you only load things you actually need on that one request.
If you really do need to play around with hundreds of thousands of
records of anything in memory on a single request, then you should
definitely be looking at writing an extension and doing that in a custom
data type streamlined for that particular type of data.
Keeping your Apache2 processes around 40M or below even for less than
efficient code was never much of a problem and that means you can do
about 50 processes in 2G of memory. You probably don't want to go much
beyond 50 concurrent requests on a single quad-core cpu since there just
won't be enough juice for each one to finish in a timely manner. Dual
quad-core and you can probably go to about 100, but you also tend to
have more ram in those. You can of course crank up the concurrency if
you are willing to take the latency hit.
For my own stuff that doesn't use any heavy framework code I easily keep
my per-Apache incremental memory usage under 10M.
-Rasmus
Hi!
Given this, sometimes it's easy to forget that PHP is pathologically
memory hungry, to the point of making simple tasks difficult or
impossible to perform in limited environments. It's the worst language
I've ever encountered in this respect. An array of small strings will
use on the order of 200 bytes per element. An array of integers will use
HashTable uses 40 bytes, zval is 16 bytes, Bucket is 36 bytes, which
means if you use integer indexes, the overhead is 72 bytes per value
including memory block headers and alignments. It might be too much for
you, in which case I'd go towards making an extension that creates an
object storing strings more efficiently and implementing either get/set
handlers or ArrayAccess (or both). This of course would be most useful
if you access only small part of strings in each function/method.
I do not see what could be removed from Bucket or zval without hurting
the functionality.
not much less. A simple object (due to being based on the same
inefficient data structure) may use a kilobyte or two.
Kilobyte looks like too much for a single simple object (unless we have
different notions of simple). Could you describe what exactly makes up
the kilobyte - what's in the object?
- Objects that can optionally pack themselves into a class-dependent
structure and unpack on demand
Objects can do pretty much anything in Zend Engine now, provided you do
some C :) For the engine, object is basically a pointer and an integer,
the rest is changeable. Of course, on PHP level we need to have more,
but that's because certain things just not doable on PHP level. Do you
have some specific use case that would allow to reduce
- Exposing strongly-typed list and vector data structures to the user,
that don't have massive hashtable overheads- An oparray format with less 64-bit pointers and more smallish integers
Ah, you're on 64-bit... That explains why your memory requirements is
larger :) But I'm not sure how the data op array needs can be stored
without using pointers.
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
- Exposing strongly-typed list and vector data structures to the user,
that don't have massive hashtable overheads
I'm actually working on a few things here.. some more efficient sets and
hashes. Expect more to see soon.
regards,
Derick
--
http://derickrethans.nl | http://xdebug.org
twitter: @derickr