Hi folks. I have a question about the PHP runtime that I hope is appropriate
for this list. (If not, please thwap me gently; I bruise easily.)
I know PHP does copy-on-write. However, how "deeply" does it copy when
dealing with nested arrays?
This is probably easiest to explain with an example...
$a['foo']['bar']['baz'] = 1;
$a['foo']['bar']['bob'] = 1;
$a['foo']['bar']['narf'] = 1;
$a['foo']['poink']['narf'] = 1;
function test($b) {
// Assume each of the following lines in isolation...
// Does this copy just the one variable baz, or the full array?
$b['foo']['bar']['baz'] = 2;
// Does this copy $b, or just $b['foo']['poink']?
$b['foo']['poink']['stuff'] = 3;
return $b;
}
// I know this is wasteful; I'm trying to figure out just how wasteful.
$a = test($a);
test() in this case should take $b by reference, but I'm trying to determine
how much of a difference it is. (In practice my use case has a vastly larger
array, so any inefficiencies are multiplied.)
--Larry Garfield
It does the whole of $b. It has to, because when you change 'baz', a reference in
'bar' needs to change to point to the newly copied 'baz', so 'bar' is
written...and likewise 'foo' is written.
Ben.
Hi folks. I have a question about the PHP runtime that I hope is appropriate
for this list. (If not, please thwap me gently; I bruise easily.)I know PHP does copy-on-write. However, how "deeply" does it copy when
dealing with nested arrays?This is probably easiest to explain with an example...
$a['foo']['bar']['baz'] = 1;
$a['foo']['bar']['bob'] = 1;
$a['foo']['bar']['narf'] = 1;
$a['foo']['poink']['narf'] = 1;function test($b) {
// Assume each of the following lines in isolation...// Does this copy just the one variable baz, or the full array?
$b['foo']['bar']['baz'] = 2;// Does this copy $b, or just $b['foo']['poink']?
$b['foo']['poink']['stuff'] = 3;return $b;
}// I know this is wasteful; I'm trying to figure out just how wasteful.
$a = test($a);test() in this case should take $b by reference, but I'm trying to determine
how much of a difference it is. (In practice my use case has a vastly larger
array, so any inefficiencies are multiplied.)--Larry Garfield
That's what I was afraid of. So it does copy the entire array. Crap. :-)
Am I correct that each level in the array represents its own ZVal, with the
additional memory overhead a ZVal has (however many bytes that is)?
That is, the array below would have $a, foo, bar, baz, bob, narf, poink,
poink/narf = 8 ZVals? (That seems logical to me because each its its own
variable that just happens to be an array, but I want to be sure.)
--Larry Garfield
It does the whole of $b. It has to, because when you change 'baz', a
reference in 'bar' needs to change to point to the newly copied 'baz', so
'bar' is written...and likewise 'foo' is written.Ben.
Hi folks. I have a question about the PHP runtime that I hope is
appropriate for this list. (If not, please thwap me gently; I bruise
easily.)I know PHP does copy-on-write. However, how "deeply" does it copy when
dealing with nested arrays?This is probably easiest to explain with an example...
$a['foo']['bar']['baz'] = 1;
$a['foo']['bar']['bob'] = 1;
$a['foo']['bar']['narf'] = 1;
$a['foo']['poink']['narf'] = 1;function test($b) {
// Assume each of the following lines in isolation...
// Does this copy just the one variable baz, or the full array?
$b['foo']['bar']['baz'] = 2;// Does this copy $b, or just $b['foo']['poink']?
$b['foo']['poink']['stuff'] = 3;return $b;
}
// I know this is wasteful; I'm trying to figure out just how wasteful.
$a = test($a);test() in this case should take $b by reference, but I'm trying to
determine how much of a difference it is. (In practice my use case has
a vastly larger array, so any inefficiencies are multiplied.)--Larry Garfield
Yep. PHP does clock up memory very quickly for big arrays, objects with lots of
members and/or lots of small objects with large overheads. There are a LOT of
zvals and zobjects and things around the place, and their overhead isn't all that
small.
Of course, if you go to the trouble to construct arrays using references, you can
avoid some of that, because a copy-on-write will just copy the reference. It does
mean you're passing references, though.
$bar['baz'] = 1;
$poink['narf'] = 1;
$a['foo']['bar'] =& $bar;
$a['foo']['poink'] =& $poink;
Then if you test($a), $bar and $poink will be changed, since they are 'passed by
reference'--no copying needs to be done. It's almost as if $b were passed by
reference, but setting $b['blip'] wouldn't show up in $a, because $a itself would
be copied in that case, including the references, which would continue to refer to
$bar and $poink. So a much quicker copy, but obviously not the same level of
isolation that you might expect or desire. Unless you did some jiggerypokery like
$b_bar=$b['bar']; $b['bar']=$b_bar; which would break the reference and make a
copy of just that part of the array. But this is a pretty nasty caller-callee
co-operative kind of thing. Just a thought to throw into the mix, though.
Disclaimer: I'm somewhat out of my depth here. But I'm sure someone will jump on
me if I'm wrong.
Ben.
That's what I was afraid of. So it does copy the entire array. Crap. :-)
Am I correct that each level in the array represents its own ZVal, with the
additional memory overhead a ZVal has (however many bytes that is)?That is, the array below would have $a, foo, bar, baz, bob, narf, poink,
poink/narf = 8 ZVals? (That seems logical to me because each its its own
variable that just happens to be an array, but I want to be sure.)--Larry Garfield
It does the whole of $b. It has to, because when you change 'baz', a
reference in 'bar' needs to change to point to the newly copied 'baz', so
'bar' is written...and likewise 'foo' is written.Ben.
Hi folks. I have a question about the PHP runtime that I hope is
appropriate for this list. (If not, please thwap me gently; I bruise
easily.)I know PHP does copy-on-write. However, how "deeply" does it copy when
dealing with nested arrays?This is probably easiest to explain with an example...
$a['foo']['bar']['baz'] = 1;
$a['foo']['bar']['bob'] = 1;
$a['foo']['bar']['narf'] = 1;
$a['foo']['poink']['narf'] = 1;function test($b) {
// Assume each of the following lines in isolation... // Does this copy just the one variable baz, or the full array? $b['foo']['bar']['baz'] = 2; // Does this copy $b, or just $b['foo']['poink']? $b['foo']['poink']['stuff'] = 3; return $b;
}
// I know this is wasteful; I'm trying to figure out just how wasteful.
$a = test($a);test() in this case should take $b by reference, but I'm trying to
determine how much of a difference it is. (In practice my use case has
a vastly larger array, so any inefficiencies are multiplied.)--Larry Garfield
Using references does not speed up PHP. It does that already
internally, if I'm not mistaken. The point of my post was that
assigning values to tree arrays are in general faster than a full
array copy.
Hannes
Yep. PHP does clock up memory very quickly for big arrays, objects with lots
of members and/or lots of small objects with large overheads. There are a
LOT of zvals and zobjects and things around the place, and their overhead
isn't all that small.Of course, if you go to the trouble to construct arrays using references,
you can avoid some of that, because a copy-on-write will just copy the
reference. It does mean you're passing references, though.$bar['baz'] = 1;
$poink['narf'] = 1;
$a['foo']['bar'] =& $bar;
$a['foo']['poink'] =& $poink;Then if you test($a), $bar and $poink will be changed, since they are
'passed by reference'--no copying needs to be done. It's almost as if $b
were passed by reference, but setting $b['blip'] wouldn't show up in $a,
because $a itself would be copied in that case, including the references,
which would continue to refer to $bar and $poink. So a much quicker copy,
but obviously not the same level of isolation that you might expect or
desire. Unless you did some jiggerypokery like $b_bar=$b['bar'];
$b['bar']=$b_bar; which would break the reference and make a copy of just
that part of the array. But this is a pretty nasty caller-callee
co-operative kind of thing. Just a thought to throw into the mix, though.Disclaimer: I'm somewhat out of my depth here. But I'm sure someone will
jump on me if I'm wrong.Ben.
That's what I was afraid of. So it does copy the entire array. Crap. :-)
Am I correct that each level in the array represents its own ZVal, with
the
additional memory overhead a ZVal has (however many bytes that is)?That is, the array below would have $a, foo, bar, baz, bob, narf, poink,
poink/narf = 8 ZVals? (That seems logical to me because each its its own
variable that just happens to be an array, but I want to be sure.)--Larry Garfield
It does the whole of $b. It has to, because when you change 'baz', a
reference in 'bar' needs to change to point to the newly copied 'baz', so
'bar' is written...and likewise 'foo' is written.Ben.
Hi folks. I have a question about the PHP runtime that I hope is
appropriate for this list. (If not, please thwap me gently; I bruise
easily.)I know PHP does copy-on-write. However, how "deeply" does it copy when
dealing with nested arrays?This is probably easiest to explain with an example...
$a['foo']['bar']['baz'] = 1;
$a['foo']['bar']['bob'] = 1;
$a['foo']['bar']['narf'] = 1;
$a['foo']['poink']['narf'] = 1;function test($b) {
// Assume each of the following lines in isolation...
// Does this copy just the one variable baz, or the full array?
$b['foo']['bar']['baz'] = 2;// Does this copy $b, or just $b['foo']['poink']?
$b['foo']['poink']['stuff'] = 3;return $b;
}
// I know this is wasteful; I'm trying to figure out just how wasteful.
$a = test($a);test() in this case should take $b by reference, but I'm trying to
determine how much of a difference it is. (In practice my use case has
a vastly larger array, so any inefficiencies are multiplied.)--Larry Garfield
What about objects?
class Foo {
public $foo;
}
function test($o) {
$o->foo->foo->foo = 2;
}
$bar = new Foo;
$bar->foo = new Foo;
$bar->foo->foo = new Foo;
test( $bar );
Also... is it better to pass an object as a parameter rather than many
values?
function withValues($anInteger, $aBool, $aString) {
var_dump($anInteger, $aBool, $aString);
}
function withObject(ParamOject $o) {
var_dump( $o->theInteger(), $o->theBool(), $o->theString() );
}
Martin Scotta
On Wed, Jan 19, 2011 at 5:03 AM, Hannes Landeholm landeholm@gmail.comwrote:
Using references does not speed up PHP. It does that already
internally, if I'm not mistaken. The point of my post was that
assigning values to tree arrays are in general faster than a full
array copy.Hannes
On 19 January 2011 08:36, Ben Schmidt mail_ben_schmidt@yahoo.com.au
wrote:Yep. PHP does clock up memory very quickly for big arrays, objects with
lots
of members and/or lots of small objects with large overheads. There are a
LOT of zvals and zobjects and things around the place, and their overhead
isn't all that small.Of course, if you go to the trouble to construct arrays using references,
you can avoid some of that, because a copy-on-write will just copy the
reference. It does mean you're passing references, though.$bar['baz'] = 1;
$poink['narf'] = 1;
$a['foo']['bar'] =& $bar;
$a['foo']['poink'] =& $poink;Then if you test($a), $bar and $poink will be changed, since they are
'passed by reference'--no copying needs to be done. It's almost as if $b
were passed by reference, but setting $b['blip'] wouldn't show up in $a,
because $a itself would be copied in that case, including the references,
which would continue to refer to $bar and $poink. So a much quicker copy,
but obviously not the same level of isolation that you might expect or
desire. Unless you did some jiggerypokery like $b_bar=$b['bar'];
$b['bar']=$b_bar; which would break the reference and make a copy of just
that part of the array. But this is a pretty nasty caller-callee
co-operative kind of thing. Just a thought to throw into the mix, though.Disclaimer: I'm somewhat out of my depth here. But I'm sure someone will
jump on me if I'm wrong.Ben.
That's what I was afraid of. So it does copy the entire array. Crap.
:-)Am I correct that each level in the array represents its own ZVal, with
the
additional memory overhead a ZVal has (however many bytes that is)?That is, the array below would have $a, foo, bar, baz, bob, narf, poink,
poink/narf = 8 ZVals? (That seems logical to me because each its its
own
variable that just happens to be an array, but I want to be sure.)--Larry Garfield
It does the whole of $b. It has to, because when you change 'baz', a
reference in 'bar' needs to change to point to the newly copied 'baz',
so
'bar' is written...and likewise 'foo' is written.Ben.
Hi folks. I have a question about the PHP runtime that I hope is
appropriate for this list. (If not, please thwap me gently; I bruise
easily.)I know PHP does copy-on-write. However, how "deeply" does it copy
when
dealing with nested arrays?This is probably easiest to explain with an example...
$a['foo']['bar']['baz'] = 1;
$a['foo']['bar']['bob'] = 1;
$a['foo']['bar']['narf'] = 1;
$a['foo']['poink']['narf'] = 1;function test($b) {
// Assume each of the following lines in isolation...
// Does this copy just the one variable baz, or the full array?
$b['foo']['bar']['baz'] = 2;// Does this copy $b, or just $b['foo']['poink']?
$b['foo']['poink']['stuff'] = 3;return $b;
}
// I know this is wasteful; I'm trying to figure out just how
wasteful.
$a = test($a);test() in this case should take $b by reference, but I'm trying to
determine how much of a difference it is. (In practice my use case
has
a vastly larger array, so any inefficiencies are multiplied.)--Larry Garfield
On Wed, 19 Jan 2011 14:23:49 -0000, Martin Scotta martinscotta@gmail.com
wrote:
What about objects?
With objects less copying occurs because the object value (zval) data is
actually just a pointer and an id that for most purposes works as a
pointer.
However, it should be said that while a copy of an array forces more
memory to be copied, the inner zvals are not actually copied. In this
snippet:
$a = array(1, 2, array(3));
$b = $a;
function separate(&$dummy) { }
separate($a);
the copy that occurs when you force the separation of the zval that is
shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it merely
copies the zval pointer of $a to $b and increments its reference count) is
just a shallow copy of hash table and a increment of the first level
zvals' refcounts. This means the zvals that have their pointers stored in
the array $a's HashTable are not themselves copied.
Interestingly (or should I say, unfortunately), this happens even if the
inner zvals are references. See
http://php.net/manual/en/language.references.whatdo.php the part on arrays.
class Foo {
public $foo;
}function test($o) {
$o->foo->foo->foo = 2;
}$bar = new Foo;
$bar->foo = new Foo;
$bar->foo->foo = new Foo;test( $bar );
This example shows no copying (in the sense of "new zval allocation on
passing or assignment") at all.
Also... is it better to pass an object as a parameter rather than many
values?function withValues($anInteger, $aBool, $aString) {
var_dump($anInteger, $aBool, $aString);
}function withObject(ParamOject $o) {
var_dump( $o->theInteger(), $o->theBool(), $o->theString() );
}
It should be indifferent. In normal circumstances, there is no zval
copying at all (only the pointers of arguments' symbols are copied). Only
when you start throwing references into the mix will you start forcing
copied.
--
Gustavo Lopes
So it sounds like the general answer is that if you pass a complex array
to a function by value and mess with it, data is duplicated for every
item you modify and its direct ancestors up to the root variable but not
for the rest of the tree.
For objects, because of their "pass by handle"-type behavior you are
(usually) modifying the same data directly so there's no duplication.
Does that sound correct?
Related: What is the overhead of a ZVal? I'm assuming it's a fixed
number of bytes.
--Larry Garfield
On Wed, 19 Jan 2011 14:23:49 -0000, Martin Scotta
martinscotta@gmail.com wrote:What about objects?
With objects less copying occurs because the object value (zval) data is
actually just a pointer and an id that for most purposes works as a
pointer.However, it should be said that while a copy of an array forces more
memory to be copied, the inner zvals are not actually copied. In this
snippet:$a = array(1, 2, array(3));
$b = $a;
function separate(&$dummy) { }
separate($a);the copy that occurs when you force the separation of the zval that is
shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it
merely copies the zval pointer of $a to $b and increments its reference
count) is just a shallow copy of hash table and a increment of the first
level zvals' refcounts. This means the zvals that have their pointers
stored in the array $a's HashTable are not themselves copied.Interestingly (or should I say, unfortunately), this happens even if the
inner zvals are references. See
http://php.net/manual/en/language.references.whatdo.php the part on arrays.class Foo {
public $foo;
}function test($o) {
$o->foo->foo->foo = 2;
}$bar = new Foo;
$bar->foo = new Foo;
$bar->foo->foo = new Foo;test( $bar );
This example shows no copying (in the sense of "new zval allocation on
passing or assignment") at all.
Also... is it better to pass an object as a parameter rather than many
values?function withValues($anInteger, $aBool, $aString) {
var_dump($anInteger, $aBool, $aString);
}function withObject(ParamOject $o) {
var_dump( $o->theInteger(), $o->theBool(), $o->theString() );
}It should be indifferent. In normal circumstances, there is no zval
copying at all (only the pointers of arguments' symbols are copied).
Only when you start throwing references into the mix will you start
forcing copied.
So it sounds like the general answer is that if you pass a complex array to
a function by value and mess with it, data is duplicated for every item you
modify and its direct ancestors up to the root variable but not for the rest
of the tree.For objects, because of their "pass by handle"-type behavior you are
(usually) modifying the same data directly so there's no duplication.Does that sound correct?
Related: What is the overhead of a ZVal? I'm assuming it's a fixed number
of bytes.
http://lmgtfy.com/?q=php+zval&l=1
Regards
Peter
--
<hype>
WWW: plphp.dk / plind.dk
LinkedIn: plind
BeWelcome/Couchsurfing: Fake51
Twitter: kafe15
</hype
So it sounds like the general answer is that if you pass a complex array to a
function by value and mess with it, data is duplicated for every item you modify
and its direct ancestors up to the root variable but not for the rest of the tree.For objects, because of their "pass by handle"-type behavior you are (usually)
modifying the same data directly so there's no duplication.Does that sound correct?
Yes.
Related: What is the overhead of a ZVal? I'm assuming it's a fixed
number of bytes.
It seems not, though a zval has a fixed size. What that size is will
depend on the compiler and architecture of the system being used, or at
least on the ABI.
From zend.h:
typedef union _zvalue_value {
long lval; /* long value /
double dval; / double value */
struct {
char *val;
int len;
} str;
HashTable ht; / hash table value */
zend_object_value obj;
} zvalue_value;
struct _zval_struct {
/* Variable information /
zvalue_value value; / value /
zend_uint refcount__gc;
zend_uchar type; / active type */
zend_uchar is_ref__gc;
};
The zvalue_value union will probably be 8 or 12 bytes, depending on the
architecture. The whole struct will then probably be between 14 and 24
bytes, depending on the architecture and structure alignment and so on.
For my system:
$ cd php-5.3.3
$ ./configure
$ cd Zend
$ gcc -I. -I../TSRM -x c - <<END
#include "zend.h"
int main(void) {
printf("%lu\n",sizeof(zval));
return 0;
}
END
$ file ./a.out
./a.out: Mach-O 64-bit executable
$ ./a.out
24
$ gcc -I. -I../TSRM -arch i386 -x c - <<END
#include "zend.h"
int main(void) {
printf("%lu\n",sizeof(zval));
return 0;
}
END
$ file ./a.out
./a.out: Mach-O executable i386
$ ./a.out
16
You can figure out what you think the overhead is from that. For a
string, arguably the whole structure is overhead, since the string is
stored elsewhere via pointer. Likewise for objects. For a double, the
payload is 8 bytes, and stored in the zval, so there's less overhead. An
integer, with a payload of 4 bytes, is somewhere in between.
Ben.
--Larry Garfield
On Wed, 19 Jan 2011 14:23:49 -0000, Martin Scotta
martinscotta@gmail.com wrote:What about objects?
With objects less copying occurs because the object value (zval) data is
actually just a pointer and an id that for most purposes works as a
pointer.However, it should be said that while a copy of an array forces more
memory to be copied, the inner zvals are not actually copied. In this
snippet:$a = array(1, 2, array(3));
$b = $a;
function separate(&$dummy) { }
separate($a);the copy that occurs when you force the separation of the zval that is
shared by $a and $b ($b = $a doesn't copy the array in $a to $b, it
merely copies the zval pointer of $a to $b and increments its reference
count) is just a shallow copy of hash table and a increment of the first
level zvals' refcounts. This means the zvals that have their pointers
stored in the array $a's HashTable are not themselves copied.Interestingly (or should I say, unfortunately), this happens even if the
inner zvals are references. See
http://php.net/manual/en/language.references.whatdo.php the part on arrays.class Foo {
public $foo;
}function test($o) {
$o->foo->foo->foo = 2;
}$bar = new Foo;
$bar->foo = new Foo;
$bar->foo->foo = new Foo;test( $bar );
This example shows no copying (in the sense of "new zval allocation on
passing or assignment") at all.
Also... is it better to pass an object as a parameter rather than many
values?function withValues($anInteger, $aBool, $aString) {
var_dump($anInteger, $aBool, $aString);
}function withObject(ParamOject $o) {
var_dump( $o->theInteger(), $o->theBool(), $o->theString() );
}It should be indifferent. In normal circumstances, there is no zval
copying at all (only the pointers of arguments' symbols are copied).
Only when you start throwing references into the mix will you start
forcing copied.
Related: What is the overhead of a ZVal? I'm assuming it's a fixed
number of bytes.It seems not, though a zval has a fixed size. What that size is will
depend on the compiler and architecture of the system being used, or at
least on the ABI.
Ah, yes, of course. Oh C...
snip
The zvalue_value union will probably be 8 or 12 bytes, depending on the
architecture. The whole struct will then probably be between 14 and 24
bytes, depending on the architecture and structure alignment and so on.
snip
You can figure out what you think the overhead is from that. For a
string, arguably the whole structure is overhead, since the string is
stored elsewhere via pointer. Likewise for objects. For a double, the
payload is 8 bytes, and stored in the zval, so there's less overhead. An
integer, with a payload of 4 bytes, is somewhere in between.
Hm. OK, so if I'm assuming a 64-bit architecture (most servers these days,
I'd think) and just looking for a rough approximation, it sounds like 20 bytes
per zval/variable is a not unreasonable estimation. At least close enough for
determining the memory overhead of a general algorithm.
Thanks again!
--Larry Garfield