Hi all,
This is my first time posting to these lists, so I apologize if the
internals list is wrong. :-)
Before I describe my issue, I had been wondering how I can tell/estimate
how much memory PHP will use for arrays? Is there at least a rough formula
someone can tell me?
My environment: 4.3.6 as an Apache 1.3 module on Windows 2000.
OK, now the array performance problem -- I have a simple script that
basically just has a 2 dimensional array with this structure: [4-10
character key][int key] = int value. For my examples, the array, $big_arr,
has ~40k elements, and ~100k total at the second level. I fill this array
in a loop. Then I have this code:
$list = implode("','", array_keys($big_arr));
$r = mysql_query(<query using each $big_arr key>);
while ($tmp = mysql_fetch_assoc($r))
{
$foo[$tmp['key for string value']] = $tmp; // just 8 small columns in
$tmp
}
The problem is that it takes over 2 seconds to fill the $foo array with
~40k array elements. Ugh! (Still ~1.4s if I use the same 123 scalar value
for each.) That's not counting any time to do the query or fetch the rows.
I'm thinking it has something to do with $big_arr using a lot of memory
(how much??), because a script with just this code:
for ($i = 0; $i < 40000; ++$i) { $test["test$i"] = $i; }
... creates 40k elements in 0.3s, including the loop's running time.
Finally, I tried to unset($big_arr) before filling $foo, but that
increased running time by ~3.5s! :-( (The unset() took about 0.3s.) Why
is that causing $foo to be filled slower?! Is PHP searching for free memory
for $foo that had been used by $big_arr?
This is disappointing, especially being caused by an array that doesn't
seem too outrageously large (and I'd like to go larger). :-( It's going to
suck if I can't come up with a way to get it faster... Is there anything
that would be different on a version newer than 4.3.6, as I didn't try any
yet? Or if it's some Windows thing, then I don't care as much.
Comments, suggestions?
Thanks!
Matt
Before I describe my issue, I had been wondering how I can tell/estimate
how much memory PHP will use for arrays? Is there at least a rough formula
someone can tell me?
42 * number of array elements + size of keys + data
This is disappointing, especially being caused by an array that doesn't
seem too outrageously large (and I'd like to go larger). :-( It's going to
suck if I can't come up with a way to get it faster... Is there anything
that would be different on a version newer than 4.3.6, as I didn't try any
yet? Or if it's some Windows thing, then I don't care as much.Comments, suggestions?
Make a custom C extension.. I don't know why you needs arrays this
large, but it sounds like a application design flaw.
Derick
----- Original Message -----
From: "Derick Rethans"
Sent: Thursday, December 09, 2004
Before I describe my issue, I had been wondering how I can
tell/estimate
how much memory PHP will use for arrays? Is there at least a rough
formula
someone can tell me?42 * number of array elements + size of keys + data
Thanks Derick. Is that exact or just an estimate?
So PHP needs 42 bytes for each array element, OK. For a multi-dimensional
array, I assume the "number of elements" is the total from all dimensions
added up. e.g. $foo[] = array(123, 456); is 3 elements (1 + 2)?
This is disappointing, especially being caused by an array that doesn't
seem too outrageously large (and I'd like to go larger). :-( It's going
to
suck if I can't come up with a way to get it faster... Is there
anything
that would be different on a version newer than 4.3.6, as I didn't try
any
yet? Or if it's some Windows thing, then I don't care as much.Comments, suggestions?
Make a custom C extension.. I don't know why you needs arrays this
large, but it sounds like a application design flaw.
I need arrays this large because it's building a search index, which could
have 100s of millions of entries; and I want to buffer many entries in
arrays before doing writes... So even these "large" arrays are just working
on a small chunk at a time. :-)
I was playing with it more today trying to figure out why it was
dramatically slowing down while filling the second large array ($foo). Now
I understand more what's going own (thanks in part to that memory usage
formula), but there's still a couple odd things. (And I realize that having
a 40k element array with an 8 element array for each value takes a lot of
memory!)
Somewhere in the early part of filling the 40k element array, PHP seems to
hit a "bump" where something (mem allocation?) takes much more time, and
then it fills the array quickly again (e.g. 40k doesn't take much longer to
create than 10k, for example). That I can understand I guess (though I
always like to know exactly what's happening? :-)).
The most puzzling part to me still remains: when I unset() the first large
array ($big_arr), why does filling the second array take more than twice as
long (5+ seconds vs. 2+)? If a smaller second array is used, it does go
faster with the first array unset, which is what I'd expect. So why, when
it gets to a certain size does having unset the first array make things
slower?? :-/
Thanks if anyone has an explanation to help me better understand what's
going on internally (especially the unset() thing).
Matt
Sidestepping your original question, it sounds to me like you might
benefit from using sqlite here, either direct to disk or using an
in-memory sqlite database.
--Wez.
I need arrays this large because it's building a search index, which could
have 100s of millions of entries; and I want to buffer many entries in
arrays before doing writes... So even these "large" arrays are just working
on a small chunk at a time. :-)
Hi Matt,
The problem is most probably because the Zend array hash table reallocs
everytime it starts to hold too many elements. The larger it gets, the less
likely it manages to reallocate in the same memory area and then it has to
allocate a new big memory chunk and memcpy() all the old data to it. This
is probably where you are feeling the hit.
Andi
At 05:02 AM 12/10/2004 -0600, Matt W wrote:
----- Original Message -----
From: "Derick Rethans"
Sent: Thursday, December 09, 2004Before I describe my issue, I had been wondering how I can
tell/estimate
how much memory PHP will use for arrays? Is there at least a rough
formula
someone can tell me?42 * number of array elements + size of keys + data
Thanks Derick. Is that exact or just an estimate?
So PHP needs 42 bytes for each array element, OK. For a multi-dimensional
array, I assume the "number of elements" is the total from all dimensions
added up. e.g. $foo[] = array(123, 456); is 3 elements (1 + 2)?This is disappointing, especially being caused by an array that doesn't
seem too outrageously large (and I'd like to go larger). :-( It's going
to
suck if I can't come up with a way to get it faster... Is there
anything
that would be different on a version newer than 4.3.6, as I didn't try
any
yet? Or if it's some Windows thing, then I don't care as much.Comments, suggestions?
Make a custom C extension.. I don't know why you needs arrays this
large, but it sounds like a application design flaw.I need arrays this large because it's building a search index, which could
have 100s of millions of entries; and I want to buffer many entries in
arrays before doing writes... So even these "large" arrays are just working
on a small chunk at a time. :-)I was playing with it more today trying to figure out why it was
dramatically slowing down while filling the second large array ($foo). Now
I understand more what's going own (thanks in part to that memory usage
formula), but there's still a couple odd things. (And I realize that having
a 40k element array with an 8 element array for each value takes a lot of
memory!)Somewhere in the early part of filling the 40k element array, PHP seems to
hit a "bump" where something (mem allocation?) takes much more time, and
then it fills the array quickly again (e.g. 40k doesn't take much longer to
create than 10k, for example). That I can understand I guess (though I
always like to know exactly what's happening? :-)).The most puzzling part to me still remains: when I unset() the first large
array ($big_arr), why does filling the second array take more than twice as
long (5+ seconds vs. 2+)? If a smaller second array is used, it does go
faster with the first array unset, which is what I'd expect. So why, when
it gets to a certain size does having unset the first array make things
slower?? :-/Thanks if anyone has an explanation to help me better understand what's
going on internally (especially the unset() thing).Matt