Dear Internals,
During development of DocBlox I encountered a (for me) unusual
situation
with regards to memory usage.
I hope you can shed some light on this for me as I do not understand.
The situations is as follows:
I have a php file containing about 53 KLOC (including whitespace and
comments), which is about 2.1MB in size. When I execute the
memory_get_peak_usage after running the token_get_all method on its
content it reports that 232MB of RAM have been used in the process.
I am having trouble understanding how 244003984B (232MB) RAM could be
used.
The following is what I have calculated:
- 640.952B to start with (measured);
- 2.1MB to load the file contents into memory using file_get_contents
- 68 bytes for the resulting array
- 68 bytes for each child array representing a token
- 68 bytes for each element in a token array (which can be either 1 or
3, depending whether it is actually a token or literal) - 2.1MB in total for the string contents of the token literals /
contents (equivalent to the byte size of the file)
I have used the count method to retrieve the number of tokens (276697)
and come to the following sum (everything is retrieved and calculated
in
bytes):
640952+2165950+68+(27669768)+(2766973*68)+2165950=80234436 = 76M
This is a worst case formula where I assume that every token in the
array consists of 3 elements.
Based on this calculation I would be missing 156MB of memory; anybody
know where that went?
I used the following snippet of code for my tests:
var_dump(memory_get_peak_usage());
$tokens = token_get_all(file_get_contents('<PATH>'));
var_dump(count($tokens));
var_dump(memory_get_peak_usage());
I hope this mail did not scare anyone ;)
Kind regards,
Mike van Riel
What does
var_dump(memory_get_peak_usage());
token_get_all(file_get_contents('<PATH>'));
var_dump(memory_get_peak_usage());
get you?
David
Dear Internals,
During development of DocBlox I encountered a (for me) unusual situation
with regards to memory usage.I hope you can shed some light on this for me as I do not understand.
The situations is as follows:
I have a php file containing about 53 KLOC (including whitespace and
comments), which is about 2.1MB in size. When I execute the
memory_get_peak_usage after running the token_get_all method on its
content it reports that 232MB of RAM have been used in the process.I am having trouble understanding how 244003984B (232MB) RAM could be
used.The following is what I have calculated:
- 640.952B to start with (measured);
- 2.1MB to load the file contents into memory using file_get_contents
- 68 bytes for the resulting array
- 68 bytes for each child array representing a token
- 68 bytes for each element in a token array (which can be either 1 or
3, depending whether it is actually a token or literal)- 2.1MB in total for the string contents of the token literals /
contents (equivalent to the byte size of the file)I have used the count method to retrieve the number of tokens (276697)
and come to the following sum (everything is retrieved and calculated in
bytes):640952+2165950+68+(27669768)+(2766973*68)+2165950=80234436 = 76M
This is a worst case formula where I assume that every token in the
array consists of 3 elements.Based on this calculation I would be missing 156MB of memory; anybody
know where that went?I used the following snippet of code for my tests:
var_dump(memory_get_peak_usage());
$tokens = token_get_all(file_get_contents('<PATH>'));
var_dump(count($tokens));
var_dump(memory_get_peak_usage());I hope this mail did not scare anyone ;)
Kind regards,
Mike van Riel
Hey David,
That gives me the following output:
int(640720)
int(244001144)
Mike
What does
var_dump(memory_get_peak_usage());
token_get_all(file_get_contents('<PATH>'));
var_dump(memory_get_peak_usage());get you?
David
Smells like a memory leak if gc_collect_cycles()
doesn't fix it.
David
Hey David,
That gives me the following output:
int(640720)
int(244001144)Mike
What does
var_dump(memory_get_peak_usage());
token_get_all(file_get_contents('<PATH>'));
var_dump(memory_get_peak_usage());get you?
David
Seems like leak.
Try disabling ZendMM to see if something noticeable happens (memory
peak should be lower).
USE_ZEND_ALLOC=0
Cheers,
Julien
On Sun, Jun 5, 2011 at 2:01 PM, David Zülke
david.zuelke@bitextender.com wrote:
Smells like a memory leak if
gc_collect_cycles()
doesn't fix it.David
Hey David,
That gives me the following output:
int(640720)
int(244001144)Mike
What does
var_dump(memory_get_peak_usage());
token_get_all(file_get_contents('<PATH>'));
var_dump(memory_get_peak_usage());get you?
David
David and Pauli,
When I change the test script to:
var_dump(memory_get_peak_usage());
`gc_collect_cycles()`;
token_get_all(file_get_contents(<FILE>));
`gc_collect_cycles()`;
var_dump(memory_get_peak_usage());
And execute the following bash line preceding:
export USE_ZEND_ALLOC=0
I get the following output:
int(8240)
int(8240)
When I remove the gc_collect_cycles I get the same result.
Even assigning the results to a variable do not increase the peak
memory.
FYI: When I change the argument of memory_get_peak_usage to 'true', I
get the following results:
int(262144)
int(262144)
This amount is astoundingly less than the previous conclusions and less
than my own calculations would show.
Of course this leads me to the following questions:
- Does it hurt to disable the Zend MM?
- Can it be done from inside a PHP Script?
- Why is the memory consumption so much lower, even lower than my
calculations?
I assume it is a good thing to at least try to create an easy way to
reproduce the issue (cannot include my test file) and create a bug
report about this :)
Thank you for your assistance thus far.
Mike
Seems like leak.
Try disabling ZendMM to see if something noticeable happens (memory
peak should be lower).
USE_ZEND_ALLOC=0Cheers,
JulienOn Sun, Jun 5, 2011 at 2:01 PM, David Zülke
david.zuelke@bitextender.com wrote:Smells like a memory leak if
gc_collect_cycles()
doesn't fix it.David
- Does it hurt to disable the Zend MM?
- Can it be done from inside a PHP Script?
- Why is the memory consumption so much lower, even lower than my
calculations?
When you disable Zend MM PHP will not use it but directly the system's
allocator, which won't be counted by ZendMM therefore. you therefore
loose all PHP-specific allocation improvements and the memory leak
protections. Nothing you'd actually want ;-)
Disabling the ZendMM is mostly useful for using memory debuggers (incl.
valgrind) to do further checks.
I assume it is a good thing to at least try to create an easy way to
reproduce the issue (cannot include my test file) and create a bug
report about this :)
Reproducible bug reports are always a good thing.
johannes
Please test the exact thing I suggested :)
var_dump(memory_get_usage());
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_usage());
memory_get_peak_usage()
is irrelevant, and USE_ZEND_ALLOC won't give accurate results anymore when looking at memory usage.
If the above gives the same numbers you got initially, then there's a memleak in token_get_all()
.
David
David and Pauli,
When I change the test script to:
var_dump(memory_get_peak_usage());
gc_collect_cycles()
;
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_peak_usage());And execute the following bash line preceding:
export USE_ZEND_ALLOC=0
I get the following output:
int(8240)
int(8240)When I remove the gc_collect_cycles I get the same result.
Even assigning the results to a variable do not increase the peak memory.FYI: When I change the argument of memory_get_peak_usage to 'true', I get the following results:
int(262144)
int(262144)This amount is astoundingly less than the previous conclusions and less than my own calculations would show.
Of course this leads me to the following questions:
- Does it hurt to disable the Zend MM?
- Can it be done from inside a PHP Script?
- Why is the memory consumption so much lower, even lower than my calculations?
I assume it is a good thing to at least try to create an easy way to reproduce the issue (cannot include my test file) and create a bug report about this :)
Thank you for your assistance thus far.
Mike
Seems like leak.
Try disabling ZendMM to see if something noticeable happens (memory
peak should be lower).
USE_ZEND_ALLOC=0Cheers,
JulienOn Sun, Jun 5, 2011 at 2:01 PM, David Zülke
david.zuelke@bitextender.com wrote:Smells like a memory leak if
gc_collect_cycles()
doesn't fix it.David
On Tue, Jun 7, 2011 at 4:28 PM, David Zülke david.zuelke@bitextender.comwrote:
Please test the exact thing I suggested :)
AFAIK he did.
" int(640720)
int(244001144)"
except if you suggested something else off-list.
Tyrael
Damn I'm an idiot. I meant memory_get_usage()
all along. Sorry Mike. Then it'll make sense... memory_get_usage()
, but a gc_collect_cycles()
before the second call.
So, my first email should have had this code in it:
var_dump(memory_get_usage());
token_get_all(file_get_contents('<PATH>'));
var_dump(memory_get_usage());
And then, a comparison to this would be useful:
var_dump(memory_get_usage());
token_get_all(file_get_contents('<PATH>'));
gc_collect_cycles()
;
var_dump(memory_get_usage());
David
On Tue, Jun 7, 2011 at 4:28 PM, David Zülke david.zuelke@bitextender.comwrote:
Please test the exact thing I suggested :)
AFAIK he did.
" int(640720)
int(244001144)"
except if you suggested something else off-list.Tyrael
I have ran the script that you provided and got the following results:
int(635192)
int(635944)
Which is far less than the peak memory result.
I use memory_get_peak_usage to measure what the worst case memory output
is in my application. I expect this to be the actual memory used (and
thus when the server starts swapping if this number exceeds the physical
memory).
Is my assertion about the meaning of memory_get_peak_usage incorrect?
Mike
Please test the exact thing I suggested :)
var_dump(memory_get_usage());
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_usage());
memory_get_peak_usage()
is irrelevant, and USE_ZEND_ALLOC won't give accurate results anymore when looking at memory usage.If the above gives the same numbers you got initially, then there's a memleak in
token_get_all()
.David
David and Pauli,
When I change the test script to:
var_dump(memory_get_peak_usage());
gc_collect_cycles()
;
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_peak_usage());And execute the following bash line preceding:
export USE_ZEND_ALLOC=0
I get the following output:
int(8240)
int(8240)When I remove the gc_collect_cycles I get the same result.
Even assigning the results to a variable do not increase the peak memory.FYI: When I change the argument of memory_get_peak_usage to 'true', I get the following results:
int(262144)
int(262144)This amount is astoundingly less than the previous conclusions and less than my own calculations would show.
Of course this leads me to the following questions:
- Does it hurt to disable the Zend MM?
- Can it be done from inside a PHP Script?
- Why is the memory consumption so much lower, even lower than my calculations?
I assume it is a good thing to at least try to create an easy way to reproduce the issue (cannot include my test file) and create a bug report about this :)
Thank you for your assistance thus far.
Mike
Seems like leak.
Try disabling ZendMM to see if something noticeable happens (memory
peak should be lower).
USE_ZEND_ALLOC=0Cheers,
JulienOn Sun, Jun 5, 2011 at 2:01 PM, David Zülke
david.zuelke@bitextender.com wrote:Smells like a memory leak if
gc_collect_cycles()
doesn't fix it.David
Before I forget; without gc_collect_cycles I get the following output
using memory_get_usage instead of memory_get_peak_usage:
int(634640)
int(635392)
Mike
I have ran the script that you provided and got the following results:
int(635192) int(635944)
Which is far less than the peak memory result.
I use memory_get_peak_usage to measure what the worst case memory output
is in my application. I expect this to be the actual memory used (and
thus when the server starts swapping if this number exceeds the physical
memory).Is my assertion about the meaning of memory_get_peak_usage incorrect?
Mike
Please test the exact thing I suggested :)
var_dump(memory_get_usage());
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_usage());
memory_get_peak_usage()
is irrelevant, and USE_ZEND_ALLOC won't give accurate results anymore when looking at memory usage.If the above gives the same numbers you got initially, then there's a memleak in
token_get_all()
.David
David and Pauli,
When I change the test script to:
var_dump(memory_get_peak_usage());
gc_collect_cycles()
;
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_peak_usage());And execute the following bash line preceding:
export USE_ZEND_ALLOC=0
I get the following output:
int(8240)
int(8240)When I remove the gc_collect_cycles I get the same result.
Even assigning the results to a variable do not increase the peak memory.FYI: When I change the argument of memory_get_peak_usage to 'true', I get the following results:
int(262144)
int(262144)This amount is astoundingly less than the previous conclusions and less than my own calculations would show.
Of course this leads me to the following questions:
- Does it hurt to disable the Zend MM?
- Can it be done from inside a PHP Script?
- Why is the memory consumption so much lower, even lower than my calculations?
I assume it is a good thing to at least try to create an easy way to reproduce the issue (cannot include my test file) and create a bug report about this :)
Thank you for your assistance thus far.
Mike
Seems like leak.
Try disabling ZendMM to see if something noticeable happens (memory
peak should be lower).
USE_ZEND_ALLOC=0Cheers,
JulienOn Sun, Jun 5, 2011 at 2:01 PM, David Zülke
david.zuelke@bitextender.com wrote:Smells like a memory leak if
gc_collect_cycles()
doesn't fix it.David
memory_get_peak_usage() is the maximum amount of memory used by the VM of PHP (but not by some extensions for instance) up until the point where that function is called. So the actual memory usage may be even higher IIRC. But yeah, you're basically right. I've explained in another message why it might be so much more than you expected (zval overhead, basically)
David
I have ran the script that you provided and got the following results:
int(635192)
int(635944)Which is far less than the peak memory result.
I use memory_get_peak_usage to measure what the worst case memory output
is in my application. I expect this to be the actual memory used (and
thus when the server starts swapping if this number exceeds the physical
memory).Is my assertion about the meaning of memory_get_peak_usage incorrect?
Mike
Please test the exact thing I suggested :)
var_dump(memory_get_usage());
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_usage());
memory_get_peak_usage()
is irrelevant, and USE_ZEND_ALLOC won't give accurate results anymore when looking at memory usage.If the above gives the same numbers you got initially, then there's a memleak in
token_get_all()
.David
David and Pauli,
When I change the test script to:
var_dump(memory_get_peak_usage());
gc_collect_cycles()
;
token_get_all(file_get_contents(<FILE>));
gc_collect_cycles()
;
var_dump(memory_get_peak_usage());And execute the following bash line preceding:
export USE_ZEND_ALLOC=0
I get the following output:
int(8240)
int(8240)When I remove the gc_collect_cycles I get the same result.
Even assigning the results to a variable do not increase the peak memory.FYI: When I change the argument of memory_get_peak_usage to 'true', I get the following results:
int(262144)
int(262144)This amount is astoundingly less than the previous conclusions and less than my own calculations would show.
Of course this leads me to the following questions:
- Does it hurt to disable the Zend MM?
- Can it be done from inside a PHP Script?
- Why is the memory consumption so much lower, even lower than my calculations?
I assume it is a good thing to at least try to create an easy way to reproduce the issue (cannot include my test file) and create a bug report about this :)
Thank you for your assistance thus far.
Mike
Seems like leak.
Try disabling ZendMM to see if something noticeable happens (memory
peak should be lower).
USE_ZEND_ALLOC=0Cheers,
JulienOn Sun, Jun 5, 2011 at 2:01 PM, David Zülke
david.zuelke@bitextender.com wrote:Smells like a memory leak if
gc_collect_cycles()
doesn't fix it.David
Am i then also correct to assume that the output of
memory_get_peak_usage is used for determining the memory_limit?
Also: after correcting with your new information (zval = 114 bytes
instead of 68) I still have a rather large offset:
640952+2165950+114+(276697*114)+(276697*3*114)+2165950 = 131146798 =
125M
(not trying to be picky here; I just don't understand)
If my calculations are correct then a zval should be approx 216 bytes
(excluding string contents):
((244000000-640952-2165950-2165950) / 4) / 276697 = 215.9647B
Mike
memory_get_peak_usage()
is the maximum amount of memory used by the VM of PHP (but not by some extensions for instance) up until the point where that function is called. So the actual memory usage may be even higher IIRC. But yeah, you're basically right. I've explained in another message why it might be so much more than you expected (zval overhead, basically)David
144 (not 114!) bytes is for an integer; I'm not quite sure what the overheads are for arrays, which token_get_all()
produces in abundance :) An empty array seems to occupy 312 bytes of memory.
Also, strings have memory allocated in 8 byte increments as far as I know, so "1" eats up 8 bytes, and "12345678901234567" will consume 24 bytes for the raw text, not 17.
David
Am i then also correct to assume that the output of
memory_get_peak_usage is used for determining the memory_limit?Also: after correcting with your new information (zval = 114 bytes
instead of 68) I still have a rather large offset:640952+2165950+114+(276697114)+(2766973*114)+2165950 = 131146798 =
125M(not trying to be picky here; I just don't understand)
If my calculations are correct then a zval should be approx 216 bytes
(excluding string contents):((244000000-640952-2165950-2165950) / 4) / 276697 = 215.9647B
Mike
memory_get_peak_usage()
is the maximum amount of memory used by the VM of PHP (but not by some extensions for instance) up until the point where that function is called. So the actual memory usage may be even higher IIRC. But yeah, you're basically right. I've explained in another message why it might be so much more than you expected (zval overhead, basically)David
144 (not 114!) bytes is for an integer; I'm not quite sure what the
overheads are for arrays, whichtoken_get_all()
produces in
abundance :) An empty array seems to occupy 312 bytes of memory.Also, strings have memory allocated in 8 byte increments as far as I
know, so "1" eats up 8 bytes, and "12345678901234567" will consume 24
bytes for the raw text, not 17.
I'm too lazy to do the actual math (well best would be to do
sizeof(zval), sizeof(HashTable), sizeof(Bucket) on your system) and
there are few things to consider:
* The sizes are different from 32 bit and 64bit; with 64bit
there's a difference between Windows and Unix/Linux (on Win a
long will still be 32 bit, but pointers 64 bit, on Linux/Unix
both are 64bit)
* On some architectures memory segments have to be aligned in some
way which might waste memory
* As David mentioned HashTables (Arrays) are more complex.
* `token_get_all()` returns an array of (string | array of (long,
string, long) )
* A long takes sizeof(zval)
* A string takes sizeof(zval)+strlen()+1
* and array is a HashTable + space for buckets, this includes
place for some not used elements
* Each element inside the HT needs additional space for a Bucket
with some meta data
* While running your script you also keep the complete script file
in memory. You also keep some temporary parser data in memory
while the resulting array is being filled.
In the end it's not fully trivial to gather the size needed. And I'm
sure my list is missing loooots of things.
http://schlueters.de/blog/archives/142-HashTables.html has an short
introduction to HashTables. Skipping many of the details.
johannes
David
Am i then also correct to assume that the output of
memory_get_peak_usage is used for determining the memory_limit?Also: after correcting with your new information (zval = 114 bytes
instead of 68) I still have a rather large offset:640952+2165950+114+(276697114)+(2766973*114)+2165950 = 131146798 =
125M(not trying to be picky here; I just don't understand)
If my calculations are correct then a zval should be approx 216 bytes
(excluding string contents):((244000000-640952-2165950-2165950) / 4) / 276697 = 215.9647B
Mike
memory_get_peak_usage()
is the maximum amount of memory used by the VM of PHP (but not by some extensions for instance) up until the point where that function is called. So the actual memory usage may be even higher IIRC. But yeah, you're basically right. I've explained in another message why it might be so much more than you expected (zval overhead, basically)David
I wrote about ZendMM some time ago
(http://julien-pauli.developpez.com/tutoriels/php/internals/zend-memory-manager/)
, that's in french language ;-)
To shorten the conversation a little bit, I would suggest to trace the
memory with valgrind/massif. That's not too hard if you know what you
do, if not, then it can take some time.
Basically, Johannes gave some good hints; but memory management is a
hard task to compute and deal with, I suggest you dont try to figure
out how many memory it "would" take, as the computation is really too
hard to be accurate. Only memory debuggers will show you exactly what
happens.
BTW, there might be a little leak inside token_get_all()
as it doesn't
seem to free memory it allocated. Not very easy to find as it plays
with lex scanner.
Julien.P
2011/6/8 Johannes Schlüter johannes@schlueters.de:
144 (not 114!) bytes is for an integer; I'm not quite sure what the
overheads are for arrays, whichtoken_get_all()
produces in
abundance :) An empty array seems to occupy 312 bytes of memory.Also, strings have memory allocated in 8 byte increments as far as I
know, so "1" eats up 8 bytes, and "12345678901234567" will consume 24
bytes for the raw text, not 17.I'm too lazy to do the actual math (well best would be to do
sizeof(zval), sizeof(HashTable), sizeof(Bucket) on your system) and
there are few things to consider:* The sizes are different from 32 bit and 64bit; with 64bit
there's a difference between Windows and Unix/Linux (on Win a
long will still be 32 bit, but pointers 64 bit, on Linux/Unix
both are 64bit)
* On some architectures memory segments have to be aligned in some
way which might waste memory
* As David mentioned HashTables (Arrays) are more complex.
*token_get_all()
returns an array of (string | array of (long,
string, long) )
* A long takes sizeof(zval)
* A string takes sizeof(zval)+strlen()+1
* and array is a HashTable + space for buckets, this includes
place for some not used elements
* Each element inside the HT needs additional space for a Bucket
with some meta data
* While running your script you also keep the complete script file
in memory. You also keep some temporary parser data in memory
while the resulting array is being filled.In the end it's not fully trivial to gather the size needed. And I'm
sure my list is missing loooots of things.http://schlueters.de/blog/archives/142-HashTables.html has an short
introduction to HashTables. Skipping many of the details.johannes
David
Am i then also correct to assume that the output of
memory_get_peak_usage is used for determining the memory_limit?Also: after correcting with your new information (zval = 114 bytes
instead of 68) I still have a rather large offset:640952+2165950+114+(276697114)+(2766973*114)+2165950 = 131146798 =
125M(not trying to be picky here; I just don't understand)
If my calculations are correct then a zval should be approx 216 bytes
(excluding string contents):((244000000-640952-2165950-2165950) / 4) / 276697 = 215.9647B
Mike
memory_get_peak_usage()
is the maximum amount of memory used by the VM of PHP (but not by some extensions for instance) up until the point where that function is called. So the actual memory usage may be even higher IIRC. But yeah, you're basically right. I've explained in another message why it might be so much more than you expected (zval overhead, basically)David
One thing to keep in mind of course is that each zval incurs an overhead. $x = 1; requires 144 bytes of memory in total IIRC.
David
Dear Internals,
During development of DocBlox I encountered a (for me) unusual situation
with regards to memory usage.I hope you can shed some light on this for me as I do not understand.
The situations is as follows:
I have a php file containing about 53 KLOC (including whitespace and
comments), which is about 2.1MB in size. When I execute the
memory_get_peak_usage after running the token_get_all method on its
content it reports that 232MB of RAM have been used in the process.I am having trouble understanding how 244003984B (232MB) RAM could be
used.The following is what I have calculated:
- 640.952B to start with (measured);
- 2.1MB to load the file contents into memory using file_get_contents
- 68 bytes for the resulting array
- 68 bytes for each child array representing a token
- 68 bytes for each element in a token array (which can be either 1 or
3, depending whether it is actually a token or literal)- 2.1MB in total for the string contents of the token literals /
contents (equivalent to the byte size of the file)I have used the count method to retrieve the number of tokens (276697)
and come to the following sum (everything is retrieved and calculated in
bytes):640952+2165950+68+(27669768)+(2766973*68)+2165950=80234436 = 76M
This is a worst case formula where I assume that every token in the
array consists of 3 elements.Based on this calculation I would be missing 156MB of memory; anybody
know where that went?I used the following snippet of code for my tests:
var_dump(memory_get_peak_usage());
$tokens = token_get_all(file_get_contents('<PATH>'));
var_dump(count($tokens));
var_dump(memory_get_peak_usage());I hope this mail did not scare anyone ;)
Kind regards,
Mike van Riel