So, I've been reading articles for a decade now that say that readfile()
is great and wonderful except for memory usage. Specifically, that it
reads a file into memory entirely, and then prints it to stdout from
there. So if you're outputing a big file you will hit your memory limit
and kill the server. Thus, one should always loop over fread()
instead. The most recent article I found saying that was from 2007,
with a StackExchange thread saying the same from 2011. I've even found
mention of it in old PHP Bugs.
However, I cannot replicate that in my own testing. Earlier today I was
running some benchmarks of different file streaming techniques in PHP
(5.3.6 specifically) and found that fread()
looping, fpassthru()
,
readfile()
, and stream_copy_to_stream()
perform almost identically on
memory, and all are identical on CPU except for fread()
which is slower,
which makes sense since you're looping in PHP space.
What's more, I cranked my memory limit down to 10 MB and then tried
streaming a 20 MB file. No change. The PHP peak memory never left
around a half-meg or so, most of which I presume is just the Apache/PHP
overhead. But it's not actually possible for readfile()
to be buffering
the whole file into memory before printing and not die if the file is
bigger than the memory limit. I verified that the data I'm getting
downloaded from the script is correct, and exactly matches the file that
it should be streaming.
My first thought was that this is yet another case of PHP improving and
fixing a long-standing bug, but somehow the rest of the world not
knowing about it so "conventional wisdom" persists long after it's still
wise. However, I found no mention of readfile()
in the PHP 5 change
log[1] at all aside from one note from back in 5.0.0 Beta 1 about
improving performance under Windows. (I'm on Linux.)
So, what's going on here? Has readfile()
been memory-safe for that long
without anyone noticing? Is my test completely flawed (although I don't
see how since I can verify that the code works as expected)? Something
else?
Please un-confuse me!
(Note: Sending this to internals since this is an engine question, and I
am more likely to reach whoever it was that un-sucked readfile()
sometime in the silent past that way. <g>)
--Larry Garfield
Hi,
Readfile()
is internally implemented in the same way like fpassthru()
(actually the same backend function is called, readfile only opening a
stream before delegating to passthru). Both methods delegate from PHP
user-space to an internal streams API methods php_stream_passthru(). This
one has 2 implementations:
- If the underlying stream allows MMAP, it will use memory mapping (mapping
file to virtual memory) and copy the mapped buffer to output. Please note
memory mapping does not load the file into memory, it only maps the file
contents to virtual memory like a swap file
(http://en.wikipedia.org/wiki/Mmap). - If this is not the case, it copies the whole file in blocks of 8192 bytes
using a conventional loop.
I verified, this code is at least in PHP 5.2 and 5.3, maybe earlier, too.
Uwe
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany
-----Original Message-----
From: Larry Garfield [mailto:larry@garfieldtech.com]
Sent: Monday, April 30, 2012 7:22 AM
To: internals@lists.php.net
Subject: [PHP-DEV]readfile()
memory usageSo, I've been reading articles for a decade now that say that
readfile()
is great
and wonderful except for memory usage. Specifically, that it reads a file
into
memory entirely, and then prints it to stdout from there. So if you're
outputing
a big file you will hit your memory limit and kill the server. Thus, one
should
always loop overfread()
instead. The most recent article I found saying
that
was from 2007, with a StackExchange thread saying the same from 2011.
I've
even found mention of it in old PHP Bugs.However, I cannot replicate that in my own testing. Earlier today I was
running
some benchmarks of different file streaming techniques in PHP
(5.3.6 specifically) and found thatfread()
looping,fpassthru()
,
readfile()
, and
stream_copy_to_stream()
perform almost identically on memory, and all are
identical on CPU except forfread()
which is slower, which makes sense
since
you're looping in PHP space.What's more, I cranked my memory limit down to 10 MB and then tried
streaming a 20 MB file. No change. The PHP peak memory never left around
a half-meg or so, most of which I presume is just the Apache/PHP overhead.
But it's not actually possible forreadfile()
to be buffering the whole
file into
memory before printing and not die if the file is bigger than the memory
limit.
I verified that the data I'm getting downloaded from the script is
correct, and
exactly matches the file that it should be streaming.My first thought was that this is yet another case of PHP improving and
fixing a
long-standing bug, but somehow the rest of the world not knowing about it
so
"conventional wisdom" persists long after it's still wise. However, I
found no
mention ofreadfile()
in the PHP 5 change log[1] at all aside from one
note
from back in 5.0.0 Beta 1 about improving performance under Windows. (I'm
on Linux.)So, what's going on here? Has
readfile()
been memory-safe for that long
without anyone noticing? Is my test completely flawed (although I don't
see
how since I can verify that the code works as expected)? Something else?Please un-confuse me!
(Note: Sending this to internals since this is an engine question, and I
am more
likely to reach whoever it was that un-suckedreadfile()
sometime in the
silent
past that way. <g>)--Larry Garfield
--
To unsubscribe,
visit:
http://www.php.net/unsub.php
hi!
- If the underlying stream allows MMAP, it will use memory mapping (mapping
file to virtual memory) and copy the mapped buffer to output. Please note
memory mapping does not load the file into memory, it only maps the file
contents to virtual memory like a swap file
(http://en.wikipedia.org/wiki/Mmap).
Some additional notes:
mmap may use "normal" memory too, depending on the options (not sure
which are used exactly with readfile or stream's mmap).
About php memory usage, one has to use an external tools to actually
see this memory usage as it is not managed by the zend memory manager.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi,
mmap may use "normal" memory too, depending on the options (not sure
which are used exactly with readfile or stream's mmap).
Mmapping of course uses memory, but the memory used here is not from PHP's
memory manager, it's memory that's already used for the O/S cache. The
memory mapping used here does not enforce loading the file contents into O/S
cache; it just gets a virtual address into the O/S cache. If the actual file
contents are not yet in O/S cache, the O/S will hit a page fault and load
the pages into memory. Apache Server uses the same mechanism to serve files.
Maybe the "user confusion" about memory usage comes from that fact (they see
lots of virtual memory used by PHP when viewed in top). I know this user
confusion from my work in the Apache Lucene/Solr project, where one option
(used on 64 bit operating systems) is to memory-map the whole Lucene
full-text index. Users then see hundreds of Gigabytes of "virtual memory"
usage in TOP / Windows Task Manager and are afraid of running their machine
out of memory. This is always hard to explain to people that are not used to
the term "virtual memory".
About php memory usage, one has to use an external tools to actually see
this
memory usage as it is not managed by the zend memory manager.
Of course...
Thanks,
Uwe
Mmapping of course uses memory, but the memory used here is not from PHP's
memory manager, it's memory that's already used for the O/S cache. The
memory mapping used here does not enforce loading the file contents into O/S
cache; it just gets a virtual address into the O/S cache. If the actual file
contents are not yet in O/S cache, the O/S will hit a page fault and load
the pages into memory. Apache Server uses the same mechanism to serve files.Maybe the "user confusion" about memory usage comes from that fact (they see
lots of virtual memory used by PHP when viewed in top). I know this user
confusion from my work in the Apache Lucene/Solr project, where one option
(used on 64 bit operating systems) is to memory-map the whole Lucene
full-text index. Users then see hundreds of Gigabytes of "virtual memory"
usage in TOP / Windows Task Manager and are afraid of running their machine
out of memory. This is always hard to explain to people that are not used to
the term "virtual memory".
That's a very good point and a detailed look at the stack can show
some of the underlying mechanics coming from readfile and how its
pretty much is just implemented like fpassthru delegating a stream, as
you said.
About php memory usage, one has to use an external tools to actually see
this
memory usage as it is not managed by the zend memory manager.Of course...
Also, running valgrind and taking a closer look at what memory blocks
PHP is allocating here it can be better determined what's leaking and
what isn't, of course...
I can certainly attest to users being deceived by memory readings in
top in the past. It can be very deceiving if you think look at what
free memory top or free
shows you especially after some huge
allocation. There's a clear difference here though between the Zend
memory manager allocating these blocks in PHP and what's going on with
readfile. The memory the system knows is available to it isn't being
tied up in this case.
Thanks all. So it sounds like the answer is:
-
readfile()
has always been memory-safe as far as PHP is concerned. - Because it uses mmap(), GFL trying to understand its memory usage
from top. - Operating systems have gotten better at such things in the past decade.
- So given #2 and #3, the "readfile() will kill your memory, don't use
it" line is a persistent urban legend that belongs on Snopes as
debunked. Looping onfread()
for performance is a red herring.
Is that an accurate summary? If so, I will blog my benchmark results
and this conversation.
--Larry Garfield
Mmapping of course uses memory, but the memory used here is not from PHP's
memory manager, it's memory that's already used for the O/S cache. The
memory mapping used here does not enforce loading the file contents into O/S
cache; it just gets a virtual address into the O/S cache. If the actual file
contents are not yet in O/S cache, the O/S will hit a page fault and load
the pages into memory. Apache Server uses the same mechanism to serve files.Maybe the "user confusion" about memory usage comes from that fact (they see
lots of virtual memory used by PHP when viewed in top). I know this user
confusion from my work in the Apache Lucene/Solr project, where one option
(used on 64 bit operating systems) is to memory-map the whole Lucene
full-text index. Users then see hundreds of Gigabytes of "virtual memory"
usage in TOP / Windows Task Manager and are afraid of running their machine
out of memory. This is always hard to explain to people that are not used to
the term "virtual memory".That's a very good point and a detailed look at the stack can show
some of the underlying mechanics coming from readfile and how its
pretty much is just implemented like fpassthru delegating a stream, as
you said.About php memory usage, one has to use an external tools to actually see
this
memory usage as it is not managed by the zend memory manager.Of course...
Also, running valgrind and taking a closer look at what memory blocks
PHP is allocating here it can be better determined what's leaking and
what isn't, of course...I can certainly attest to users being deceived by memory readings in
top in the past. It can be very deceiving if you think look at what
free memory top orfree
shows you especially after some huge
allocation. There's a clear difference here though between the Zend
memory manager allocating these blocks in PHP and what's going on with
readfile. The memory the system knows is available to it isn't being
tied up in this case.
Hi Larry,
From my understanding this is correct, except the part about the code
history: I am not sure what PHP did in the past (before the new streams
API), so before saying that it was always this way, you should review
ext/std/file.c and main/streams.c in the GIT/SVN/CVS history! But the other
notes seem to be the reason for the "persistent urban legends" :-)
Uwe
P.S.: By the way, I will have a MMap blog post, too; focusing on the same
urban legends about to the use of Lucene's MMapDirectory in Apache Lucene
and Apache Solr. I am just a little bit overcrowded with work.
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany
-----Original Message-----
From: Larry Garfield [mailto:larry@garfieldtech.com]
Sent: Monday, April 30, 2012 7:16 PM
To: internals@lists.php.net
Subject: Re: [PHP-DEV]readfile()
memory usageThanks all. So it sounds like the answer is:
readfile()
has always been memory-safe as far as PHP is concerned.- Because it uses mmap(), GFL trying to understand its memory usage from
top.- Operating systems have gotten better at such things in the past decade.
- So given #2 and #3, the "readfile() will kill your memory, don't use
it" line is
a persistent urban legend that belongs on Snopes as debunked. Looping on
fread()
for performance is a red herring.Is that an accurate summary? If so, I will blog my benchmark results and
this
conversation.--Larry Garfield
Mmapping of course uses memory, but the memory used here is not from
PHP's memory manager, it's memory that's already used for the O/S
cache. The memory mapping used here does not enforce loading the file
contents into O/S cache; it just gets a virtual address into the O/S
cache. If the actual file contents are not yet in O/S cache, the O/S
will hit a page fault and load the pages into memory. Apache Server
uses
the same mechanism to serve files.Maybe the "user confusion" about memory usage comes from that fact
(they see lots of virtual memory used by PHP when viewed in top). I
know this user confusion from my work in the Apache Lucene/Solr
project, where one option (used on 64 bit operating systems) is to
memory-map the whole Lucene full-text index. Users then see hundreds of
Gigabytes of "virtual memory"
usage in TOP / Windows Task Manager and are afraid of running their
machine out of memory. This is always hard to explain to people that
are not used to the term "virtual memory".That's a very good point and a detailed look at the stack can show
some of the underlying mechanics coming from readfile and how its
pretty much is just implemented like fpassthru delegating a stream, as
you said.About php memory usage, one has to use an external tools to actually
see
this
memory usage as it is not managed by the zend memory manager.Of course...
Also, running valgrind and taking a closer look at what memory blocks
PHP is allocating here it can be better determined what's leaking and
what isn't, of course...I can certainly attest to users being deceived by memory readings in
top in the past. It can be very deceiving if you think look at what
free memory top orfree
shows you especially after some huge
allocation. There's a clear difference here though between the Zend
memory manager allocating these blocks in PHP and what's going on with
readfile. The memory the system knows is available to it isn't being
tied up in this case.--
To unsubscribe,
visit:
http://www.php.net/unsub.php
Hi Larry,
- So given #2 and #3, the "readfile() will kill your memory, don't use it"
line is a persistent urban legend that belongs on Snopes as debunked.
Looping onfread()
for performance is a red herring.
I implemented this earlier this very year to avoid memory issues (a
quick look at project history shows me working on it in January). The
difference between using readfile, and some convoluted method from the
documentation comments was clear and immediate: corrupted download
with out of memory error in the logs, to things working just fine.
Let me re-create with a simple test script and share my server details
before we call snopes :)
paul
Hi Larry,
- So given #2 and #3, the "readfile() will kill your memory, don't use it"
line is a persistent urban legend that belongs on Snopes as debunked.
Looping onfread()
for performance is a red herring.
I implemented this earlier this very year to avoid memory issues (a
quick look at project history shows me working on it in January). The
difference between using readfile, and some convoluted method from the
documentation comments was clear and immediate: corrupted download
with out of memory error in the logs, to things working just fine.Let me re-create with a simple test script and share my server details
before we call snopes :)
Fascinating. I even verified the md5sum of the file I got on the other
end just to be sure. I'll hold off on the blog post then. :-) I look
forward to your test setup.
--Larry Garfield
Hi Larry,
Fascinating. I even verified the md5sum of the file I got on the other end
just to be sure. I'll hold off on the blog post then. :-) I look forward
to your test setup.
The server in question is still on PHP 5.2.13
Script:
<?php
ini_set('memory_limit', '8M');
$name = uniqid()
. ".zip";
header('Content-type: application/zip');
header("Content-Disposition: attachment; filename="$name"");
readfile('../../filestorage/4f9e9e3b9bcff.zip');
File Information:
[user@host public]$ ls -alh ../../filestorage/4f9e9e3b9bcff.zip
-rw-r--r-- 1 apache apache 27M Apr 30 10:14 ../../filestorage/4f9e9e3b9bcff.zip
Error:
[Tue May 01 09:30:48 2012] [error] [client 198.136.162.2] PHP Fatal
error: Allowed memory size of 8388608 bytes exhausted (tried to
allocate 27617281 bytes) in
/home/lots/of/path.org/stuff/public/rf822.php on line 6
I'll try something newer, but I wanted to prove myself not crazy and
do it on the server in question first.
paul
On Tue, 01 May 2012 15:39:56 +0200, Paul Reinheimer
preinheimer@gmail.com wrote:
Fascinating. I even verified the md5sum of the file I got on the other
end just to be sure. I'll hold off on the blog post then. :-) I look
forward to your test setup.The server in question is still on PHP 5.2.13
[...]
I'll try something newer, but I wanted to prove myself not crazy and
do it on the server in question first.
Unfortunately, you've ignored Uwe's e-mail... The problem is not the PHP
version; the problem is that you're buffering unlimited amounts of data.
Check your configuration and make sure ob_get_level()
returns 0.
--
Gustavo Lopes
I know it doesn't really fit this problem in general, but I figured I
would point it out. Lighttpd introduced a brilliant concept by
letting the server serve that file directly. Basically, instead of
using readfile, you would just send a header: X-SendFile: $filename...
It's available for Apache (as a module:
http://www.jasny.net/articles/how-i-php-x-sendfile/) and NginX as
well.
The benefit is that the file never needs to be moved around in memory,
it can be directly mapped to the network card by the OS.
Just pointing it out (although it doesn't directly apply to the memory
usage, it solves the problem differently, and better)...
Anthony
On Tue, 01 May 2012 15:39:56 +0200, Paul Reinheimer preinheimer@gmail.com
wrote:Fascinating. I even verified the md5sum of the file I got on the other
end just to be sure. I'll hold off on the blog post then. :-) I look
forward to your test setup.The server in question is still on PHP 5.2.13
[...]
I'll try something newer, but I wanted to prove myself not crazy and
do it on the server in question first.Unfortunately, you've ignored Uwe's e-mail... The problem is not the PHP
version; the problem is that you're buffering unlimited amounts of data.
Check your configuration and make sureob_get_level()
returns 0.--
Gustavo Lopes
Hi All,
Unfortunately, you've ignored Uwe's e-mail... The problem is not the PHP
version; the problem is that you're buffering unlimited amounts of data.
Check your configuration and make sureob_get_level()
returns 0.
My apologies in the delay, ob_get_level()
returns 1, good catch.
phpinfo()
reports output_buffering as 4096
Does this push what I'm getting into expected behaviour?
paul
Hi All,
Unfortunately, you've ignored Uwe's e-mail... The problem is not the PHP
version; the problem is that you're buffering unlimited amounts of data.
Check your configuration and make sureob_get_level()
returns 0.My apologies in the delay,
ob_get_level()
returns 1, good catch.
phpinfo()
reports output_buffering as 4096Does this push what I'm getting into expected behaviour?
paul
It sounds like it. In that case the memory spike is happening in the
output buffer, where the file is streamed into by readfile()
in 8K
chunks until the output buffer explodes. :-)
So, I think we're back to "urban legend" territory.
--Larry Garfield
Hi,
It sounds like it. In that case the memory spike is happening in the output
buffer, where the file is streamed into byreadfile()
in 8K chunks until the
output buffer explodes. :-)So, I think we're back to "urban legend" territory.
That's good to know. Thanks, and my apologies for adding confusion to the issue.
One question, with a value of 4096 for the ini directive, shouldn't it
be flushing data to the client long before I run into memory issues?
What have I missed here.
thanks
paul
Hi,
It sounds like it. In that case the memory spike is happening in the output
buffer, where the file is streamed into byreadfile()
in 8K chunks until the
output buffer explodes. :-)So, I think we're back to "urban legend" territory.
That's good to know. Thanks, and my apologies for adding confusion to the issue.
One question, with a value of 4096 for the ini directive, shouldn't it
be flushing data to the client long before I run into memory issues?
What have I missed here.
That's an interesting catch, and the answer is here
:http://lxr.php.net/opengrok/xref/PHP_5_3/main/output.c#596
In fact, even if your ob layer is , say , 4K length, PHP will
reallocate it to fit the data written in it
(http://lxr.php.net/opengrok/xref/PHP_5_3/main/output.c#395) with a
block_size step
(http://lxr.php.net/opengrok/xref/PHP_5_3/main/output.c#402)
In short, having an output_buffer of, say, 4K lenght, will not prevent
PHP to allocate several Mb of data if the OB layer gets suddenly fed
with such amount of data, what readfile()
with MMAP strategy does
actually.
Mike could help as he's the main ob layer designer.
Bye :)
Julien.Pauli
Hi All,
Unfortunately, you've ignored Uwe's e-mail... The problem is not the
PHP
version; the problem is that you're buffering unlimited amounts of
data.
Check your configuration and make sureob_get_level()
returns 0.My apologies in the delay,
ob_get_level()
returns 1, good catch.
phpinfo()
reports output_buffering as 4096Does this push what I'm getting into expected behaviour?
paul
It sounds like it. In that case the memory spike is happening in the
output buffer, where the file is streamed into byreadfile()
in 8K
chunks until the output buffer explodes. :-)So, I think we're back to "urban legend" territory.
--Larry Garfield
Thanks for the sanity check, everyone. I've put together a blog post
with my findings. If anyone wants to check it to make sure I am not
saying anything grotesquely wrong before I posted it, that would be much
appreciated. :-) It's set to world-commentable:
https://docs.google.com/document/d/1qfe4OUc5lbuoSZFUh6NZYP-6pbaiquxnOFwN_oBccBI/edit
--Larry Garfield
Hi Everyone
So, I think we're back to "urban legend" territory.
I've updated the documentation for readfile()
to help send more people
down the path of checking for output buffering, and disabling that
rather than contriving loops with fread()
.
paul
--
Paul Reinheimer
Hi, Everyone
FYI: If you just want to check something before serving a file to the
client, you can also use something called xsendfile.
Apache: https://tn123.org/mod_xsendfile/
lighttpd: It's build in :)
nginx: http://wiki.nginx.org/XSendfile
Idea:
Do what you're doing in your php-script and add the header
"X-Sendfile: $yourFile" (nginx is using another header).
This header will trigger the plugin and the webserver will serve the
file instead of your php-process.
I personally use it on my webserver and it works quite fine. Debian
has a compiled package called libapache2-mod-xsendfile in version 0.9
Bye
Simon
Hi Everyone
So, I think we're back to "urban legend" territory.
I've updated the documentation for
readfile()
to help send more people
down the path of checking for output buffering, and disabling that
rather than contriving loops withfread()
.paul
--
Paul Reinheimer
Hi,
2012/5/11 Simon Schick simonsimcity@googlemail.com
Hi, Everyone
FYI: If you just want to check something before serving a file to the
client, you can also use something called xsendfile.Apache: https://tn123.org/mod_xsendfile/
lighttpd: It's build in :)
nginx: http://wiki.nginx.org/XSendfile
It's named "X-Accel" and build in too (as long as it's not explicitly
disabled)
http://wiki.nginx.org/Modules
http://wiki.nginx.org/X-accel
Idea:
Do what you're doing in your php-script and add the header
"X-Sendfile: $yourFile" (nginx is using another header).
This header will trigger the plugin and the webserver will serve the
file instead of your php-process.I personally use it on my webserver and it works quite fine. Debian
has a compiled package called libapache2-mod-xsendfile in version 0.9Bye
SimonOn Fri, May 4, 2012 at 2:33 PM, Paul Reinheimer preinheimer@gmail.com
wrote:Hi Everyone
So, I think we're back to "urban legend" territory.
I've updated the documentation for
readfile()
to help send more people
down the path of checking for output buffering, and disabling that
rather than contriving loops withfread()
.paul
--
Paul Reinheimer
Hi Larry,
Fascinating. I even verified the md5sum of the file I got on the other end
just to be sure. I'll hold off on the blog post then. :-) I look forward
to your test setup.The server in question is still on PHP 5.2.13
Script:
<?php
ini_set('memory_limit', '8M');
$name =uniqid()
. ".zip";
header('Content-type: application/zip');
header("Content-Disposition: attachment; filename="$name"");
readfile('../../filestorage/4f9e9e3b9bcff.zip');File Information:
[user@host public]$ ls -alh ../../filestorage/4f9e9e3b9bcff.zip
-rw-r--r-- 1 apache apache 27M Apr 30 10:14 ../../filestorage/4f9e9e3b9bcff.zipError:
[Tue May 01 09:30:48 2012] [error] [client 198.136.162.2] PHP Fatal
error: Allowed memory size of 8388608 bytes exhausted (tried to
allocate 27617281 bytes) in
/home/lots/of/path.org/stuff/public/rf822.php on line 6I'll try something newer, but I wanted to prove myself not crazy and
do it on the server in question first.
That's odd, because PHP 5.2 has identical code in this respect. Wez
committed these changes in 2002:
https://github.com/php/php-src/commit/a662f012bba5a6fdc50533673f3fff47bf9af219#diff-5
So it has been like this for quite a while. Does that server have
implicit unlimited output buffering turned on in your ini file?
-Rasmus
Hi Larry,
Fascinating. I even verified the md5sum of the file I got on the other end
just to be sure. I'll hold off on the blog post then. :-) I look forward
to your test setup.The server in question is still on PHP 5.2.13
Script:
<?php
ini_set('memory_limit', '8M');
$name =uniqid()
. ".zip";
header('Content-type: application/zip');
header("Content-Disposition: attachment; filename="$name"");
readfile('../../filestorage/4f9e9e3b9bcff.zip');File Information:
[user@host public]$ ls -alh ../../filestorage/4f9e9e3b9bcff.zip
-rw-r--r-- 1 apache apache 27M Apr 30 10:14 ../../filestorage/4f9e9e3b9bcff.zipError:
[Tue May 01 09:30:48 2012] [error] [client 198.136.162.2] PHP Fatal
error: Allowed memory size of 8388608 bytes exhausted (tried to
allocate 27617281 bytes) in
/home/lots/of/path.org/stuff/public/rf822.php on line 6I'll try something newer, but I wanted to prove myself not crazy and
do it on the server in question first.That's odd, because PHP 5.2 has identical code in this respect. Wez
committed these changes in 2002:https://github.com/php/php-src/commit/a662f012bba5a6fdc50533673f3fff47bf9af219#diff-5
So it has been like this for quite a while. Does that server have
implicit unlimited output buffering turned on in your ini file?
And actually, that patch just made it use streams. Even before the
internal streams API the function worked the same way. In PHP 4.2 it
used php_passthru_fd which looked like this:
https://github.com/php/php-src/blob/PHP-4.2.0/ext/standard/file.c#L1526
Sascha added the mmap implementation we still use today in 1999:
https://github.com/php/php-src/commit/dda0b783df7d849df01fa831febbc1e34b5b8dd3
But even prior to that readfile would still buffer in 8k chunks. The PHP
2.0.1 implementation (very scary to look at code I wrote 17 or 18 years
ago):
/*
-
Read a file and write the ouput to stdout
*/
void ReadFile(void) {
Stack *s;
char buf[8192],temp[8];
FILE *fp;
int b,i, size;s = Pop(); if(!s) { Error("Stack error in ReadFile"); return; } if(!*(s->strval)) { Push("-1",LNUMBER); return; }
#if DEBUG
Debug("Opening [%s]\n",s->strval);
#endif
StripSlashes(s->strval);
#if PHP_SAFE_MODE
if(!CheckUid(s->strval,1)) {
Error("SAFE MODE Restriction in effect. Invalid owner
of file to be read.");
Push("-1",LNUMBER);
return;
}
#endif
fp = fopen(s->strval,"r");
if(!fp) {
Error("ReadFile("%s") - %s",s->strval,strerror(errno));
Push("-1",LNUMBER);
return;
}
size= 0;
php_header(0,NULL);
while((b = fread(buf, 1, sizeof(buf), fp)) > 0) {
for(i = 0; i < b; i++)
PUTC(buf [i]);
size += b ;
}
fclose(fp);
sprintf(temp,"%d",size);
Push(temp,LNUMBER);
}
-Rasmus
Hi Larry,
- So given #2 and #3, the "readfile() will kill your memory, don't use
it"
line is a persistent urban legend that belongs on Snopes as debunked.
Looping onfread()
for performance is a red herring.I implemented this earlier this very year to avoid memory issues (a
quick look at project history shows me working on it in January). The
difference between using readfile, and some convoluted method from the
documentation comments was clear and immediate: corrupted download
with out of memory error in the logs, to things working just fine.Let me re-create with a simple test script and share my server details
before we call snopes :)
Are you sure that you are not using ob_start()
/ob_flush()/... (output
buffering)? If this is the case, readfile writes to the output, but buffers
everything in memory, as the PHP output buffering is active?
Uwe
So, I've been reading articles for a decade now that say that
readfile()
is
great and wonderful except for memory usage. Specifically, that it reads a
file into memory entirely, and then prints it to stdout from there. So if
you're outputing a big file you will hit your memory limit and kill the
server. Thus, one should always loop overfread()
instead. The most recent
article I found saying that was from 2007, with a StackExchange thread
saying the same from 2011. I've even found mention of it in old PHP Bugs.
Well, this is not true, but I haven't seen the StackExchange thread
you're referring to (I won't bother to comment on the specifics of a
discussion I wasn't privy to). I can say that readfile uses PHP's
streams, actually. So it's to the same effect. If you're just writing
more code to this yourself, you haven't done anything different than
what readfile()
already does. Here's your proof:
http://lxr.php.net/opengrok/xref/PHP_5_4/ext/standard/file.c#1346
However, I cannot replicate that in my own testing. Earlier today I was
running some benchmarks of different file streaming techniques in PHP (5.3.6
specifically) and found thatfread()
looping,fpassthru()
,readfile()
, and
stream_copy_to_stream()
perform almost identically on memory, and all are
identical on CPU except forfread()
which is slower, which makes sense since
you're looping in PHP space.
That's because they're all using pretty much the same code in PHP :)
What's more, I cranked my memory limit down to 10 MB and then tried
streaming a 20 MB file. No change. The PHP peak memory never left around a
half-meg or so, most of which I presume is just the Apache/PHP overhead.
But it's not actually possible forreadfile()
to be buffering the whole
file into memory before printing and not die if the file is bigger than the
memory limit. I verified that the data I'm getting downloaded from the
script is correct, and exactly matches the file that it should be streaming.
You're absolutely correct. readfile cleans up after itself. It's
basically just a stream that reads from the file and sends the output
directly for you. It doesn't load the entire file into memory first,
which is what file_get_contents()
does, for example. Therefore you
aren't using any more memory in PHP with readfile then if you did a
simple fopen(...), while(!feof($fp)) echo fread(...); fclose()
; in
PHP. Here's a small test you can use to demonstrate that point
effectively.
This code uses readfile()
to read a 10MB file to output. Here's a
snapshot of the result of running this script on my local server in
Chrome.
What you can see here is that we transferred 10MB of data (albeit I
have compression with gzip) without ever peaking over roughly 270KB of
memory usage in PHP.
Now here's the result of using the same code except replace readfile()
with file_get_contents()
;
(Keep in mind I didn't actually output the file here, but I was
demonstrating the clear difference in memory consumption). Now,
file_get_contents()
is still actually using the same streaming
capabilities PHP offers, with the exception being it's calling
fopen()
, fread()
, and fclose()
in one go and without cleaning up the
memory. The purpose is to store/return the file contents in memory as
a string so that you can do something with it in your code. The
purpose of readfile()
is just to output the file (you wouldn't care
about doing anything with it in your code here). So the two have very
different use-cases.
My first thought was that this is yet another case of PHP improving and
fixing a long-standing bug, but somehow the rest of the world not knowing
about it so "conventional wisdom" persists long after it's still wise.
However, I found no mention ofreadfile()
in the PHP 5 change log[1] at all
aside from one note from back in 5.0.0 Beta 1 about improving performance
under Windows. (I'm on Linux.)
I believe you're talking about Wez's commit (that was way back during
the PHP5 beta days). I dug up another one that had to do with a minor
memory leak issue with readfile()
around the same time. That was
basically an issue with how PHP handles memory clean up and shutdown
after the user has aborted the connection
(http://us.php.net/manual/en/function.ignore-user-abort.php)
So, what's going on here? Has
readfile()
been memory-safe for that long
without anyone noticing? Is my test completely flawed (although I don't see
how since I can verify that the code works as expected)? Something else?Please un-confuse me!
I hope this may shed at least some light as to your confusion.
Please let me know if there's anything else I missed.
(Note: Sending this to internals since this is an engine question, and I am
more likely to reach whoever it was that un-suckedreadfile()
sometime in
the silent past that way. <g>)--Larry Garfield