Hi,
Please find attached a patch for adding large file size support to PHP
5.5.1.
Basically, it allows 32 bits machine to address file larger than 4GB, get
correct results when asking for their size, allows to read and write them,
etc...
It does so by, from the PHP's side, getting double instead of int for the
file size/ offset functions, when the size is larger than 2^31.
This means that files with size:
- up to 2^32 bytes works as previously (integer returned / used)
- up to 2^52 bytes can be handled correctly (double's mantissa is 52 bits,
no loss in precision here) - from 2^52 up to 2^64 will have their size rounded, yet, reading and
writing will work as expected since it's done in the PHP's binary.
The changes are:
- Some size_t are changed to off_t wherever required.
- The code with unique mmap now loops the mmap until the complete
file/stream is done processing - Fix for the mmap of a popen's pipe that can't work (unrelated, but easy
to fix while working on the mmap code) - Change the return type based on the actual range of the manipulated
number (so if the value fit in a integer, a integer is used, and the code
that used to work still works, but if it does not fit, a double is used,
and the code that used to fails now works)
Please notice that I'm not a PHP developer, I don't have any time left for
maintaining this patch, but I'm sure this patch has a value. So, I deny any
right on it and put it in public domain for whoever wants to improve it,
integrate it.
Let me know your remarks.
Cyril
Hi,
Please find attached a patch for adding large file size support to
PHP 5.5.1.
The patch didn't make it. Please send as text/plain (i.e. using .txt
extension)
It does so by, from the PHP's side, getting double instead of int for
the file size/ offset functions, when the size is larger than 2^31.
This has some problems - for further handling onemight need the exact
file size (i.e. content length headers, checking structures, reading
specific positions)
This means that files with size:
- up to 2^32 bytes works as previously (integer returned / used)
- up to 2^52 bytes can be handled correctly (double's mantissa is 52
bits, no loss in precision here)
This might work for the initial operation, but as soon as the user does
a calculation ("give me the last ten bytes") this will cause issues.
- from 2^52 up to 2^64 will have their size rounded, yet, reading and
writing will work as expected since it's done in the PHP's binary.The changes are:
- Some size_t are changed to off_t wherever required.
This disqualifies from 5.5 and allows use in 5.6 only.
johannes
On Fri, Aug 30, 2013 at 4:52 PM, Johannes Schlüter
johannes@schlueters.de wrote:
Hi,
Please find attached a patch for adding large file size support to
PHP 5.5.1.The patch didn't make it. Please send as text/plain (i.e. using .txt
extension)
Done.
It does so by, from the PHP's side, getting double instead of int for
the file size/ offset functions, when the size is larger than 2^31.This has some problems - for further handling onemight need the exact
file size (i.e. content length headers, checking structures, reading
specific positions)This means that files with size:
- up to 2^32 bytes works as previously (integer returned / used)
- up to 2^52 bytes can be handled correctly (double's mantissa is 52
bits, no loss in precision here)This might work for the initial operation, but as soon as the user does
a calculation ("give me the last ten bytes") this will cause issues.
No it won't. double mantissa are integers, works like integer.
So you have the complete 52 bits of precision here, not a single
rounding error can occur.
The exponent in that case will be 0, so "2^0 * integer_with_52_bits"
is still an integer with 52 bits of precision.
So the "filesize - 10" is still exact as long as the filesize is less
than 2^52 (see below when it's larger)
- from 2^52 up to 2^64 will have their size rounded, yet, reading and
writing will work as expected since it's done in the PHP's binary.
I'll try to improve this, since it's causing misunderstanding.
Typically, when you start storing a value that does not fit in 52
bits, it's shifted (understand: round to the closed multiple of two)
until it fits the range.
So for example, a file size that's "2^53 + 203" will be stored as
9007199254741196 instead of 9007199254741195 (the error is 1 here)
Yet, the reported value will be wrong, but if you seek close to the
end position and read it you'll still be able to read it completely
(since the <wrong> double value will be converted back to <wrong>
64-bit integer in the C code). As long as you're dealing with
positions/seek in a multiple of 2^52, you'll be fine.
So if you need 100% correct value for such a large file, you can still
do it by looping, AT WORST, 4096 times ( = 2^64 / 2^52) a seek with
SEEK_CUR.
Not a real showstopper when the current version does not even allow
you to know the actual size of the file, not even speaking of reading
it!)
If you either have to handle such large file NOW, then chance are high
you're already using a 64 bits system.
This patch if for us, poor souls, struck with 32 bits system, yet
wanting to report correct size for our movies in our web-based file
manager, wanting to stream correctly such files and so on.
The changes are:
- Some size_t are changed to off_t wherever required.
This disqualifies from 5.5 and allows use in 5.6 only.
Ok
Cyril