Hi,
I'm submitting a patch to perform "on the fly" MD5/SHA1 digest
calculation of a file uploaded via the HTTP POST method. Being
not uncommon for applications to require some digest of a freshly
uploaded file, doing the math directly in the buffer where the file is
being read can save some time.
Digest calculation is triggered by setting the special input fields
COMPUTE_MD5 and/or COMPUTE_SHA1 to a non-zero value:
(note that these assignments must precede the
<input type="file" name=...> field, as in the MAX_FILE_SIZE case.)
The result is found in the special variables
$_FILES[userfile]["md5"] and $_FILES[userfile]["sha1"].
These variables are only defined upon request of the corresponding
digest.
The patch was produced against the current CVS version of rfc1867.c
(PHP_5_0 branch, v1.160).
Cheers,
David
David Santinoli, Milano + david@santinoli.com
Independent Linux/Unix consultant + http://www.santinoli.com
I'm submitting a patch to perform "on the fly" MD5/SHA1 digest
calculation of a file uploaded via the HTTP POST method. Being
not uncommon for applications to require some digest of a freshly
uploaded file, doing the math directly in the buffer where the file is
being read can save some time.
I like this idea, and the patch looks (from a glance OK). Without
applying it I have a question though:
Does it handle cleaning up the digest structures correctly if the
fileupload is aborted in some way?
regards,
Derick
I like this idea, and the patch looks (from a glance OK). Without
applying it I have a question though:Does it handle cleaning up the digest structures correctly if the
fileupload is aborted in some way?
Do you refer to the PHP_MD5_CTX and PHP_SHA1_CTX contexts?
AFAIK, given they're automatic variables with no malloc()ed buffers
inside them, it doesn't look like an explicit cleanup is needed.
Cheers,
David
PS I'm on the list! CC:ing to my private address is unnecessary.
David Santinoli, Milano + david@santinoli.com
Independent Linux/Unix consultant + http://www.santinoli.com
-1 it seems somewhat redundant given that this signature can be easily
generated via a single function call, md5_file()
or sha1_file()
.
Ilia
Hi,
I'm submitting a patch to perform "on the fly" MD5/SHA1 digest
calculation of a file uploaded via the HTTP POST method. Being
not uncommon for applications to require some digest of a freshly
uploaded file, doing the math directly in the buffer where the file is
being read can save some time.Digest calculation is triggered by setting the special input fields
<input type="hidden" name="COMPUTE_SHA1" value="1">
COMPUTE_MD5 and/or COMPUTE_SHA1 to a non-zero value:(note that these assignments must precede the
<input type="file" name=...> field, as in the MAX_FILE_SIZE case.)The result is found in the special variables
$_FILES[userfile]["md5"] and $_FILES[userfile]["sha1"].
These variables are only defined upon request of the corresponding
digest.The patch was produced against the current CVS version of rfc1867.c
(PHP_5_0 branch, v1.160).Cheers,
David
Ilia Alshanetsky wrote:
-1 it seems somewhat redundant given that this signature can be easily
generated via a single function call,md5_file()
orsha1_file()
.Ilia
I think his point was that the hash is calculated as the file is being
uploaded, which saves having to read the whole file a second time after
uploading is finished. For big files, and repetitious uploading, this
could probably save on the CPU cycles and I/O load a bit.
I'd like to see an ini setting to disable this, though.
Dave
Hi,
I'm submitting a patch to perform "on the fly" MD5/SHA1 digest
calculation of a file uploaded via the HTTP POST method. Being
not uncommon for applications to require some digest of a freshly
uploaded file, doing the math directly in the buffer where the file is
being read can save some time.Digest calculation is triggered by setting the special input fields
<input type="hidden" name="COMPUTE_SHA1" value="1">
COMPUTE_MD5 and/or COMPUTE_SHA1 to a non-zero value:(note that these assignments must precede the
<input type="file" name=...> field, as in the MAX_FILE_SIZE case.)The result is found in the special variables
$_FILES[userfile]["md5"] and $_FILES[userfile]["sha1"].
These variables are only defined upon request of the corresponding
digest.The patch was produced against the current CVS version of rfc1867.c
(PHP_5_0 branch, v1.160).Cheers,
David
I think his point was that the hash is calculated as the file is being
uploaded, which saves having to read the whole file a second time after
uploading is finished. For big files, and repetitious uploading, this
could probably save on the CPU cycles and I/O load a bit.
That is true, but to really validate the file you'd still want to check the
data you have on disk, rather then what PHP supposedly saved to disk.
Adding hidden fields (that may conflict with the ones some people already use)
also seems like a recipe for trouble.
Ilia
I think his point was that the hash is calculated as the file is
being uploaded, which saves having to read the whole file a second
time after uploading is finished. For big files, and repetitious
uploading, this could probably save on the CPU cycles and I/O load a
bit.
Exactly.
That is true, but to really validate the file you'd still want to
check the data you have on disk, rather then what PHP supposedly saved
to disk.
Actually, I don't see this as a way to validate uploaded files; rather,
I consider it an optimized replacement for the upload -> md5_file()
sequence.
Adding hidden fields (that may conflict with the ones some people
already use) also seems like a recipe for trouble.
I admit that's a good point, although I can't really see many possible
uses for a hidden field named COMPUTE_MD5 in a file upload form - I bet
those people would be glad to switch to the new facility. ;-)
The suggestion by Derrell Lipman - requesting the hash calculation
by adding an attribute to the INPUT element - has some merit, but I
don't know how viable this road is (in terms of functionalities already
present in the PHP core), and I'd also like to stick to "standard" HTML.
Cheers,
David
David Santinoli, Milano + david@santinoli.com
Independent Linux/Unix consultant + http://www.santinoli.com
David Santinoli u235@libero.it writes:
Hi,
I'm submitting a patch to perform "on the fly" MD5/SHA1 digest
calculation of a file uploaded via the HTTP POST method. Being
not uncommon for applications to require some digest of a freshly
uploaded file, doing the math directly in the buffer where the file is
being read can save some time.
It's a nice feature to be able to compute the digest on the fly. However...
Digest calculation is triggered by setting the special input fields
<input type="hidden" name="COMPUTE_SHA1" value="1">
COMPUTE_MD5 and/or COMPUTE_SHA1 to a non-zero value:(note that these assignments must precede the
<input type="file" name=...> field, as in the MAX_FILE_SIZE case.)
... this seems like a somewhat kludgy method of specifying what you want done.
Specifically, it seems kludgy to add a hidden input field with a specific
name. (What if there are multiple type="file" input elements and only one of
them is to have its digest computed?)
A cleaner method might be (if you can get the information returned to you)
to add an attribute to the input element, something like:
<input type="file" digest="md5" ...>
At least this keeps the request with the element to which it applies. It's
still a bit kludgy in that it's adding a "digest" attribute. I suspect you'll
have trouble getting that digest attribute returned to you by the browser,
though.
I'd be inclined to keep things simple and call the digest functions in the
normal way rather than "extending" HTML syntax.
I do like the concept, though... Hmmm...
Derrell