Hello internals! Thanks for PHP!
I'm writing to gauge interest in two new functions to the PHP hash
extension, hash_serialize
and hash_unserialize
. These functions would
serialize and unserialize the internals of a HashContext object, allowing a
partially-computed hash to be saved, then restored and completed in a later
run.
EXAMPLE: Multi-part upload.
Say that a very large file is uploaded in pieces, big.001
through
big.999
, and it is necessary to compute the SHA256 of the final
concatenated file.
Current PHP must compute the hash in one go:
$ctx = hash_init("sha256");
for ($i = 1; $i <= 999; ++$i) {
hash_update_file($ctx, sprintf("big.%.03d", $i));
}
$hash = hash_final($ctx);
This in turn requires that all pieces be on the filesystem simultaneously.
With hash_serialize and hash_unserialize, the hash can be computed
gradually, allowing pieces to be deleted as they are uploaded elsewhere.
$ctx = hash_init("sha256");
hash_update_file($ctx, "big.001");
SAVE_TO_DATABASE(hash_serialize($ctx));
...
$ctx = hash_unserialize(LOAD_FROM_DATABASE());
hash_update_file($ctx, "big.002");
SAVE_TO_DATABASE(hash_serialize($ctx));
...
etc.
I am happy to write up an RFC for these functions. An initial
implementation with tests is visible here:
https://github.com/kohler/php-src/commit/5a3a828f90b88cd7f660babec7db531cfc04b0a1
New functions hash_serialize
and hash_unserialize
appear to fit the
existing API well, and simplify implementation, but it's possible that
__serialize/__unserialize
or the internal serialize/unserialize
functions would be preferred.
I'd be grateful for any feedback.
Thanks!
Eddie Kohler
I'm writing to gauge interest in two new functions to the PHP
hash
extension,hash_serialize
andhash_unserialize
. These functions
would serialize and unserialize the internals of a HashContext
objectallowing a partially-computed hash to be saved, then restored
and completed in a laterrun.
I would suggest to make the HashContext Serializable, then
serialize($hash_context);
works. Then it also fits when stored in other objects or something ...
johannes
Thanks for this suggestion. I've updated the implementation to make
HashContext implement Serializable.
I'd still be grateful for more feedback, or perhaps I should just create an
RFC?
Eddie
On Mon, Jun 8, 2020 at 9:28 AM Johannes Schlüter johannes@schlueters.de
wrote:
I'm writing to gauge interest in two new functions to the PHP
hash
extension,hash_serialize
andhash_unserialize
. These functions
would serialize and unserialize the internals of a HashContext
objectallowing a partially-computed hash to be saved, then restored
and completed in a laterrun.I would suggest to make the HashContext Serializable, then
serialize($hash_context);
works. Then it also fits when stored in other objects or something ...
johannes
Thanks for this suggestion. I've updated the implementation to make
HashContext implement Serializable.I'd still be grateful for more feedback, or perhaps I should just create an
RFC?
Not sure if that would need an RFC; maybe just start by submitting a
pull request. :)
Thanks,
Christoph
Eddie
On Mon, Jun 8, 2020 at 9:28 AM Johannes Schlüter johannes@schlueters.de
wrote:I'm writing to gauge interest in two new functions to the PHP
hash
extension,hash_serialize
andhash_unserialize
. These functions
would serialize and unserialize the internals of a HashContext
objectallowing a partially-computed hash to be saved, then restored
and completed in a laterrun.I would suggest to make the HashContext Serializable, then
serialize($hash_context);
works. Then it also fits when stored in other objects or something ...
johannes
On Thu, Jun 11, 2020 at 11:59 AM Eddie Kohler kohler@seas.harvard.edu
wrote:
Thanks for this suggestion. I've updated the implementation to make
HashContext implement Serializable.I'd still be grateful for more feedback, or perhaps I should just create
an RFC?Be careful what you ask for. :)
Overall +1 on the concept with a few notes:
- Please put this on a branch and make it a PR so we can comment on it
directly. - Consider using zend_parse_parameters_throws() and family so that the
exception which is thrown contains the type error information rather than
the generic RETURN_THROWS() macros. - Consider using hex or base64 to serialize the contexts. This will
reduce various transport/storage issues. - It's great that you've thought about endianness, but the current
implementation simply bails on endian mismatch. It'd be a nice-to-have for
the user if these serializations were portable. I know this represents a
lot of work for sort of an edge case so I won't hold it against you if you
say 'no' and/or save this for later work if demand surfaces. - Storing $key makes me nervous. I don't have a good solution to this
since the deserialization doesn't actually give us a chance to specify it
in the deserialization process. I wish I'd made $key/hmac an option to
hash_final rather than hash_init. Maybe we can think about allowing that
to be specified at either end. Let's expand on this topic while you work
on your RFC. - Yeah... I think you need an RFC because of #5. Sorry.
- TABS v SPACES indentation issues.
-Sara
Hi all,
I've opened up a pull request and responded to this message there. I'd love
any further comments.
https://github.com/php/php-src/pull/5702
Eddie
WARNING: Harvard cannot validate this message was sent from an authorized
system. Please be careful when opening attachments, clicking links, or
following instructions. For more information, visit the HUIT IT Portal and
search for SPF.On Thu, Jun 11, 2020 at 11:59 AM Eddie Kohler kohler@seas.harvard.edu
wrote:Thanks for this suggestion. I've updated the implementation to make
HashContext implement Serializable.I'd still be grateful for more feedback, or perhaps I should just create
an RFC?Be careful what you ask for. :)
Overall +1 on the concept with a few notes:
- Please put this on a branch and make it a PR so we can comment on it
directly.- Consider using zend_parse_parameters_throws() and family so that the
exception which is thrown contains the type error information rather than
the generic RETURN_THROWS() macros.- Consider using hex or base64 to serialize the contexts. This will
reduce various transport/storage issues.- It's great that you've thought about endianness, but the current
implementation simply bails on endian mismatch. It'd be a nice-to-have for
the user if these serializations were portable. I know this represents a
lot of work for sort of an edge case so I won't hold it against you if you
say 'no' and/or save this for later work if demand surfaces.- Storing $key makes me nervous. I don't have a good solution to this
since the deserialization doesn't actually give us a chance to specify it
in the deserialization process. I wish I'd made $key/hmac an option to
hash_final rather than hash_init. Maybe we can think about allowing that
to be specified at either end. Let's expand on this topic while you work
on your RFC.- Yeah... I think you need an RFC because of #5. Sorry.
- TABS v SPACES indentation issues.
-Sara