Opening https://wiki.php.net/rfc/password_registry for discussion.
It's all in the elevator pitch, but the TL;DR is to make
password_hash()
/password_verify() into a more easily extensible API
for multiple hashing mechanisms. Critically, this would allow us to
include new library dependent mechanisms (such as those found in
libsodium and libhydrogen) without actually forcing a library
dependency on the core build.
I don't consider the current internal API proposal fixed,
particularly, I'm not too keen on the algorithm identification. What
I've presented is a callback for a mechanism to say "Yes, I can verify
that signature", but this means we must ask all mechanisms. A more
direct means might involve "search for /^$mechanismName$/, but not
only is this already insuffcient for bcrypt (identified by $2y$), but
it'll probably be worse later on. If anyone has better ideas here,
I'm totes open.
-Sara
I don't consider the current internal API proposal fixed,
particularly, I'm not too keen on the algorithm identification. What
I've presented is a callback for a mechanism to say "Yes, I can verify
that signature", but this means we must ask all mechanisms. A more
direct means might involve "search for /^$mechanismName$/, but not
only is this already insuffcient for bcrypt (identified by $2y$), but
it'll probably be worse later on. If anyone has better ideas here,
I'm totes open.
As I understand it, the purpose of the $foo$ syntax is to uniquely identify
each algorithm, so would it make sense to pass the prefix string to the
register call, and maintain a lookup table internally of prefix => handler?
struct php_password_algo {
const char* name; // Symbolic name of the algorithm, e.g. "argon2id"
const char* prefix; // Prefix used for hashes in this algorithm, e.g. "2y"
zend_string* (hash)(const zend_string password, zend_array* options);
zend_bool (verify)(const zend_string password, const zend_string* hash);
zend_bool (needs_rehash)(const zend_string hash, zend_array *options);
int (*get_info)(zval return_value, const zend_string hash);
}
If an extension wants to reuse an implementation for more than one prefix
(e.g. minor variations in algorithm) it can just register multiple
"handlers" which happen to have the same function pointers; and if multiple
extensions try to register for the same prefix, the error can be detected
immediately at startup.
Determining the algorithm would then involve extracting the prefix from the
hash and looking it up in the registry.
You mention ext/sodium checking if ext/standard has already registered some
or all of its algorithms, but don't specify a method to do so; if the
registry was prefix-oriented, there could be a standard API such as:
PHPAPI const php_password_algo* php_password_algo_for_prefix(const
char* prefix);
Or simply:
PHPAPI zend_bool php_password_algo_is_registered(const char* prefix);
Regards,
Rowan Collins
[IMSoP]
As I understand it, the purpose of the $foo$ syntax is to uniquely identify
each algorithm, so would it make sense to pass the prefix string to the
register call, and maintain a lookup table internally of prefix => handler?
If that's an assumption we can definitely make, then absolutely, that
simplifies a lot of the design. When preparing this I wasn't 100%
confident that this would hold true (maybe
$foo${hashspecificstuff},v=1 is different from
$foo${hashspecificstuff},v=2), and I didn't want to create a problem
for future us. But I suppose we can always extend the protocol if it
comes to that.
You mention ext/sodium checking if ext/standard has already registered some
or all of its algorithms, but don't specify a method to do so; if the
registry was prefix-oriented, there could be a standard API such as:
Yeah, I realize I left out a means to actually enumerate the
registered method internally, that certainly needs to be added, even
if we use deterministic identifiers.
-Sara
On Tue, Oct 16, 2018 at 8:43 AM Rowan Collins rowan.collins@gmail.com
wrote:As I understand it, the purpose of the $foo$ syntax is to uniquely
identify
each algorithm, so would it make sense to pass the prefix string to the
register call, and maintain a lookup table internally of prefix =>
handler?If that's an assumption we can definitely make, then absolutely, that
simplifies a lot of the design. When preparing this I wasn't 100%
confident that this would hold true (maybe
$foo${hashspecificstuff},v=1 is different from
$foo${hashspecificstuff},v=2), and I didn't want to create a problem
for future us. But I suppose we can always extend the protocol if it
comes to that.
The format seems to have become a de facto standard, with the first part
identifying the algorithm, and later parts algorithm-specific. Among the
references on Wikipedia are:
- a man page discusssing glibc's implementation of the
crypt()
function [1]
describing the format as $id$salt$encrypted - a document from the Password Hashing Competition [2] describing it as
$<id>[$<param>=<value>(,<param>=<value>)*][$<salt>[$<hash>]]
I think encoding that assumption in PHP is following the general consensus,
and in the absence of a stronger standard we're not obliged to follow an
implementation that varies from that consensus anyway. In other words, if
someone implements MyAwesomeHash as a C library that doesn't use a unique
prefix of the appropriate form, the PHP wrapper can always prepend
$MyAwesomeHash$ to the C library's output, and strip it off again when
verifying.
In your $foo${hashspecificstuff},v=2 case, the handler could register for
prefix "foo", then delegate internally by parsing out the v=1 or v=2 token.
The chance of those two algorithms being in separate extensions which are
completely unaware of each other seems pretty slim.
[1] http://man7.org/linux/man-pages/man3/crypt.3.html
[2] https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md
Regards,
Rowan Collins
[IMSoP]
As I understand it, the purpose of the $foo$ syntax is to uniquely identify
each algorithm, so would it make sense to pass the prefix string to the
register call, and maintain a lookup table internally of prefix => handler?If that's an assumption we can definitely make, then absolutely, that
simplifies a lot of the design. When preparing this I wasn't 100%
confident that this would hold true (maybe
$foo${hashspecificstuff},v=1 is different from
$foo${hashspecificstuff},v=2), and I didn't want to create a problem
for future us. But I suppose we can always extend the protocol if it
comes to that.I think encoding that assumption in PHP is following the general consensus, and in the absence of a stronger standard we're not obliged to follow an implementation that varies from that consensus anyway. In other words, if someone implements MyAwesomeHash as a C library that doesn't use a unique prefix of the appropriate form, the PHP wrapper can always prepend $MyAwesomeHash$ to the C library's output, and strip it off again when verifying.
Thanks for the research! I was hoping this was the case, I just
hadn't managed to confirm it.
Unfortunately, I just sat down to implement it and noticed that we
have explicit test cases which verify that only hashes with a prefix
of "$2y" and a length of precisely 60 are identified as bcrypt. So
either we need to loosen that check (I'm trying to avoid BC breaks
here), or we create additional identification logic.
My personal take is that, given a preference for maintaining BC, we
should should keep a more generalized ident callback, even though the
fixed prefix is cleaner and more expedient. If we're willing to break
some BC, then I'd also vote in favor of making the algo identifiers
be strings instead of numbers.
-Sara
Unfortunately, I just sat down to implement it and noticed that we
have explicit test cases which verify that only hashes with a prefix
of "$2y" and a length of precisely 60 are identified as bcrypt. So
either we need to loosen that check (I'm trying to avoid BC breaks
here), or we create additional identification logic.
Hm... what does length != 60 currently generate - presumably it's just an
"unrecognised format" error of some sort?
If so, we could keep BC by having a validate method in each handler, but
only call it for hashes with the given prefix, and return an error if it
returns false.
So in PHP terms:
$prefix = extract_prefix($hash);
$handler = $registry[$prefix];
if ( is_null($handler) || ! $handler->validate($hash) ) {
throw new UnrecognisedHashError;
}
This would also allow handlers to reject other invalid strings, such as
$knownAlgo$nonExistentOption=error$abc123
Regards,
Rowan Collins
[IMSoP]
Unfortunately, I just sat down to implement it and noticed that we
have explicit test cases which verify that only hashes with a prefix
of "$2y" and a length of precisely 60 are identified as bcrypt. So
either we need to loosen that check (I'm trying to avoid BC breaks
here), or we create additional identification logic.Hm... what does length != 60 currently generate - presumably it's just an "unrecognised format" error of some sort?
For the purposes of rehash/verify, an "unknown algorithm" is
equivalent to bcrypt as a fallback. password_get_info()
makes the
distinction, however.
If so, we could keep BC by having a validate method in each handler, but only call it for hashes with the given prefix, and return an error if it returns false.
That would address this, and provide that more-general mechanism. +1
-Sara
If so, we could keep BC by having a validate method in each handler, but only call it for hashes with the given prefix, and return an error if it returns false.
That would address this, and provide that more-general mechanism. +1
I think you've also convinced me it's worth switching to string
identifiers from the current integers, even with the minor BC break.
I've updated the PR and RFC to reflect this.
Opening https://wiki.php.net/rfc/password_registry for discussion.
Should the registry support password hashing mechanisms defined in script
code? (I don't think so, but feel free to disagree)
Not for disagreeing but for the discussion: allowing userland to provide
algos would allow providing polyfills. This is proven useful already (see
https://github.com/symfony/polyfill/)
Of course, a userland algo should not be allowed to replace an existing
algo.
Nicolas