Random string generation (á la password_make_salt)

12 years ago by Nikita Popov — view source

unread

Hi all,

I just want to throw a quick thought in here:

The password API proposal includes a function called
password_make_salt(), that basically creates a random string, either
in raw binary form, or in the bcrypt salt format. Personally I don't
see much use for the function in the salt context as the password API
already generates the salt all by itself, but I do see a lot of use
for a random string function in general. People commonly want to
create random strings according to some format. Like CSRF tokens, ids,
etc.

So my thought was to drop password_make_salt() and instead add some
kind of generalized random_string() function:

// this is a 20 byte random binary string
$str = random_string(20);

// ten random hex characters
$str = random_string(10, "0123456789ABCDEF");

// 15 characters from the bcrypt alphabet 0-9a-zA-Z./
$str = random_string(15,

"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ./");

// if it's not too hard to implement, one could support this kind

of shortcut:
$str = random_string(15, "0-9a-zA-Z./");

Thoughts?

Nikita

12 years ago by Andrew Faulds — view source

unread

This sounds very useful. To make it easier to use, why not also add
some string constants, something like CHARS_HEX, CHARS_BASE64,
CHARS_DECIMAL, etc? Then you could just do random_string(24, CHARS_HEX); to get a 24-char hex string.

Hi all,

I just want to throw a quick thought in here:

The password API proposal includes a function called
password_make_salt(), that basically creates a random string, either
in raw binary form, or in the bcrypt salt format. Personally I don't
see much use for the function in the salt context as the password API
already generates the salt all by itself, but I do see a lot of use
for a random string function in general. People commonly want to
create random strings according to some format. Like CSRF tokens, ids,
etc.

So my thought was to drop password_make_salt() and instead add some
kind of generalized random_string() function:
// this is a 20 byte random binary string
$str = random_string(20);

// ten random hex characters
$str = random_string(10, "0123456789ABCDEF");

// 15 characters from the bcrypt alphabet 0-9a-zA-Z./
$str = random_string(15,
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ./");
// if it's not too hard to implement, one could support this kind
of shortcut:
$str = random_string(15, "0-9a-zA-Z./");

Thoughts?

Nikita

--

--
Andrew Faulds (AJF)
http://ajf.me/

12 years ago by Anthony Ferrara — view source

unread

I like the concept in principle. But implementing it is non trivial.

First, you need a base-conversion function that will allow you to convert
between arbitrary bases (base_convert() won't work, because it only works
on fixed bases, and on numbers < INT_MAX)... Here's a utility class that
does just that:
https://github.com/ircmaxell/PHP-CryptLib/blob/master/lib/CryptLib/Core/BaseConverter.php

It works on arrays internally, since they are easier to work with in PHP,
but in C I would make it work with a char* array instead...

As far as the implementation itself, I would also add a third parameter for
crypto_safe. We could take mcrypt_create_iv's approach, and use DEV
constants:

// Crypto Secure
random_string(24, "chars", DEV_RANDOM);

// Crypto Strong, But Not Secure
random_string(24, "chars", DEV_URANDOM);

// Non-Crypto
random_string(24, "chars", DEV_RAND);

Having it default to DEV_RAND...

If this is something that's desired, I can update the password
implementation to include this change (since it depends on a function like
this internally)...

Anthony

On Mon, Jul 16, 2012 at 9:58 AM, Andrew Faulds ajfweb@googlemail.comwrote:

This sounds very useful. To make it easier to use, why not also add
some string constants, something like CHARS_HEX, CHARS_BASE64,
CHARS_DECIMAL, etc? Then you could just do random_string(24, CHARS_HEX); to get a 24-char hex string.
Hi all,

I just want to throw a quick thought in here:

The password API proposal includes a function called
password_make_salt(), that basically creates a random string, either
in raw binary form, or in the bcrypt salt format. Personally I don't
see much use for the function in the salt context as the password API
already generates the salt all by itself, but I do see a lot of use
for a random string function in general. People commonly want to
create random strings according to some format. Like CSRF tokens, ids,
etc.

So my thought was to drop password_make_salt() and instead add some
kind of generalized random_string() function:
// this is a 20 byte random binary string
$str = random_string(20);

// ten random hex characters
$str = random_string(10, "0123456789ABCDEF");

// 15 characters from the bcrypt alphabet 0-9a-zA-Z./
$str = random_string(15,
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ./");
// if it's not too hard to implement, one could support this kind
of shortcut:
$str = random_string(15, "0-9a-zA-Z./");

Thoughts?

Nikita

--
--
Andrew Faulds (AJF)
http://ajf.me/

12 years ago by keisial@gmail.com — view source

unread

If this is something that's desired, I can update the password
implementation to include this change (since it depends on a function like
this internally)...

Anthony
Looks good.

12 years ago by Alex Aulbach — view source

unread

I like it. I've looked in some code and found about 8
password-generation-functions. 4 of them have more or less the same
idea behind.

The rest generates more complicated password. E.g. "minimum one
digit", "First letter must be alphabetic". This is easy to implement.
Some generate passwords from syllables (don't ask, no one does that
anymore).

Three suggestions:

1a) If you want to support character classes, you can do it with pcre:
http://www.php.net/manual/en/regexp.reference.character-classes.php

The idea is the following:

pseudofunction random_string($len, $characters)
{
....
$set = '';
if ($characters "look like a RE consisting of just one character-class") {
foreach ($charset as $char) {
// If the regex matches one of the chars, it is in the character class!
if (preg_match($characters, $char)) {
// add char to $set
$set .= $char;
}
}
} else {
$set = $characters;
}
....

-- "look like RE consisting of just one character-class" : something
like "/^/[.]/[^/]$/s" - not tested this, but explained: search
for "/[...]/...". Some cases here are untested ([, ] and so on), needs
more thinking, when I have time, but will be enough for prove of
concept. Making it easier is always possible.
-- $charset : The chars from 0 to 255.

With this you can avoid to parse or define the character-classes
yourself and it is normally fast enough. If you want to have it faster
see suggestion 3.

1b) And it has some more functionality: For germans the alphabet
constists out of 30 chars. PCRE normally considers this! [:alpha:] for
german locals differs from [:alpha:] for english.

Is this wanted? I think, the localisation should be by default off;
nobody really needs to generate passwords with umlauts.

1c) For the standard cases like "a-zA-Z0-9" etc. constants could be useful.

Whats about Unicode? Do Japanese people want to have japanese passwords?

Because generating a string from character-classes is very handy in
general for some other things (many string functions have it), I
suggest that it is not part of random_string(). Make a new function
str_from_character_class(), or if you use pcre like above
pcre_str_from_character_class()?

--
Alex Aulbach

12 years ago by Andrew Faulds — view source

unread

I like it. I've looked in some code and found about 8
password-generation-functions. 4 of them have more or less the same
idea behind.

The rest generates more complicated password. E.g. "minimum one
digit", "First letter must be alphabetic". This is easy to implement.
Some generate passwords from syllables (don't ask, no one does that
anymore).

Three suggestions:

1a) If you want to support character classes, you can do it with pcre:
http://www.php.net/manual/en/regexp.reference.character-classes.php

The idea is the following:

pseudofunction random_string($len, $characters)
{
....
$set = '';
if ($characters "look like a RE consisting of just one character-class") {
foreach ($charset as $char) {
// If the regex matches one of the chars, it is in the character class!
if (preg_match($characters, $char)) {
// add char to $set
$set .= $char;
}
}
} else {
$set = $characters;
}
....

-- "look like RE consisting of just one character-class" : something
like "/^/[.]/[^/]$/s" - not tested this, but explained: search
for "/[...]/...". Some cases here are untested ([, ] and so on), needs
more thinking, when I have time, but will be enough for prove of
concept. Making it easier is always possible.
-- $charset : The chars from 0 to 255.

With this you can avoid to parse or define the character-classes
yourself and it is normally fast enough. If you want to have it faster
see suggestion 3.

1b) And it has some more functionality: For germans the alphabet
constists out of 30 chars. PCRE normally considers this! [:alpha:] for
german locals differs from [:alpha:] for english.

Is this wanted? I think, the localisation should be by default off;
nobody really needs to generate passwords with umlauts.

1c) For the standard cases like "a-zA-Z0-9" etc. constants could be useful.

Whats about Unicode? Do Japanese people want to have japanese passwords?

No, Japanese and Chinese are entered using IMEs and would be
impractical to use in passwords.

Russian though, maybe. However I think most passwords are alphanumeric.

Besides, this isn't to generate passwords, it's to generate salts and
other random strings.

Because generating a string from character-classes is very handy in
general for some other things (many string functions have it), I
suggest that it is not part of random_string(). Make a new function
str_from_character_class(), or if you use pcre like above
pcre_str_from_character_class()?

--
Alex Aulbach

--
Andrew Faulds (AJF)
http://ajf.me/

12 years ago by keisial@gmail.com — view source

unread

I like it. I've looked in some code and found about 8
password-generation-functions. 4 of them have more or less the same
idea behind.

The rest generates more complicated password. E.g. "minimum one
digit", "First letter must be alphabetic". This is easy to implement.
Some generate passwords from syllables (don't ask, no one does that
anymore).

Three suggestions:

1a) If you want to support character classes, you can do it with pcre:
http://www.php.net/manual/en/regexp.reference.character-classes.php

(...)

-- "look like RE consisting of just one character-class" : something
like "/^/[.]/[^/]$/s" - not tested this, but explained: search
for "/[...]/...". Some cases here are untested ([, ] and so on), needs
more thinking, when I have time, but will be enough for prove of
concept. Making it easier is always possible.
-- $charset : The chars from 0 to 255.

With this you can avoid to parse or define the character-classes
yourself and it is normally fast enough. If you want to have it faster
see suggestion 3.
That's more or less what I have thought.
If it's a string surrounded by square brackets, it's a character class,
else
treat as a literal list of characters.
] and - can be provided with the old trick of provide "] as first
character",
"make - the first or last one".

Quite easy to implement, however you can get into problems when dealing
with multiple locales. For instance, if the string is in utf-8, you
don't want
to randomly choose the first byte and then an ascii character.
Maybe there should be a parameter for string encoding.
Having to detect character limits makes it uglier.

1b) And it has some more functionality: For germans the alphabet
constists out of 30 chars. PCRE normally considers this! [:alpha:] for
german locals differs from [:alpha:] for english.

Is this wanted? I think, the localisation should be by default off;
nobody really needs to generate passwords with umlauts.
Not something to use as default. You don't want to provide users passwords
with characters they can't type.

About supporting POSIX classes, that could be cool. But you then need a way
to enumerate them. Note that isalpha() will be provided by the C
library, so you
can't count on having its data. It's possible that PCRE, which we bundle,
contains the needed unicode tables.

Because generating a string from character-classes is very handy in
general for some other things (many string functions have it), I
suggest that it is not part of random_string(). Make a new function
str_from_character_class(), or if you use pcre like above
pcre_str_from_character_class()?
How would you use such function? If you want to make a string out of them,
you would use this new str_random(). If you want to verify if a given
character
matches a class, you have preg_match(). If you want one arbitrary
character from
that class, just call str_random() with a length of 1.

12 years ago by Pierre Joye — view source

unread

hi,

About supporting POSIX classes, that could be cool. But you then need a way
to enumerate them. Note that isalpha() will be provided by the C
library, so you
can't count on having its data. It's possible that PCRE, which we bundle,
contains the needed unicode tables.

If anything, then ICU data. POSIX is the worst thing ever when it
comes to locale support.

Cheers,

Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

12 years ago by Alex Aulbach — view source

unread

2012/7/16 Ángel González keisial@gmail.com:

1a) If you want to support character classes, you can do it with pcre:
http://www.php.net/manual/en/regexp.reference.character-classes.php

That's more or less what I have thought.
If it's a string surrounded by square brackets, it's a character class,
else
treat as a literal list of characters.
] and - can be provided with the old trick of provide "] as first
character",
"make - the first or last one".

Right thought. But introducing a new scheme of character-class
identificators or a new kind of describing character-classes is
confusing. As PHP developer I think "Oh no, not again new magic
charsets".

I suggest again to use PCRE for that. The difference to your proposal
is not so big. Examples:

"/[[:alnum:]]/" will return "abc...XYZ0123456789". We can do this also
with "/[a-zA-Z0-9]/". Or "/[a-z0-9]/i". Or "/[[:alpha:][:digit:]]/"

You see: You can do things in much more different ways with PCRE. And
you continue to use this "standard".

[And PCRE supports UTF8. Currently not important. But who knows?]

And maybe we can think about removing the beginning "/[" and the
ending "]/", but a "/" at the end should be optionally possible to add
some regex-parameters (like "/i").

Having to detect character limits makes it uglier.

Exactly. That's why I think we need not so much magic to the second
parameter. The character-list is just a list of characters. No magic.
We can extent this with a third parameter to tell the function from
which charset it is. And maybe a fourth to tell the random-algorithm,
but I think it's eventually better to have a function for each
algorithm, because that's the way how random currently works.

If I should write it with php this looks like that:

pseudofunction str_random($len, $characters, $encoding = 'ASCII', $algo)
{
$result = '';
$chlen = mb_strlen($characters,$encoding);
for ($i = 0; $i < $len; $i++) {
$result .= mb_substr($characters, myrandom(0, $chlen, $algo),1);
}
return $result;
}

Without testing anything. It's just an idea.

This is a working php-function, but $encoding doesn't work (some
stupid error?) and not using $algo:

function str_random($len, $characters, $encoding = 'ASCII', $algo = null)
{
$result = '';
$chlen = mb_strlen($characters,$encoding);
for ($i = 0; $i < $len; $i++) {
$result .= mb_substr($characters, rand(0, $chlen),1);
}
return $result;
}

About supporting POSIX classes, that could be cool. But you then need a way
to enumerate them. Note that isalpha() will be provided by the C
library, so you
can't count on having its data. It's possible that PCRE, which we bundle,
contains the needed unicode tables.

It works without thinking as above written in PHP code, but I dunno if
this could be done in C equally.

Because generating a string from character-classes is very handy in
general for some other things (many string functions have it), I
suggest that it is not part of random_string(). Make a new function
str_from_character_class(), or if you use pcre like above
pcre_str_from_character_class()?
How would you use such function? If you want to make a string out of them,

Oh, there are many cases to use it.

For example (I renamed the function to "str_charset()", because it is
just a string of a charset):

// Search spacer strings
strpbrk ("Hello World", str_charset('/[\s]/'));

// remove invisible chars at begin or end (not very much sense,
because a regex in this case is maybe faster)
trim("\rblaa\n", str_charset('/[^[:print:]]/'));

// remove invisible chars: when doing this with very big strings it
could be much faster than with regex.
str_replace(str_split(str_charset('/[^[:print:]]/')), "\rblaa\n");

There are many other more or less useful things you can do with a
charset-string. :)

--
Alex Aulbach

12 years ago by keisial@gmail.com — view source

unread

That's more or less what I have thought.
If it's a string surrounded by square brackets, it's a character class,
else
treat as a literal list of characters.
] and - can be provided with the old trick of provide "] as first
character",
"make - the first or last one".
Right thought. But introducing a new scheme of character-class
identificators or a new kind of describing character-classes is
confusing. As PHP developer I think "Oh no, not again new magic
charsets".
Not really new. Those escapings is how you had to work with them in
character classes of traditional regular expressions.
But I agree it can be confusing. What about a flag parameter, then?

I suggest again to use PCRE for that. The difference to your proposal
is not so big. Examples:

"/[[:alnum:]]/" will return "abc...XYZ0123456789". We can do this also
with "/[a-zA-Z0-9]/". Or "/[a-z0-9]/i". Or "/[[:alpha:][:digit:]]/"

You see: You can do things in much more different ways with PCRE. And
you continue to use this "standard".

[And PCRE supports UTF8. Currently not important. But who knows?]

And maybe we can think about removing the beginning "/[" and the
ending "]/", but a "/" at the end should be optionally possible to add
some regex-parameters (like "/i").
Those could be in the flag. The / are not really needed, they are an
additional
syntax over regex provided by PHP (and the character can be a different
one,
although usually / is picked).

Having to detect character limits makes it uglier.
Exactly. That's why I think we need not so much magic to the second
parameter. The character-list is just a list of characters. No magic.
We can extent this with a third parameter to tell the function from
which charset it is. And maybe a fourth to tell the random-algorithm,
but I think it's eventually better to have a function for each
algorithm, because that's the way how random currently works.

If I should write it with php this looks like that:

pseudofunction str_random($len, $characters, $encoding = 'ASCII', $algo)
{
$result = '';
$chlen = mb_strlen($characters,$encoding);
for ($i = 0; $i < $len; $i++) {
$result .= mb_substr($characters, myrandom(0, $chlen, $algo),1);
}
return $result;
}

Without testing anything. It's just an idea.

This is a working php-function, but $encoding doesn't work (some
stupid error?) and not using $algo:

function str_random($len, $characters, $encoding = 'ASCII', $algo = null)
{
$result = '';
$chlen = mb_strlen($characters,$encoding);
for ($i = 0; $i < $len; $i++) {
$result .= mb_substr($characters, rand(0, $chlen),1);
}
return $result;
}

About supporting POSIX classes, that could be cool. But you then need a way
to enumerate them. Note that isalpha() will be provided by the C
library, so you
can't count on having its data. It's possible that PCRE, which we bundle,
contains the needed unicode tables.
It works without thinking as above written in PHP code, but I dunno if
this could be done in C equally.
The above code doesn't support POSIX character classes, just picking
characters
out of a string (which I agree is simple).

Because generating a string from character-classes is very handy in
general for some other things (many string functions have it), I
suggest that it is not part of random_string(). Make a new function
str_from_character_class(), or if you use pcre like above
pcre_str_from_character_class()?
How would you use such function? If you want to make a string out of them,
Oh, there are many cases to use it.

For example (I renamed the function to "str_charset()", because it is
just a string of a charset):

// Search spacer strings
strpbrk ("Hello World", str_charset('/[\s]/'));
So you're expanding all spacing characters, then iterating over them
with strpbrk(),
a preg_match() would have been more efficient.

// remove invisible chars at begin or end (not very much sense,
because a regex in this case is maybe faster)
trim("\rblaa\n", str_charset('/[^[:print:]]/'));

// remove invisible chars: when doing this with very big strings it
could be much faster than with regex.
str_replace(str_split(str_charset('/[^[:print:]]/')), "\rblaa\n");
I don't see why expanding to a string, then converting to an array to
finally str_replace
would be faster :S
Also, that str_split() for all non-printable characters (even
considering that you
wouldn't get out of the memory limit with the many unicode chars you
will meet)
will fail with codepoints > 127 (str_split works on bytes)

There are many other more or less useful things you can do with a
charset-string. :)
I'm not really convinced it's the right way to do them :)

12 years ago by Alex Aulbach — view source

unread

2012/7/17 Ángel González keisial@gmail.com:

Those could be in the flag. The / are not really needed, they are an
additional syntax over regex provided by PHP (and the character can be a different

makes it a little bit like Perl. I see this as a "standard". So for me
a regex is with delimiters, even if the lib doesn't need them.

one, although usually / is picked).

Think, it comes from SED. I usaly use '#' - better visibility. :)

The above code doesn't support POSIX character classes, just picking
characters out of a string (which I agree is simple).

PCRE uses the Posix classes...

[btw. off topic: I read thrugh http://www.pcre.org/pcre.txt - meiomei,
they have introduced things here... "backtracking control", "path
recording"... incredible]

// Search spacer strings
strpbrk ("Hello World", str_charset('/[\s]/'));
So you're expanding all spacing characters, then iterating over them
with strpbrk(),
a preg_match() would have been more efficient.

Of course. Every example could be done more efficient with regex. It's
not the point! Once used, str_charset() is "ready", the result can be
cached and reused for much more things. And then it's faster than
pcre, even ready compiled.

Hm.

Maybe back to the roots? Using range() is not so complicated:

implode('',
array_merge(
range('a','z'),
range('A','Z'),
range('0','9'))
))

<shrug> So I think, if we don't need charset-encoding, we won't need
this special functionality.

--
Alex Aulbach