[RFC] [DISCUSSION] Reliable user-land CSPRNG

10 years ago by Tom Worster — view source

unread

I welcome the proposal for an easy-to-use PHP function for obtaining
crypto-secure randomness. I have a number of comments and suggestions.

I think the function name(s) should indicate that these functions are
for getting crypto-secure randomness. I proposed cs_random_bytes()
previously (https://wiki.php.net/rfc/csrandombytes) and I still it
works. First, it's important that users understand this distinction.
Second, given all the shared environments, it's good for the world at
large if people aren't draining the operating system's "entropy pool"
for work that doesn't need crypto-secure randomness, e.g. Monte Carlo
simulations that are better served with a different source.

CS random strings are often required but I haven't ever seen
requirements for arbitrary alphabets, charsets and encodings. In Yii
we provided a method that returns a string using the 64-character set
[a-zA-Z0-9_-] which is nice because they are all transparent in URLs.
There are many uses for such strings and it seems to meet the needs of
most users, as they haven't requested more flexibility.

I don't understand the requirement for crypto-secure random integers.
I have never encountered this requirement. [Btw: the proposed patch
implements this function using a loop that's not guaranteed to
terminate in any given amount of time.]

I believe that simplicity is of paramount importance. I also believe
in only addressing the requirements of 90% of the users if addressing
the 10% specialists means complicating the API and adding potential
footguns.

For example, the number of users that actually need to do something
better than read from /dev/urandom is small. A user that is concerned
about the status of the system's "entropy pool" (whatever that might
mean) or that feels the need to check the "degree of seededness"
of the system's CSPRNG (again, whatever that might mean) is a very
specialized user. There's no need to cater to them in these "Reliable,
userfriendly RNG APIs". The (metaphorical) 1% can look after
themselves. Whatever its size, it's a small minority that genuinely
cannot accept the kind of randomness that /dev/urandom on Linux or
/dev/random on FreeBSD or OS X offer.

Further, the concepts of seededness of an RNG are very advanced
matters that are not well understood and that vary from one system to
another. Standardizing these semantics across platforms is hard. So
making these complications portable over different operating systems
is, I imagine, beyond difficult.

And if you aim to make an API that exposes such subtleties, you need
to be able to clearly explain in the manual what it means in both
technically accurate terms and in practical terms that a non-
specialist application developer can base a design decision on. I
certainly couldn't do that.

To, to summarize.

The requirement for a easy-to-use function to obtain crypto-secure
randomness is very clear. Has been for years in my view.
Name the functions so the crypto-secure feature is obvious, e.g.
cs_random_bytes()
The functions should not expose or allow selection of degrees of
"strength" of crypto-secureness (it's both a footgun and a semantic
tar pit). Just use the non-blocking system source and make a note in
the manual so the specialist users know what's going on.
A function to get a random text string drawn from the 64-byte
alphabet of URL-transparent chars is very useful.
Don't complicate the random string getter without first establishing
the requirement for such complication.
Don't offer a crypto-secure random integer getter unless the
requirement for such a thing is clear.

Otherwise, great stuff, good luck!

Tom

10 years ago by Anthony Ferrara — view source

unread

Tom,

I welcome the proposal for an easy-to-use PHP function for obtaining
crypto-secure randomness. I have a number of comments and suggestions.

I think the function name(s) should indicate that these functions are
for getting crypto-secure randomness. I proposed cs_random_bytes()
previously (https://wiki.php.net/rfc/csrandombytes) and I still it
works. First, it's important that users understand this distinction.
Second, given all the shared environments, it's good for the world at
large if people aren't draining the operating system's "entropy pool"
for work that doesn't need crypto-secure randomness, e.g. Monte Carlo
simulations that are better served with a different source.

CS random strings are often required but I haven't ever seen
requirements for arbitrary alphabets, charsets and encodings. In Yii
we provided a method that returns a string using the 64-character set
[a-zA-Z0-9_-] which is nice because they are all transparent in URLs.
There are many uses for such strings and it seems to meet the needs of
most users, as they haven't requested more flexibility.

I don't understand the requirement for crypto-secure random integers.
I have never encountered this requirement. [Btw: the proposed patch
implements this function using a loop that's not guaranteed to
terminate in any given amount of time.]

I believe that simplicity is of paramount importance. I also believe
in only addressing the requirements of 90% of the users if addressing
the 10% specialists means complicating the API and adding potential
footguns.

For example, the number of users that actually need to do something
better than read from /dev/urandom is small. A user that is concerned
about the status of the system's "entropy pool" (whatever that might
mean) or that feels the need to check the "degree of seededness"
of the system's CSPRNG (again, whatever that might mean) is a very
specialized user. There's no need to cater to them in these "Reliable,
userfriendly RNG APIs". The (metaphorical) 1% can look after
themselves. Whatever its size, it's a small minority that genuinely
cannot accept the kind of randomness that /dev/urandom on Linux or
/dev/random on FreeBSD or OS X offer.

Further, the concepts of seededness of an RNG are very advanced
matters that are not well understood and that vary from one system to
another. Standardizing these semantics across platforms is hard. So
making these complications portable over different operating systems
is, I imagine, beyond difficult.

And if you aim to make an API that exposes such subtleties, you need
to be able to clearly explain in the manual what it means in both
technically accurate terms and in practical terms that a non-
specialist application developer can base a design decision on. I
certainly couldn't do that.

To, to summarize.

The requirement for a easy-to-use function to obtain crypto-secure
randomness is very clear. Has been for years in my view.

Agree 100%.

Name the functions so the crypto-secure feature is obvious, e.g.
cs_random_bytes()

I'm less sure on this point. I think people will get confused "what's
the difference between mt_rand and cs_random?" and then just use
rand() anyway.

I think the way we solve this is though documentation instead. Keep
the name simple, and document it well...

The functions should not expose or allow selection of degrees of
"strength" of crypto-secureness (it's both a footgun and a semantic
tar pit). Just use the non-blocking system source and make a note in
the manual so the specialist users know what's going on.

Agree 100%

A function to get a random text string drawn from the 64-byte
alphabet of URL-transparent chars is very useful.

Don't complicate the random string getter without first establishing
the requirement for such complication.

Don't offer a crypto-secure random integer getter unless the
requirement for such a thing is clear.

Well, there are cases such as UUID generation, contests, etc which
need to pick numbers in a defined range.

Also, if it's a drop-in replacement for mt_rand, so existing could
could be migrated without having to be rewritten (a bonus for legacy
users).

Anthony

10 years ago by Stanislav Malyshev — view source

unread

Hi!

For example, the number of users that actually need to do something
better than read from /dev/urandom is small. A user that is concerned

Good summary read on the topic: http://www.2uo.de/myths-about-urandom/
TLDR: it's ok to use /dev/urandom.

--
Stas Malyshev
smalyshev@gmail.com

10 years ago by Tom Worster — view source

unread

Good summary read on the topic: http://www.2uo.de/myths-about-urandom/
TLDR: it's ok to use /dev/urandom.

Yes! Thanks for the link. Much shorter but with pretty much the same
message, I like:
http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/

The Linux RNG design and especially the urandom(4) man page has caused a

lot of trouble. I wonder how many more years before we can put it
behind us.

Tom

10 years ago by Leigh — view source

unread

Hi!

For example, the number of users that actually need to do something
better than read from /dev/urandom is small. A user that is concerned

Good summary read on the topic: http://www.2uo.de/myths-about-urandom/
TLDR: it's ok to use /dev/urandom.

Yes, it's OK. And it is what will be used on the overwhelming majority
of platforms.

10 years ago by Leigh — view source

unread

Hey Tom,

I don't understand the requirement for crypto-secure random integers.
I have never encountered this requirement. [Btw: the proposed patch
implements this function using a loop that's not guaranteed to
terminate in any given amount of time.]

That's true, but you're going to have to be really unlucky for it to
have any significant impact. The worst case for a re-roll comes with a
probability of 0.5, and best case is 0.0. I did actually notice that
I've made the implementation worst-case by default when catering for a
negative lower bound. It's on my list to see how I can improve it.

For example, the number of users that actually need to do something
better than read from /dev/urandom is small. A user that is concerned
about the status of the system's "entropy pool" (whatever that might
mean) or that feels the need to check the "degree of seededness"
of the system's CSPRNG (again, whatever that might mean) is a very
specialized user. There's no need to cater to them in these "Reliable,
userfriendly RNG APIs". The (metaphorical) 1% can look after
themselves. Whatever its size, it's a small minority that genuinely
cannot accept the kind of randomness that /dev/urandom on Linux or
/dev/random on FreeBSD or OS X offer.

On modern OpenBSD/FreeBSD/OSX /dev/random and /dev/urandom are both
aliases of /dev/arandom, which is quite literally an inexhaustible
supply of CS random backed by arc4random. On Linux I think you'll be
hard pressed to exhaust /dev/urandom from a minute or two after boot.
We're still discussing whether a userland (non-kernel userland)
implementation of arc4random is wise. We need to put some feelers out
and consult some experts on this, but if we do implement it, the same
is true. Inexhaustible CS random. Otherwise as mentioned, /dev/urandom
is going to be good enough.

Further, the concepts of seededness of an RNG are very advanced
matters that are not well understood and that vary from one system to
another. Standardizing these semantics across platforms is hard. So
making these complications portable over different operating systems
is, I imagine, beyond difficult.

At present I think we've picked the most appropriate sources and
precedence for these sources, without getting into the realm of a
portability nightmare.

And if you aim to make an API that exposes such subtleties, you need
to be able to clearly explain in the manual what it means in both
technically accurate terms and in practical terms that a non-
specialist application developer can base a design decision on. I
certainly couldn't do that.

We certainly do not want to expose this, or allow the user to choose.

To, to summarize.

The requirement for a easy-to-use function to obtain crypto-secure
randomness is very clear. Has been for years in my view.

Name the functions so the crypto-secure feature is obvious, e.g.
cs_random_bytes()

We thought the current naming was enough to distinguish between the
well known (to be bad) rand(), and true random. As Anthony says, we
don't want to use a prefix that's meaningless to the majority. I'd be
open to renaming with a "crypto_" prefix if it is deemed necessary.

The functions should not expose or allow selection of degrees of
"strength" of crypto-secureness (it's both a footgun and a semantic
tar pit). Just use the non-blocking system source and make a note in
the manual so the specialist users know what's going on.

Absolutely. We pick the best available so that the user does not have
to. If there is no source of CS random available, the function issues
a warning and returns false.

A function to get a random text string drawn from the 64-byte
alphabet of URL-transparent chars is very useful.

We've tossed this idea around a bit OTR, a potential function called
random_token(). We need to somehow make it clear that this is not to
be used for keys or IVs, but it's a good fit for things like session
IDs or CSRF tokens.

Don't offer a crypto-secure random integer getter unless the
requirement for such a thing is clear.

The random_int function will always provide an unbiased result. An
example use-case brought up recently was in the case of online games,
where a bias could give players an upper hand if they knew the
subtleties of the underlying RNG.

Thank you for your feedback!

Leigh.

10 years ago by Tom Worster — view source

unread

Hi Leigh,

We're still discussing whether a userland (non-kernel userland)
implementation of arc4random is wise. We need to put some feelers out
and consult some experts on this,

I wouldn't. As Thomas Patcek put it, quoting the article I linked
before:

You want to use the kernel's CSPRNG, because:

The kernel has access to raw device entropy.
It can promise not to share the same state between applications.
A good kernel CSPRNG, like FreeBSD's, can also promise not to feed
you random data before it's seeded.

Study the last ten years of randomness failures and you'll read
a litany of userspace randomness failures. Debian's OpenSSH debacle?
Userspace random. Android Bitcoin wallets repeating ECDSA k's?
Userspace random. Gambling sites with predictable shuffles? Userspace
random.

Userspace generators almost always depend on the kernel's generator
anyways. Even if they don't, the security of your whole system sure
does. A userspace CSPRNG doesn't add defense-in-depth; instead, it
creates two single points of failure.

10 years ago by Anthony Ferrara — view source

unread

Tom

Hi Leigh,

We're still discussing whether a userland (non-kernel userland)
implementation of arc4random is wise. We need to put some feelers out
and consult some experts on this,

I wouldn't. As Thomas Patcek put it, quoting the article I linked
before:

You want to use the kernel's CSPRNG, because:

The kernel has access to raw device entropy.

It can promise not to share the same state between applications.

A good kernel CSPRNG, like FreeBSD's, can also promise not to feed
you random data before it's seeded.

Study the last ten years of randomness failures and you'll read
a litany of userspace randomness failures. Debian's OpenSSH debacle?
Userspace random. Android Bitcoin wallets repeating ECDSA k's?
Userspace random. Gambling sites with predictable shuffles? Userspace
random.

Userspace generators almost always depend on the kernel's generator
anyways. Even if they don't, the security of your whole system sure
does. A userspace CSPRNG doesn't add defense-in-depth; instead, it
creates two single points of failure.

The question came up while discussing potential bandwidth limitations
of /dev/urandom and things like file descriptor exhaustion.

So one of the thoughts we bounced was to use an internal arc4random
implementation that was seeded the first use, and then re-seeded
periodically (every x bytes or y seconds).

Another thought to avoid the file descriptor issue was in newer
kernels just tie in the linux getrandom(2) syscall and BSD
getentropy(2) syscall directly. But based on getentropy(2)'s docs it
specifies that it's only meant to seed an arc4 generator, not provide
a stream of bytes.

Overall the complexity prob isn't worth it (as you state). But the
abstraction also doesn't depend on these details (so we can always
change it later).

Thanks

Anthony

10 years ago by Leigh — view source

unread

Might as well mention it because it has been discussed OTR.

We've thrown the idea around that we could cater for other sources of
random via a PHP extension. (I.e. if someone has a particular hardware
RNG they want to use). We're concerned that this may be misused, or
even abused as a way of deliberately weakening the source of random.
Our current preference is to not include this as a feature.

10 years ago by Pierre Joye — view source

unread

On modern OpenBSD/FreeBSD/OSX /dev/random and /dev/urandom are both
aliases of /dev/arandom, which is quite literally an inexhaustible
supply of CS random backed by arc4random. On Linux I think you'll be
hard pressed to exhaust /dev/urandom from a minute or two after boot.
We're still discussing whether a userland (non-kernel userland)
implementation of arc4random is wise. We need to put some feelers out
and consult some experts on this, but if we do implement it, the same
is true. Inexhaustible CS random. Otherwise as mentioned, /dev/urandom
is going to be good enough.

Do you mean to maintain our own CSRNG based on arc4random (algorithm)?
I seriously hope it is not the case.

Further, the concepts of seededness of an RNG are very advanced
matters that are not well understood and that vary from one system to
another. Standardizing these semantics across platforms is hard. So
making these complications portable over different operating systems
is, I imagine, beyond difficult.

At present I think we've picked the most appropriate sources and
precedence for these sources, without getting into the realm of a
portability nightmare.

Minus the entropy setting, which should be system or create one a
sysadmin can actually change according to its need. The use cases I
mentioned before are not only about some labs R&D or other edge cases,
there are actually quite a large amount of users that would benefit
from that. Mass hosting is one and the trend of the densification of
app servers using containers only increase that need, drastically.

Providing such setting (system, not INI_ALL) won't affect the exposed
APIs in any way nor make the API itself less "secure".

And if you aim to make an API that exposes such subtleties, you need
to be able to clearly explain in the manual what it means in both
technically accurate terms and in practical terms that a non-
specialist application developer can base a design decision on. I
certainly couldn't do that.

We certainly do not want to expose this, or allow the user to choose.

It is a system decision, not a developer (which are most likely no
idea about it). Systems usually do.

To, to summarize.

The requirement for a easy-to-use function to obtain crypto-secure
randomness is very clear. Has been for years in my view.

Name the functions so the crypto-secure feature is obvious, e.g.
cs_random_bytes()

We thought the current naming was enough to distinguish between the
well known (to be bad) rand(), and true random.

I hope as well you are not going to deprecate (or whatever else close)
these functions. Move the note about them not being cryptosafe as a
big red warning down the function signature (in the manual) should be
enough.

As Anthony says, we
don't want to use a prefix that's meaningless to the majority. I'd be
open to renaming with a "crypto_" prefix if it is deemed necessary.

I'd to go with that, or actually random_crypto_*. While the only one
actually accurately CS will be random_bytes.

The functions should not expose or allow selection of degrees of
"strength" of crypto-secureness (it's both a footgun and a semantic
tar pit). Just use the non-blocking system source and make a note in
the manual so the specialist users know what's going on.

Absolutely. We pick the best available so that the user does not have
to. If there is no source of CS random available, the function issues
a warning and returns false.

it should not even build. As far as I remember it is what happens with
session.entropy*

A function to get a random text string drawn from the 64-byte
alphabet of URL-transparent chars is very useful.

We've tossed this idea around a bit OTR, a potential function called
random_token(). We need to somehow make it clear that this is not to
be used for keys or IVs, but it's a good fit for things like session
IDs or CSRF tokens.

<hint>session.entropysource...</hint> :)

We should duplicate features, settings. I said that back then while
asking for a general entropy source settings we should use in PHP,
users can rely on (fopen back then) and upcoming new random functions
can use. That point did not change, we should still not duplicate a
feature and confuse users. This is the same thing.

Don't offer a crypto-secure random integer getter unless the
requirement for such a thing is clear.

The random_int function will always provide an unbiased result. An
example use-case brought up recently was in the case of online games,
where a bias could give players an upper hand if they knew the
subtleties of the underlying RNG.

Which can be possible in some cases. I had this situation in this
exact case, for games. I actually have it in offline games as well
where one was able to manage to get sequences out of many printed
random numbers and figure out the actual randomness (or lack of) of
some sequences due to the reduction of the entropy quality introduced
by integer ranges only. just for the story. He earned some money with
that, until the trick was discovered ;)

Cheers,

Pierre

@pierrejoye | http://www.libgd.org

10 years ago by Pierre Joye — view source

unread

hi,

I totally forgot to mention one thing in all my replies:

I love this addition and it is cruelly needed :)

I only have doubts about the implementations and the details I
explained in my other replies. It is also something that could be
added later.

The only thing I would vote no is if we start to implement our own
little CS function, that will be the worst idea ever, a disaster
waiting to explode in our poor faces :) But I may misinterpreted the
answer about that :)

Cheers,

Pierre

@pierrejoye | http://www.libgd.org

10 years ago by Sammy Kaye Powers — view source

unread

I don't know why everyone says the internals list is so scary - you guys
are great! :)

I think the function name(s) should indicate that these functions are for
getting crypto-secure randomness. I proposed cs_random_bytes()

I'm cool with that idea but I also think it should be spelled out like random_crypto_*() as Pierre suggests. I like secure_random_bytes() but
that's because it's what Ruby names their CSPRNG. :)
http://ruby-doc.org/stdlib-2.1.2/libdoc/securerandom/rdoc/SecureRandom.html
But really a moot point.

CS random strings are often required but I haven't ever seen requirements
for arbitrary alphabets, charsets and encodings. In Yii we provided a
method that returns a string using the 64-character set [a-zA-Z0-9_-] which
is nice because they are all transparent in URLs. There are many uses for
such strings and it seems to meet the needs of most users, as they haven't
requested more flexibility.

I actually started down this RFC path out of frustration on this very point
of needing secure random alphanumeric stings. The originally RFC & patch
contained a random_hex() function that would convert bytes from the
CSPRNG into hex. The use case that I have seen most needed in user-land is
in fact for random alphanumeric strings so that they can generate CSRF
tokens. Every CRUD app could be affected by this. So I'm still +1 for
having a built-in function to get back arbitrary alphanumeric strings. But
this can be done with bin2hex(random_bytes(16)) or base64_encode( random_bytes(16)) so I won't fight it too much. :)

I welcome the proposal for an easy-to-use PHP function for obtaining crypto-secure
randomness.

I love this addition and it is cruelly needed :)

Yay! Lots of love for a CSPRNG! :)

Thanks,
Sammy Kaye Powers
sammyk.me

230 S Clark St #194
Chicago, IL 60604

10 years ago by Stanislav Malyshev — view source

unread

Hi!

I'm cool with that idea but I also think it should be spelled out like random_crypto_*() as Pierre suggests. I like secure_random_bytes() but
that's because it's what Ruby names their CSPRNG. :)

The custom is that the first word names the function group (yes, I know
old functions do not follow it, but this is new one). Unless we're going
to introduce a group of functions called secure_, random_ is a natural
choice. I would be careful with using words like "secure", "crypto" etc.
in general because they may be easily misunderstood - something like
`random_bytes()` would do as well I think.

Stas Malyshev
smalyshev@gmail.com

10 years ago by Pierre Joye — view source

unread

On Thu, Feb 26, 2015 at 12:48 AM, Stanislav Malyshev
smalyshev@gmail.com wrote:

Hi!

I'm cool with that idea but I also think it should be spelled out like random_crypto_*() as Pierre suggests. I like secure_random_bytes() but
that's because it's what Ruby names their CSPRNG. :)

The custom is that the first word names the function group (yes, I know
old functions do not follow it, but this is new one). Unless we're going
to introduce a group of functions called secure_, random_ is a natural
choice. I would be careful with using words like "secure", "crypto" etc.
in general because they may be easily misunderstood - something like
random_bytes() would do as well I think.

I agree. It should (and it is the case in the RFC) starts with
random_. As of "crypto", it is something different here as it does
match what it actually does, provides crypto safe PRNG. And the term
"crypto safe" is a well defined term. Yes, many users confuse "good",
"strong" and "crypto safe", but this is a documentation and education
issue and we should not invent new wording for industry standards.

--
Pierre

@pierrejoye | http://www.libgd.org

10 years ago by Tom Worster — view source

unread

I don't know why everyone says the internals list is so scary - you guys
are great! :)

Clearly php-internals participants are all very fine people. I am
nevertheless scared brickless of php-internals, which is not the
same thing;)

I actually started down this RFC path out of frustration on this very
point of needing secure random alphanumeric stings. The originally RFC &
patch contained a random_hex() function that would convert bytes from
the CSPRNG into hex.

bin2hex(random_bytes(8)) is so easy i don't think a new shorthand
function is worth it.

The use case that I have seen most needed in user-land is in fact for
random alphanumeric strings so that they can generate CSRF tokens. Every
CRUD app could be affected by this. So I'm still +1 for having a built-in
function to get back arbitrary alphanumeric strings. But this can be done
with bin2hex(random_bytes(16)) or base64_encode(random_bytes(16)) so
I won't fight it too much. :)

Using a 64-character alphabet is a bit more involved and is needed
so often that we put it in Yii2's security class. base64 actually
uses 65 characters, 3 of which aren't transparent to URL encoding.

10 years ago by Leigh — view source

unread

I actually started down this RFC path out of frustration on this very
point of needing secure random alphanumeric stings. The originally RFC &
patch contained a random_hex() function that would convert bytes from
the CSPRNG into hex.

bin2hex(random_bytes(8)) is so easy i don't think a new shorthand
function is worth it.

I can't help but notice the output of this is 16 bytes.

Please, please tell me that you don't use the output of
bin2hex(random_bytes(8)) for a key or IV. This is so dangerous and I'm
actually worried about how many people actually do this.

Apologies if this is just a coincidence, but for the benefit of
anyone reading this, never ever ever (ever!), use an encoded value
for a key or IV. Not hex, not base64, not anything exotic you've
dreamed up to make things "more random". If you know a project doing
this, spread the word, this is something that needs to be fixed
through education.