[RFC] [DISCUSSION] Reliable user-land CSPRNG

10 years ago by Sammy Kaye Powers — view source

unread

The RFC to add a user-land API for an easy-to-use and reliable CSPRNG in
PHP is up for discussion: https://wiki.php.net/rfc/easy_userland_csprng

This proposes adding two methods: random_bytes() and random_int() that
return cryptographically secure pseudo-random data.

This has been quite a team effort so far and would love to hear your
feedback! :)

Thanks,
Sammy Kaye Powers
sammyk.me

230 S Clark St #194
Chicago, IL 60604

10 years ago by Andrey Andreev — view source

unread

Hi,

The RFC to add a user-land API for an easy-to-use and reliable CSPRNG in
PHP is up for discussion: https://wiki.php.net/rfc/easy_userland_csprng

This proposes adding two methods: random_bytes() and random_int() that
return cryptographically secure pseudo-random data.

This has been quite a team effort so far and would love to hear your
feedback! :)

I noticed that the patch checks for /dev/arandom availability first,
and I'm pretty sure that on systems that have it, /dev/urandom simply
redirects to /dev/urandom, so that might be a bit redundant ... Maybe
Leigh can say more about this if I'm missing something.

Also, you don't need 100s of lines of code to write the same thing in
userland ... you need ~30 lines, your Facebook SDK example is just
over-complicated. I'm sure everybody will agree that this is a feature
that PHP needs, so I think you should rather focus on explaining that
it's better than leaving it to userland implementations that may screw
up a lot of details.

And finally, a suggestion to remove the default $length value of 16
for random_bytes() - it just happens to be what you need for i.e. an
AES-128 IV, but other than that it doesn't make sense to have a
default length.

Otherwise - great! I'm really looking forward to this, and many others
surely do as well. I've got no doubt that the RFC will pass and I
intend to write a compat package for use in pre-PHP7 environments, to
ease the new API's adoption.

Cheers,
Andrey.

10 years ago by Leigh — view source

unread

Hi Andrey,

I noticed that the patch checks for /dev/arandom availability first,
and I'm pretty sure that on systems that have it, /dev/urandom simply
redirects to /dev/urandom, so that might be a bit redundant ... Maybe
Leigh can say more about this if I'm missing something.

You're absolutely right, on modern releases of systems like OpenBSD
and OSX /dev/urandom is simply an alias of /dev/arandom. The problem
is, I'm not an expert in every version of every OS, and it might
not always be the case that this aliasing exists. I'd also like to
think this adds an element of future-proofing. If I wish for it hard
enough, maybe one day Linux in general will introduce /dev/arandom,
but maybe at first /dev/urandom does not alias it until some time
later.

Also, you don't need 100s of lines of code to write the same thing in
userland ... you need ~30 lines, your Facebook SDK example is just
over-complicated. I'm sure everybody will agree that this is a feature
that PHP needs, so I think you should rather focus on explaining that
it's better than leaving it to userland implementations that may screw
up a lot of details.

I agree, we can make a succinct explanation that focuses on the
importance of "getting it right".

And finally, a suggestion to remove the default $length value of 16
for random_bytes() - it just happens to be what you need for i.e. an
AES-128 IV, but other than that it doesn't make sense to have a
default length.

This is just a badly formatted part of the RFC. There is no default
for random_bytes().

The defaults for random_int() are however +/- PHP_INT_MAX

Otherwise - great! I'm really looking forward to this, and many others
surely do as well. I've got no doubt that the RFC will pass and I
intend to write a compat package for use in pre-PHP7 environments, to
ease the new API's adoption.

Thanks :)

10 years ago by Pierre Joye — view source

unread

It should use the session.entropy_file setting as it aims to be the exact
same thing. It also allows custom entropy src (better ones for higher
demands) as well.

The RFC to add a user-land API for an easy-to-use and reliable CSPRNG in
PHP is up for discussion: https://wiki.php.net/rfc/easy_userland_csprng

This proposes adding two methods: random_bytes() and random_int() that
return cryptographically secure pseudo-random data.

This has been quite a team effort so far and would love to hear your
feedback! :)

Thanks,
Sammy Kaye Powers
sammyk.me

230 S Clark St #194
Chicago, IL 60604

10 years ago by Leigh — view source

unread

It should use the session.entropy_file setting as it aims to be the exact
same thing. It also allows custom entropy src (better ones for higher
demands) as well.

I disagree. We want to take responsibility away from the user to
choose the best source of entropy. The session.entropy_file setting
also does not allow arc4random to be used, which is a source of
cryptographic quality random without using a file descriptor.

In fact I had planned for a future RFC where we allow
session.entropy_file to use using random_bytes(). So the "best" source
is chosen automatically. (If you think there are better sources not
covered by this patch, please let me know, I would like it to be
complete)

There is an aspect of this that has been left for "future work", but
if the list thinks it is important I can implement it for this RFC.
The issue is that on Linux it still does not provide a way of getting
random bytes without using a file descriptor. This is important for a
couple of reasons, 1) It means chroot environments don't require
/dev/*random 2) it prevents fd-exhaustion attacks that force lower
quality randomness. LibreSSL-portable has a very good implementation
of this using the Linux getrandom syscall (Kernel >= 3.17) that I can
phpise and include if we think it is necessary.

10 years ago by Pierre Joye — view source

unread

It should use the session.entropy_file setting as it aims to be the
exact
same thing. It also allows custom entropy src (better ones for higher
demands) as well.

I disagree. We want to take responsibility away from the user to
choose the best source of entropy. The session.entropy_file setting
also does not allow arc4random to be used, which is a source of
cryptographic quality random without using a file descriptor.

Sorry, my reply could have been confusing.

I was not trying to say It should use only this setting. However it should
be able to use it. As it is a requirement and is part of the configure
requirements to begin with.

In fact I had planned for a future RFC where we allow
session.entropy_file to use using random_bytes(). So the "best" source
is chosen automatically. (If you think there are better sources not
covered by this patch, please let me know, I would like it to be
complete)

I rather prefer to be able to choose. Maybe make it system instead of
INI_ALL if the users ability to choose wisely are questionable.

There is an aspect of this that has been left for "future work", but
if the list thinks it is important I can implement it for this RFC.
The issue is that on Linux it still does not provide a way of getting
random bytes without using a file descriptor. This is important for a
couple of reasons, 1) It means chroot environments don't require
/dev/*random 2) it prevents fd-exhaustion attacks that force lower
quality randomness. LibreSSL-portable has a very good implementation
of this using the Linux getrandom syscall (Kernel >= 3.17) that I can
phpise and include if we think it is necessary.

It is all the same setting. What is done on windows can be applied on any
other platforms. Either use a path or a method, it just works fine.

There are many different RNG providers, daemons or other services. I have
seen customers using some of them (together) to generate enough data for
their needs (need a lot of entropy data). By enough data I mean to be 200%
sure that the entropy is good enough at any time no matter how much data is
being fetched.

I am also unsure about random_get_int, why only integer? It is also
important that doing so the results is per se not crypto safe anymore. But
still handy for codes required random integers (or other types). If it is
kept, I would also prefer to name it random_get_<type> instead, for clarity.

Cheers,

Pierre

10 years ago by Anthony Ferrara — view source

unread

Pierre,

In fact I had planned for a future RFC where we allow
session.entropy_file to use using random_bytes(). So the "best" source
is chosen automatically. (If you think there are better sources not
covered by this patch, please let me know, I would like it to be
complete)

I rather prefer to be able to choose. Maybe make it system instead of
INI_ALL if the users ability to choose wisely are questionable.

Why? Why should the user change their entropy location? What benefits
(especially from only a system perspective) does that provide to
users/sysadmins?

Besides letting an admin decrease the security of a security API?

There is an aspect of this that has been left for "future work", but
if the list thinks it is important I can implement it for this RFC.
The issue is that on Linux it still does not provide a way of getting
random bytes without using a file descriptor. This is important for a
couple of reasons, 1) It means chroot environments don't require
/dev/*random 2) it prevents fd-exhaustion attacks that force lower
quality randomness. LibreSSL-portable has a very good implementation
of this using the Linux getrandom syscall (Kernel >= 3.17) that I can
phpise and include if we think it is necessary.

It is all the same setting. What is done on windows can be applied on any
other platforms. Either use a path or a method, it just works fine.

There are many different RNG providers, daemons or other services. I have
seen customers using some of them (together) to generate enough data for
their needs (need a lot of entropy data). By enough data I mean to be 200%
sure that the entropy is good enough at any time no matter how much data is
being fetched.

Yes, and they all can feed into /dev/arandom and /dev/urandom. In fact
that's how those systems are designed to operate (taking in entropy
from many sources).

PERHAPS, it could be written in such a way that a PECL extension can
alter the RNG to accommodate that usecase. But I'd be wary of that and
core supporting userland RNGs.

I think one of the mistakes that password_hash made was exposing the
salt to the user. If you give users a choice with respect to security,
they will on average make it worse (based on plenty of code snippets
I've seen, that's the case with salt generation).

I think the case you have to look at here is the target audience. Are
you looking to be all things to all users? Or are you attempting to be
an opinionated tool to help the 99%. Along with password_hash, I think
this random library serves the 99%.

It shouldn't try to be all things to everyone, because then it will
confuse or complicate things for the 99%. The reality is we've been
lacking a simple API for years. Seeing what users implement in the
wild shows that.

To quote a phrase I coined years ago: It should be easier to get right
than to screw up. And that includes using existing APIs.

So if random_int() is harder than mt_rand(), it's a non-starter.

If random_bytes() is harder than uniqid(), it's a non-starter.

I am also unsure about random_get_int, why only integer? It is also
important that doing so the results is per se not crypto safe anymore. But
still handy for codes required random integers (or other types). If it is
kept, I would also prefer to name it random_get_<type> instead, for clarity.

How is it not crypto safe? It uses a non-biased algorithm using a
crypto-safe primitive.

As far as why only integer, a float can always be added to. But
strings are already supported, and what other types do you need?

That's my stance at least.

Anthony

10 years ago by Pierre Joye — view source

unread

Pierre,

In fact I had planned for a future RFC where we allow
session.entropy_file to use using random_bytes(). So the "best" source
is chosen automatically. (If you think there are better sources not
covered by this patch, please let me know, I would like it to be
complete)

I rather prefer to be able to choose. Maybe make it system instead of
INI_ALL if the users ability to choose wisely are questionable.

Why? Why should the user change their entropy location? What benefits
(especially from only a system perspective) does that provide to
users/sysadmins?

Besides letting an admin decrease the security of a security API?

Increase is the point and it is one of the reasons why we made a setting
for the session entropy.

There is an aspect of this that has been left for "future work", but
if the list thinks it is important I can implement it for this RFC.
The issue is that on Linux it still does not provide a way of getting
random bytes without using a file descriptor. This is important for a
couple of reasons, 1) It means chroot environments don't require
/dev/*random 2) it prevents fd-exhaustion attacks that force lower
quality randomness. LibreSSL-portable has a very good implementation
of this using the Linux getrandom syscall (Kernel >= 3.17) that I can
phpise and include if we think it is necessary.

It is all the same setting. What is done on windows can be applied on
any
other platforms. Either use a path or a method, it just works fine.

There are many different RNG providers, daemons or other services. I
have
seen customers using some of them (together) to generate enough data for
their needs (need a lot of entropy data). By enough data I mean to be
200%
sure that the entropy is good enough at any time no matter how much
data is
being fetched.

Yes, and they all can feed into /dev/arandom and /dev/urandom. In fact
that's how those systems are designed to operate (taking in entropy
from many sources).

Not always. And not always desired. Also as pointed out in a another reply
it could be an API (but that goes for a next step about this feature).

PERHAPS, it could be written in such a way that a PECL extension can
alter the RNG to accommodate that usecase. But I'd be wary of that and
core supporting userland RNGs.

Yes, driver based. That brings some risk but worth exploring this
possibility.

I think one of the mistakes that password_hash made was exposing the
salt to the user. If you give users a choice with respect to security,
they will on average make it worse (based on plenty of code snippets
I've seen, that's the case with salt generation).

I think the case you have to look at here is the target audience. Are
you looking to be all things to all users? Or are you attempting to be
an opinionated tool to help the 99%. Along with password_hash, I think
this random library serves the 99%.

It shouldn't try to be all things to everyone, because then it will
confuse or complicate things for the 99%. The reality is we've been
lacking a simple API for years. Seeing what users implement in the
wild shows that.

To quote a phrase I coined years ago: It should be easier to get right
than to screw up. And that includes using existing APIs.

So if random_int() is harder than mt_rand(), it's a non-starter.

If random_bytes() is harder than uniqid(), it's a non-starter.

I am also unsure about random_get_int, why only integer? It is also
important that doing so the results is per se not crypto safe anymore.
But
still handy for codes required random integers (or other types). If it
is
kept, I would also prefer to name it random_get_<type> instead, for
clarity.

How is it not crypto safe? It uses a non-biased algorithm using a
crypto-safe primitive.

As far as why only integer, a float can always be added to. But
strings are already supported, and what other types do you need?

You actually reduce the data set, bytes level or higher, the randomness of
the data is then restricted or limited and sequences may happen, worst case
it could make it less hard (or easier) to predict. I have seen these cases
in a couple of projects which rely heavily on entropy.

That's my stance at least.

Anthony
I

10 years ago by Leigh — view source

unread

PERHAPS, it could be written in such a way that a PECL extension can
alter the RNG to accommodate that usecase. But I'd be wary of that and
core supporting userland RNGs.

Yes, driver based. That brings some risk but worth exploring this
possibility.

We can make the function a pointer. That's not a problem. The problem
is when people assign their own function to this pointer :)

You actually reduce the data set, bytes level or higher, the randomness of
the data is then restricted or limited and sequences may happen, worst case
it could make it less hard (or easier) to predict. I have seen these cases
in a couple of projects which rely heavily on entropy.

If you need very high quality and high throughput entropy I can add
that to this patch. I left it out for now, because I didn't want this
to become over-complicated. If not having an fd-less crypto-quality
high throughput is a show stopper for you then let me know, we can fix
this.

10 years ago by Pierre Joye — view source

unread

PERHAPS, it could be written in such a way that a PECL extension can
alter the RNG to accommodate that usecase. But I'd be wary of that and
core supporting userland RNGs.

Yes, driver based. That brings some risk but worth exploring this
possibility.

We can make the function a pointer. That's not a problem. The problem
is when people assign their own function to this pointer :)

You actually reduce the data set, bytes level or higher, the randomness of
the data is then restricted or limited and sequences may happen, worst case
it could make it less hard (or easier) to predict. I have seen these cases
in a couple of projects which rely heavily on entropy.

If you need very high quality and high throughput entropy I can add
that to this patch. I left it out for now, because I didn't want this
to become over-complicated. If not having an fd-less crypto-quality
high throughput is a show stopper for you then let me know, we can fix
this.

It is only about the amount of data. The trend to have bunch amount of
apps running on the same physical host, it can exhaust the entropy
quickly.

10 years ago by padraic.brady@gmail.com — view source

unread

If random_bytes() is harder than uniqid(), it's a non-starter.

Technically, it will be harder than uniqid() if producing strictly
random bytes (if output needs to be printable/readable).
That's not a "bad" thing obviously!

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com

10 years ago by Anthony Ferrara — view source

unread

Padraic,

Hi

If random_bytes() is harder than uniqid(), it's a non-starter.

Technically, it will be harder than uniqid() if producing strictly
random bytes (if output needs to be printable/readable).
That's not a "bad" thing obviously!

Sure. But does that indicate the need for a "random_string()" function?

I don't know...

Anthony

10 years ago by padraic.brady@gmail.com — view source

unread

Padraic,

Hi

If random_bytes() is harder than uniqid(), it's a non-starter.

Technically, it will be harder than uniqid() if producing strictly
random bytes (if output needs to be printable/readable).
That's not a "bad" thing obviously!

Sure. But does that indicate the need for a "random_string()" function?

It would be more random than a stream of 0-9 integer characters, and
probably useful compared to base64'ing a byte stream. I'm afraid to go
survey how its done in the wild right now. Possibly?

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com

10 years ago by Anthony Ferrara — view source

unread

Padraic,

Technically, it will be harder than uniqid() if producing strictly
random bytes (if output needs to be printable/readable).
That's not a "bad" thing obviously!

Sure. But does that indicate the need for a "random_string()" function?

It would be more random than a stream of 0-9 integer characters, and
probably useful compared to base64'ing a byte stream. I'm afraid to go
survey how its done in the wild right now. Possibly?

I've done it two ways:

Bitmasks:
https://github.com/ircmaxell/RandomLib/blob/master/lib/RandomLib/Generator.php#L228

String of characters:
https://github.com/ircmaxell/random_compat/blob/master/lib/random.php#L147

I think the latter is the easiest, especially if we define constants
with "normal" char lists: "ALPHA", "ALPHA_NUMERIC", "BASE64", etc and
default to "ALPHA_NUMERIC"...

Tho I am sure there are other ways out there.

Anthony

10 years ago by Yasuo Ohgaki — view source

unread

Hi all,

On Wed, Feb 25, 2015 at 6:33 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:

On Tue, Feb 24, 2015 at 4:17 PM, Pádraic Brady padraic.brady@gmail.com
wrote:

Hi

On 24 February 2015 at 20:04, Anthony Ferrara ircmaxell@gmail.com
wrote:

If random_bytes() is harder than uniqid(), it's a non-starter.

Technically, it will be harder than uniqid() if producing strictly
random bytes (if output needs to be printable/readable).
That's not a "bad" thing obviously!

Sure. But does that indicate the need for a "random_string()" function?

I don't know...

Random bytes is better. People would use it for IV or like with the
size of IV. If we use string, users loose effective bits.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by padraic.brady@gmail.com — view source

unread

Hi Yasuo,

Random bytes is better. People would use it for IV or like with the
size of IV. If we use string, users loose effective bits.

Suggestion was for an additional function, so random_bytes() would
still be there ;).

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com

10 years ago by Yasuo Ohgaki — view source

unread

Hi Padraic,

On Wed, Feb 25, 2015 at 7:54 AM, Pádraic Brady padraic.brady@gmail.com
wrote:

Random bytes is better. People would use it for IV or like with the
size of IV. If we use string, users loose effective bits.

Suggestion was for an additional function, so random_bytes() would
still be there ;).

random_string() sounds good to me, too!
It can be used system generated passwords, etc.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Larry Garfield — view source

unread

Hi Padraic,

On Wed, Feb 25, 2015 at 7:54 AM, Pádraic Brady padraic.brady@gmail.com
wrote:

Random bytes is better. People would use it for IV or like with the
size of IV. If we use string, users loose effective bits.
Suggestion was for an additional function, so random_bytes() would
still be there ;).

random_string() sounds good to me, too!
It can be used system generated passwords, etc.

I can see the use for random_string(), but what about character sets?
Does it only generate random characters within ASCCI / low-UTF-8?
Wouldn't someone in Novsibirsk want it to generate a random Cyrillic string?

That said, I am +1 on the original proposal. It's in the similar vein
as password_hash(): If users have to think, they'll screw up. Don't make
them think.

--Larry Garfield

10 years ago by padraic.brady@gmail.com — view source

unread

Hi Larry,

I think we'd be biting off too much to be worth chewing for other
character sets. Most uses are going to revolve around characters
allowed in URLs. Expanding that, to a degree, perhaps per a additional
character list, or character list flag, might not be too far, but
things will get interesting once you start requiring whole custom
character lists with multibyte chars thrown in.

Of course, random_string(LOTS_OF_FLAGS) might not be all that helpful
once you get enough variations involved to require a page of
explanatory text to cover them.

Paddy

I can see the use for random_string(), but what about character sets? Does
it only generate random characters within ASCCI / low-UTF-8? Wouldn't
someone in Novsibirsk want it to generate a random Cyrillic string?

That said, I am +1 on the original proposal. It's in the similar vein as
password_hash(): If users have to think, they'll screw up. Don't make them
think.

--Larry Garfield

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com