Hey Internals,
I realise I'm cutting it close with this one, but I want to propose some
changes to our standard random number generators.
The downside of this proposal is that our RNGs (rand() and mt_rand()
) are
seedable and reproduce identical streams (platform dependant) for any given
seed. However their implementations are broken or inconsistent, so we need
to weigh up the cost of changing these sequences versus having solid
implementations.
It is my opinion that if we are going to make any changes to these
functions, we should make all of the changes at the same time and avoid any
future disruption to their output.
The RFC contains a few proposals, some of them depend on each other while
others are standalone. Throughout the discussion phase I hope to reduce the
number of proposals down to a consensus we can vote on in two weeks time.
I will release a patch when I have a better feeling for the direction we
want to take.
The issues I want to bring up for discussion are.
- Replacing
mt_rand()
andrand()
to a strong, modern RNG. - Alternatively, fixing the current
mt_rand()
implementation to make it
standard - Aliasing
rand()
tomt_rand()
to improve output and cross-platform support - Fixing RAND_RANGE for large ranges.
- Replacing insecure uses of php_rand() with php_random_bytes()
- Making the
array_rand()
algorithm more efficient
The RFC can be found here: https://wiki.php.net/rfc/rng_fixes
If anyone knows of other fixes that should be made at the same time but I
have overlooked, please let me know so I can get them included.
Regards,
Leigh.
Hey Internals,
I realise I'm cutting it close with this one, but I want to propose some
changes to our standard random number generators.The downside of this proposal is that our RNGs (rand() and
mt_rand()
) are
seedable and reproduce identical streams (platform dependant) for any given
seed. However their implementations are broken or inconsistent, so we need
to weigh up the cost of changing these sequences versus having solid
implementations.It is my opinion that if we are going to make any changes to these
functions, we should make all of the changes at the same time and avoid any
future disruption to their output.The RFC contains a few proposals, some of them depend on each other while
others are standalone. Throughout the discussion phase I hope to reduce the
number of proposals down to a consensus we can vote on in two weeks time.I will release a patch when I have a better feeling for the direction we
want to take.The issues I want to bring up for discussion are.
- Replacing
mt_rand()
andrand()
to a strong, modern RNG.- Alternatively, fixing the current
mt_rand()
implementation to make it
standard- Aliasing
rand()
tomt_rand()
to improve output and cross-platform support- Fixing RAND_RANGE for large ranges.
- Replacing insecure uses of php_rand() with php_random_bytes()
- Making the
array_rand()
algorithm more efficientThe RFC can be found here: https://wiki.php.net/rfc/rng_fixes
If anyone knows of other fixes that should be made at the same time but I
have overlooked, please let me know so I can get them included.Regards,
Leigh.
Good idea. I'm particularly fond of PCG over MT and LCG (but would not
ever use it for a CSPRNG).
Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises https://paragonie.com/
The issues I want to bring up for discussion are.
- Replacing
mt_rand()
andrand()
to a strong, modern RNG.- Alternatively, fixing the current
mt_rand()
implementation to make it
standard- Aliasing
rand()
tomt_rand()
to improve output and cross-platform support- Fixing RAND_RANGE for large ranges.
- Replacing insecure uses of php_rand() with php_random_bytes()
- Making the
array_rand()
algorithm more efficientThe RFC can be found here: https://wiki.php.net/rfc/rng_fixes
Why do we need so many functions to get a random int anyways if we now
have random_int()
? I would like to see all of them deprecated and
removed in PHP 8.0.
-
crypt()
->password_hash()
-
rand()
->random_int()
-
mt_getrandmax()
->PHP_INT_MAX
-
mt_rand()
->random_int()
-
mt_srand()
-> - -
shuffle()
-> array_shuffle()* -
srand()
-> -
Mcrypt is meant to be replaced anyways and OpenSSL might be too if we
can come up with a nicer implementation that actually hides the
underlying library (e.g. sodium).
-
Directly fix the name and get rid of the reference:
array_shuffle(array $array, int $num = 1): array
I do not see a problem to change array_rand()
, array_shuffle(), nor
str_shuffle()
since their output should be random anyways.
--
Richard "Fleshgrinder" Fussenegger
The issues I want to bring up for discussion are.
- Replacing
mt_rand()
andrand()
to a strong, modern RNG.- Alternatively, fixing the current
mt_rand()
implementation to make it
standard- Aliasing
rand()
tomt_rand()
to improve output and cross-platform support- Fixing RAND_RANGE for large ranges.
- Replacing insecure uses of php_rand() with php_random_bytes()
- Making the
array_rand()
algorithm more efficientThe RFC can be found here: https://wiki.php.net/rfc/rng_fixes
Why do we need so many functions to get a random int anyways if we now
haverandom_int()
? I would like to see all of them deprecated and
removed in PHP 8.0.
In my opinion, we need at least one random function which yields
reproducible values.
--
Christoph M. Becker
In my opinion, we need at least one random function which yields
reproducible values.--
Christoph M. Becker
Hi Christoph,
Even with the proposed changes both functions will still be capable of
reproducible sequences, just different sequences than before the changes.
In my opinion, we need at least one random function which yields
reproducible values.Even with the proposed changes both functions will still be capable of
reproducible sequences, just different sequences than before the changes.
Yes, I'm aware of that, and that change isn't an issue for me (except
maybe that it might happen in a minor version). I was responding to
Richard (Fleshgrinder) who suggested to remove rand()
and mt_rand()
alltogether, because there is random_int()
.
--
Christoph M. Becker
Yes, I'm aware of that, and that change isn't an issue for me (except
maybe that it might happen in a minor version). I was responding to
Richard (Fleshgrinder) who suggested to removerand()
andmt_rand()
alltogether, because there israndom_int()
.
I understood how you mean it. :)
Call me ignorant but is this required in typical web applications?
Couldn't we move this functionality to PECL? I mean, it is required in
games but other than that.
Please correct me if that is wrong!
--
Richard "Fleshgrinder" Fussenegger
Yes, I'm aware of that, and that change isn't an issue for me (except
maybe that it might happen in a minor version). I was responding to
Richard (Fleshgrinder) who suggested to removerand()
andmt_rand()
alltogether, because there israndom_int()
.I understood how you mean it. :)
Call me ignorant but is this required in typical web applications?
Couldn't we move this functionality to PECL? I mean, it is required in
games but other than that.Please correct me if that is wrong!
--
Richard "Fleshgrinder" Fussenegger
I think as this is a BC break it should require the 2/3 majority. I do
support fixing the RNGs though.
Have you done any checks on GitHub etc to see how widespread this usage is?
I'd like to get some data on that too.
Yes, I'm aware of that, and that change isn't an issue for me (except
maybe that it might happen in a minor version). I was responding to
Richard (Fleshgrinder) who suggested to removerand()
andmt_rand()
alltogether, because there israndom_int()
.I understood how you mean it. :)
Call me ignorant but is this required in typical web applications?
Couldn't we move this functionality to PECL? I mean, it is required in
games but other than that.Please correct me if that is wrong!
--
Richard "Fleshgrinder" FusseneggerI think as this is a BC break it should require the 2/3 majority. I do
support fixing the RNGs though.Have you done any checks on GitHub etc to see how widespread this usage is?
I'd like to get some data on that too.
I don't have data, but a word of caution: Don't grep legacy crypto
libraries for use of rand()
or mt_rand()
for key/IV generation if you want
to feel any sense of optimism. Speaking from experience here! ;)
Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises https://paragonie.com/
I think as this is a BC break it should require the 2/3 majority. I do
support fixing the RNGs though.
Sure if there's a consensus on that, I have no problem with it.
Have you done any checks on GitHub etc to see how widespread this usage is?
I'd like to get some data on that too.
I have checked, but it's really difficult to find legitimate use cases of
srand and mt_srand. I spent some time stacking up "NOT xyz" terms in the
search, and still didn't find anything that actually relied on
deterministic streams of numbers.
Call me ignorant but is this required in typical web applications?
PHP is used for various things, not just web apps. I use it for various
other things because its the language in which I am most fluent.
And the requirements of typical apps using PHP should not be the basis
for removing functions that are in fact used in existing programs.
It's possible to change programs so they don't use mt_rand()
etc. but
most people won't thank you for forcing them to rewrite software that works.
Leigh, iiuc, is trying to fix bugs. Let's not change the discussion to
cleaning up PHP's API.
Tom
Call me ignorant but is this required in typical web applications?
PHP is used for various things, not just web apps. I use it for various
other things because its the language in which I am most fluent.
PHP is and should remain: a pragmatic web-focused language
--- Rasmus Lerdorf
Please do not ignore our mission statement here. PHP is not a general
purpose language and even real general purpose languages do not offer
predictable RNGs.
And the requirements of typical apps using PHP should not be the basis
for removing functions that are in fact used in existing programs.
Moving to PECL is not considered a BC and people are easily able to get
the functions back in if they really need to.
It's possible to change programs so they don't use
mt_rand()
etc. but
most people won't thank you for forcing them to rewrite software that
works.
The applications and libraries who are using it incorrectly right now
will thank us for making it harder to use the language incorrectly.
Leigh, iiuc, is trying to fix bugs. Let's not change the discussion to
cleaning up PHP's API.
This is not what my proposal is about. I would move all the broken and
weak stuff to PECL and offer the already existing good alternatives to
the developers. At the same time we are able to fix the problems in the
PECL modules and release a new major version of those packages.
We do not need to fix password_hash()
nor random_int()
since they work
and they are what is needed in a web-focused language.
--
Richard "Fleshgrinder" Fussenegger
Fleshgrinder php@fleshgrinder.com schrieb am Mi., 15. Juni 2016, 19:55:
Call me ignorant but is this required in typical web applications?
PHP is used for various things, not just web apps. I use it for various
other things because its the language in which I am most fluent.PHP is and should remain: a pragmatic web-focused language
--- Rasmus Lerdorf
Please do not ignore our mission statement here. PHP is not a general
purpose language and even real general purpose languages do not offer
predictable RNGs.
Quoting from PHP.net:
PHP is a popular general-purpose scripting language that is especially
suited to web development.
And the requirements of typical apps using PHP should not be the basis
for removing functions that are in fact used in existing programs.Moving to PECL is not considered a BC and people are easily able to get
the functions back in if they really need to.It's possible to change programs so they don't use
mt_rand()
etc. but
most people won't thank you for forcing them to rewrite software that
works.The applications and libraries who are using it incorrectly right now
will thank us for making it harder to use the language incorrectly.Leigh, iiuc, is trying to fix bugs. Let's not change the discussion to
cleaning up PHP's API.This is not what my proposal is about. I would move all the broken and
weak stuff to PECL and offer the already existing good alternatives to
the developers. At the same time we are able to fix the problems in the
PECL modules and release a new major version of those packages.We do not need to fix
password_hash()
norrandom_int()
since they work
and they are what is needed in a web-focused language.--
Richard "Fleshgrinder" Fussenegger
Quoting from PHP.net:
PHP is a popular general-purpose scripting language that is especially
suited to web development.
Quoting from Wikipedia:
PHP is a server-side scripting language designed for web development >
but also used as a general-purpose programming language.
But let use stop that now. I already wrote that someone should come up
with use cases for predictable random numbers other than creating
insecure secrets. This is the main problem that needs solving, people
using this stuff without knowing what they do.
Keep in mind that anyone or anything (company) that requires predictable
random numbers for their software (e.g. game) wants to have more control
over distribution and ways to tweak it. Hence, they will directly
implement it straight on their own anyways. Business rules are more
important in such domains than readily available built-in stuff.
Otherwise many people would not have jobs. :P
If they really don't want to they can still fall back to PECL. I really
do not see the shared hosting as a big argument here because shared
hosting directly falls back to web application and -- as I said before
-- in this context the requirement for predictable random numbers is
pretty much nil.
Just proof me wrong and show me where it is needed.
Drupal? Symfony? Zend? Wordpress? PhpBB? ...?
--
Richard "Fleshgrinder" Fussenegger
Quoting from PHP.net:
PHP is a popular general-purpose scripting language that is especially
suited to web development.Quoting from Wikipedia:
PHP is a server-side scripting language designed for web development >
but also used as a general-purpose programming language.But let use stop that now. I already wrote that someone should come up
with use cases for predictable random numbers other than creating
insecure secrets. This is the main problem that needs solving, people
using this stuff without knowing what they do.Keep in mind that anyone or anything (company) that requires predictable
random numbers for their software (e.g. game) wants to have more control
over distribution and ways to tweak it. Hence, they will directly
implement it straight on their own anyways.
No they don't all do it.
There are ways to achieve what you want in a nice way while not breaking
things. Let consider them.
Cheers,
Pierre
No they don't all do it.
We don't know but I will try to find legitimate usages of (mt_)rand.
There are ways to achieve what you want in a nice way while not breaking
things. Let consider them.
Moving to PECL does not break anything.
--
Richard "Fleshgrinder" Fussenegger
I think I've caught up on everything discussed now.
One thing I would like to point out, when people have searched for
"legitimate uses" of mt_rand()
, you should have been looking for
legitimate uses of mt_srand()
- this is the functionality that will be
broken.
There are ways to achieve what you want in a nice way while not breaking
things. Let consider them.Cheers,
Pierre
So what would you suggest? mt_rand_mode() with constants for correct
and legacy? (defaulting to correct, and a single fcall for users to
get the old behaviour back)
I think I've caught up on everything discussed now.
One thing I would like to point out, when people have searched for
"legitimate uses" ofmt_rand()
, you should have been looking for
legitimate uses ofmt_srand()
- this is the functionality that will be
broken.There are ways to achieve what you want in a nice way while not breaking
things. Let consider them.Cheers,
PierreSo what would you suggest? mt_rand_mode() with constants for correct
and legacy? (defaulting to correct, and a single fcall for users to
get the old behaviour back)
Yes that would make it. Even if i would prefer the other way for at least
one version.
Le mercredi 15 juin 2016, 21:43:05 Fleshgrinder a écrit :
But let use stop that now. I already wrote that someone should come up
with use cases for predictable random numbers other than creating
insecure secrets. This is the main problem that needs solving, people
using this stuff without knowing what they do.Keep in mind that anyone or anything (company) that requires predictable
random numbers for their software (e.g. game) wants to have more control
over distribution and ways to tweak it. Hence, they will directly
implement it straight on their own anyways. Business rules are more
important in such domains than readily available built-in stuff.
Otherwise many people would not have jobs. :PIf they really don't want to they can still fall back to PECL. I really
do not see the shared hosting as a big argument here because shared
hosting directly falls back to web application and -- as I said before
-- in this context the requirement for predictable random numbers is
pretty much nil.Just proof me wrong and show me where it is needed.
Drupal? Symfony? Zend? Wordpress? PhpBB? ...?
Hello,
An example I can think of where reproductible RNG could be needed (outside of the obvious case of games, and I’m not sure why it’s not enough), is the generation of random images based on user’s information, as gravatar is doing for instance.
So, for me PHP must have a way of providing reproducible random sequences. But that does not mean it has to be the same functions as before, I would be fine if (mt_)(s)rand are deprecated and some other method allows to do this.
But I’m a bit confused if people are arguing over keeping rand method or over whether we need reproducible RNG at all.
Côme
Yes, I'm aware of that, and that change isn't an issue for me (except
maybe that it might happen in a minor version). I was responding to
Richard (Fleshgrinder) who suggested to removerand()
andmt_rand()
alltogether, because there israndom_int()
.Call me ignorant but is this required in typical web applications?
Couldn't we move this functionality to PECL? I mean, it is required in
games but other than that.
I don't know, but I can image that it is (test suites come to mind). Of
course, it might be an option to move the functionality to PECL, but on
the other hand, what would we gain? For users, however, that might be a
major pain; consider software which runs on shared hosting.
--
Christoph M. Becker
I don't know, but I can image that it is (test suites come to mind). Of
course, it might be an option to move the functionality to PECL, but on
the other hand, what would we gain? For users, however, that might be a
major pain; consider software which runs on shared hosting.
https://github.com/sebastianbergmann/phpunit/pull/1408#issuecomment-53163159
rand()
will not work for them. ;)
And then there is https://packagist.org/packages/leigh/mt-rand as Leight
just pointed out.
I am sorry but this topic is clear as a crystal.
--
Richard "Fleshgrinder" Fussenegger
Why do we need so many functions to get a random int anyways if we now
haverandom_int()
? I would like to see all of them deprecated and
removed in PHP 8.0.
Lets see if others support this option. (I'm not even sure I do right now)
I do not see a problem to change
array_rand()
, array_shuffle(), nor
str_shuffle()
since their output should be random anyways.
Right now a call to srand()
with a given seed will make these functions
return the same sequence of outputs for a particular set of inputs. This
behaviour is fine and sometimes even desirable. The changes in this RFC
will change those outputs, they will still be controllable with srand()
though
Why do we need so many functions to get a random int anyways if we now
haverandom_int()
?
For backwards compatibility. There are programs that use these and
little to gain from breaking them.
Tom
Hi!
Why do we need so many functions to get a random int anyways if we now
haverandom_int()
? I would like to see all of them deprecated and
removed in PHP 8.0.
I don't understand this drive to remove functions and break existing
code. What is the point of it? If you don't like them, don't use them.
They can share underlying implementation but there's absolutely no
reason to remove functions unless they do not work anymore, broken and
can not be fixed, etc.
PHP 8.0 is not some magic land where anything goes and BC does not
matter anymore. It still should run existing apps and do it well,
otherwise there's no point in releasing it at all - who's going to use it?
Mcrypt is meant to be replaced anyways and OpenSSL might be too if we
can come up with a nicer implementation that actually hides the
underlying library (e.g. sodium).
This is another problem. So we have OpenSSL, then we have mcrypt, then
we have another implementation like sodium... do we really expect our
users to rewrite crypto in their apps every couple of years? That would
be insane. OK, we could say "have your apps work as they worked, but use
new stuff for new things" - but you propose to remove stuff?
--
Stas Malyshev
smalyshev@gmail.com
Hi!
Why do we need so many functions to get a random int anyways if we now
haverandom_int()
? I would like to see all of them deprecated and
removed in PHP 8.0.I don't understand this drive to remove functions and break existing
code. What is the point of it? If you don't like them, don't use them.
They can share underlying implementation but there's absolutely no
reason to remove functions unless they do not work anymore, broken and
can not be fixed, etc.
I agree that removing such widely used functions should be done
extremely carefully, but marking them deprecated might be a good idea if
there is a stronger alternative available. That way we encourage people
to migrate, and at some point (PHP 9, 10, never, who knows) maybe they
are unused enough that we can remove them safely.
Same with magic quotes and such, it took 10 years, but they are finally
gone.
Cheers
--
Jordi Boggiano
@seldaek - http://seld.be
Hi!
Why do we need so many functions to get a random int anyways if we now
haverandom_int()
? I would like to see all of them deprecated and
removed in PHP 8.0.I don't understand this drive to remove functions and break existing
code. What is the point of it? If you don't like them, don't use them.
They can share underlying implementation but there's absolutely no
reason to remove functions unless they do not work anymore, broken and
can not be fixed, etc.I agree that removing such widely used functions should be done
extremely carefully, but marking them deprecated might be a good idea if
there is a stronger alternative available. That way we encourage people
to migrate, and at some point (PHP 9, 10, never, who knows) maybe they
are unused enough that we can remove them safely.Same with magic quotes and such, it took 10 years, but they are finally
gone.Cheers
I don't understand the drive to holding on to obviously faulty stuff.
Nikic proposed already to deprecate rand()
and I am only saying that we
can go one step further and implement pcg_rand() in favor of mt_rand()
and deprecate mt_rand()
. Probably not something we want to do in 7.1
(because PCG needs some more time to mature) but something that we might
want to do on the road to PHP 8.0.
Deprecation is not the same as "simply removing and breaking all
existing code". It is just a deprecation that usually means a log entry,
if at all.
Removal does not mean that we need to delete the code and get rid of it
forever. We can move the functionality to PECL and/or supply userland
polyfills for users who do not which to upgrade.
Where is the drive coming from? Simple: getting towards a cleaner
language that does not include cruft.
Cleaning APIs is a totally normal process. We just need to do it
carefully without BCs within minor or patch versions (which would now
allow me to make a potshot right due to some other RFCs) and without
introducing too many for the next major version because it makes
upgrading too complicated because too many LOCs need to be touched in
order to upgrade some software.
The var keyword was a good example of something that can safely be
deprecated and removed later but the rand stuff here is too. We already
have alternatives available (public, private, protected for var)
(random_int and mt_rand for rand) or the ability to implement good
alternatives (pcg_rand). The general community that is actively
developing software is already trying to get rid of the usage of both
(var and rand) and is only using mt_rand because there are no other
alternatives available (random_int was introduced with PHP 7 but the
community has developed a polyfill for themselves).
I do not understand the fear to deprecate cruft or badly designed stuff
from time to time. I am working at a big company that has a lot of
legacy PHP code but we are not afraid to actively invest into making it
better. We were actually one of the first big players to change to PHP 7
and this was even shared by Rasmus on his Twitter account. Yes it is
time consuming, yes it takes man power, but we are willing to invest it
in order to see increased performance, security, testability, and
readability of our code since our code is what makes our applications
and our applications is what makes our money. In the same vein,
searching for PHP developers is a hard thing because of all the various
complains that we all know of and are already fed up with. If we would
just start addressing those issues things would change over time in this
field too and at some point it is as easy as finding a Java developer as
it is to find a proper PHP developer.
Sorry for going so off topic but you asked for my drive and that is part
of it.
--
Richard "Fleshgrinder" Fussenegger
Hi!
I don't understand the drive to holding on to obviously faulty stuff.
What is for you "obviously faulty stuff" for literally thousands of
people is "code that works". I appreciate that there's a number of new
hip randomness tests that mt_rand may not satisfy, and there's new and
exciting number generator that we absolutely must implement because it's
new and exciting, but most users of mt_rand wouldn't really care, and
breaking their code for no reason that we have new and exciting RNG is
not what they would appreciate.
Deprecation is not the same as "simply removing and breaking all
existing code". It is just a deprecation that usually means a log entry,
if at all.
If we're not going to remove it, there's no reason to pollute the logs
with useless messages. If you want to tell people about better
alternatives, there's the manual for that.
Removal does not mean that we need to delete the code and get rid of it
forever. We can move the functionality to PECL and/or supply userland
polyfills for users who do not which to upgrade.
The users that do not wish to upgrade just won't upgrade. Making them
jump through hoops just to have what they already have makes no sense.
Where is the drive coming from? Simple: getting towards a cleaner
language that does not include cruft.
This is a meaningless statement. "Cruft" is entirely subjective and
undefined, you essentially are saying "the language should look the way
I like it and contain only the code and syntax I like". That can not
work in a mature language with gigabytes of deployed code out there.
The var keyword was a good example of something that can safely be
deprecated and removed later but the rand stuff here is too. We already
No and no.
alternatives (pcg_rand). The general community that is actively
developing software is already trying to get rid of the usage of both
(var and rand) and is only using mt_rand because there are no other
"Trying to get rid of" is not going to help people that have code bases
rooting in PHP 4. And that's the majority of deployed code. It would be
enormous expense to convert all that code - and all that expense would
serve absolutely no goal except giving you satisfaction of "there's no
cruft". It is a useless goal and any time spent on reaching it is wasted.
I do not understand the fear to deprecate cruft or badly designed stuff
from time to time. I am working at a big company that has a lot of
legacy PHP code but we are not afraid to actively invest into making it
better. We were actually one of the first big players to change to PHP 7
Your company may have tons of money and manpower to rewrite their code
(and, I assume, every library you use and every dependency that library
has) every 2-3 years. Most companies most definitely do not. Our usage
numbers for latest versions are very bad - 5.3 is still the most used
PHP version. We do not need to make upgrading harder. In fact, if we
make it any substantially harder, I don't know who we'll be releasing it
for - 0.01% of PHP user base that would actually run it?
and our applications is what makes our money. In the same vein,
searching for PHP developers is a hard thing because of all the various
complains that we all know of and are already fed up with. If we would
Did I misunderstand you or you just claimed nobody wants to code in PHP
because it has var and rand and there's impossible to find a PHP
developer on the market? I am afraid my experience is exactly the
opposite. But if you are worried about this, making it harder to work
with PHP by fragmenting the ecosystem and making people remember which
function does random in which PHP version is not going to help.
Stas Malyshev
smalyshev@gmail.com
I don't understand the drive to holding on to obviously faulty stuff.
What is for you "obviously faulty stuff" for literally thousands of
people is "code that works". I appreciate that there's a number of new
hip randomness tests that mt_rand may not satisfy, and there's new and
exciting number generator that we absolutely must implement because it's
new and exciting, but most users of mt_rand wouldn't really care, and
breaking their code for no reason that we have new and exciting RNG is
not what they would appreciate.
Can someone explain why I should need 'crypto safe' random numbers when
ALL I use rand for is to give a random order to content items on the
page. Something more in sync with the shuffle and array_rand without the
need to recode to actually use the array functions, or simply select an
entry at random from a list.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Can someone explain why I should need 'crypto safe' random numbers when
ALL I use rand for is to give a random order to content items on the
page. Something more in sync with the shuffle and array_rand without the
need to recode to actually use the array functions, or simply select an
entry at random from a list.
There are actually only two properties of an RNG that are of interest to
you and that is resource consumption and performance since you do not
rely on predictable sequences, a certain amount of randomness, nor
portability.
mt_rand()
provided by PHP is the best choice here right now:
rand = https://3v4l.org/nIIdb/perf#tabs
mt_rand = https://3v4l.org/Wb3ZA/perf#tabs
random_int = https://3v4l.org/5SZHW/perf#tabs
But notice how super tiny the difference from random_int()
to the others
is. Being a use'n'forget for almost all purposes (predictable sequences
is the only use case it does not cover). Especially note that
random_int()
is pretty much as fast as rand()
itself!
This led me to my initial question: why do we have so many random
functions in the first place?
Historical? Yes!
Needed?
--
Richard "Fleshgrinder" Fussenegger
Can someone explain why I should need 'crypto safe' random numbers when
ALL I use rand for is to give a random order to content items on the
page.
I cannot.
Something more in sync with the shuffle and array_rand without the
need to recode to actually use the array functions, or simply select an
entry at random from a list.
Similarly, when I randomize the time before a daemon next wakes up, I
don't think random_int()
is appropriate. mt_rand()
is entirely suitable.
And upgrading to a more efficient RNG like PCG would be daft.
Tom
What is for you "obviously faulty stuff" for literally thousands of
people is "code that works". I appreciate that there's a number of new
hip randomness tests that mt_rand may not satisfy, and there's new and
exciting number generator that we absolutely must implement because it's
new and exciting, but most users of mt_rand wouldn't really care, and
breaking their code for no reason that we have new and exciting RNG is
not what they would appreciate.
https://news.ycombinator.com/item?id=9941364
My favorite:
The PHP approach seems to be that any crazy behavior is acceptable as
long as it's documented.
There is plenty more of this stuff to be found on the Internet and the
only thing I want to show is that it is faulty. It is not broken, yes,
but faulty and hard to use right for many people out there. This is
usually reason enough to react and try to do better.
Why is it faulty? Well, it has many weird edge cases that people need to
learn about. Reading the manual is however not something people actually
do and this results in constant education being necessary. The question
is, why can't we fix that? The answer here seems to be: because of
legacy support. You are saying yourself that nobody is upgrading anyways
to anything beyond 5.3 so this argument is kind of weak because
obviously people have no problem with PHP 7.0, they seem to have a
problem with any of the recent PHP versions.
If we're not going to remove it, there's no reason to pollute the logs
with useless messages. If you want to tell people about better
alternatives, there's the manual for that.
Not within 7.x because that would be a serious BC within a non-major
version. Of course we would remove rand()
in 8.0 in favor of mt_rand()
or a better alternative if readily available within the versions of PHP
that were releases before 8.0.
I proposed to deprecate mt_rand()
right away but I had a closer look at
the newly available RNGs and non of them seem to be fit for production
yet. It is definitely better to wait here and just keep mt_rand()
for
7.x. In 8.0 things might change but that's too far in the future to make
plans right now.
The users that do not wish to upgrade just won't upgrade. Making them
jump through hoops just to have what they already have makes no sense.
If they don't upgrade, they don't mind it being removed. ;)
This is a meaningless statement. "Cruft" is entirely subjective and
undefined, you essentially are saying "the language should look the way
I like it and contain only the code and syntax I like". That can not
work in a mature language with gigabytes of deployed code out there.
I never said that I want things to be like I want them to be. Stop
putting words in my mouth and reflect for a second on your own write ups
in which you always claim to know exactly what is best for everyone. I
am writing my posts from my perspective and you are writing your posts
from your perspective and I tell you a secret: that is how we humans
write stuff, from our perspectives.
Yes, cruft is a generic term: https://en.wikipedia.org/wiki/Cruft
It specifically contains "redundant" which is more than appropriate to
the current discussion since we have two random functions that do
literally the same but one worse than the other.
"Trying to get rid of" is not going to help people that have code bases
rooting in PHP 4. And that's the majority of deployed code. It would be
enormous expense to convert all that code - and all that expense would
serve absolutely no goal except giving you satisfaction of "there's no
cruft". It is a useless goal and any time spent on reaching it is wasted.
No since most (mt_)rand() calls are broken crypto crap. Just go out and
search for yourself like I did. It is pretty much impossible to find any
other usage ... Not saying that there are not legitimate usages of any
of those functions but they are super rare.
Your company may have tons of money and manpower to rewrite their code
(and, I assume, every library you use and every dependency that library
has) every 2-3 years. Most companies most definitely do not. Our usage
numbers for latest versions are very bad - 5.3 is still the most used
PHP version. We do not need to make upgrading harder. In fact, if we
make it any substantially harder, I don't know who we'll be releasing it
for - 0.01% of PHP user base that would actually run it?
Yes, we have manpower and the libraries we use are being upgraded
constantly. Symfony and PHPUnit are pretty reliable in that manner and
often ahead of our own libraries (PHP 5.x + 7.x support together is not
hard to achieve and also what most of our libraries currently offer; we
are not pure PHP 7 but PHP 5.4+).
5.3 is long gone and the fact that 5.4 was actually 6.0 was already
discussed very often here. One of the main reasons why upgrades are slow
is definitely related to shared hosters who do not upgrade their PHP as
well as the Linux distribution systems that is faulty and leads to old
packages being installed on many machines.
It is hard to judge whether the hidden BCs that are silently being
introduced by some RFCs are more harmful than a proper deprecate and
remove approach with proper version numbers. I would argue that the last
approach is better, at least other programming languages have more
success with it (Java).
Too much fragmentation is definitely not the goal but we are far away
from that right now. Node has a brutal fragmentation with 0.x 1.x 2.x
3.x 4.x and 5.x all being in use at the same time.
I really do not know how to address the issue that people are not
upgrading. Hidden BCs that are sold as improvements definitely do not
help but if controlled BCs do is unanswered in my opinion. Even if you
claim otherwise.
Did I misunderstand you or you just claimed nobody wants to code in PHP
because it has var and rand and there's impossible to find a PHP
developer on the market? I am afraid my experience is exactly the
opposite. But if you are worried about this, making it harder to work
with PHP by fragmenting the ecosystem and making people remember which
function does random in which PHP version is not going to help.
No, you simply jumped over a complete sentence:
searching for PHP developers is a hard thing because of all the
various complains that we all know of and are already fed up with.
PHP has a bad reputation among the industry especially in Europe/Germany
because of "the various complains that we all know of and are already
fed up with". This (mt_)rand() situation is just a super tiny part of
the whole.
--
Richard "Fleshgrinder" Fussenegger
Hi!
I am sorry, what this link is supposed to illustrate? That if one
doesn't read the docs and uses mt_rand wrong they'd get exactly what it
is supposed to do? Ok, true, and?
The PHP approach seems to be that any crazy behavior is acceptable
as long as it's documented.
This is an obviously false statement, so obviously that I am confused
about the purpose which could drive you to quote it. Unless it claims
that this is not actual approach but just "seems" to be to some person
on the internet. In which case I wonder why should I care about
obviously false impression of some person on the internet.
There is plenty more of this stuff to be found on the Internet and
the
There's plenty of any stuff found on the Internet. So what? If I fancied
to read only "PHP sucks" articles every day, I could probably occupy
myself full-time for months without reading any of them twice. 90% of it
is hair-splitting and "I didn't read the manual and it's your fault"
kind of stuff, 10% of it are real issues. Which are known to us and none
of them requires removal of functions and syntax.
only thing I want to show is that it is faulty. It is not broken,
yes, but faulty and hard to use right for many people out there. This
is
No, it's not hard to use for most people out there. In fact, it is
extremely easy to use to most people out there, which is easily proven
by people actually using it in millions.
usually reason enough to react and try to do better.
Nobody argues with doing better. But "let's remove this function
altogether because it has a corner case" is not doing better, that's
exactly my point.
Why is it faulty? Well, it has many weird edge cases that people need
to learn about. Reading the manual is however not something people
actually do and this results in constant education being necessary.
The question is, why can't we fix that? The answer here seems to be:
because of legacy support. You are saying yourself that nobody is
upgrading anyways
If you want to create a random function that would not require knowing
anything about randomness and how to use random functions, and still be
crypto-strong, and will have no corner cases at all, you will fail. If
you want to create a random function that would not have that particular
corner case, you will succeed easily. So what? It will have different
corner cases. The fact that some guy wrote on HN about this corner case
and didn't write about another one changes nothing.
We can and should have better random functions. It is not the reason to
remove ones that work. Even if they have corner cases which people once
talked about on HN.
Not within 7.x because that would be a serious BC within a non-major
version. Of course we would removerand()
in 8.0 in favor of
mt_rand()
or a better alternative if readily available within the
versions of PHP that were releases before 8.0.
So all the code that uses *rand() is broken, and we have nobody that
uses that code upgrading to 8.0 until they have a budget to rewrite and
re-test and re-audit all their code, which is probably sometime in the
next 20 years. And at the price of that we achieved... what
exactly? We certainly didn't help people that used rand()
- they are now
using unsupported PHP version with all bugs and security issues and
can't upgrade. We didn't help people starting new development - they
don't need rand()
removed, they can just use new stuff.
But now we can go back to that HN thread and say "looks at us, we have
no cruft anymore!" Big achievement unlocked?
The users that do not wish to upgrade just won't upgrade. Making
them jump through hoops just to have what they already have makes
no sense.If they don't upgrade, they don't mind it being removed. ;)
True. But we want them to upgrade. And needlessly removing stuff
prevents that.
Yes, cruft is a generic term: https://en.wikipedia.org/wiki/Cruft
Generic and thus meaningless in the specific context, without
qualification at least. It's just a pejorative. "It needs to be removed
because it's bad". It's not argument for removal, it's just
restating the same.
It specifically contains "redundant" which is more than appropriate
to the current discussion since we have two random functions that do
literally the same but one worse than the other.
It doesn't have to be worse.
No since most (mt_)rand() calls are broken crypto crap. Just go out
and
That is based on what? I certainly have seen a lot of rand usage having
little to nothing to do with cryptography.
search for yourself like I did. It is pretty much impossible to find
any other usage ... Not saying that there are not legitimate usages
of any of those functions but they are super rare.
I found three usages within 5 minutes having nothing to do with crypto:
https://github.com/zendframework/zf1/blob/210190dab599e2897220648c9040bce9ff76f21f/library/Zend/Captcha/Image.php#L407
https://github.com/zendframework/zf1/blob/210190dab599e2897220648c9040bce9ff76f21f/library/Zend/Amf/Value/Messaging/AbstractMessage.php#L82
https://github.com/zendframework/zend-cache/blob/bb8a75c62d3e1c75b8d8bc53f8b2db98314d3a17/src/Storage/Plugin/OptimizeByFactor.php#L42
That's not the meaning of "impossible" I'm used to.
The very fact that you claim rand is not used by anything but crypto
suggests to me that you are overfocusing on one issue - which is a real
issue, no doubt about it - of using non-crypto-strong randomness in
crypto context and ignoring other aspects where randomness is in play.
Solving your problem neither requires removing mt_rand nor is achieved
by it. The solution would be better crypto API and education about which
randomness to use in which context.
It is hard to judge whether the hidden BCs that are silently being
introduced by some RFCs are more harmful than a proper deprecate and
remove approach with proper version numbers. I would argue that the
last approach is better, at least other programming languages have
more success with it (Java).
Java pretty much never removes APIs. And I don't remember any instance
of Java syntax being removed. See for example:
https://www.quora.com/Has-Sun-or-Oracle-ever-removed-a-deprecated-Java-method-in-an-official-API/
Oracle people say they consider to maybe remove some com.sun.*
internal APIs that aren't even part of official documented API. From
what I know, these considerations are still just talk. Somehow they are
not worried about articles on HN and nobody wanting to use Java because
they have "cruft".
Also, Java never generates runtime messages on deprecated items.
PHP has a bad reputation among the industry especially in
Europe/Germany because of "the various complains that we all know of
and are already fed up with". This (mt_)rand() situation is just a
super tiny part of the whole.
PHP has millions of users and servers, including in Europe, including in
Germany. So whoever considers it beneath them because it has mt_rand can
use whatever language they see fit, but I am not very impressed by the
argument that somehow PHP is in a huge crisis and removing stuff that
works somehow is going to help solve it.
Stas Malyshev
smalyshev@gmail.com
I am sorry, what this link is supposed to illustrate? That if one
doesn't read the docs and uses mt_rand wrong they'd get exactly what it
is supposed to do? Ok, true, and?This is an obviously false statement, so obviously that I am confused
about the purpose which could drive you to quote it. Unless it claims
that this is not actual approach but just "seems" to be to some person
on the internet. In which case I wonder why should I care about
obviously false impression of some person on the internet.
It is meant to illustrate that people do not read the docs and that
people get it wrong. I still claim that this indicates room for
improvement on our side.
There's plenty of any stuff found on the Internet. So what? If I fancied
to read only "PHP sucks" articles every day, I could probably occupy
myself full-time for months without reading any of them twice. 90% of it
is hair-splitting and "I didn't read the manual and it's your fault"
kind of stuff, 10% of it are real issues. Which are known to us and none
of them requires removal of functions and syntax.
We both could waste our time with that or try to improve PHP (you
probably more than me but I am just starting to invest more time into
this and hope to get more useful in the future). Randomly removing
functions and syntax does not help -- no question there -- but
selectively cleaning things up definitely does. Or why do you think that
there are so many internals threads and RFCs discussing exactly that. I
know that you have the impression of me that I always just want to
remove everything but we also were in various threads were I was of an
opposite opinion. However, it is true that I generally tend to go more
into the direction of clean-up rather than simply keeping and ignoring
something.
Nobody argues with doing better. But "let's remove this function
altogether because it has a corner case" is not doing better, that's
exactly my point.
Then let's stop discussing that point and try to do better together. ;)
I will ignore all the Hacker News stuff you wrote because it obviously
goes into a completely different direction than intended on my side.
Sorry for confusing you.
True. But we want them to upgrade. And needlessly removing stuff
prevents that.
Yes to the fact that we want the upgrades and that needless removals
will not help. I however stick to the believe that selective removals do.
Generic and thus meaningless in the specific context, without
qualification at least. It's just a pejorative. "It needs to be removed
because it's bad". It's not argument for removal, it's just
restating the same.
The arguments are in the thread (not in my answer to you).
That is based on what? I certainly have seen a lot of rand usage having
little to nothing to do with cryptography.
Open source code, issues, and pull requests that I searched through on
GitHub.
I found three usages within 5 minutes having nothing to do with crypto:
https://github.com/zendframework/zf1/blob/210190dab599e2897220648c9040bce9ff76f21f/library/Zend/Captcha/Image.php#L407
https://github.com/zendframework/zf1/blob/210190dab599e2897220648c9040bce9ff76f21f/library/Zend/Amf/Value/Messaging/AbstractMessage.php#L82
https://github.com/zendframework/zend-cache/blob/bb8a75c62d3e1c75b8d8bc53f8b2db98314d3a17/src/Storage/Plugin/OptimizeByFactor.php#L42
None of them is using rand()
:P
That's not the meaning of "impossible" I'm used to.
The very fact that you claim rand is not used by anything but crypto
suggests to me that you are overfocusing on one issue - which is a real
issue, no doubt about it - of using non-crypto-strong randomness in
crypto context and ignoring other aspects where randomness is in play.
Solving your problem neither requires removing mt_rand nor is achieved
by it. The solution would be better crypto API and education about which
randomness to use in which context.
I think you are still misunderstanding where I would like to go with all
of this, probably because you did not read all of the thread or because
my messages do not contain the relevant information or I need to simply
improve with my writing skills.
Recap: I proposed to deprecate pretty much all of the functions and
people directly reacted with panic. Initially I was like, what the?!?
but I got the point and agreed that there should be one non-crypot rand
function. I already mentioned to you that I had a closer look at the
various cool new random algorithms and that none of them seems fit for
production yet. So where is my proposal at right now? Nice of you to ask
instead of simply continuing the discussion on various philosophical
topics where my answers are going to be super biased by my personal
opinion and usually nobody has the right answer anyways.
My current proposal would be to not touch any of the existing functions
because that would be a BC in our current 7 branch.
Invest some time and see how PCG matures since it looks like the most
promising algorithm of the ones available to us as of today.
If it matures enough, implement it as the algo that rand()
uses instead
of forwarding it to the operating system in PHPng. Deprecate mt_rand()
in favor of rand()
in the same release. Remove mt_rand()
in PHPnng so
that there is only one non-crypto random function left within PHP. Of
course it would also be possible to alias mt_rand()
to rand()
in PHPnng
with an E_NOTICE
or E_WARNING
and remove it in PHPnnng just to ease the
upgrade path further. I would be more than fine with that (as a general
approach to such topics).
This is not the best approach because most people were asked to migrate
from rand()
to mt_rand()
without any plan to do anything with rand()
.
Now many people exchanged their rand()
with mt_rand()
and with the above
they would not to change those mt_rand()
calls back to rand()
. However,
mt_rand()
has a specific algo in its name which is bad. Usage of the
older and more generic rand()
name is better to provide forward
compatibility in the future and allows us to exchange the underlying
algo again if necessary in an future major release of PHP.
The only situation were this approach would truly break something is if
an application relies on the predictable sequences that are currently
produced by rand()
(on a specific platform). However, I do not see this
as a big problem for a major version to be honest. That would be up to
further discussion.
That's where I currently stand with my thoughts. Perfect would be to
have two functions that are named:
-
random_int()
- crypto_random_int()
However, it might be too late to reach that goal ... :(
Java pretty much never removes APIs. And I don't remember any instance
of Java syntax being removed. See for example:https://www.quora.com/Has-Sun-or-Oracle-ever-removed-a-deprecated-Java-method-in-an-official-API/
Oracle people say they consider to maybe remove some com.sun.*
internal APIs that aren't even part of official documented API. From
what I know, these considerations are still just talk. Somehow they are
not worried about articles on HN and nobody wanting to use Java because
they have "cruft".
Also, Java never generates runtime messages on deprecated items.
Did not know that they just keep on deprecating. I always update as soon
as I encounter something that is deprecated. That approach would bring
us back to your question regarding what deprecations are good for if you
plan to never remove it. :P
PHP has millions of users and servers, including in Europe, including in
Germany. So whoever considers it beneath them because it has mt_rand can
use whatever language they see fit, but I am not very impressed by the
argument that somehow PHP is in a huge crisis and removing stuff that
works somehow is going to help solve it.
Once more, I never said that PHP is in a huge crisis (the opposite is
the case) nor that simply removing stuff will help it (selectively
cleaning things that are weird or not needed might help to gain even
more momentum). I just said that it is much harder to find PHP
developers compared to some other languages because PHP is not actively
supported through the old school industry as well as universities who
look at the old school industry. Just look at the big players that are
around in Germany: car manufacturers, SAP, Siemens, ... Most stuff is
Java and some times Ruby if it is web related. PHP is left to people who
teach themselves at home and they are often lacking the theoretical
background when it comes to real coding.
--
Richard "Fleshgrinder" Fussenegger
My favorite:
The PHP approach seems to be that any crazy behavior is acceptable as
long as it's documented.
People love raggin' on PHP.
It's a virulent meme. It propagates so well in our coder culture because
it's easy, it's just provocative enough to attract attention but it's
pretty safe because you can always find people to agree with you.
Participants can feel smart and superior even if their contribution
amounts to "me too".
It's like people raggin' on C++. "They named the language after the
worst feature in C: pointer arithmetic!" Knowingly superior chuckles. I
wonder how many of the people who pile on (propagating the meme) have
the understanding of real experience. My own history with C++ was short
but so miserable that I sometimes join in with this one.
Same thing with Perl. Before that it was COBOL. Raggin' on COBOL is
legit even if you've no idea what COBOL code looks like.
It's a form social behavior for establishing groups and belonging. It
works by using people's need for a sense of identity and validation.
Computer people at a party can use these memes as small-talk to develop
relations, either friendly or not. Our modern comms platforms'
gamification literally rewards this behavior.
But once you're aware of it, it's like American stand-up comics raggin'
on New Jersey. Usually good for a cheap laugh but in reality it's tired
out, past its due date, old, utterly unimaginative, very, very boring,
and, in the Frankfurtian sense[1], bullshit.
There. I finally said it. I've wanted to get it off my chest for years.
I apologize that I rely too much on America vernacular and culture. And
for totally hijacking Leigh's RFC thread.
Richard, nothing internals could do will stop PHP being the butt of
these dreary jokes and insults. And there are more effective ways to
push your agenda. Please consider using them.
Tom
Hi Tom,
I'm building a startup (my fourth) and I chose PHP and Laravel to make it work.
Use the right tool for the job and what not.
I think that you have been trolled (YHBT), you've lost by feeding the troll (YHL).
Have a nice day (HAND).
Also VIM is better than emacs, carry on with that if we're going to rant about minutia.
-E
My favorite:
The PHP approach seems to be that any crazy behavior is acceptable as
long as it's documented.
People love raggin' on PHP.
It's a virulent meme. It propagates so well in our coder culture because it's easy, it's just provocative enough to attract attention but it's pretty safe because you can always find people to agree with you. Participants can feel smart and superior even if their contribution amounts to "me too".
It's like people raggin' on C++. "They named the language after the worst feature in C: pointer arithmetic!" Knowingly superior chuckles. I wonder how many of the people who pile on (propagating the meme) have the understanding of real experience. My own history with C++ was short but so miserable that I sometimes join in with this one.
Same thing with Perl. Before that it was COBOL. Raggin' on COBOL is legit even if you've no idea what COBOL code looks like.
It's a form social behavior for establishing groups and belonging. It works by using people's need for a sense of identity and validation. Computer people at a party can use these memes as small-talk to develop relations, either friendly or not. Our modern comms platforms' gamification literally rewards this behavior.
But once you're aware of it, it's like American stand-up comics raggin' on New Jersey. Usually good for a cheap laugh but in reality it's tired out, past its due date, old, utterly unimaginative, very, very boring, and, in the Frankfurtian sense[1], bullshit.
There. I finally said it. I've wanted to get it off my chest for years. I apologize that I rely too much on America vernacular and culture. And for totally hijacking Leigh's RFC thread.
Richard, nothing internals could do will stop PHP being the butt of these dreary jokes and insults. And there are more effective ways to push your agenda. Please consider using them.
Tom
What is for you "obviously faulty stuff" for literally thousands of
people is "code that works". I appreciate that there's a number of new
hip randomness tests that mt_rand may not satisfy
As far as I can tell
https://gist.github.com/tom--/a12175047578b3ae9ef8
mt_rand()
tests just the same as MT19337 and both produce "good quality"
random variates.
(Btw: there's nothing hip or new about this kind of testing. It's old
and boring :p)
The issue I have with mt_rand()
is speed and memory efficiency, where it
is orders of magnitude worse than alternatives. But this won't matter in
many uses of mt_rand()
. I checked and it doesn't matter in any of the
uses in my software.
The only argument for removing it that has any legs is: "people use it
for security-related unpredictable randoms and PHP should fix that." My
judgement is that the real world benefits of doing this don't justify
the disruption.
Tom
Hi,
I already wrote this message once, but it seems to have evaporated into
the ether. So apologies if it reappears and this is revealed as a poor
duplicate of it!
I don't understand the drive to holding on to obviously faulty stuff.
Nikic proposed already to deprecaterand()
and I am only saying that we
can go one step further and implement pcg_rand() in favor ofmt_rand()
and deprecatemt_rand()
. Probably not something we want to do in 7.1
(because PCG needs some more time to mature) but something that we might
want to do on the road to PHP 8.0.
I think the push to remove "cruft" would make more sense if the
replacements were less obviously "cruft" in their own right. If I have
to polyfill pcg_rand() on old servers and mt_rand()
on new ones, I'd be
tempted to just implement wtfbbq_rand() and have done with it, because
the names, and even the algorithms they represent, are pretty
meaningless to me.
As with libsodium, I think we should avoid replacing one set of
overly-specific implementations with another, and saying "this time
we've got it right". Instead, we should look at what people actually
want the functions for, and hide the implementation as much as possible.
For instance, for reproducible (seedable) random sequences, how about a
function random_int_sequence($seed, $min, $max) which returns a
generator, so you could write "$user_seq =
random_int_sequence($user_seed, 0, 10); $user_pick = $user_seq();" Or
maybe it could return a closure, or an object - either way, something to
replace the global state of (mt_)srand.
Perhaps a random_int_fast() function with big warnings that it's not to
be trusted for crypto, but performs really well if all you're doing is
picking which image banner to show on your home page.
Use whatever RNG you want under the hood, make a declaration of whether
or not it's stable cross-platform and cross-version, and give users an
actual reason to change their code. Then consider other use cases: a
better uniqid()
, shuffling, maybe built-in UUID support...
Then, IF the new APIs become popular, we can come back to talking about
removing rand()
and mt_rand()
, because we'll have replaced them with
something substantially better, not just another variant on the same thing.
Regards,
--
Rowan Collins
[IMSoP]
Hi,
I already wrote this message once, but it seems to have evaporated into
the ether. So apologies if it reappears and this is revealed as a poor
duplicate of it!I think the push to remove "cruft" would make more sense if the
replacements were less obviously "cruft" in their own right. If I have
to polyfill pcg_rand() on old servers andmt_rand()
on new ones, I'd be
tempted to just implement wtfbbq_rand() and have done with it, because
the names, and even the algorithms they represent, are pretty
meaningless to me.As with libsodium, I think we should avoid replacing one set of
overly-specific implementations with another, and saying "this time
we've got it right". Instead, we should look at what people actually
want the functions for, and hide the implementation as much as possible.For instance, for reproducible (seedable) random sequences, how about a
function random_int_sequence($seed, $min, $max) which returns a
generator, so you could write "$user_seq =
random_int_sequence($user_seed, 0, 10); $user_pick = $user_seq();" Or
maybe it could return a closure, or an object - either way, something to
replace the global state of (mt_)srand.Perhaps a random_int_fast() function with big warnings that it's not to
be trusted for crypto, but performs really well if all you're doing is
picking which image banner to show on your home page.Use whatever RNG you want under the hood, make a declaration of whether
or not it's stable cross-platform and cross-version, and give users an
actual reason to change their code. Then consider other use cases: a
betteruniqid()
, shuffling, maybe built-in UUID support...Then, IF the new APIs become popular, we can come back to talking about
removingrand()
andmt_rand()
, because we'll have replaced them with
something substantially better, not just another variant on the same thing.Regards,
Yes, yes, yes! :)
I would still like to deprecate rand()
but probably leave it to Nikic
because people actually listen to him. ;)
@Tom Worster: I will not answer to any other message in this thread
today but I think we are essentially on the same page regarding PCG and
its suitability for PHP, it just needs time to mature. I think we also
agree regarding the MT situation.
However, I think that it makes sense to tackle the problem that people
use mt_rand()
incorrectly and Rowan's proposal here matches the last
proposal I made: it's just even better and he is as always much better
in summing things up. :)
Maybe we could do some name brain storming?
- random_int_sequence()
- random_int_fast()
- random_int_seedable()
- random_pseudo_int()
- pseudorandom_int()
- random_deterministic_int()
- deterministic_random_int()
- ????
[1] Somehow unclear what sequence means here if you just want to
randomly pick an entry from an array.
[2] This might still suggest to people that they can use it for crypto,
just faster. I really like it but I don't think its perfect.
[3] Sounds nice to me.
[4] What is a pseudo int?
[5] Would create a new prefix and probably not show up in some code
completions together with the other random functions. :( Other than that
it would describe it best.
[6] Long but to the point.
[7] Again too long and it creates a shitty new prefix.
The signature is clear in my opinion and exactly as Rowan had it:
prng(int $seed, ?int $min = 0, ?int $max = PHP_INT_MAX): int;
--
Richard "Fleshgrinder" Fussenegger
Maybe we could do some name brain storming?
I would strongly recommend instead of that, sending fewer emails to
this list, with each email containing well thought through ideas.
Throwing out random ideas into this mailing list does not seem to lead
to a productive outcome. Other mediums, like IRC, chat-rooms, forums,
and even Reddit, seem better suited at pulling ideas together into a
coherent proposal to be discussed.
cheers
Dan
Maybe we could do some name brain storming?
I would strongly recommend instead of that, sending fewer emails to
this list, with each email containing well thought through ideas.Throwing out random ideas into this mailing list does not seem to lead
to a productive outcome. Other mediums, like IRC, chat-rooms, forums,
and even Reddit, seem better suited at pulling ideas together into a
coherent proposal to be discussed.cheers
Dan
This is the official internals discussion forum. I admit that I tend to
write too much, no question there and I will improve, promise, but going
somewhere else where nobody from internals is makes no sense. :P
--
Richard "Fleshgrinder" Fussenegger
Hi,
I already wrote this message once, but it seems to have evaporated into
the ether. So apologies if it reappears and this is revealed as a poor
duplicate of it!I think the push to remove "cruft" would make more sense if the
replacements were less obviously "cruft" in their own right. If I have
to polyfill pcg_rand() on old servers andmt_rand()
on new ones, I'd be
tempted to just implement wtfbbq_rand() and have done with it, because
the names, and even the algorithms they represent, are pretty
meaningless to me.As with libsodium, I think we should avoid replacing one set of
overly-specific implementations with another, and saying "this time
we've got it right". Instead, we should look at what people actually
want the functions for, and hide the implementation as much as
possible.For instance, for reproducible (seedable) random sequences, how about a
function random_int_sequence($seed, $min, $max) which returns a
generator, so you could write "$user_seq =
random_int_sequence($user_seed, 0, 10); $user_pick = $user_seq();" Or
maybe it could return a closure, or an object - either way, something to
replace the global state of (mt_)srand.Perhaps a random_int_fast() function with big warnings that it's not to
be trusted for crypto, but performs really well if all you're doing is
picking which image banner to show on your home page.Use whatever RNG you want under the hood, make a declaration of whether
or not it's stable cross-platform and cross-version, and give users an
actual reason to change their code. Then consider other use cases: a
betteruniqid()
, shuffling, maybe built-in UUID support...Then, IF the new APIs become popular, we can come back to talking about
removingrand()
andmt_rand()
, because we'll have replaced them with
something substantially better, not just another variant on the same
thing.Regards,
Yes, yes, yes! :)
I would still like to deprecate
rand()
but probably leave it to Nikic
because people actually listen to him. ;)
I haven't been following this thread, just jumping in to comment on this
point. My suggestion to deprecate rand()
was motivated by the fact that
rand()
produces extremely low quality random numbers on Windows, while at
the same time having the name people are most likely to try first if they
want to have a random number. It's a bad state of things if there's a
rand()
and an mt_rand()
function and the latter is preferable in all
situations, while the former is more likely to be used. However, this
concern is completely alleviated by aliasing rand()
to mt_rand()
. If we do
this, I see no reason to deprecate rand()
, at least in the short term.
Btw, I fully support this RFC.
Nikita
@Tom Worster: I will not answer to any other message in this thread
today but I think we are essentially on the same page regarding PCG and
its suitability for PHP, it just needs time to mature. I think we also
agree regarding the MT situation.However, I think that it makes sense to tackle the problem that people
usemt_rand()
incorrectly and Rowan's proposal here matches the last
proposal I made: it's just even better and he is as always much better
in summing things up. :)Maybe we could do some name brain storming?
- random_int_sequence()
- random_int_fast()
- random_int_seedable()
- random_pseudo_int()
- pseudorandom_int()
- random_deterministic_int()
- deterministic_random_int()
- ????
[1] Somehow unclear what sequence means here if you just want to
randomly pick an entry from an array.[2] This might still suggest to people that they can use it for crypto,
just faster. I really like it but I don't think its perfect.[3] Sounds nice to me.
[4] What is a pseudo int?
[5] Would create a new prefix and probably not show up in some code
completions together with the other random functions. :( Other than that
it would describe it best.[6] Long but to the point.
[7] Again too long and it creates a shitty new prefix.
The signature is clear in my opinion and exactly as Rowan had it:
prng(int $seed, ?int $min = 0, ?int $max = PHP_INT_MAX): int;
--
Richard "Fleshgrinder" Fussenegger
Hi!
I haven't been following this thread, just jumping in to comment on this
point. My suggestion to deprecaterand()
was motivated by the fact that
rand()
produces extremely low quality random numbers on Windows, while at
Why not fix it then?
the same time having the name people are most likely to try first if they
want to have a random number. It's a bad state of things if there's a
rand()
and anmt_rand()
function and the latter is preferable in all
situations, while the former is more likely to be used. However, this
concern is completely alleviated by aliasingrand()
tomt_rand()
. If we do
Exactly, one of the ways. If mt_rand for some (unknown to me) reason is
not good enough, I'm sure we can find a better one.
--
Stas Malyshev
smalyshev@gmail.com
I haven't been following this thread, just jumping in to comment on this
point. My suggestion to deprecaterand()
was motivated by the fact that
rand() produces extremely low quality random numbers on Windows, while at
the same time having the name people are most likely to try first if they
want to have a random number. It's a bad state of things if there's a
rand() and anmt_rand()
function and the latter is preferable in all
situations, while the former is more likely to be used. However, this
concern is completely alleviated by aliasingrand()
tomt_rand()
. If we
do this, I see no reason to deprecaterand()
, at least in the short term.
Alternatively, if you fix rand()
by making it the new, fast,
platform-independent RNG (e.g. Xoroshiro128+) and leave mt_rand()
alone
then:
-
The "bad state of things" you described is resolved.
-
The various complaints about
mt_rand()
become irrelevant becauserand()
will be preferable in all situations (except security and backwards
compat).
Tom
I haven't been following this thread, just jumping in to comment on this
point. My suggestion to deprecaterand()
was motivated by the fact that
rand() produces extremely low quality random numbers on Windows, while at
the same time having the name people are most likely to try first if they
want to have a random number. It's a bad state of things if there's a
rand() and anmt_rand()
function and the latter is preferable in all
situations, while the former is more likely to be used. However, this
concern is completely alleviated by aliasingrand()
tomt_rand()
. If we
do this, I see no reason to deprecaterand()
, at least in the short term.Alternatively, if you fix
rand()
by making it the new, fast,
platform-independent RNG (e.g. Xoroshiro128+) and leavemt_rand()
alone
then:
The "bad state of things" you described is resolved.
The various complaints about
mt_rand()
become irrelevant becauserand()
will be preferable in all situations (except security and backwards
compat).
Imho this is worst solution of all. This means that prior to PHP 7.1
mt_rand()
is preferable in all cases and starting with PHP 7.1 rand()
is
preferable in all cases. Have fun writing code for that.
I personally have no problem changing mt_rand()
to use something other than
MT19937. Given the fact that mt_rand()
has been producing random numbers
that do not conform to the MT19937 sequence for years and years and it was
only noticed recently we can say that, without any doubt, nobody is using
mt_rand()
to obtain sequences compatible with external MT implementations.
As such it doesn't matter if we switch to something else (apart from the
fact that the sequence changes in some way, which is a given with all the
changes we're discussion here.)
Whatever we do, please maintain the invariant that mt_rand()
>= rand()
in
terms of quality. I recommend doing this by making mt_rand()
== rand()
.
Regards,
Nikita
Whatever we do, please maintain the invariant that
mt_rand()
>=rand()
in terms of quality. I recommend doing this by makingmt_rand()
==rand()
.
The relationship that I feel is poorly defined at the moment is
random_int()
vs mt_rand()
. There isn't actually anything in the manual
to say when not to use it. So the implication seems to be random_int()
mt_rand()
>rand()
.
I would prefer something like random_fast_int() == mt_rand()
== rand()
,
with clear documentation on when to use random_fast_int() instead of
random_int()
, and a note on the others that "since 7.2, mt_rand()
is an
alias for random_fast_int()" etc. (Not wedded to the name
random_fast_int, we can bikeshed that later.)
Regards,
Rowan Collins
[IMSoP]
Hi!
I would prefer something like random_fast_int() ==
mt_rand()
==rand()
,
with clear documentation on when to use random_fast_int() instead of
random_int()
, and a note on the others that "since 7.2,mt_rand()
is an
alias for random_fast_int()" etc. (Not wedded to the name
random_fast_int, we can bikeshed that later.)
That sounds to me like a good way to proceed too. I don't think it's a
big deal it mt_rand won't be using specific MT algorithm anymore, I see
very small number of places where it would matter.
One thing to consider is that there might be test scenarios, sequences,
etc. that depend on specific seed and will be broken by changing the
implementation (tests relying on specific rand are not the very best
idea, but they do happen), but I think this kind of thing may be
acceptable for a major version. Would like to hear thoughts on this though.
Stas Malyshev
smalyshev@gmail.com
Hi!
I would prefer something like random_fast_int() ==
mt_rand()
==rand()
,
with clear documentation on when to use random_fast_int() instead of
random_int()
, and a note on the others that "since 7.2,mt_rand()
is an
alias for random_fast_int()" etc. (Not wedded to the name
random_fast_int, we can bikeshed that later.)That sounds to me like a good way to proceed too. I don't think it's a
big deal it mt_rand won't be using specific MT algorithm anymore, I see
very small number of places where it would matter.
For these cases https://packagist.org/packages/leigh/mt-rand
Tom
Hi!
I would prefer something like random_fast_int() ==
mt_rand()
==rand()
,
with clear documentation on when to use random_fast_int() instead of
random_int()
, and a note on the others that "since 7.2,mt_rand()
is an
alias for random_fast_int()" etc. (Not wedded to the name
random_fast_int, we can bikeshed that later.)That sounds to me like a good way to proceed too. I don't think it's a
big deal it mt_rand won't be using specific MT algorithm anymore, I see
very small number of places where it would matter.One thing to consider is that there might be test scenarios, sequences,
etc. that depend on specific seed and will be broken by changing the
implementation (tests relying on specific rand are not the very best
idea, but they do happen), but I think this kind of thing may be
acceptable for a major version. Would like to hear thoughts on this though.
+1 for the introduction of a new function that is named similar to
random_int()
but clearly distinguishable followed by aliasing in PHP 8
of mt_rand()
and rand()
.
The agenda would be:
- Decide on a signature for that new function.
- Decide on a name for that new function.
- Decide on an algorithm for that new function.
- Implement it.
- Update the documentation or
rand()
andmt_rand()
to mention the plans
for aliasing. - Alias them in PHP 8.
Nice to see that we come to something that satisfies everyone. :)
--
Richard "Fleshgrinder" Fussenegger
hi,
On Thu, Jun 23, 2016 at 11:56 PM, Stanislav Malyshev
smalyshev@gmail.com wrote:
Hi!
I would prefer something like random_fast_int() ==
mt_rand()
==rand()
,
with clear documentation on when to use random_fast_int() instead of
random_int()
, and a note on the others that "since 7.2,mt_rand()
is an
alias for random_fast_int()" etc. (Not wedded to the name
random_fast_int, we can bikeshed that later.)That sounds to me like a good way to proceed too. I don't think it's a
big deal it mt_rand won't be using specific MT algorithm anymore, I see
very small number of places where it would matter.One thing to consider is that there might be test scenarios, sequences,
etc. that depend on specific seed and will be broken by changing the
implementation (tests relying on specific rand are not the very best
idea, but they do happen), but I think this kind of thing may be
acceptable for a major version. Would like to hear thoughts on this though.
fast_int or fast whatever looks bad to me. But we know that.
Also, about mt_rand, it costs nothing to have a compat mode for the
cases where the current (standard or not is not relevant)
implementation is being relied on.
More generally about the random APIs, there are many good non crypto
safe RNGs, and there will be more. Each of them have their own use
cases. I do not think that arbitrary choose, change, actualized them,
in major versions or not, is a good thing.
I would rather prefer to have a more flexible API allowing to choose
one algorithm for a run. It is also good to have a default per
category (strong, weak, fast for example) so one can use the current
default for a given need.
A good API for RNG is https://bitbucket.org/haypo/hachoir/wiki/Home.
It allows flexibility while having default and the ability to choose a
specific implementation if desired. My main point is to do not
penalize valid uses because some code will never ever do the right
thing (tm). A language cannot prevent bad behaviors but do not have to
punish good cases by removing useful features for the "good of all".
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
hi,
On Thu, Jun 23, 2016 at 11:56 PM, Stanislav Malyshev
smalyshev@gmail.com wrote:Hi!
I would prefer something like random_fast_int() ==
mt_rand()
==rand()
,
with clear documentation on when to use random_fast_int() instead of
random_int()
, and a note on the others that "since 7.2,mt_rand()
is an
alias for random_fast_int()" etc. (Not wedded to the name
random_fast_int, we can bikeshed that later.)That sounds to me like a good way to proceed too. I don't think it's a
big deal it mt_rand won't be using specific MT algorithm anymore, I see
very small number of places where it would matter.One thing to consider is that there might be test scenarios, sequences,
etc. that depend on specific seed and will be broken by changing the
implementation (tests relying on specific rand are not the very best
idea, but they do happen), but I think this kind of thing may be
acceptable for a major version. Would like to hear thoughts on this though.fast_int or fast whatever looks bad to me. But we know that.
Also, about mt_rand, it costs nothing to have a compat mode for the
cases where the current (standard or not is not relevant)
implementation is being relied on.More generally about the random APIs, there are many good non crypto
safe RNGs, and there will be more. Each of them have their own use
cases. I do not think that arbitrary choose, change, actualized them,
in major versions or not, is a good thing.I would rather prefer to have a more flexible API allowing to choose
one algorithm for a run. It is also good to have a default per
category (strong, weak, fast for example) so one can use the current
default for a given need.A good API for RNG is https://bitbucket.org/haypo/hachoir/wiki/Home.
It allows flexibility while having default and the ability to choose a
specific implementation if desired. My main point is to do not
penalize valid uses because some code will never ever do the right
thing (tm). A language cannot prevent bad behaviors but do not have to
punish good cases by removing useful features for the "good of all".
Last but not least, mt_rand must remain using Mersenne Twister.
Options can define which implementation (at runtime, no ini please).
It will be very weird that MT uses some randomly chosen
implementation. New APIs can expose other "better" algorithms.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
Mcrypt is meant to be replaced anyways and OpenSSL might be too if we
can come up with a nicer implementation that actually hides the
underlying library (e.g. sodium).This is another problem. So we have OpenSSL, then we have mcrypt, then
we have another implementation like sodium... do we really expect our
users to rewrite crypto in their apps every couple of years? That would
be insane. OK, we could say "have your apps work as they worked, but use
new stuff for new things" - but you propose to remove stuff?
Forgot to answer to this part, so here it comes.
The mcrypt situation is just a legacy that we need to take care.
Exposing OpenSSL was a bad idea from the very beginning if you ask me.
OpenSSL is well known of being problematic long before Heartbleed and
related things.
Ignoring the two specifics. Yes, I expect people to rewrite there crypto
every couple of years because, well, it is crypto and crypto is
something that changes every couple of years. Attacks are developed
further, key sizes are not sufficient anymore, and new technology makes
old cryptos unsafe.
Security is a topic where a language really needs to move fast if
necessary and users need to be prepared to do the same if they want to
provide good crypto. Way too many problems arise from ignoring that.
--
Richard "Fleshgrinder" Fussenegger
The RFC can be found here: https://wiki.php.net/rfc/rng_fixes
Hi Leigh,
Thanks for putting this together. I am strongly pro on two points and
moderately contra on the other two. I'd prefer separated votes, even
though I don't have a vote. I numbered the 4 bullets in your intro 1 thru 4.
-
Insecure usage. I think we should replace the internal insecure uses
of php_rand(). I can't see a reason not to. -
Poor scaling of bounded outputs. I think RAND_RANGE() should be
fixed. Users surely expect unbiased distribution. There's a BC argument
but the bug is pretty serious. I think this should apply toarray_rand()
too. -
Incorrect implementations.
I don't think we should dictate that programs currently using mt_rand()
shall use in future use MT19937 any more than we should dictate that
XorShift64 or any other PRNG better fits their requirements.
The incorrectness of the mt_rand()
implementation with respect to its
documentation can be fixed either in the code or in the docs. Given
that, as far as we know, mt_rand()
's byte-stream looks like a decent
PRNG[1] it's not clear that the actual MT19937 sequence is more
important that backward compatibility. I for one think it's very unlikely.
[1] https://gist.github.com/tom--/a12175047578b3ae9ef8
I also don't think we should assume the responsibility of correcting
people's insecure programs using rand()
or mt_rand()
(e.g. for keys,
IVs, salts) by changing the algorithm. Programs this bad need more
rework than we can provide. These functions have had scary-colored
cautions on them for a long time.
- Roughly the same arguments applies to
rand()
. The function is PHP's
API to the OS's rand(3). There's value to that and probably people who
rely on it.
Summarizing 2. and 3. it's not clear what we fix in the real world with
the proposed changes to rand()
and mt_rand()
. But I do see BC breakage.
I would prefer to fix these bugs the docs.
With respect to PRNGs completely new to PHP (you mentioned Xoroshiro128+
and PCG), I would prefer completely divorce this question from the bugs
discussed above. If some PHP users need efficient implementations of
such algorithms then I would urge whoever wants to write them to use a
new API and to provide them via PECL. In software, "better" is always
with respect to context. While there are specific, well-known uses for
random numbers (e.g. crypto) where we can make recommendations, in
general we cannot.
Tom
The RFC can be found here: https://wiki.php.net/rfc/rng_fixes
Thanks for putting this together. I am strongly pro on two points and
moderately contra on the other two. I'd prefer separated votes, even
though I don't have a vote. I numbered the 4 bullets in your intro 1
thru 4.
Insecure usage. I think we should replace the internal insecure uses
of php_rand(). I can't see a reason not to.Poor scaling of bounded outputs. I think RAND_RANGE() should be
fixed. Users surely expect unbiased distribution. There's a BC argument
but the bug is pretty serious. I think this should apply toarray_rand()
too.Incorrect implementations.
I don't think we should dictate that programs currently using
mt_rand()
shall use in future use MT19937 any more than we should dictate that
XorShift64 or any other PRNG better fits their requirements.The incorrectness of the
mt_rand()
implementation with respect to its
documentation can be fixed either in the code or in the docs. Given
that, as far as we know,mt_rand()
's byte-stream looks like a decent
PRNG[1] it's not clear that the actual MT19937 sequence is more
important that backward compatibility. I for one think it's very unlikely.[1] https://gist.github.com/tom--/a12175047578b3ae9ef8
I also don't think we should assume the responsibility of correcting
people's insecure programs usingrand()
ormt_rand()
(e.g. for keys,
IVs, salts) by changing the algorithm. Programs this bad need more
rework than we can provide. These functions have had scary-colored
cautions on them for a long time.
- Roughly the same arguments applies to
rand()
. The function is PHP's
API to the OS's rand(3). There's value to that and probably people who
rely on it.
Hm, at least traditionally the rand()
implementation on Windows is
limited to non-negative short ints (16-bit signed), what appears
generally limited, and might make it hard to write portable code. On
the other hand a developer could use mt_rand()
instead, and we could
document that rand()
is a rather low-level non-portable API.
Summarizing 2. and 3. it's not clear what we fix in the real world with
the proposed changes torand()
andmt_rand()
. But I do see BC breakage.
I would prefer to fix these bugs the docs.With respect to PRNGs completely new to PHP (you mentioned Xoroshiro128+
and PCG), I would prefer completely divorce this question from the bugs
discussed above. If some PHP users need efficient implementations of
such algorithms then I would urge whoever wants to write them to use a
new API and to provide them via PECL. In software, "better" is always
with respect to context. While there are specific, well-known uses for
random numbers (e.g. crypto) where we can make recommendations, in
general we cannot.
I agree to every said (except where noted).
--
Christoph M. Becker
The RFC can be found here: https://wiki.php.net/rfc/rng_fixes
Hi Leigh,
Thanks for putting this together. I am strongly pro on two points and
moderately contra on the other two. I'd prefer separated votes, even
though I don't have a vote. I numbered the 4 bullets in your intro 1 thru
Noted, even though I'd like to "fix everything" at once, if separating the
votes is the only way to get the most important fixes in, then that's what
we'll have to resort to.
Insecure usage. I think we should replace the internal insecure uses
of php_rand(). I can't see a reason not to.Poor scaling of bounded outputs. I think RAND_RANGE() should be
fixed. Users surely expect unbiased distribution. There's a BC argument
but the bug is pretty serious. I think this should apply toarray_rand()
too.
Every point on the list causes a BC issue, it's up to us to judge which
ones are worth it. Some independent and some cascade into each other. I
just don't want to be in a situation where we cause some now, and some in a
future version.
- Incorrect implementations.
I don't think we should dictate that programs currently using
mt_rand()
shall use in future use MT19937 any more than we should dictate that
XorShift64 or any other PRNG better fits their requirements.
I get your point, but most people probably use mt_rand()
because "it's
better than rand". mt_rand is also incredibly slow and has a huge state
when compared to modern algorithms. I should probably note the performance
gains in the RFC.
The incorrectness of the
mt_rand()
implementation with respect to its
documentation can be fixed either in the code or in the docs. Given
that, as far as we know,mt_rand()
's byte-stream looks like a decent
PRNG[1] it's not clear that the actual MT19937 sequence is more
important that backward compatibility. I for one think it's very unlikely.
I actually agree, (it was me who originally reverted the mt_rand fix in a
point release, citing BC as my reason to do so). I felt obligated to put
the decision up for a vote though, because I might have been wrong :)
I also don't think we should assume the responsibility of correcting
people's insecure programs using
rand()
ormt_rand()
(e.g. for keys,
IVs, salts) by changing the algorithm. Programs this bad need more
rework than we can provide. These functions have had scary-colored
cautions on them for a long time.
We can only educate so far, I think we do have an obligation to provide the
best (no matter how subjective) possible algorithms to the end users.
Summarizing 2. and 3. it's not clear what we fix in the real world with
the proposed changes to
rand()
andmt_rand()
. But I do see BC breakage.
I would prefer to fix these bugs the docs.
Changing mt_rand I don't see any real gain, but rand on the other hand has
platform-dependant output.
With respect to PRNGs completely new to PHP (you mentioned Xoroshiro128+
and PCG), I would prefer completely divorce this question from the bugs
discussed above. If some PHP users need efficient implementations of
such algorithms then I would urge whoever wants to write them to use a
new API and to provide them via PECL. In software, "better" is always
with respect to context. While there are specific, well-known uses for
random numbers (e.g. crypto) where we can make recommendations, in
general we cannot.
I've been thinking of doing this anyway.
Hi Leigh,
I need to change stance wrt MT.
I get your point, but most people probably use
mt_rand()
because "it's
better than rand". mt_rand is also incredibly slow and has a huge state
when compared to modern algorithms. I should probably note the
performance gains in the RFC.
I spent some time trying to understand the weird PHP mt_rand()
. I took the
binary MT19937_02 generator from TestU01 and made a variant with the PHP
bug. I added side-by-side diff off the results from running BigCrush on
both here
https://gist.github.com/tom--/a12175047578b3ae9ef8
I can't see any significant difference between.
More interesting was how this work changed my appreciation of Mersenne
Twister. I used to think it was a good RNG. But that dates back a long
time to when George Marsaglia had the best tests for RNGs and he was
challenging sci.math to factor enormous numbers to use in new generators
with ever more extravagant periods. I took it on authority that MT was
decent.
But after spending time with the code I see you're right! Its state and
period are crazy. It's one thing to be slow but on top of that it's
chewing up cache lines as though nothing else needs them.
My opinion on rand()
is that it is historical, like the crummy old RNGs
kicking around in various libcs and elsewhere. Don't use them. Now I feel
the same about mt_rand()
-- like MD4 and DES, it's interesting history.
I think every self-respecting programming environment should provide a
good deterministic RNG. And now it seems I've persuaded myself that it's
time time for PHP to move on from MT.
So I need to update my opinion on your RFC. I still think rand()
and
mt_rand()
implementations can stay but I now agree with you that it's time
for a new RNG. And I agree that xoroshiro128+ is a good choice.
Specifically, rand()
docs should say that the underlying RNGs are
obsolete, not portable and have questionable quality on some platforms.
mt_rand()
docs should mention the poor performance and reference #71152.
Tom
Hi Leigh,
The issues I want to bring up for discussion are.
- Replacing
mt_rand()
andrand()
to a strong, modern RNG.- Alternatively, fixing the current
mt_rand()
implementation to make it
standard- Aliasing
rand()
tomt_rand()
to improve output and cross-platform support- Fixing RAND_RANGE for large ranges.
- Replacing insecure uses of php_rand() with php_random_bytes()
- Making the
array_rand()
algorithm more efficientThe RFC can be found here: https://wiki.php.net/rfc/rng_fixes
+1 in general.
There should be a way to produce compatible random sequences for
compatibility for reasonable periods, 5+ years at least. IMHO. INI
switch for this is required.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hey Internals,
I realise I'm cutting it close with this one, but I want to propose some
changes to our standard random number generators.The downside of this proposal is that our RNGs (rand() and
mt_rand()
) are
seedable and reproduce identical streams (platform dependant) for any given
seed. However their implementations are broken or inconsistent, so we need
to weigh up the cost of changing these sequences versus having solid
implementations.It is my opinion that if we are going to make any changes to these
functions, we should make all of the changes at the same time and avoid any
future disruption to their output.The RFC contains a few proposals, some of them depend on each other while
others are standalone. Throughout the discussion phase I hope to reduce the
number of proposals down to a consensus we can vote on in two weeks time.I will release a patch when I have a better feeling for the direction we
want to take.The issues I want to bring up for discussion are.
- Replacing
mt_rand()
andrand()
to a strong, modern RNG.
I do not think this is a good option, if by strong you mean another kind of RNG.
- Alternatively, fixing the current
mt_rand()
implementation to make it
standard
That sounds more reasonable. An option (please no ini as it is a
programatic flow feature, not a php configuration problem) to keep the
old behavior for BC. Having to add an option for 7.1 or 7.2+ is
reasonable enough for the cases where the current seed and predictable
sequences are desired (same data generations for example using one
seed for example).
- Replacing insecure uses of php_rand() with php_random_bytes()
- Making the
array_rand()
algorithm more efficient
Indeed
The RFC can be found here: https://wiki.php.net/rfc/rng_fixes
If anyone knows of other fixes that should be made at the same time but I
have overlooked, please let me know so I can get them included.
Also in the replies to this thread I see the word "crypto" mixed with
mt_rand/rand. It does not make sense to me. I agree that some apps
misused these functions as crypto safe RNG, some may even work around
issues and do not want to change their code to use reliable
alternatives. However this is a documentation/education issue. There
is no need to make mt_rand/rand crypto safe but there is a use for a
reliable implemantion of mt_rand.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
- Alternatively, fixing the current
mt_rand()
implementation to make it
standard
That sounds more reasonable. An option (please no ini as it is a
programatic flow feature, not a php configuration problem) to keep the
old behavior for BC. Having to add an option for 7.1 or 7.2+ is
reasonable enough for the cases where the current seed and predictable
sequences are desired (same data generations for example using one
seed for example).
Hi Pierre,
I'm glad you mentioned a compatibility mode. Let's say we would offer:
int mt_rand ( $mode = MT_RAND_COMPAT )
int mt_rand ( int $min, int $max, $mode = MT_RAND_COMPAT )
MT_RAND_COMPAT = 1
MT_RAND_MT19937 = 2
A PHP user needs to make the right choice of what to use in their
situation. A technical description of the modes would be confusing and
unhelpful to most users. I have no idea how to document this simply,
honestly and accurately, and without jumping to conclusions about
suitability.
This is why I think a compat/correct mode switch doesn't improve PHP.
It's inconsistent with the spirit set out in the preamble of "PHP RFC:
Your Title Here"[1].
[1] https://wiki.php.net/rfc/template
Similarly, the $mode arg allows us to add MT_RAND_XOROSHIRO128_PLUS or
whatever (interesting to some of us, more "modern", perhaps arguably
more "strong" or are otherwise "better") aren't improvements to PHP
unless users are asking for them.
Tom
int mt_rand ( $mode = MT_RAND_COMPAT ) int mt_rand ( int $min, int $max, $mode = MT_RAND_COMPAT ) MT_RAND_COMPAT = 1 MT_RAND_MT19937 = 2
A PHP user needs to make the right choice of what to use in their
situation. A technical description of the modes would be confusing and
unhelpful to most users. I have no idea how to document this simply,
honestly and accurately, and without jumping to conclusions about
suitability.This is why I think a compat/correct mode switch doesn't improve PHP.
It's inconsistent with the spirit set out in the preamble of "PHP RFC:
Your Title Here"[1].[1] https://wiki.php.net/rfc/template
Similarly, the $mode arg allows us to add MT_RAND_XOROSHIRO128_PLUS or
whatever (interesting to some of us, more "modern", perhaps arguably
more "strong" or are otherwise "better") aren't improvements to PHP
unless users are asking for them.
Just a thought here, if the goal is to provide a better interface,
wouldn't it be better to use OO for this? Not because OO is better, but
because it would avoid having problems if two mt_rand-using pieces of
code are executed concurrently.
Something like this:
$gen1 = new SeededRandom(MT_RAND_MT19937, 23);
$gen2 = new SeededRandom(MT_RAND_MT19937, 23);
$gen1->rand(); // or get()? generate()?
$gen1->rand();
$gen2->rand(); // returns the same as the first call as both use seed 23
With the current situation, if you add a library that relies on mt_rand
you can suddenly break your ability to get consistent numbers.
Cheers
--
Jordi Boggiano
@seldaek - http://seld.be
Just a thought here, if the goal is to provide a better interface,
Hi Jordi,
Iiuc, Leigh's goal, which I support, is to fix known bugs. It is not to
provide a better interface.
I already suggested that if people want new RNGs or a new API then we
should divorce that discussion from the bug fixes. Let's not use this
RFC or thread for that.
Tom
Just a thought here, if the goal is to provide a better interface,
wouldn't it be better to use OO for this? Not because OO is better, but
because it would avoid having problems if two mt_rand-using pieces of
code are executed concurrently.
The goal is to fix inconsistencies. We absolutely can provide a better
interface as something separate though.
RFC updated to include:
- A note about
mt_rand()
s poor performance - Separate votes for proposals so we can at least get the security fixes
through - Updated vote from 50% to 2/3 as it does cause a BC issue.
I should also state that mt_rand is easily implementable in userland, so
the correct/legacy algorithm can be provided that way if changing it in
core does not pass (I have a library providing this)
So there have been a couple of suggestions of providing legacy
functionality via a PECL extension. If we were to make rand/mt_rand use
function pointers to their implementation it would be very easy for an
extension to override their behaviour. If people like this idea I'm more
than happy to provide this ext as part of the RFC.
Updated RFC
- Removed proposal to replace (mt_)rand with an alternative algorithm
as many have expressed concerns with this. - Clarified that the output of mt_rand appears to be high quality as-is
- Added that the old mt_rand functionality will be available at
runtime viamt_rand_mode()
I'll have an implementation ready for review by the end of the week.
RFC updated to include:
- A note about
mt_rand()
s poor performance- Separate votes for proposals so we can at least get the security fixes
through- Updated vote from 50% to 2/3 as it does cause a BC issue.
I should also state that mt_rand is easily implementable in userland, so the
correct/legacy algorithm can be provided that way if changing it in core
does not pass (I have a library providing this)So there have been a couple of suggestions of providing legacy functionality
via a PECL extension. If we were to make rand/mt_rand use function pointers
to their implementation it would be very easy for an extension to override
their behaviour. If people like this idea I'm more than happy to provide
this ext as part of the RFC.