Hey internals,
As we’re moving to clean some things up in PHP 7, I think it might be worth looking at the rand()
/mt_rand() situation.
Currently, we have two standard-library functions to obtain a random number: rand()
and mt_rand()
. rand()
uses the C standard library function, which on many stdlibs is slow or has a limited range (e.g. Win32 rand()
’s maximum return value is SHORT_MAX). As a result, a drop-in replacement was added, mt_rand()
, which uses Mersenne Twister-based random number generator that’s probably better than what the C standard library provides.
Having two different random number functions is confusing to new users, and an oft-cited issue with PHP’s standard library. Intuitively, a user would expect rand()
to be the function to get a random number, but actually they should be using mt_rand()
, which is usually better.
Another issue is the fact we allow random numbers to be seeded with srand()
and mt_srand()
. We have automatically seeded the random number generator for a while now, so there is no need to do it manually. We also explicitly do not guarantee that the random number generator will always produce the same output given the same seed, and in fact we’ve changed mt_rand()
's behaviour in the past. This eliminates the primary use-case for manual seeding when there is automatic seeding: predictable output for procedural generation.
Finally, a third issue is that mt_rand()
does not produce good quality numbers on 64-bit platforms. mt_rand()
will always produce a 32-bit value internally, and scale it up or down to fit the user-specified range. Unfortunately, this means that values for the $max parameter beyond 2^31 - 1 produce numbers with poor granularity.
Given all these, I would suggest that for PHP 7, we:
- Get rid of
rand()
,srand()
andgetrandmax()
- Rename
mt_rand()
,mt_srand()
andmt_getrandmax()
torand()
,srand()
, andgetrandmax()
but add mt_* aliases for backwards-compatibility - Make
mt_srand()
andsrand()
do nothing and produce a deprecation notice - Use a 64-bit random number generation algorithm on 64-bit platforms (or invoke the 32-bit generator twice)
The end result should be that PHP has just one random number generation function, rand()
, which can produce the full range from PHP_INT_MIN
to PHP_INT_MAX
with no scaling. This would be far more intuitive, I think.
Would that sound good?
Thanks!
Andrea Faulds
http://ajf.me/
De : Andrea Faulds [mailto:ajf@ajf.me]
Would that sound good?
That sounds very good to me :)
All this sounds really good, and I am all for uniformity when the downsides
are not too much to handle. I would have one suggestion though, and that
would be to add deprecation notices on every "mt_" alias so in time they
can be safely removed and cleaning everything up.
On Sun, Jan 11, 2015 at 11:35 PM, François Laupretre francois@tekwire.net
wrote:
De : Andrea Faulds [mailto:ajf@ajf.me]
Would that sound good?
That sounds very good to me :)
Hey Stelian,
All this sounds really good, and I am all for uniformity when the downsides are not too much to handle. I would have one suggestion though, and that would be to add deprecation notices on every "mt_" alias so in time they can be safely removed and cleaning everything up.
We could remove them in a future version, yes. Though that, specifically, would seem like a needless BC break. There’s little harm in keeping the old aliases around.
Thanks!
--
Andrea Faulds
http://ajf.me/
Overall it's a good plan IMO and it would clean things up for sure. On
the other hand if we don't remove old docs it will just make the docs
pages even more complex since the docs for rand()
will highly depend on
which PHP version is used.
- Make
mt_srand()
andsrand()
do nothing and produce a deprecation notice
That is the only point with which I disagree. Looking at
http://3v4l.org/FLHBV we see that while indeed across PHP versions the
result has not always been the same, it generally is, and especially
within one given version setting a seed means you get predictable results.
This has benefits in some cases like fixtures generation where it might
not be important if the output changes when you upgrade PHP, but you
don't want entirely different fixtures every single time. Obviously if
we could guarantee the algo won't change it would be even better.
Bottom line is I think it's important to have the ability to set the
seed yourself.
Cheers
--
Jordi Boggiano
@seldaek - http://nelm.io/jordi
Hey Yasuo and Jordi,
- Make
mt_srand()
andsrand()
do nothing and produce a deprecation noticeThat is the only point with which I disagree. Looking at http://3v4l.org/FLHBV we see that while indeed across PHP versions the result has not always been the same, it generally is, and especially within one given version setting a seed means you get predictable results.
This has benefits in some cases like fixtures generation where it might not be important if the output changes when you upgrade PHP, but you don't want entirely different fixtures every single time. Obviously if we could guarantee the algo won't change it would be even better.
Bottom line is I think it's important to have the ability to set the seed yourself.
However, I object removal of srand.
Game programmers need "the same random sequence" on occasion.
There should be srand to get the same sequence for repeatable behaviors.
rand()
/srand() may be renamed to sys_rand()/sys_srand() (or whatever
suitable name for them) in case user needs system random sequence
for whatever reasons. I don't insist to keep system's rand/srand, though.
I don’t disagree with having some mechanism for predictable random number generation from a seed, but I think the global random number generator is the wrong place to do it. It’s not guaranteed to be predictable, and everything uses it, so some library you’re using might advance it without you realising.
Much better would be to add a new, OOP API that gives you your own number generator (no global state) and requires explicitly specifying the algorithm (cross-version compatibility), with a guarantee that it won’t break in new PHP versions.
Something like this, maybe:
$numgen = new RandomNumberGenerator(RAND_MERSENNE_TWISTER, `time()`); // could auto-seed with `time()`
$randInt1 = $numgen->getInt(0, 100); // gets random integer and advances this generator
list($randInt2, $numgen) = $numgen->newGetInt(0, 100); // gets random integer and returns a new, advanced generator
$serialised = $numgen->serialiseState(); // Or maybe $numgen->getSeed() ?
Does that work?
--
Andrea Faulds
http://ajf.me/
Hi Andrea,
- Make
mt_srand()
andsrand()
do nothing and produce a deprecation
noticeThat is the only point with which I disagree. Looking at
http://3v4l.org/FLHBV we see that while indeed across PHP versions the
result has not always been the same, it generally is, and especially within
one given version setting a seed means you get predictable results.This has benefits in some cases like fixtures generation where it might
not be important if the output changes when you upgrade PHP, but you don't
want entirely different fixtures every single time. Obviously if we could
guarantee the algo won't change it would be even better.Bottom line is I think it's important to have the ability to set the
seed yourself.However, I object removal of srand.
Game programmers need "the same random sequence" on occasion.
There should be srand to get the same sequence for repeatable behaviors.
rand()
/srand() may be renamed to sys_rand()/sys_srand() (or whatever
suitable name for them) in case user needs system random sequence
for whatever reasons. I don't insist to keep system's rand/srand, though.I don’t disagree with having some mechanism for predictable random number
generation from a seed, but I think the global random number generator is
the wrong place to do it. It’s not guaranteed to be predictable, and
everything uses it, so some library you’re using might advance it without
you realising.Much better would be to add a new, OOP API that gives you your own number
generator (no global state) and requires explicitly specifying the
algorithm (cross-version compatibility), with a guarantee that it won’t
break in new PHP versions.Something like this, maybe:
$numgen = new RandomNumberGenerator(RAND_MERSENNE_TWISTER, `time()`); //
could auto-seed with
time()
$randInt1 = $numgen->getInt(0, 100); // gets random integer and
advances this generator
list($randInt2, $numgen) = $numgen->newGetInt(0, 100); // gets random
integer and returns a new, advanced generator
$serialised = $numgen->serialiseState(); // Or maybe
$numgen->getSeed() ?Does that work?
It works, but I prefer to have procedural API also (and OO API if it is
needed.)
I like multi paradigm programming language.
Pseudo random number generator is pseudo. We are better off with real
random generator when cryptographic random number is needed. So
renaming mt_rand()
-> rand()
/ rand()
-> sys_rand() and keeping mt_rand()
alias would be enough. IMO.
BTW, difference between 32bit and 64bit platforms would not be a problem
as long as it is documented.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Yasuo,
It works, but I prefer to have procedural API also (and OO API if it is needed.)
I like multi paradigm programming language.
I don’t see the point of a “procedural API” here, it’d just be a set of thin wrappers around the OOP API.
Multi-paradigm means actually providing different approaches. Every time PHP has had both a “procedural” and “OOP” API so far, it’s been nothing of the sort, the “procedural” API has been the OOP one in disguise. There is zero functional difference between $foo->bar($baz, $qux); and foo_bar($foo, $baz, $qux); and we should stop pretending there is one.
Are people seriously scared off by the syntax of “OOP” APIs or something? I just don’t get it. Surely it is not hard to explain that mysql_query and $mysqli->query are exactly the same thing?!
BTW, difference between 32bit and 64bit platforms would not be a problem
as long as it is documented.
It’s not necessarily a “problem”, but I don’t see why 64-bit integers shouldn’t be properly supported on 64-bit systems.
Thanks.
Andrea Faulds
http://ajf.me/
Hi Andrea,
It works, but I prefer to have procedural API also (and OO API if it is
needed.)
I like multi paradigm programming language.I don’t see the point of a “procedural API” here, it’d just be a set of
thin wrappers around the OOP API.Multi-paradigm means actually providing different approaches. Every time
PHP has had both a “procedural” and “OOP” API so far, it’s been nothing of
the sort, the “procedural” API has been the OOP one in disguise. There is
zero functional difference between $foo->bar($baz, $qux); and foo_bar($foo,
$baz, $qux); and we should stop pretending there is one.Are people seriously scared off by the syntax of “OOP” APIs or something?
I just don’t get it. Surely it is not hard to explain that mysql_query and
$mysqli->query are exactly the same thing?!
I just would like to make sure that there is both.
We are better off with both OO and procedural API like Python, IMO.
BTW, difference between 32bit and 64bit platforms would not be a problem
as long as it is documented.It’s not necessarily a “problem”, but I don’t see why 64-bit integers
shouldn’t be properly supported on 64-bit systems.
Nice idea.
Let's have 64 bit random in 32 bit systems. (i.e. same seed = same sequence
as long as
mix/max fits in range)
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi!
Something like this, maybe:
$numgen = new RandomNumberGenerator(RAND_MERSENNE_TWISTER, `time()`); // could auto-seed with `time()` $randInt1 = $numgen->getInt(0, 100); // gets random integer and advances this generator list($randInt2, $numgen) = $numgen->newGetInt(0, 100); // gets random integer and returns a new, advanced generator $serialised = $numgen->serialiseState(); // Or maybe $numgen->getSeed() ?
Does that work?
Having new, better API is fine. Breaking existing working code because
new, better API is possible, is not fine.
--
Stas Malyshev
smalyshev@gmail.com
Hey Stas,
Having new, better API is fine. Breaking existing working code because
new, better API is possible, is not fine.
The manual explicitly guarantees that code should not rely on the random number generator being predictable.
If people want their existing code to continue to work, we could of course allow this new API to support the C stdlib rand()
as an algorithm for BC reasons. But such code was never supposed to work in this first place.
--
Andrea Faulds
http://ajf.me/
Hey Stas,
On 12 Jan 2015, at 01:03, Stanislav Malyshev smalyshev@gmail.com
wrote:Having new, better API is fine. Breaking existing working code because
new, better API is possible, is not fine.The manual explicitly guarantees that code should not rely on the random
number generator being predictable.If people want their existing code to continue to work, we could of
course allow this new API to support the C stdlibrand()
as an algorithm
for BC reasons. But such code was never supposed to work in this first
place.
Breaking APIs means to drop functions. Functions not present anymore break
codes. New implementations are fine and match the doc.
--
Andrea Faulds
http://ajf.me/
Hi!
The manual explicitly guarantees that code should not rely on the
random number generator being predictable.
Where exactly does it say that? The only note I've found is this:
http://php.net/manual/en/function.mt-srand.php
5.2.1 The Mersenne Twister implementation in PHP now uses a new seeding
algorithm by Richard Wagner. Identical seeds no longer produce the same
sequence of values they did in previous versions. This behavior is not
expected to change again, but it is considered unsafe to rely upon it
nonetheless.
Which just says we could change PRNG behavior between versions, and
nothing about PRNG not being predictable.
If people want their existing code to continue to work, we could of
course allow this new API to support the C stdlibrand()
as an
algorithm for BC reasons. But such code was never supposed to work in
this first place.
If it works, breaking it should have a very good reason. I don't see any
reason to break srand()
.
--
Stas Malyshev
smalyshev@gmail.com
Hi,
The manual explicitly guarantees that code should not rely on the
random number generator being predictable.Where exactly does it say that? The only note I've found is this:
http://php.net/manual/en/function.mt-srand.php5.2.1 The Mersenne Twister implementation in PHP now uses a new seeding
algorithm by Richard Wagner. Identical seeds no longer produce the same
sequence of values they did in previous versions. This behavior is not
expected to change again, but it is considered unsafe to rely upon it
nonetheless.Which just says we could change PRNG behavior between versions, and
nothing about PRNG not being predictable.
It says it’s unsafe to rely upon the behaviour of seeding.
Also, FWIW, anyone who used the Suhosin patch couldn’t use srand()
because it disabled it.
If people want their existing code to continue to work, we could of
course allow this new API to support the C stdlibrand()
as an
algorithm for BC reasons. But such code was never supposed to work in
this first place.If it works, breaking it should have a very good reason. I don't see any
reason to breaksrand()
.
Because if we don’t break it, people will continue to rely on it, and this binds our hands for future versions.
Also, those people will have their code break anyway if they upgrade their OS and it changes its random number generator.
Just because people do rely on it doesn’t mean they should or that we should continue to allow them to.
--
Andrea Faulds
http://ajf.me/
Hi Andrea,
- Get rid of
rand()
,srand()
andgetrandmax()
- Rename
mt_rand()
,mt_srand()
andmt_getrandmax()
torand()
,srand()
,
andgetrandmax()
but add mt_* aliases for backwards-compatibility- Make
mt_srand()
andsrand()
do nothing and produce a deprecation notice- Use a 64-bit random number generation algorithm on 64-bit platforms
(or invoke the 32-bit generator twice)
I like your proposal in general.
However, I object removal of srand.
Game programmers need "the same random sequence" on occasion.
There should be srand to get the same sequence for repeatable behaviors.
rand()
/srand() may be renamed to sys_rand()/sys_srand() (or whatever
suitable name for them) in case user needs system random sequence
for whatever reasons. I don't insist to keep system's rand/srand, though.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
- Get rid of
rand()
,srand()
andgetrandmax()
- Rename
mt_rand()
,mt_srand()
andmt_getrandmax()
torand()
,srand()
, andgetrandmax()
but add mt_* aliases for backwards-compatibility- Make
mt_srand()
andsrand()
do nothing and produce a deprecation notice- Use a 64-bit random number generation algorithm on 64-bit platforms (or invoke the 32-bit generator twice)
I have to object to removing the C stdlib rand()
. mt_rand()
is
significantly slower and when I don't care about the "quality" of the
random numbers the choice is obvious. Now if that means mt_rand()
goes
to rand()
, and rand()
goes to something like fast_rand(), that is
fine, I don't care, I just don't think we should ditch the standard
rand function.
I do however think the default limits should be appropriate for the
underlying platform, so 64 bit values returned by default on 64 bit
platforms.
The MT internal state is large enough for you to consume 64 bits at a
time, no need to call it twice.
Hey Leigh,
- Get rid of
rand()
,srand()
andgetrandmax()
- Rename
mt_rand()
,mt_srand()
andmt_getrandmax()
torand()
,srand()
, andgetrandmax()
but add mt_* aliases for backwards-compatibility- Make
mt_srand()
andsrand()
do nothing and produce a deprecation notice- Use a 64-bit random number generation algorithm on 64-bit platforms (or invoke the 32-bit generator twice)
I have to object to removing the C stdlib
rand()
.mt_rand()
is
significantly slower and when I don't care about the "quality" of the
random numbers the choice is obvious.
For what application do you need to generate so many random numbers that mt_rand()
is too slow? Bear in mind that rand()
is only faster if you’re using a sdtlib that’s faster.
Thanks.
--
Andrea Faulds
http://ajf.me/
- Get rid of
rand()
,srand()
andgetrandmax()
- Rename
mt_rand()
,mt_srand()
andmt_getrandmax()
torand()
,srand()
, andgetrandmax()
but add mt_* aliases for backwards-compatibility
Also, this breaks all code currently using rand()
and srand()
, as the
LCG and MT produce a different sequence of numbers for the same seed.
I can't judge how widely this is used, I don't expect it to be a lot,
but it's still something that has to be considered.
Hi all,
- Use a 64-bit random number generation algorithm on 64-bit platforms
(or invoke the 32-bit generator twice)
Comment for those who does not know 64 bit version of MT rand, please read
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html
With this algorithm, 32bit and 64bit machines get different random number
with the same seed.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi!
- Get rid of
rand()
,srand()
andgetrandmax()
* Renamemt_rand()
,
mt_srand()
andmt_getrandmax()
torand()
,srand()
, andgetrandmax()
but add mt_* aliases for backwards-compatibility
This means rand()
and mt_rand()
would do the same. That however assumes
that OS's libc random-number functions are and will always be inferior.
Is that the case that we believe?
In any case, I would rather disclaim any specifics about implementation
of rand()
other than saying it is using best algorithm we have. If we
decide MT one is the best we can support, so be it.
- Make
mt_srand()
andsrand()
do nothing and produce a deprecation notice
I think this is not a good idea. While we can not guarantee the PRNG we
use always provides the same values for the same srand on every system
for every version, it is the case that it provides them in the same
environment, thus enabling the possibility of testing random-driven
algorithms. Disabling it precludes any such testing, except with
user-level workarounds, which will inevitably be more brittle and
error-prone. This all is for no observable gain.
- Use a
64-bit random number generation algorithm on 64-bit platforms (or
invoke the 32-bit generator twice)
This sounds like a good idea, though it may have BC implications. Is
there a use case we know of where it matters?
--
Stas Malyshev
smalyshev@gmail.com
Hey,
- Get rid of
rand()
,srand()
andgetrandmax()
* Renamemt_rand()
,
mt_srand()
andmt_getrandmax()
torand()
,srand()
, andgetrandmax()
but add mt_* aliases for backwards-compatibilityThis means
rand()
andmt_rand()
would do the same. That however assumes
that OS's libc random-number functions are and will always be inferior.
Is that the case that we believe?
Quite possibly not, but we know that some OSes do have inferior rand()
implementations and using our own ensures cross-platform uniformity. Using our own implementation shields users from sucky stdlibs.
In any case, I would rather disclaim any specifics about implementation
ofrand()
other than saying it is using best algorithm we have. If we
decide MT one is the best we can support, so be it.
Yeah, I agree there.
- Make
mt_srand()
andsrand()
do nothing and produce a deprecation noticeI think this is not a good idea. While we can not guarantee the PRNG we
use always provides the same values for the same srand on every system
for every version, it is the case that it provides them in the same
environment, thus enabling the possibility of testing random-driven
algorithms. Disabling it precludes any such testing, except with
user-level workarounds, which will inevitably be more brittle and
error-prone. This all is for no observable gain.
Such algorithms really shouldn’t be using rand()
or mt_rand()
, they aren’t fit for purpose for reasons I’ve previously elaborated.
Code which needs to do what you’re describing should use an API made specifically for that purpose. There are userland packages for this. We could also add a standard library class like I suggested earlier. I don’t really see why a userland DRBG would necessarily be “brittle and error-prone”. Not unless they’re using some completely unmaintained library.
- Use a
64-bit random number generation algorithm on 64-bit platforms (or
invoke the 32-bit generator twice)This sounds like a good idea, though it may have BC implications. Is
there a use case we know of where it matters?
I can’t immediately think of one, but I can’t imagine there wouldn’t be a need for values larger than 2^32.
By the way, I’m not sure what I’ll do for rand()
for the bigint RFC/patch. I might just use the bigint library’s random function when the range is wider than that a native integer has, but I can’t do that if we allow seeding (because it breaks predictable sequence generation). Alternatively, I could just not make it work with bigints, since producing incredibly large numbers can of course be done manually with bitwise shifts and multiple mt_rand()
invocations.
--
Andrea Faulds
http://ajf.me/
Hey internals,
As we’re moving to clean some things up in PHP 7, I think it might be worth looking at the
rand()
/mt_rand() situation.Currently, we have two standard-library functions to obtain a random number:
rand()
andmt_rand()
.rand()
uses the C standard library function, which on many stdlibs is slow or has a limited range (e.g. Win32rand()
’s maximum return value is SHORT_MAX). As a result, a drop-in replacement was added,mt_rand()
, which uses Mersenne Twister-based random number generator that’s probably better than what the C standard library provides.Having two different random number functions is confusing to new users, and an oft-cited issue with PHP’s standard library. Intuitively, a user would expect
rand()
to be the function to get a random number, but actually they should be usingmt_rand()
, which is usually better.Another issue is the fact we allow random numbers to be seeded with
srand()
andmt_srand()
. We have automatically seeded the random number generator for a while now, so there is no need to do it manually. We also explicitly do not guarantee that the random number generator will always produce the same output given the same seed, and in fact we’ve changedmt_rand()
's behaviour in the past. This eliminates the primary use-case for manual seeding when there is automatic seeding: predictable output for procedural generation.Finally, a third issue is that
mt_rand()
does not produce good quality numbers on 64-bit platforms.mt_rand()
will always produce a 32-bit value internally, and scale it up or down to fit the user-specified range. Unfortunately, this means that values for the $max parameter beyond 2^31 - 1 produce numbers with poor granularity.Given all these, I would suggest that for PHP 7, we:
- Get rid of
rand()
,srand()
andgetrandmax()
- Rename
mt_rand()
,mt_srand()
andmt_getrandmax()
torand()
,srand()
, andgetrandmax()
but add mt_* aliases for backwards-compatibility- Make
mt_srand()
andsrand()
do nothing and produce a deprecation notice- Use a 64-bit random number generation algorithm on 64-bit platforms (or invoke the 32-bit generator twice)
The end result should be that PHP has just one random number generation function,
rand()
, which can produce the full range fromPHP_INT_MIN
toPHP_INT_MAX
with no scaling. This would be far more intuitive, I think.Would that sound good?
I am all in favor of having new RNG functions to make it easier and
safer to use RNG with PHP.
However, I can only confirm what Rasmus said. Removing them is
absolutely not an option. There are dozen of valid usages of these
functions, from game engines to image processing via data simulation.
Dropping them or changing how they work will be a major pain for many
of these cases and a show stopper to upgrade to 7. Let not do that
please.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
Hi Pierre,
I am all in favor of having new RNG functions to make it easier and
safer to use RNG with PHP.
Let's have it for PHP 7!
What's the status of your new crypt extension?
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net