Hello internals.
I have created a draft of the RFC.
https://wiki.php.net/rfc/rng_extension
This RFC is a proposal to implement the old object scope RNG in PHP.
You can read about the discussion so far below.
I believe that the problem with the previous RFC was that it was just an
ugly API. I propose to sort this out and deprecate the existing functions
that should be replaced.
I would like to ask for feedback on this draft RFC.
Regards,
Go kudo
Hi Go Kudo,
I think this proposal is much better than the one before. I believe we need
such functionality in PHP.
I have some questions I would like you to consider:
- I'm still not a fan of exposing the algorithms as separate classes in
PHP. Their names are confusing and I don't see how exposing them gives any
benefits. I would prefer that, like in password_hash, let the user chose
the algorithm using a constant then ask for seed as a second argument
to Randomizer. - Why is there a getter on Randomizer for the source?
- Why is the interface called source?
- You have generateInt() twice. Wouldn't it be better to have it only once
but make the arguments nullable or provide PHP_MIN_INT and PHP_MAX_INT as
defaults? - Why do we need to deprecate the old functions now? What's the rush?
Regards,
Kamil
- I'm still not a fan of exposing the algorithms as separate classes in
PHP. Their names are confusing and I don't see how exposing them gives any
benefits. I would prefer that, like in password_hash, let the user chose
the algorithm using a constant then ask for seed as a second argument
to Randomizer.
This may be useful if you are implementing a specific RNG in userland.
It also gets quite complicated when trying to implement extensions that add
corresponding algorithms.
Since we're going all out, I thought it might be appropriate to use the OOP
approach. What do you think?
- Why is there a getter on Randomizer for the source?
Some Source objects are serializable and can store state, to take advantage
of this feature.
- Why is the interface called source?
I couldn't think of an appropriate name. Do you have any ideas?
- You have generateInt() twice. Wouldn't it be better to have it only
once but make the arguments nullable or provide PHP_MIN_INT and PHP_MAX_INT
as defaults?
This is to maintain compatibility with the mt_rand()
function.
- Why do we need to deprecate the old functions now? What's the rush?
Keeping old implementations may lead to code being written that
unintentionally relies on the PHPs global state.
Deprecating them reduces the possibility that such code will be written
anew.
2021年5月19日(水) 2:29 Kamil Tekiela tekiela246@gmail.com:
Hi Go Kudo,
I think this proposal is much better than the one before. I believe we
need such functionality in PHP.
I have some questions I would like you to consider:
- I'm still not a fan of exposing the algorithms as separate classes in
PHP. Their names are confusing and I don't see how exposing them gives any
benefits. I would prefer that, like in password_hash, let the user chose
the algorithm using a constant then ask for seed as a second argument
to Randomizer.- Why is there a getter on Randomizer for the source?
- Why is the interface called source?
- You have generateInt() twice. Wouldn't it be better to have it only
once but make the arguments nullable or provide PHP_MIN_INT and PHP_MAX_INT
as defaults?- Why do we need to deprecate the old functions now? What's the rush?
Regards,
Kamil
I have created a draft of the RFC.
Hi,
At a glance, I think this looks like a good clear approach.
I think deprecating mt_srand makes sense, but could do with a heading to
make sure it's not overlooked, and include a clear statement of when it
would be removed (9.0? 10.0?). It could potentially even be a separate
vote, as keeping it doesn't actually harm users of the new classes.
I have a few concerns with the third part of the proposal:
Also, change the following function to use the same method as
random_byte() (the php_random_bytes() internal function) for processing,
instead of PHP's global state.
Firstly, making these functions independent of mt_srand()
is a breaking
change, so cannot happen until PHP 9.0 at the earliest. This could
happen at the same time as mt_srand()
is removed, but that connection
needs to be clear in the proposal.
Secondly, it seems inconsistent to make these functions use the
crypto-strong randomness source, but retain rand()
and mt_rand()
using
PRNGs. In fact, I don't see the need to change them at all - they can be
documented as not for cryptographic use, as mt_rand is, and users
pointed to the new API if they need something stronger.
Regards,
--
Rowan Tommins
[IMSoP]
Thanks.
Deprecation will be done only to prevent unintended MT state dependent
code. We don't plan to go as far as removing the implementation for now.
Firstly, making these functions independent of
mt_srand()
is a breaking
change, so cannot happen until PHP 9.0 at the earliest.
To be honest, I don't understand how far PHP is willing to go to accept
disruptive changes.
For example, mt_rand()
, when called in its uninitialized state, was
initialized using Combined LCG (time + PID/TID) before PHP 8.0.
This can bias the results of mt_rand()
, so I sent a PR on GitHub and it
was changed to use php_random_bytes()
if possible.
https://github.com/php/php-src/pull/6520
It was in PHP 7.1 that the shuffle()
, str_shuffle()
, and array_rand()
functions were changed to use MT-based random numbers instead of
libc-derived random numbers. This was clearly disruptive, and left no way
to even maintain backward compatibility.
Compared to that, this proposal is not disruptive in the sense that it
provides a complete alternative.
2021年5月19日(水) 2:43 Rowan Tommins rowan.collins@gmail.com:
I have created a draft of the RFC.
Hi,
At a glance, I think this looks like a good clear approach.
I think deprecating mt_srand makes sense, but could do with a heading to
make sure it's not overlooked, and include a clear statement of when it
would be removed (9.0? 10.0?). It could potentially even be a separate
vote, as keeping it doesn't actually harm users of the new classes.I have a few concerns with the third part of the proposal:
Also, change the following function to use the same method as
random_byte() (the php_random_bytes() internal function) for processing,
instead of PHP's global state.Firstly, making these functions independent of
mt_srand()
is a breaking
change, so cannot happen until PHP 9.0 at the earliest. This could
happen at the same time asmt_srand()
is removed, but that connection
needs to be clear in the proposal.Secondly, it seems inconsistent to make these functions use the
crypto-strong randomness source, but retainrand()
andmt_rand()
using
PRNGs. In fact, I don't see the need to change them at all - they can be
documented as not for cryptographic use, as mt_rand is, and users
pointed to the new API if they need something stronger.Regards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
Thanks.
Deprecation will be done only to prevent unintended MT state dependent
code. We don't plan to go as far as removing the implementation for now.
I think every deprecation notice should mean a plan to remove. If you
just want to encourage people to discover the new functions, I'm not
sure a message in their logs every time mt_srand runs is the right way
to do it.
Firstly, making these functions independent of
mt_srand()
is a
breaking change, so cannot happen until PHP 9.0 at the earliest.To be honest, I don't understand how far PHP is willing to go to
accept disruptive changes.
The official policy is here: https://wiki.php.net/rfc/releaseprocess It
relies on the phrase "backwards compatibility", which is admittedly
quite hard to define.
You are right that the change from using rand()
to using mt_rand()
in
7.1 is arguably not backwards compatible. However, the following code
can be used to reproducibly shuffle two arrays in any version of PHP:
srand(123456);
shuffle($array1);
shuffle($array2);
The exact sequence returned will be different in 7.1 from 7.0, but
because srand()
is now an alias of mt_srand()
, it still affects the
shuffle()
function in the expected way. If we switch shuffle()
to an
algorithm that can't be seeded at all, this code will simply stop
working, and have to be completely rewritten.
A "polyfill" of sorts is possible, by storing an instance of the new
RNG\Randomizer in a global or static variable, but it's more than just a
find-and-replace, especially since many users will need to support
version 8.0 and 8.1 in the same code base.
As I said previously, I'm not convinced these functions need to stop
using the MT algorithm at all, and certainly see no reason to rush the
change.
Regards,
--
Rowan Tommins
[IMSoP]
Thank you. I'm convinced for the most part.
I will probably have to separate this into a separate RFC. I'd like to
remove the existing reference to mt_rand()
from this RFC, and not include
the BC break.
Regards,
Go Kudo
2021年5月20日(木) 17:46 Rowan Tommins rowan.collins@gmail.com:
Thanks.
Deprecation will be done only to prevent unintended MT state dependent
code. We don't plan to go as far as removing the implementation for now.I think every deprecation notice should mean a plan to remove. If you
just want to encourage people to discover the new functions, I'm not
sure a message in their logs every time mt_srand runs is the right way
to do it.Firstly, making these functions independent of
mt_srand()
is a
breaking change, so cannot happen until PHP 9.0 at the earliest.To be honest, I don't understand how far PHP is willing to go to
accept disruptive changes.The official policy is here: https://wiki.php.net/rfc/releaseprocess It
relies on the phrase "backwards compatibility", which is admittedly
quite hard to define.You are right that the change from using
rand()
to usingmt_rand()
in
7.1 is arguably not backwards compatible. However, the following code
can be used to reproducibly shuffle two arrays in any version of PHP:srand(123456);
shuffle($array1);
shuffle($array2);The exact sequence returned will be different in 7.1 from 7.0, but
becausesrand()
is now an alias ofmt_srand()
, it still affects the
shuffle()
function in the expected way. If we switchshuffle()
to an
algorithm that can't be seeded at all, this code will simply stop
working, and have to be completely rewritten.A "polyfill" of sorts is possible, by storing an instance of the new
RNG\Randomizer in a global or static variable, but it's more than just a
find-and-replace, especially since many users will need to support
version 8.0 and 8.1 in the same code base.As I said previously, I'm not convinced these functions need to stop
using the MT algorithm at all, and certainly see no reason to rush the
change.Regards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
Hello internals.
I have created a draft of the RFC.
Hello Go Kudo,
First of all, thank you for the (re)work.
I think the API looks better, but I still have some remarks/questions
(roughly in order of appearance, with some overlap):
-
How many bytes is
Source::next()
supposed to generate? Is it even
intended to be called directly (vsRandomizer::generateBytes(int $length)
)? -
The stubs in "Proposal" should show
__construct()
signatures for the
classes implementingSource
. -
About the
MT19937PHP
variant: The PHP manual formt_srand()
describes
theMT_RAND_PHP
mode like this: "Uses an incorrect Mersenne Twister
implementation which was used as the default up till PHP 7.1.0. This mode
is available for backward compatibility."; do we really need/want to port
it to this new extension? -
I think you said in the past that
XorShift128Plus
should be the
preferred/default implementation, andMT19937
is only provided for
compatibility/migration? But if so, that's not clear at all in the RFC. -
PlatformProvided
feels like the "odd one out" here: its constructor
[assuming same as previous RFC] doesn't take a seed, and it isn't
serializable; I understand why, but the main point of the RFC is "RNG
reproducibility", which this class cannot achieve.
Also, we already haverandom_bytes()
/random_int()
, and you're
proposing to changeshuffle()
/str_shuffle()
/array_rand()
's internal
RNG frommt_rand()
-like torandom_bytes()
-like, so what reason would
there be to use this class? onlygenerateFloat()
? but couldn't that be a
new global functionrandom_float()
?
All in all, wouldn't it be simpler to dropPlatformProvided
, and have
allSource
implementations take a seed on construction and be
serializable? -
Technical, about
Randomizer::generateFloat()
: Is the returned float
guaranteed to be finite (i.e. neverNAN
nor +/-INF)? And may it ever return
-0.0 (negative zero)? and even if not, is 0.0 twice as likely as other
floats?
Also, nogenerateFloat(float $min, float $max)
variant like for
generateInt()
? -
About main usage of the API: To get e.g. $n (say 3) pseudo-random
integers between $min (say 0) and $max (say 100) using the "best"
implementation (which we have to know/guess is XorShift128+, isn't it?), we
must: first choose a $seed (time()
, maybe?), then do$source = new XorShift128Plus($seed);
and$rng = new Randomizer($source);
[even if
$source (and $seed) could be inlined, that's still several "steps"], then
we can finally call$rng->generateInt($min, $max);
$n times. Is that
correct?
If so, it may make sense from the implementer side, but from the user
side it doesn't look like the most friendly.
Also, that lets the possibility of constructing a second Randomizer using
the same $source (or$rng->getSource()
), which could be undesirable for
reproducibility, isn't it?
Wouldn't it be better to "merge" Randomizer into the interface/classes?
Actually I see that's the case in your other project ext/orng (
https://github.com/zeriyoshi/php-ext-orng/blob/master/rng/rnginterface.stub.php
and
https://github.com/zeriyoshi/php-ext-orng/blob/master/rng/xorshift128plus.stub.php
), why the different approach for ext/rng?
Maybe even better (and already suggested), have Randomizer constructor
take an optional algorithm (string, default XorShift128Plus::class?) and
an optional seed (int, defaulttime()
?) and construct the source
internally (and don't expose it)? So we could do just$rng = new Randomizer();
to rely on the defaults (then call$rng->generateInt($min, $max);
$n times). -
Finally, I think the vote should be split into 3 parts: the new classes,
the deprecations (which itself could be split between [mt_]srand and
[mt_]rand), and the change of internal RNG for
shuffle/str_shuffle/array_rand.
Best regards,
--
Guilliam Xavier
Sorry. I couldn't make it in time for my lunch break.
How many bytes is
Source::next()
supposed to generate? Is it even
intended to be called directly (vsRandomizer::generateBytes(int $length)
)?
No, this is not intended to be used directly.
In fact, as many people have pointed out, I think it would be more
appropriate to merge it into a single class.
As per the original implementation (ext/orng), I didn't think it needed to
be separated originally. One of the comments on a previous RFC suggested
that it would probably be better to separate them, and I took that into
account.
About the
MT19937PHP
variant: The PHP manual formt_srand()
describes
theMT_RAND_PHP
mode like this: "Uses an incorrect Mersenne Twister
implementation which was used as the default up till PHP 7.1.0. This mode
is available for backward compatibility."; do we really need/want to port
it to this new extension?
This is for backward compatibility. If it is not needed, we would like to
eliminate the implementation itself.
Well, to keep the language core clean, maybe this should be provided by an
appropriate external extension.
-
PlatformProvided
feels like the "odd one out" here: its constructor
[assuming same as previous RFC] doesn't take a seed, and it isn't
serializable; I understand why, but the main point of the RFC is "RNG
reproducibility", which this class cannot achieve.
You're absolutely right. I would like to remove this.
Technical, about
Randomizer::generateFloat()
: Is the returned float
guaranteed to be finite (i.e. neverNAN
nor +/-INF)? And may it ever return
-0.0 (negative zero)? and even if not, is 0.0 twice as likely as other
floats?
Also, nogenerateFloat(float $min, float $max)
variant like for
generateInt()
?
As a matter of fact, we believe that the functions provided by this class
are sufficient for mt_rand()
, shuffle()
, str_shuffle()
, and
array_rand()
equivalents. At most, the equivalent of random_byte()
might be provided.
I'll reorganize these.
Thanks for the detailed remarks. Based on these, I would like to clean up
the RFC.
Regards,
Go Kudo
2021年5月19日(水) 18:46 Guilliam Xavier guilliam.xavier@gmail.com:
Hello internals.
I have created a draft of the RFC.
Hello Go Kudo,
First of all, thank you for the (re)work.
I think the API looks better, but I still have some remarks/questions
(roughly in order of appearance, with some overlap):
How many bytes is
Source::next()
supposed to generate? Is it even
intended to be called directly (vsRandomizer::generateBytes(int $length)
)?The stubs in "Proposal" should show
__construct()
signatures for the
classes implementingSource
.About the
MT19937PHP
variant: The PHP manual formt_srand()
describes theMT_RAND_PHP
mode like this: "Uses an incorrect Mersenne
Twister implementation which was used as the default up till PHP 7.1.0.
This mode is available for backward compatibility."; do we really need/want
to port it to this new extension?I think you said in the past that
XorShift128Plus
should be the
preferred/default implementation, andMT19937
is only provided for
compatibility/migration? But if so, that's not clear at all in the RFC.
PlatformProvided
feels like the "odd one out" here: its constructor
[assuming same as previous RFC] doesn't take a seed, and it isn't
serializable; I understand why, but the main point of the RFC is "RNG
reproducibility", which this class cannot achieve.
Also, we already haverandom_bytes()
/random_int()
, and you're
proposing to changeshuffle()
/str_shuffle()
/array_rand()
's internal
RNG frommt_rand()
-like torandom_bytes()
-like, so what reason would
there be to use this class? onlygenerateFloat()
? but couldn't that be a
new global functionrandom_float()
?
All in all, wouldn't it be simpler to dropPlatformProvided
, and have
allSource
implementations take a seed on construction and be
serializable?Technical, about
Randomizer::generateFloat()
: Is the returned float
guaranteed to be finite (i.e. neverNAN
nor +/-INF)? And may it ever return
-0.0 (negative zero)? and even if not, is 0.0 twice as likely as other
floats?
Also, nogenerateFloat(float $min, float $max)
variant like for
generateInt()
?About main usage of the API: To get e.g. $n (say 3) pseudo-random
integers between $min (say 0) and $max (say 100) using the "best"
implementation (which we have to know/guess is XorShift128+, isn't it?), we
must: first choose a $seed (time()
, maybe?), then do$source = new XorShift128Plus($seed);
and$rng = new Randomizer($source);
[even if
$source (and $seed) could be inlined, that's still several "steps"], then
we can finally call$rng->generateInt($min, $max);
$n times. Is that
correct?
If so, it may make sense from the implementer side, but from the user
side it doesn't look like the most friendly.
Also, that lets the possibility of constructing a second Randomizer
using the same $source (or$rng->getSource()
), which could be undesirable
for reproducibility, isn't it?
Wouldn't it be better to "merge" Randomizer into the interface/classes?
Actually I see that's the case in your other project ext/orng (
https://github.com/zeriyoshi/php-ext-orng/blob/master/rng/rnginterface.stub.php
and
https://github.com/zeriyoshi/php-ext-orng/blob/master/rng/xorshift128plus.stub.php
), why the different approach for ext/rng?
Maybe even better (and already suggested), have Randomizer constructor
take an optional algorithm (string, default XorShift128Plus::class?) and
an optional seed (int, defaulttime()
?) and construct the source
internally (and don't expose it)? So we could do just$rng = new Randomizer();
to rely on the defaults (then call$rng->generateInt($min, $max);
$n times).Finally, I think the vote should be split into 3 parts: the new classes,
the deprecations (which itself could be split between [mt_]srand and
[mt_]rand), and the change of internal RNG for
shuffle/str_shuffle/array_rand.Best regards,
--
Guilliam Xavier
Sorry. I couldn't make it in time for my lunch break.
No worries, there was quite a lot to read/reply ;)
How many bytes is
Source::next()
supposed to generate? Is it even
intended to be called directly (vsRandomizer::generateBytes(int $length)
)?No, this is not intended to be used directly.
In fact, as many people have pointed out, I think it would be more
appropriate to merge it into a single class.As per the original implementation (ext/orng), I didn't think it needed to
be separated originally. One of the comments on a previous RFC suggested
that it would probably be better to separate them, and I took that into
account.
I see. Sorry that you had so many back-and-forth suggestions :s I guess
that the simplest-to-use API (for the main intended/expected use cases) is
more likely to get acceptance.
About the
MT19937PHP
variant: The PHP manual formt_srand()
describes theMT_RAND_PHP
mode like this: "Uses an incorrect Mersenne
Twister implementation which was used as the default up till PHP 7.1.0.
This mode is available for backward compatibility."; do we really need/want
to port it to this new extension?This is for backward compatibility. If it is not needed, we would like to
eliminate the implementation itself.
Well, to keep the language core clean, maybe this should be provided by an
appropriate external extension.
Yes, I think that a new core extension for PHP 8.1 needn't [or even
shouldn't] provide BC for an incorrect implementation that is not used
anymore since PHP 7.1 (unless someone here argues?).
PlatformProvided
feels like the "odd one out" here: its constructor
[assuming same as previous RFC] doesn't take a seed, and it isn't
serializable; I understand why, but the main point of the RFC is "RNG
reproducibility", which this class cannot achieve.You're absolutely right. I would like to remove this.
Caution though: after your recent discussion with Rowan, if you remove
the "change shuffle()
/str_shuffle()/array_rand()'s internal
RNG from mt_rand()
-like to random_bytes()
-like" part from the proposal,
then this class becomes potentially "useful" again, not for reproducibility
but for shuffling an array/string by a "truly random" [actually CSPRNG]
algorithm, isn't it?
Technical, about
Randomizer::generateFloat()
: Is the returned float
guaranteed to be finite (i.e. neverNAN
nor +/-INF)? And may it ever return
-0.0 (negative zero)? and even if not, is 0.0 twice as likely as other
floats?
Also, nogenerateFloat(float $min, float $max)
variant like for
generateInt()
?As a matter of fact, we believe that the functions provided by this class
are sufficient formt_rand()
,shuffle()
,str_shuffle()
, and
array_rand()
equivalents. At most, the equivalent ofrandom_byte()
might be provided.
I'll reorganize these.
Sorry I'm not sure to understand your answer here, are you planning to
remove generateFloat() and generateBool()?
Well if generateBool() is a shorthand for (bool) generateInt(0, 1)
[or
generateInt(PHP_INT_MIN, PHP_INT_MAX) >= 0
] then it's not necessary, but
maybe convenient?
As for generateFloat(), I didn't know if you intended something like
(float) generateInt(PHP_INT_MIN, PHP_INT_MAX) / PHP_INT_MAX
or rather
"call generateBytes(8) then reinterpret those 64 bits as an IEEE 754 double
[in C, not PHP]"?
But it's true that I haven't seen many people requesting new
"random_bool()" and/or "random_float()" functions, so...
By the way, for generateInt()
(without explicit $min/$max args), I assume
that $max defaults to PHP_INT_MAX, but does $min default to PHP_INT_MIN
or
actually 0 [if it's like [mt_]rand() I guess that the answer is probably 0,
but that could be written clear in the RFC]?
Thanks for the detailed remarks. Based on these, I would like to clean up
the RFC.Regards,
Go Kudo
Regards,
--
Guilliam Xavier
Hello internals.
I have created a draft of the RFC.
https://wiki.php.net/rfc/rng_extension
This RFC is a proposal to implement the old object scope RNG in PHP.
You can read about the discussion so far below.
I believe that the problem with the previous RFC was that it was just an
ugly API. I propose to sort this out and deprecate the existing functions
that should be replaced.I would like to ask for feedback on this draft RFC.
Regards,
Go kudo
This looks good.
-
instead of ext/rng and the RNG namespace, why not use ext/random and
Random namespace. And incorporate also other random_* functions maybe by
creating some aliases. -
instead of Randomizer, maybe we can use just Random or RNG (if previous
suggestion does not stand). -
maybe some helper functions like Random\new_random($algo, $seed):
Random\Random can be used as a factory for internal implementations. Or
RNG\new_randomizer ($algo, $seed): RNG\Randomizer (if previous suggestions
does not stand). -
for public function
next()
: string, how many bytes should be returned?
Regards,
Alex