I would like to suggest changing the name or removing the nextFloat() method,
which is coming to PHP 8.3.
The Random\Randomizer class will have two new methods for generating
random numbers:
- getFloat($min, $max [, $boundary])
- nextFloat()
Now please try to guess what these methods do and, more importantly,
how they differ. I bet no one can guess. Which is, of course, a
symptom of poor naming. And as we know, naming along with cache
invalidation are the most complicated things :-)
What is the correct answer? It differs in that nextFloat() returns a
random number in the interval 0...1, so it does the same thing as
getFloat(0, 1). See
https://wiki.php.net/rfc/randomizer_additions#nextfloat
A) The problem with nextFloat() is that the name creates a false
expectation. Take a look at the following code:
$randomizer = new \Random\Randomizer();
$a = $randomizer->getFloat(100, 200); // number between 100..200
$b = $randomizer->nextFloat(); // another number between 100..200 ???
One would expect that in $b
there would be a number in the interval
100..200 again, but surprisingly not, there will be a number in the
interval 0..1.
B) When trying to think of a more appropriate name (e.g.
getFloat01()
?), I find that the simple getFloat(0, 1) is concise and succinct enough. If it were really useful to have a method that returns numbers in this interval, wouldn't it be better to give the $min and $max parameters the default values of 0 and 1, and thus call
getFloat()` directly?
C) PHP 8.3 is still in testing and this change can be made. If it is
not done now, it will never be done and programmers will have to use
and also read unintuitive APIs for years. Similarly, in the days
before PHP 8.0 was released, I suggested changing PhpToken::getAll()
to today's tokenize() and there was no problem with that
https://bugs.php.net/bug.php?id=80328
D) There are libraries in other languages (Java, ASP.NET, etc.) that
also use methods named next()
to generate numbers. This is perfectly
fine. The point is that they don't also use methods named get() in the
same time. The problem I am pointing out is that methods differing in
the prefix get/next differ only in the interval from which they return
numbers. You won't find this in any other language.
Thank for your time
DG
A) The problem with nextFloat() is that the name creates a false
expectation. Take a look at the following code:$randomizer = new \Random\Randomizer(); $a = $randomizer->getFloat(100, 200); // number between 100..200 $b = $randomizer->nextFloat(); // another number between 100..200 ???
I don't even get the point of nextFloat. Its a random function, there is
no thing as "next".
I think the major pet peeve i have is that the "Randomizer" class is
encapsulating all random functions.
I would like to see each randomizer have its own class, since they are
under the Random namespace
anyway, I believe they should be Random/StringRandomizer,
Random/IntRandomizer,
Random/FloatRandomizer, so on so forth. Basically Ramdom/{Type}Randomizer.
This would
also allow more granular and specific functionality for each type.
For example, for this discussion, the class, Random/FloatRandomizer (and
IntRandomizer)
could be instatiated with its boundries.
$randomFloat = new Random/FloatRandomizer($lowerBound, $upperBound,
$boundryInterval)
$a = $randomFloat->get();
$b = $randomFLoat->next();
Is that not more intuitive?
Hi
I think the major pet peeve i have is that the "Randomizer" class is
encapsulating all random functions.
I would like to see each randomizer have its own class, since they are
under the Random namespace
anyway, I believe they should be Random/StringRandomizer,
Random/IntRandomizer,
Random/FloatRandomizer, so on so forth. Basically Ramdom/{Type}Randomizer.
This would
also allow more granular and specific functionality for each type.For example, for this discussion, the class, Random/FloatRandomizer (and
IntRandomizer)
could be instatiated with its boundries.$randomFloat = new Random/FloatRandomizer($lowerBound, $upperBound, $boundryInterval) $a = $randomFloat->get(); $b = $randomFLoat->next();
Is that not more intuitive?
Creating a separate class for each possible distribution would certainly
be the cleanest API from an "academic" point of view. I'm not sure if it
would be more intuitive, but I'm sure that it would not be easier to
use. The API of Random\Randomizer is probably not perfect, but I believe
it does quite a few things quite right:
- It is secure by default, if you do not provide an engine, it defaults
to the CSPRNG. - It is discoverable, your IDE will autocomplete the available methods
and all the methods are listed right beneath each other in the
documentation. - It is succinct for common cases and (mostly) does what it says on the tin.
- It allows you to easily build additional (userland) APIs on top of it,
thus serving as a "building block" (to reuse the phrasing from the
Randomizer additions RFC). Such a userland API could look like the API
you proposed (though it would probably make sense to also include the
"Uniform" somewhere within the name, because you could also have a
"Normal" distribution, so we're right in bike-shedding territory, API
design is hard).
To give an example:
// Inspired by Rust's API.
$engine = new \Random\Engine\Secure();
$oneToHundred = new \Random\Distribution\UniformInt(1, 100);
$randomInt = $oneToHundred($engine);
$anotherRandomInt = $oneToHundred($engine);
vs
// The default PHP API.
$randomizer = new \Random\Randomizer();
$randomInt = $randomizer->getInt(1, 100);
$anotherRandomInt = $randomizer->getInt(1, 100);
I would certainly prefer PHP's API for most of my applications, because
it's much less boilerplate. It also avoids constructing many short-lived
objects [1]: You can just throw both the Engine and Randomizer into your
dependency injection container.
Best regards
Tim Düsterhus
[1] I expect the performance impact of constructing and destructing
objects to be greater than a regular function call, but might be wrong here.
Hi
random number in the interval 0...1, so it does the same thing as
getFloat(0, 1). See
https://wiki.php.net/rfc/randomizer_additions#nextfloat
For the record: That is correct.
B) When trying to think of a more appropriate name (e.g.
getFloat01()
?), I find that the simplegetFloat(0, 1) is concise and succinct enough. If it were really useful to have a method that returns numbers in this interval, wouldn't it be better to give the $min and $max parameters the default values of 0 and 1, and thus call
getFloat()` directly?
Yes, the right-open interval [0, 1) is useful to have as a special case,
because it directly maps onto percentages, allowing you to easily
generate booleans with a specific chance.
Making those default values of getFloat() wouldn't be better, though. It
is not meaningful to only provide one of the boundaries (specifically
$min), but PHP does not support overloaded signatures without
func_num_args()
trickery and this was banned for internal functions as
of this RFC:
https://wiki.php.net/rfc/deprecate_functions_with_overloaded_signatures#policy_about_new_functions
Making getFloat(float $min = 0.0, float $max = 1.0, IntervalBoundary
$boundary = IntervalBoundary::ClosedOpen) would seemingly make it legal
to call ->getFloat(0.5), which I consider to be worse than nextFloat().
Side note: The implementation of nextFloat() is much more efficient than
that of getFloat(0.0, 1.0) and the output will differ even for the same
seeded engine. The domain of legal return values is identical.
C) PHP 8.3 is still in testing and this change can be made. If it is
not done now, it will never be done and programmers will have to use
and also read unintuitive APIs for years. Similarly, in the days
before PHP 8.0 was released, I suggested changing PhpToken::getAll()
to today's tokenize() and there was no problem with that
https://bugs.php.net/bug.php?id=80328
The release managers would need to decide whether a change would be
legal at this point in the release process, but I strongly doubt it,
especially since naming is inherently bike-shedding territory and since
the method name is already in various blog articles and in the
documentation.
For the same reason I'm not sure if comparing this to PhpToken::getAll()
/ tokenize() is a useful comparison, since the tokenizer is a very
specialized feature that only few developers will care about. Making a
breaking change late in the release process will have a much smaller
impact to existing documentation / blog articles / feature videos / etc.
Best regards
Tim Düsterhus
Am 15.10.2023 um 19:24 schrieb Tim Düsterhus tim@bastelstu.be:
Making getFloat(float $min = 0.0, float $max = 1.0, IntervalBoundary $boundary = IntervalBoundary::ClosedOpen) would seemingly make it legal to call ->getFloat(0.5), which I consider to be worse than nextFloat().
While I understand that you find getFloat(0.5) undesirable I would not consider it illegal, e.g. getFloat(max:5).
Additionally I would consider getInt(max:100) to be a very valid usage too.
Side note: The implementation of nextFloat() is much more efficient than that of getFloat(0.0, 1.0) and the output will differ even for the same seeded engine. The domain of legal return values is identical.
This sounds like an implementation detail as I'm pretty sure you could switch to that fast path easily enough if neither of the arguments were given.
Side-note: The case of nextInt() is a bit trickier and strikes me as a bit of a weird API and I'd consider dropping it too: Having the max value depend on the Randomizer engine makes the generated values unstable in regards to switch the Engine. And if I read the documentation correctly then it only generates 32 bit values (or should it be 31 bits as it returns positive values only?). I think it would be clearer to require the Engine to provide at least 4 bytes and then specify the default max value of getInt() to be 2^31, optionally overridable with something up to PHP_INT_MAX.
So in summary I'd support both adding default values to getInt()/getFloat() as well as dropping nextFloat() in favor of getFloat() (and possible nextInt() with getInt()).
Regards,
- Chris
On Mon, 16 Oct 2023 at 09:34, Christian Schneider cschneid@cschneid.com
wrote:
Side note: The implementation of nextFloat() is much more efficient than
that of getFloat(0.0, 1.0) and the output will differ even for the same
seeded engine. The domain of legal return values is identical.This sounds like an implementation detail as I'm pretty sure you could
switch to that fast path easily enough if neither of the arguments were
given.
This is not how the engine works, it would require an if-clause on every
single call of getFloat() that would probably defeat the point of it.
Best regards,
George P. Banyard
Hi
Am 15.10.2023 um 19:24 schrieb Tim Düsterhus tim@bastelstu.be:
Making getFloat(float $min = 0.0, float $max = 1.0, IntervalBoundary $boundary = IntervalBoundary::ClosedOpen) would seemingly make it legal to call ->getFloat(0.5), which I consider to be worse than nextFloat().
While I understand that you find getFloat(0.5) undesirable I would not consider it illegal, e.g. getFloat(max:5).
Additionally I would consider getInt(max:100) to be a very valid usage too.
I don't, because I prefer explicit calls. If the upper bound is
specified explicitly, the lower bound should be as well. In fact the
0,
is even shorter than spelling out max:
.
Side note: The implementation of nextFloat() is much more efficient than that of getFloat(0.0, 1.0) and the output will differ even for the same seeded engine. The domain of legal return values is identical.
This sounds like an implementation detail as I'm pretty sure you could switch to that fast path easily enough if neither of the arguments were given.
Yes, that's why it's a side note.
Side-note: The case of nextInt() is a bit trickier and strikes me as a bit of a weird API and I'd consider dropping it too: Having the max value depend on the Randomizer engine makes the generated values unstable in regards to switch the Engine. And if I read the documentation correctly then it only generates 32 bit values (or should it be 31 bits as it returns positive values only?). I think it would be clearer to require the Engine to provide at least 4 bytes and then specify the default max value of getInt() to be 2^31, optionally overridable with something up to PHP_INT_MAX.
Yes, Randomizer::nextInt() is terrible. Let me provide the context why
it exists like it exists:
-
The Randomizer was designed to be drop-in compatible with the
existing randomness functionality (mt_rand, array_rand, shuffle,
strshuffle) when combining it with the Mt19937 engine. -
Thus mt_rand is equivalent to Randomizer::getInt().
-
It is legal to call
mt_rand()
without parameters. -
This behavior was carried over into getInt().
-
During the final review, it was noticed that this is an overloaded
function and a decision was made not to do this, because overloaded
functions are terrible. -
Thus the getInt()-without-parameters call was split into nextInt().
I think that the goal of making the new API a drop-in replacement has
been a useful goal to make it easier to migrate and I assume that's why
no one questioned whether retrieving "an arbitrary positive integer" is
even useful in the first place. While writing the documentation I first
realized how useless that is, because I was unable to write something
useful for Randomizer::nextInt(), but at that point it was way too late.
Hindsight is 20/20.
Given that knowledge, is naming nextFloat()
like nextInt()
a
mistake? I'm not sure. It was the obvious naming choice to me back when
I wrote the RFC a year ago and no one pointed out how the name was not
ideal, not during the RFC discussion phase, nor during the vote or in
the half year leading up to the feature freeze. Now 6 weeks before the
gold release, with 4 release candidates released and several blog posts
written, the method name suddenly is a bad choice?
So in summary I'd support both adding default values to getInt()/getFloat() as well as dropping nextFloat() in favor of getFloat() (and possible nextInt() with getInt()).
I'd be on board with deprecating nextInt() in PHP 8.4. I'd also be on
board with creating a better-named alias for nextFloat(), but I believe
that it should remaing a standalone method, because of its usefulness.
Best regards
Tim Düsterhus