Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100293 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 22248 invoked from network); 23 Aug 2017 20:05:22 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Aug 2017 20:05:22 -0000 Authentication-Results: pb1.pair.com header.from=solar@openwall.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=solar@openwall.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain openwall.com designates 195.42.179.200 as permitted sender) X-PHP-List-Original-Sender: solar@openwall.com X-Host-Fingerprint: 195.42.179.200 mother.openwall.net Received: from [195.42.179.200] ([195.42.179.200:62086] helo=mother.openwall.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id EB/4F-34801-100ED995 for ; Wed, 23 Aug 2017 16:05:22 -0400 Received: (qmail 24346 invoked from network); 23 Aug 2017 20:05:17 -0000 Received: from localhost (HELO pvt.openwall.com) (127.0.0.1) by localhost with SMTP; 23 Aug 2017 20:05:17 -0000 Received: by pvt.openwall.com (Postfix, from userid 503) id 43FF1AB18D; Wed, 23 Aug 2017 22:05:09 +0200 (CEST) Date: Wed, 23 Aug 2017 22:05:09 +0200 To: Nikita Popov Cc: Leigh , PHP internals Message-ID: <20170823200509.GA2195@openwall.com> References: <20170816191247.GA12324@openwall.com> <20170816214155.GA12831@openwall.com> <20170816220242.GA13012@openwall.com> <20170817131830.GB14477@openwall.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170817131830.GB14477@openwall.com> User-Agent: Mutt/1.4.2.3i Subject: Re: [PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug From: solar@openwall.com (Solar Designer) On Thu, Aug 17, 2017 at 03:18:30PM +0200, Solar Designer wrote: > On Thu, Aug 17, 2017 at 12:57:56AM +0200, Nikita Popov wrote: > > On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer wrote: > > > One difference I didn't notice at first: the currently committed code > > > does only one php_mt_rand() call per loop iteration when it's skipping > > > "numbers over the limit", even in the 64-bit case (thus, it replaces 32 > > > bits of the value at a time). Your 64-bit version (and my revision of > > > it) does two calls per iteration (replacing the whole number). > > > > > > Arguably, the currently committed code is smarter and more efficient in > > > that respect, whereas yours is cleaner. Regardless, it's an extra > > > change that will affect the generated random numbers in some cases where > > > we could avoid making that change (it's not fixing any bug). > > > > > > I think it's preferable not to unnecessarily change what numbers are > > > generated. > > > Good point. However, I think that this optimization is not actually > > correct. For example, let's take umax = 0x1_0000_0001, which is the > > smallest value for which this codepath is taken. In this case limit would > > be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly > > 0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this case, > > and we only generate one new 32-bit value, the top word of the result will > > be fixed to 0xffff_ffff. (A very small bias in this case, but there's > > probably more significant cases.) > > Great point. More generally, we can't reuse the same random number for > decision-making (to skip it) and as part of the next (not so) random > number, without introducing bias. So we shouldn't, and we should in > fact move away from the old "smarter" behavior as a bug fix. Thank you! So I guess both the bug I reported and this one Nikita found are going to get fixed soon? For 7.2.0 maybe? Meanwhile, I released php_mt_seed 4.0 yesterday with support for latest PHP's mt_rand(), as well as with support for PHP 5.2.0 and below (as it happens, all the way to 3.0.7, although that's overkill). Near the end of its documentation, I included a lengthy section entitled "PHP version curiosities", which I include below in this e-mail in case any of this is useful for PHP's own documentation. It starts from PHP 3.0.6, but then actually spends half of the text on PHP 7.1.0+. Here we go: --- While php_mt_seed supports 3 major revisions of PHP's mt_rand() algorithm and that sort of covers PHP 3.0.7 through 7.1.0+ (up to the latest as of this writing and probably beyond), the reality is somewhat trickier than that. From older versions to newer: As a mere historical curiosity, php_mt_seed is in fact able to crack seeds of PHP 3.0.6, which is the very first version that introduced mt_rand(), but only as long as no range was passed into mt_rand(). That version had broken support for ranges, and indeed there's no point in supporting that short-lived breakage in php_mt_seed now. With this detail, php_mt_seed has some support for all mt_rand() capable versions of PHP released so far. Then there's PHP 3.0.7 through 5.2.0, where Mersenne Twister's state initialization is with multiples of 69069. This enables our stateless implementation to quickly jump to the state array element needed to compute the first mt_rand() output by using a precomputed value for 69069 raised to the power 396 (mod 2**32), which is MT's M-1. Another curiosity of those versions, which we take advantage of too, is that they treat adjacent even and odd seeds the same, so the effective seed space is 31-bit. PHP 3.0.6 to 4.1.2 used a default seed of 4357 (and thus also 4356) if mt_srand() was not called. PHP 4.2.0 changed that to automatic seeding using system time and PHP process ID (still predictable and now also leaky, but no longer a constant), but there was "Bug #25007 rand & mt_rand seed RNG every call" until 4.3.3, which presumably affected how cracked seeds could (not) be used. PHP 5.2.1 changed MT state initialization to MT authors' new recommended algorithm, which is no longer linear so we have to compute the first 397 state elements (out of 624) even though in the simplest case we only need (and only store) the first and last one of those (or we could use a time-memory trade-off, which we currently don't). PHP 5.2.1 also introduced a bug into its implementation of MT (use of a wrong variable, whereas pre-5.2.1 code was correct in that respect). This bug lets us skip a few operations for every other seed, which we do, although this optimization is so minor that we could as well not bother. PHP 7.1.0 fixed this bug (reverting to pre-5.2.1 code in that respect, so we use the same logic for pre-5.2.1 and 7.1.0+ there). In PHP versions from 3.0.7 to 7.0.x, if mt_rand() was called with its optional output range specified, a 31-bit (0 to 2147483647) MT PRNG output was scaled to that range using floating-point math. This meant that if a range wider than 31-bit was requested on a 64-bit build of PHP, some values would never occur. This also meant that even for most ranges smaller than 31-bit a bias was introduced (some output values became more likely than others), as compared to MT's raw output (which was relatively unbiased). PHP 7.1.0 tried to fix those biases by dropping the floating-point math and instead mapping the raw 32-bit MT PRNG outputs to the target range using integer modulo division. To avoid inherent bias when the target range isn't a whole power of 2 of possible integer values, a loop was introduced to skip raw 32-bit PRNG outputs (until a suitable one is seen) that would result in such bias. A bug in that code was found and reported due to work on php_mt_seed. As it turned out, the loop only works right in 32-bit builds of PHP, and is ineffective on 64-bit (except with 64-bit ranges, see below). Luckily, this actually makes things simpler for php_mt_seed, and currently php_mt_seed fully supports the behavior of 64-bit builds only (for ranges up to 0 to 2147483646). There's currently no intent to add to php_mt_seed the complication of bias-avoidance of 32-bit builds of PHP 7.1.0+, as well as of 64-bit builds of future versions where the bug will presumably get fixed. What this means in practice is that for 32-bit builds of PHP and future versions of PHP, php_mt_seed may occasionally find wrong and miss correct seeds for mt_rand() invoked with a range, but the probability of this happening is very low except for very wide ranges that are not a whole power of 2 of possible integer values. For example, mt_rand(0, 61) or mt_rand(111, 222) are very unlikely to trigger the problem, mt_rand(0, 255) can't trigger the problem, whereas mt_rand(1000000000, 2000000000) is somewhat likely to trigger it. Such likely problematic ranges are probably rarely used and are of little relevance to uses of php_mt_seed. Also, supporting this buggy vs. correct behavior would require treating 32- and 64-bit builds of PHP separately and reporting on them differently. PHP 7.1.0 also tried to introduce proper support for 64-bit ranges in 64-bit builds. It generates two raw 32-bit PRNG outputs to derive one mt_rand() output when the target range spans more than a 32-bit space. Unfortunately, the implementation is buggy in a way where it'd introduce biases into such mt_rand() outputs. The bug will presumably get fixed as well, but regardless there's currently no intent to support wider than 31-bit ranges in php_mt_seed. This is obscure functionality (arguably, originally an accidental misfeature, which the PHP developers didn't really have to make official) that is only available on 64-bit builds of PHP. Currently, php_mt_seed does not allow specifying larger than 31-bit integers on its command line (it will report an error when a larger value is specified). Prior to PHP 7.1.0, mt_rand(0, 2147483647) was equivalent to mt_rand() without a range, and php_mt_seed still assumes so. This assumption is no longer valid for PHP 7.1.0+, which means that when searching for seeds for PHP 7.1.0+ for mt_rand() called with a range specified, you can specify at most a range one smaller than that, thus "0 2147483646" being the maximum that php_mt_seed supports for those versions. This minor limitation shouldn't matter in practice, except that you might need to be aware you can continue to specify a range of "0 2147483647" to indicate that no range was passed into mt_rand(). PHP 7.1.0 also aliased rand() to mt_rand() and srand() to mt_srand(). This means that on one hand you can use php_mt_seed to crack rand() seeds for PHP 7.1.0+ (since those are also mt_rand() seeds), but on the other hand this cross-seeding and cross-consumption of random numbers can affect which attacks work or don't work, and exactly how, against specific applications that make use of both sets of PHP functions. PHP 7.1.0 also introduced MT_RAND_PHP as optional second parameter to mt_srand(). When specified, it correctly enables behavior identical to that of PHP versions 5.2.1 to 7.0.x. Thus, seeds that php_mt_seed reports as valid for 5.2.1 to 7.0.x are always also valid for 7.1.0+ with MT_RAND_PHP, and conversely seeds that php_mt_seed reports as valid for 7.1.0+ are often invalid for 7.1.0+ with MT_RAND_PHP (except when the same seeds are also valid for 5.2.1 to 7.0.x, which is common). --- Alexander