Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100453 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75550 invoked from network); 7 Sep 2017 18:23:26 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 Sep 2017 18:23:26 -0000 Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.214.48 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.214.48 mail-it0-f48.google.com Received: from [209.85.214.48] ([209.85.214.48:35227] helo=mail-it0-f48.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D1/B5-10715-D9E81B95 for ; Thu, 07 Sep 2017 14:23:26 -0400 Received: by mail-it0-f48.google.com with SMTP id v19so1092589ite.0 for ; Thu, 07 Sep 2017 11:23:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=9QGqOl3tl7Duw1Mn4kbcHo77mRa0QiPqvhDmyvf+qgM=; b=N0A5FVF0f3emqDt0aEilAXV+YfuO739So+yd6Q53YFpEg3/9g7d+E0tL82c3WHs42r YJpYr0+H7iAsi3Pw/3iXV71ECZ5YP+1coeOQV63eA9/V3zVzYoAqmmOa6Ajjb86r3i29 E0Yh3yzLZtxN1PKLiakfTwbX8dx5FusQnwiunKzrnCiJ00lomgGAWsteJyrye7zrxgAz W05+FkifZE0WnFqFGukVE+d5xZl8GetCLxVVG1j07/qaArB8YLyYoFOtMdBX406561OS okyDZSKXCDRfj4rtwiokZf2WVtf3U0LfbYXW1HgxHhn4m9JOidjLsmf5uJtFBg0rY8I5 USRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=9QGqOl3tl7Duw1Mn4kbcHo77mRa0QiPqvhDmyvf+qgM=; b=iSbEVKwmAXOkSVyzCb9gSHOt8MxfOPP6y2WuygcXbBumMMMsYLbTZpBOBrpXyuHW4U 6V6PVjkrW4FpOWC6iGfQEfrXxMnD+xPy+Xcrs6/ovaAzlbDccv7gkMoYXogXT108KZKf K0Qga4CgcIHA2QDcMMcHeDHNQfqgyXphX3HO23AOepQzj4QgRnEqRJES8Vl+VzwIywMm 0+1a7E/gZ/rWbfmh+ffLDrWfVue+1Ei/ON2QNhnsCpDxBkT8h/Rnw8eRnqR9BbhDgYya pk4cZ4LTs+HSKFSP6B1FlTrnG/n8J330YpYf7qqmlqmfPdajvdiIBvGLm8NZVQNJIMbU TgNA== X-Gm-Message-State: AHPjjUjmQbx7jUAgRlxRtmtZgSzagekPZfKAqpDR+/nI9URyKwJjzagZ T2R7Ik4qV1825ToggXlD5jvQp88GsQ== X-Google-Smtp-Source: ADKCNb7Mw3WdIFUDbsi0AKT6CFATvxZLWawE2bMco+h1494zl70jRwSsLu1c+SfAt+dXlgrzGY362zt6vYKjm///Jlg= X-Received: by 10.36.78.137 with SMTP id r131mr190191ita.111.1504808603100; Thu, 07 Sep 2017 11:23:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.13.3 with HTTP; Thu, 7 Sep 2017 11:23:22 -0700 (PDT) In-Reply-To: <20170823200509.GA2195@openwall.com> References: <20170816191247.GA12324@openwall.com> <20170816214155.GA12831@openwall.com> <20170816220242.GA13012@openwall.com> <20170817131830.GB14477@openwall.com> <20170823200509.GA2195@openwall.com> Date: Thu, 7 Sep 2017 20:23:22 +0200 Message-ID: To: Solar Designer Cc: Leigh , PHP internals , Davey Shafik , Joe Watkins Content-Type: multipart/alternative; boundary="001a11402b2a3e2b9a05589d8e20" Subject: Re: [PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug From: nikita.ppv@gmail.com (Nikita Popov) --001a11402b2a3e2b9a05589d8e20 Content-Type: text/plain; charset="UTF-8" On Wed, Aug 23, 2017 at 10:05 PM, Solar Designer wrote: > On Thu, Aug 17, 2017 at 03:18:30PM +0200, Solar Designer wrote: > > On Thu, Aug 17, 2017 at 12:57:56AM +0200, Nikita Popov wrote: > > > On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer > wrote: > > > > One difference I didn't notice at first: the currently committed code > > > > does only one php_mt_rand() call per loop iteration when it's > skipping > > > > "numbers over the limit", even in the 64-bit case (thus, it replaces > 32 > > > > bits of the value at a time). Your 64-bit version (and my revision > of > > > > it) does two calls per iteration (replacing the whole number). > > > > > > > > Arguably, the currently committed code is smarter and more efficient > in > > > > that respect, whereas yours is cleaner. Regardless, it's an extra > > > > change that will affect the generated random numbers in some cases > where > > > > we could avoid making that change (it's not fixing any bug). > > > > > > > > I think it's preferable not to unnecessarily change what numbers are > > > > generated. > > > > > Good point. However, I think that this optimization is not actually > > > correct. For example, let's take umax = 0x1_0000_0001, which is the > > > smallest value for which this codepath is taken. In this case limit > would > > > be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly > > > 0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this > case, > > > and we only generate one new 32-bit value, the top word of the result > will > > > be fixed to 0xffff_ffff. (A very small bias in this case, but there's > > > probably more significant cases.) > > > > Great point. More generally, we can't reuse the same random number for > > decision-making (to skip it) and as part of the next (not so) random > > number, without introducing bias. So we shouldn't, and we should in > > fact move away from the old "smarter" behavior as a bug fix. Thank you! > > So I guess both the bug I reported and this one Nikita found are going > to get fixed soon? For 7.2.0 maybe? > > Meanwhile, I released php_mt_seed 4.0 yesterday with support for latest > PHP's mt_rand(), as well as with support for PHP 5.2.0 and below (as it > happens, all the way to 3.0.7, although that's overkill). Near the end > of its documentation, I included a lengthy section entitled "PHP version > curiosities", which I include below in this e-mail in case any of this > is useful for PHP's own documentation. It starts from PHP 3.0.6, but > then actually spends half of the text on PHP 7.1.0+. Here we go: > > --- > While php_mt_seed supports 3 major revisions of PHP's mt_rand() > algorithm and that sort of covers PHP 3.0.7 through 7.1.0+ (up to the > latest as of this writing and probably beyond), the reality is somewhat > trickier than that. From older versions to newer: > > As a mere historical curiosity, php_mt_seed is in fact able to crack > seeds of PHP 3.0.6, which is the very first version that introduced > mt_rand(), but only as long as no range was passed into mt_rand(). That > version had broken support for ranges, and indeed there's no point in > supporting that short-lived breakage in php_mt_seed now. With this > detail, php_mt_seed has some support for all mt_rand() capable versions > of PHP released so far. > > Then there's PHP 3.0.7 through 5.2.0, where Mersenne Twister's state > initialization is with multiples of 69069. This enables our stateless > implementation to quickly jump to the state array element needed to > compute the first mt_rand() output by using a precomputed value for > 69069 raised to the power 396 (mod 2**32), which is MT's M-1. Another > curiosity of those versions, which we take advantage of too, is that > they treat adjacent even and odd seeds the same, so the effective seed > space is 31-bit. > > PHP 3.0.6 to 4.1.2 used a default seed of 4357 (and thus also 4356) if > mt_srand() was not called. PHP 4.2.0 changed that to automatic seeding > using system time and PHP process ID (still predictable and now also > leaky, but no longer a constant), but there was "Bug #25007 rand & > mt_rand seed RNG every call" until 4.3.3, which presumably affected how > cracked seeds could (not) be used. > > PHP 5.2.1 changed MT state initialization to MT authors' new recommended > algorithm, which is no longer linear so we have to compute the first 397 > state elements (out of 624) even though in the simplest case we only > need (and only store) the first and last one of those (or we could use a > time-memory trade-off, which we currently don't). > > PHP 5.2.1 also introduced a bug into its implementation of MT (use of a > wrong variable, whereas pre-5.2.1 code was correct in that respect). > This bug lets us skip a few operations for every other seed, which we > do, although this optimization is so minor that we could as well not > bother. PHP 7.1.0 fixed this bug (reverting to pre-5.2.1 code in that > respect, so we use the same logic for pre-5.2.1 and 7.1.0+ there). > > In PHP versions from 3.0.7 to 7.0.x, if mt_rand() was called with its > optional output range specified, a 31-bit (0 to 2147483647 > <%28214%29%20748-3647>) MT PRNG > output was scaled to that range using floating-point math. This meant > that if a range wider than 31-bit was requested on a 64-bit build of > PHP, some values would never occur. This also meant that even for most > ranges smaller than 31-bit a bias was introduced (some output values > became more likely than others), as compared to MT's raw output (which > was relatively unbiased). > > PHP 7.1.0 tried to fix those biases by dropping the floating-point math > and instead mapping the raw 32-bit MT PRNG outputs to the target range > using integer modulo division. To avoid inherent bias when the target > range isn't a whole power of 2 of possible integer values, a loop was > introduced to skip raw 32-bit PRNG outputs (until a suitable one is > seen) that would result in such bias. A bug in that code was found and > reported due to work on php_mt_seed. As it turned out, the loop only > works right in 32-bit builds of PHP, and is ineffective on 64-bit > (except with 64-bit ranges, see below). Luckily, this actually makes > things simpler for php_mt_seed, and currently php_mt_seed fully supports > the behavior of 64-bit builds only (for ranges up to 0 to 2147483646 > <%28214%29%20748-3646>). > > There's currently no intent to add to php_mt_seed the complication of > bias-avoidance of 32-bit builds of PHP 7.1.0+, as well as of 64-bit > builds of future versions where the bug will presumably get fixed. What > this means in practice is that for 32-bit builds of PHP and future > versions of PHP, php_mt_seed may occasionally find wrong and miss > correct seeds for mt_rand() invoked with a range, but the probability of > this happening is very low except for very wide ranges that are not a > whole power of 2 of possible integer values. For example, mt_rand(0, > 61) or mt_rand(111, 222) are very unlikely to trigger the problem, > mt_rand(0, 255) can't trigger the problem, whereas mt_rand(1000000000, > 2000000000) is somewhat likely to trigger it. Such likely problematic > ranges are probably rarely used and are of little relevance to uses of > php_mt_seed. Also, supporting this buggy vs. correct behavior would > require treating 32- and 64-bit builds of PHP separately and reporting > on them differently. > > PHP 7.1.0 also tried to introduce proper support for 64-bit ranges in > 64-bit builds. It generates two raw 32-bit PRNG outputs to derive one > mt_rand() output when the target range spans more than a 32-bit space. > Unfortunately, the implementation is buggy in a way where it'd introduce > biases into such mt_rand() outputs. The bug will presumably get fixed > as well, but regardless there's currently no intent to support wider > than 31-bit ranges in php_mt_seed. This is obscure functionality > (arguably, originally an accidental misfeature, which the PHP developers > didn't really have to make official) that is only available on 64-bit > builds of PHP. Currently, php_mt_seed does not allow specifying larger > than 31-bit integers on its command line (it will report an error when a > larger value is specified). > > Prior to PHP 7.1.0, mt_rand(0, 2147483647 <%28214%29%20748-3647>) was > equivalent to mt_rand() > without a range, and php_mt_seed still assumes so. This assumption is > no longer valid for PHP 7.1.0+, which means that when searching for > seeds for PHP 7.1.0+ for mt_rand() called with a range specified, you > can specify at most a range one smaller than that, thus "0 2147483646 > <%28214%29%20748-3646>" > being the maximum that php_mt_seed supports for those versions. This > minor limitation shouldn't matter in practice, except that you might > need to be aware you can continue to specify a range of "0 2147483647 > <%28214%29%20748-3647>" > to indicate that no range was passed into mt_rand(). > > PHP 7.1.0 also aliased rand() to mt_rand() and srand() to mt_srand(). > This means that on one hand you can use php_mt_seed to crack rand() > seeds for PHP 7.1.0+ (since those are also mt_rand() seeds), but on the > other hand this cross-seeding and cross-consumption of random numbers > can affect which attacks work or don't work, and exactly how, against > specific applications that make use of both sets of PHP functions. > > PHP 7.1.0 also introduced MT_RAND_PHP as optional second parameter to > mt_srand(). When specified, it correctly enables behavior identical to > that of PHP versions 5.2.1 to 7.0.x. Thus, seeds that php_mt_seed > reports as valid for 5.2.1 to 7.0.x are always also valid for 7.1.0+ > with MT_RAND_PHP, and conversely seeds that php_mt_seed reports as valid > for 7.1.0+ are often invalid for 7.1.0+ with MT_RAND_PHP (except when > the same seeds are also valid for 5.2.1 to 7.0.x, which is common). > --- > > Alexander > Sorry for the long delay. I've just applied https://github.com/php/php-src/commit/fd07302024bc47082b13b32217147fd39d1e9e61 to the 7.2 branch. Davey, Joe, do we want to take action here for 7.1? It's a pretty severe bias, but fixing it is going to change seed sequences. I think at this point we're too far in the 7.1 cycle to apply this kind of change. Nikita --001a11402b2a3e2b9a05589d8e20--