Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100229 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 62711 invoked from network); 16 Aug 2017 19:13:00 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 Aug 2017 19:13:00 -0000 Authentication-Results: pb1.pair.com smtp.mail=solar@openwall.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=solar@openwall.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain openwall.com designates 195.42.179.200 as permitted sender) X-PHP-List-Original-Sender: solar@openwall.com X-Host-Fingerprint: 195.42.179.200 mother.openwall.net Received: from [195.42.179.200] ([195.42.179.200:64431] helo=mother.openwall.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 0C/26-34801-93994995 for ; Wed, 16 Aug 2017 15:12:58 -0400 Received: (qmail 11265 invoked from network); 16 Aug 2017 19:12:54 -0000 Received: from localhost (HELO pvt.openwall.com) (127.0.0.1) by localhost with SMTP; 16 Aug 2017 19:12:54 -0000 Received: by pvt.openwall.com (Postfix, from userid 503) id B2846AB18D; Wed, 16 Aug 2017 21:12:47 +0200 (CEST) Date: Wed, 16 Aug 2017 21:12:47 +0200 To: PHP internals Message-ID: <20170816191247.GA12324@openwall.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug From: solar@openwall.com (Solar Designer) Hi, It is well-known that it is impossible to map e.g. a 32-bit random number with a uniform distribution over its full range of values onto a range with fewer different values while maintaining a uniform distribution, except when the target range contains a whole power of 2 number of different values. Prior to 7.1.0, mt_rand()'s mapping onto the target range was done with some floating-point math, so there was no modulo division exposed in code, yet the problem above is fundamental and indeed it applied, e.g.: https://3v4l.org/ihCT8 > 1) & 1]++; } printf("%.1f%% vs. %.1f%%\n", 100. * $halves[0] / $total, 100. * $halves[1] / $total); ?> Output for 7.1.0 - 7.2.0beta2 49.9% vs. 50.1% Output for 5.2.1 - 5.6.30, hhvm-3.10.1 - 3.21.0, 7.0.0 - 7.0.20 40.0% vs. 60.0% Output for 4.3.0 - 5.2.0 39.8% vs. 60.2% PHP 7.1.0 moved to explicit integer modulo division yet tried to avoid the modulo bias by generating multiple original 32-bit or 64-bit random numbers in cases where the modulo bias would occur. Unfortunately, because of an implementation bug it failed, e.g.: https://3v4l.org/kBpqh Output for 7.1.0 - 7.2.0beta2 60.0% vs. 40.0% Output for 4.3.0 - 5.6.30, hhvm-3.10.1 - 3.21.0, 7.0.0 - 7.0.20 50.0% vs. 50.0% (Even higher bias can be seen in both cases by changing the 0x66666666 to 0xaaaaaaaa.) I first found the bug through code review while working on adding PHP 7.1.0+ support to php_mt_seed. The above 3v4l was only to confirm it. As I said in the Twitter thread (certainly not a proper place): https://twitter.com/solardiz/status/897617315008839686 "The bug is unconditional use of 64-bit ZEND_ULONG_MAX in the bias avoidance, even when the intermediate result is 32-bit (up to UINT32_MAX)." Leigh confirmed this understanding, tweeting: "Yep, happens when (max - min) is less than UINT32_MAX. A check on (umax > UINT32_MAX) and setting the limit fixes it." This sounds like almost the correct fix to me. The code is: PHPAPI zend_long php_mt_rand_range(zend_long min, zend_long max) { zend_ulong umax = max - min; zend_ulong limit; zend_ulong result; result = php_mt_rand(); #if ZEND_ULONG_MAX > UINT32_MAX if (umax > UINT32_MAX) { result = (result << 32) | php_mt_rand(); } #endif /* Special case where no modulus is required */ if (UNEXPECTED(umax == ZEND_ULONG_MAX)) { return (zend_long)result; } /* Increment the max so the range is inclusive of max */ umax++; /* Powers of two are not biased */ if (EXPECTED((umax & (umax - 1)) != 0)) { /* Ceiling under which ZEND_LONG_MAX % max == 0 */ limit = ZEND_ULONG_MAX - (ZEND_ULONG_MAX % umax) - 1; /* Discard numbers over the limit to avoid modulo bias */ while (UNEXPECTED(result > limit)) { #if ZEND_ULONG_MAX > UINT32_MAX if (umax > UINT32_MAX) { result = (result << 32) | php_mt_rand(); } else { result = php_mt_rand(); } #else result = php_mt_rand(); #endif } } return (zend_long)((result % umax) + min); } and what Leigh means is something like: /* Ceiling under which *_MAX % max == 0 */ if (umax > UINT32_MAX) limit = ZEND_ULONG_MAX - (ZEND_ULONG_MAX % umax) - 1; else limit = UINT32_MAX - (UINT32_MAX % umax) - 1; Strictly speaking, the check should be the same as the one earlier in the function, which decided how "result" was obtained with one or two calls to php_mt_rand(). The above proposed check isn't exactly the same because of the "umax++" inbetween. The special case of "umax" being exactly UINT32_MAX prior to the increment needs more thought. Probably the check in the first instance of: if (umax > UINT32_MAX) { result = (result << 32) | php_mt_rand(); } is buggy and should actually be: if (umax >= UINT32_MAX) { result = (result << 32) | php_mt_rand(); } whereas further code should continue with "umax > UINT32_MAX" because of the increment. Also, why even bother to support ranges beyond 32-bit? Sounds like a misfeature to me, considering it won't(?) be universally available on all PHP builds anyway (not on 32-bit ones, right?) and thus shouldn't(?) be relied upon by applications (although it might become reasonable for application developers not to care about 32-bit soon). I also see few use cases for it, even if it were universally available. The main bug reported in this message (wrongly using ZEND_ULONG_MAX for computing "limit" when "result" is 32-bit) also means that PHP 7.1.0 to 7.2.0beta2 probably produce partially different sequences of mt_rand() outputs between 32- and 64-bit builds for the same seed, for most ranges that fit within 32 bits. Specifically, 32-bit builds probably correctly avoid the modulo bias. (However, I didn't confirm this with a test. Unfortunately, 3v4l.org appears to be 64-bit only.) Possibility of bugs like this is yet another reason not to have introduced the little-needed 64-bit build specific functionality. OTOH, this was also a (missed) opportunity to detect this bug with more testing (requiring same behavior of 32- and 64-bit builds for 32-bit inputs). Alexander