Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:100245 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 37863 invoked from network); 17 Aug 2017 13:19:25 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Aug 2017 13:19:25 -0000 Authentication-Results: pb1.pair.com smtp.mail=solar@openwall.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=solar@openwall.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain openwall.com designates 195.42.179.200 as permitted sender) X-PHP-List-Original-Sender: solar@openwall.com X-Host-Fingerprint: 195.42.179.200 mother.openwall.net Received: from [195.42.179.200] ([195.42.179.200:64877] helo=mother.openwall.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id FF/1F-34801-9D795995 for ; Thu, 17 Aug 2017 09:19:22 -0400 Received: (qmail 31959 invoked from network); 17 Aug 2017 13:19:17 -0000 Received: from localhost (HELO pvt.openwall.com) (127.0.0.1) by localhost with SMTP; 17 Aug 2017 13:19:17 -0000 Received: by pvt.openwall.com (Postfix, from userid 503) id 367A6AB18D; Thu, 17 Aug 2017 15:18:31 +0200 (CEST) Date: Thu, 17 Aug 2017 15:18:31 +0200 To: Nikita Popov Cc: Leigh , PHP internals Message-ID: <20170817131830.GB14477@openwall.com> References: <20170816191247.GA12324@openwall.com> <20170816214155.GA12831@openwall.com> <20170816220242.GA13012@openwall.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Subject: Re: [PHP-DEV] PHP 7.1.0 to 7.2.0beta2 mt_rand() modulo bias bug From: solar@openwall.com (Solar Designer) On Thu, Aug 17, 2017 at 12:57:56AM +0200, Nikita Popov wrote: > On Thu, Aug 17, 2017 at 12:02 AM, Solar Designer wrote: > > One difference I didn't notice at first: the currently committed code > > does only one php_mt_rand() call per loop iteration when it's skipping > > "numbers over the limit", even in the 64-bit case (thus, it replaces 32 > > bits of the value at a time). Your 64-bit version (and my revision of > > it) does two calls per iteration (replacing the whole number). > > > > Arguably, the currently committed code is smarter and more efficient in > > that respect, whereas yours is cleaner. Regardless, it's an extra > > change that will affect the generated random numbers in some cases where > > we could avoid making that change (it's not fixing any bug). > > > > I think it's preferable not to unnecessarily change what numbers are > > generated. > Good point. However, I think that this optimization is not actually > correct. For example, let's take umax = 0x1_0000_0001, which is the > smallest value for which this codepath is taken. In this case limit would > be 0xffff_ffff_ffff_fffe, so we only resample if the value is exactly > 0xffff_ffff_ffff_ffff. So if the resampling codepath is taken in this case, > and we only generate one new 32-bit value, the top word of the result will > be fixed to 0xffff_ffff. (A very small bias in this case, but there's > probably more significant cases.) Great point. More generally, we can't reuse the same random number for decision-making (to skip it) and as part of the next (not so) random number, without introducing bias. So we shouldn't, and we should in fact move away from the old "smarter" behavior as a bug fix. Thank you! Alexander