Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117032 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 30811 invoked from network); 15 Feb 2022 08:45:28 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 15 Feb 2022 08:45:28 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 80F681804A7 for ; Tue, 15 Feb 2022 02:03:06 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 15 Feb 2022 02:03:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1644919381; bh=NL752TFbIsBiCTg+S8ITfMXKHtTDIEUibLaqBAENWvo=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=oaCbkrCWc4Po7QMj/at1VZe9hsEgC2aGjJAMhtePzL441sKFr9Qksq66kOZUgWbL2 hyScbmn5BpU/KakYi8bDVd0sOZxbIdvbnOq8kSxAPaOWUUpvsz+f2adbXKEaQxry81 aDPYg7gTcgK5nCu2vtoLDRQECi64qKwmKE40gmYPiGIj/3rPcLzgfpuNb1lDRrUOf/ q3A1qMJ2wDYx8E39zxudoppzk6kVfws1l1x5dnk1hQtO541Bm+y68cvJ6Ex31FenZn J7uvrexl8RWtEz5SJYhIj80g3HarpmFkln7xowfUF2/CvMRmZ5Qwvt2WFT8gdl2kl9 fHbIYyKdPKTJw== Message-ID: <2c667812-88c8-0b7b-3558-561a1348d0b2@bastelstu.be> Date: Tue, 15 Feb 2022 11:02:59 +0100 MIME-Version: 1.0 Content-Language: en-US To: Go Kudo Cc: internals@lists.php.net References: <41a1b458-4941-f34e-f1b4-e25b3298b80a@bastelstu.be> <553ba7ca-3821-c2d9-f88f-b216013a887b@bastelstu.be> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] [RFC] [Under Discussion] Random Extension 4.0 From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) Hi On 2/15/22 04:58, Go Kudo wrote: >> Regarding "unintuitive": I disagree. I find it unintuitive that there are > some RNG sequences that I can't access when providing a seed. > > This is also the case for RNG implementations in many other languages. For > example, Java also uses long (64-bit) as the seed value of the argument for > Math. > > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Random.html#%3Cinit%3E(long) java.util.Random is a LCG with only 48 Bits of state. A single 64-bit signed long is sufficient to represent the state. > On the other hand, some languages have access to the complete internal > state. Python, for example, accepts bytes or bytearrays. > > https://docs.python.org/3/library/random.html#random.seed > > However, making strings available in PHP may lead to incorrect usage. > > I think we can safely do this by making the seed argument accept both int > and string, and only using it as the internal state if string is specified > and it's 128-bits long. That's a solution that would work for me. >> 1. Would you expect those two 'var_dump' calls to result in the same > output? > > Added __debugInfo() magic method supports. > > https://github.com/php/php-src/pull/8094/commits/78efd2bd1e0ac5db48c272b364a615a5611e8caa Don't forget to update the RFC accordingly. It would probably be helpful if you would put the full class stubs into the RFC. I find that easier to understand than a list of methods. >> generate() should return raw bytes instead of a number (as I suggested > before). > > I don't think this is a very good idea. > > The RNG is a random number generator and should probably not be generating > strings. I'd say that the 'number' part in RNG is not technically accurate. All RNGs are effectively generators for a random sequence of bits. The number part is just an interpretation of those random sequence of bits (e.g. 64 of them). > Of course, I am aware that strings represent binary sequences in PHP. > However, this is not user-friendly. > > The generation of a binary string is a barrier when trying to implement > some kind of operation using numeric computation. I believe the average user of the RNG API would use the Randomizer class, instead of the raw generators, thus they would not come in contact with the raw bytes coming from the generator. However by getting PHP integers out of the generator it is much harder for me to process the raw bits and bytes, if that's something I need for my use case. As an example if I want to implement the following in userland. Then with getting raw bytes: - For Randomizer::getBytes() I can just concatenate the raw bytes. - For a random uint16BE I can grab 2 bytes and call unpack('n', $bytes) If I get random 64 Bit integers then: - For Randomizer::getBytes() I need to use pack and I'm not even sure, whether I need to use 'q', 'Q', 'J', 'P' to receive an unbiased result. - For uint16BE I can use "& 0xFFFF", but would waste 48 Bits, unless I also perform bit shifting to access the other bytes. But then there's also the same signedness issue. Interpreting numbers as bytes and vice versa in C / C++ is very easy. However in PHP userland I believe the bytes -> numbers direction is easy-ish. The numbers -> bytes direction is full of edge cases. > If you want to deal with the problem of generated size, it would be more > appropriate to define a method such as getGenerateSize() in the interface. > Even in this case, generation widths greater than PHP_INT_SIZE cannot be > supported, but generation widths greater than 64-bit are not very useful in > the first place. > >> The 'Randomizer' object should buffer unused bytes internally and only > call generate() if the internal buffer is drained. > > Likewise, I think this is not a good idea. Buffering reintroduces the > problem of complex state management, which has been made so easy. The user > will always have to worry about the buffering size of the Randomizer. Unfortunately you did not answer the primary question. The ones you answered were just follow-up conclusions from the answer I would give: var_dump(\bin2hex($r1->getBytes(8))); var_dump(\bin2hex($r2->getBytes(4)) . \bin2hex($r2->getBytes(4))); As a user: Would you expect those two 'var_dump' calls to result in the same output? >> Why xorshift instead of xoshiro / xoroshiro? > > The XorShift128Plus algorithm is still in use in major browsers and is dead > in a good way. I believe that that the underlying RNG in web browsers is considered an implementation detail, no? For PHP this would be part of the API surface and would need to be maintained indefinitely. Certainly it would make sense to use the latest and greatest RNG, instead of something that is outdated when its first shipped, no? > Also, in our local testing, SplitMix64 + XorShift128Plus performed well in > terms of performance and random number quality, so I don't think it is > necessary to choose a different algorithm. > > If this RFC passes, it will be easier to add algorithms in the future. If a > new algorithm is needed, it can be implemented immediately. Best regards Tim Düsterhus