Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117061 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 65415 invoked from network); 18 Feb 2022 09:27:50 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 18 Feb 2022 09:27:50 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 062F2180539 for ; Fri, 18 Feb 2022 02:46:16 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 18 Feb 2022 02:46:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1645181174; bh=lL32E2sVhdmR0N6a6kXes3U5KpfPALBQMqAKP8ESjcU=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=Taf5teB2CpEseBAlREnnUrLxgXg8H/Wq6BXYf45pndFA7h3k6SNnQWwdvICOdhYyp UqeCprQjEtltRewiDOzzSFjgyaTy3wOC24wYqKLIu8/ae433D3oPrGsKMFFacyRVv2 0jntjMTQJaJcmkTdPuP2YljSbaF1wweoI1rAn+yKY0WszXcGdpWWysuVgwtXMiO9Qr M0PSfxsYYMHJdZAToT3W2AqQVsMnYp7XRMMif7qsDcOEELKKqwhKDXE4Jr0p3Bz2+e ln3Joi0r6YHA43n1cFq2OpQ1ksmLSj9ejzXZhTSis55or/RT07+NTj478WVIGvwazn Fr89vm/IcrigA== Message-ID: <26a8c3ee-9f0a-793c-10c0-7e642eedf1d0@bastelstu.be> Date: Fri, 18 Feb 2022 11:46:13 +0100 MIME-Version: 1.0 Content-Language: en-US To: Go Kudo Cc: internals@lists.php.net References: <41a1b458-4941-f34e-f1b4-e25b3298b80a@bastelstu.be> <553ba7ca-3821-c2d9-f88f-b216013a887b@bastelstu.be> <2c667812-88c8-0b7b-3558-561a1348d0b2@bastelstu.be> <5f496cf9-8754-b009-9cb5-b978222b2249@bastelstu.be> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] [RFC] [Under Discussion] Random Extension 4.0 From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) Hi On 2/18/22 07:31, Go Kudo wrote: > I have been looking into output buffering, but don't know the right way to > do it. The buffering works fine if all RNG generation widths are static, > but if they are dynamic so complicated. I believe the primary issue here is that the engines are expected to return an uint64_t, instead of a buffer with raw bytes. This requires you to perform many conversions between the uint64 and the raw buffer: When calling Randomizer::getBytes() for a custom engine the following needs to happen: - The Engine returns a byte string. - This bytestring is then internally converted into an uint64_t. - Then calling Randomizer::getBytes() this uint64_t needs to be converted back to a bytestring. To avoid those conversations without sacrificing too much performance it might be possible to return a struct that contains a single 4 or 8-byte array: struct four_bytes { unsigned char val[4]; }; struct four_bytes r; r.val[0] = (result >> 0) & 0xff; r.val[1] = (result >> 8) & 0xff; r.val[2] = (result >> 16) & 0xff; r.val[3] = (result >> 24) & 0xff; return r; .val can be treated as a bytestring, but it does not require dynamic allocation. By doing that the internal engines (e.g. Xoshiro) would be consistent with the userland engines. > It is possible to solve this problem by allowing generate() itself to > specify the size it wants, but this would significantly slow down > performance. I don't think it's a good idea to add a size parameter to generate(). > I've looked at the sample code, but do you really need support for > Randomizer? Engine::generate() can output dynamic binaries up to 64 bits. > You can use Engine directly, instead of Randomizer::getBytes(). > > What exactly is the situation where buffering by Randomizer is needed? *I* don't need anything. I'm just trying to think of use-cases and edge-cases. Basically: What would a user attempt to do and what would their expectations be? I'm not saying that this buffering *must* be implemented, but this is something we need to think about. Because changing the behavior later is pretty much impossible, as users might rely on a specific behavior for their seeded sequences. The behavior might also need to be part of the documentation. Basically what we need to think about is what guarantees we give. As an example: 1. Calling Engine::generate() with the same seed results in the same sequence (This guarantee we give, and it is useful). 2. Calling Randomizer::getInt() with the same seeded engine results in the same numbers for the same parameters (I think this also is useful). 3. Calling Randomizer::getBytes() with the same seeded engine results in the same byte sequence (This is something we are currently discussing). 4. Calling Randomizer::getBytes() simply concatenates the raw bytes retrieved by the Engine (This ties into (3)). 5. Calling Randomizer::shuffleArray() with the same seeded engine results in the same result for the same string (This one is more debatable, because then we must maintain the exact same shuffleArray() implementation forever). All these guarantees should be properly documented within the RFC. The RFC template (https://wiki.php.net/rfc/template) says: > Remember that the RFC contents should be easily reusable in the PHP Documentation. So by thinking about this now and putting it in the RFC, the explanations can easily be copied into the documentation if the RFC passes the vote. One should not need to look into the implementation to understand how the Engines and the Randomizer is supposed to work. > Also worried that buffering will cut off random numbers at arbitrary sizes. > It may cause bias in the generated results. > If there's bias in specific bits or bytes of the generated number then getBytes(32) will already be biased even without buffering, as the raw bytes are what's of interest here. It does not matter if they are at the 1st or 4th position (for a 32-bit engine). Best regards Tim Düsterhus