Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117084 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 54441 invoked from network); 21 Feb 2022 01:38:44 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Feb 2022 01:38:44 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8A282180507 for ; Sun, 20 Feb 2022 18:57:50 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 20 Feb 2022 18:57:49 -0800 (PST) Received: by mail-yb1-f176.google.com with SMTP id j12so31419863ybh.8 for ; Sun, 20 Feb 2022 18:57:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colopl.co.jp; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ndciv8oLo29obxpJYxN8aP9/VK145If+vtK6u5wVhpQ=; b=NjfFWgVENmJRYS56mXXEuvLw2ujwo+EtThK3K6ay59cazViYvZUR950JRIrwsUPLL6 nNs2sdKZzB7oZbJFAFlZeqwCiuXnhp8fNtHoST5F2l4UrVDDdRwzXutsRVtCA06t9Dio g0KT1hv9mtf+C5IJwt3BS3IqidG/ku+GCDRJCuXqiRH9iEHDgkmxeGsbxNTS+VfxtSex gWxp5VJjQSEzRjQWdM8acy7hh1bf6hrn3Z/MuHg54Jczaf/a7eZQTB5zP9DiILwnqjg7 hvanrPQIkipydYDqiiiZvyXfSOtK9GcOmosFp+WqUMTU0eYv64CY7E/poytK6FS1cAnG 27sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ndciv8oLo29obxpJYxN8aP9/VK145If+vtK6u5wVhpQ=; b=dEUIcEj291Fyh+4XDouX17eJ1uWOzgoF8uCrvN0sFK44OvMikyn0B6iQCeQcxDBVeG XKDNByM7aIT3xFXqTxEXRjg+k75yIVsS4hdGOZUw8Ja4SXQYDpLmn1Yaphb5BwSgv5TK Ljegi541lsA5Prlydd/H/OwLtfOKZstGPSKRwWp5DPSACGInjxfrq9x3uZ1YNeKQsZts 2Uzw0A2nRMRi8pPkK+biLQ0kntszEkU198R4gS33HqHO0DkOmCFlCegIY2eQrqlq1ZFI f/Y0wztxb/DWx5IpuM5FHsn1Wy5MrLuR3mqzWBOT85iSpKkQwjoO77h2Wd7lasGKTOIa Q/XA== X-Gm-Message-State: AOAM532d6Azj7q71aG2W3btOT8jn6SewZHwCGO5jSPH/1+nqfARo1k8c 921nb5XJT52JSBIBilAZHIn7yQkflScXhEPz2nOg X-Google-Smtp-Source: ABdhPJwWdfE3fPrWqRHclWmSZEPD25CwHmrHhHl2w+ISyIyIkAoe29okS97G8rYbY/MwOhdyr6qkKvDl7d/CBClJ5Dc= X-Received: by 2002:a25:9909:0:b0:624:57e:d919 with SMTP id z9-20020a259909000000b00624057ed919mr17267509ybn.494.1645412268459; Sun, 20 Feb 2022 18:57:48 -0800 (PST) MIME-Version: 1.0 References: <41a1b458-4941-f34e-f1b4-e25b3298b80a@bastelstu.be> <553ba7ca-3821-c2d9-f88f-b216013a887b@bastelstu.be> <2c667812-88c8-0b7b-3558-561a1348d0b2@bastelstu.be> <5f496cf9-8754-b009-9cb5-b978222b2249@bastelstu.be> <26a8c3ee-9f0a-793c-10c0-7e642eedf1d0@bastelstu.be> In-Reply-To: <26a8c3ee-9f0a-793c-10c0-7e642eedf1d0@bastelstu.be> Date: Mon, 21 Feb 2022 11:57:37 +0900 Message-ID: To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: Go Kudo , internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000c55cd905d87e65d7" Subject: Re: [PHP-DEV] [RFC] [Under Discussion] Random Extension 4.0 From: g-kudo@colopl.co.jp (Go Kudo) --000000000000c55cd905d87e65d7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable 2022=E5=B9=B42=E6=9C=8818=E6=97=A5(=E9=87=91) 19:46 Tim D=C3=BCsterhus : > Hi > > On 2/18/22 07:31, Go Kudo wrote: > > I have been looking into output buffering, but don't know the right way > to > > do it. The buffering works fine if all RNG generation widths are static= , > > but if they are dynamic so complicated. > > I believe the primary issue here is that the engines are expected to > return an uint64_t, instead of a buffer with raw bytes. This requires > you to perform many conversions between the uint64 and the raw buffer: > > When calling Randomizer::getBytes() for a custom engine the following > needs to happen: > > - The Engine returns a byte string. > - This bytestring is then internally converted into an uint64_t. > - Then calling Randomizer::getBytes() this uint64_t needs to be > converted back to a bytestring. > > To avoid those conversations without sacrificing too much performance it > might be possible to return a struct that contains a single 4 or 8-byte > array: > > struct four_bytes { > unsigned char val[4]; > }; > > struct four_bytes r; > r.val[0] =3D (result >> 0) & 0xff; > r.val[1] =3D (result >> 8) & 0xff; > r.val[2] =3D (result >> 16) & 0xff; > r.val[3] =3D (result >> 24) & 0xff; > > return r; > > .val can be treated as a bytestring, but it does not require dynamic > allocation. By doing that the internal engines (e.g. Xoshiro) would be > consistent with the userland engines. > > > It is possible to solve this problem by allowing generate() itself to > > specify the size it wants, but this would significantly slow down > > performance. > > I don't think it's a good idea to add a size parameter to generate(). > > > I've looked at the sample code, but do you really need support for > > Randomizer? Engine::generate() can output dynamic binaries up to 64 bit= s. > > You can use Engine directly, instead of Randomizer::getBytes(). > > > > What exactly is the situation where buffering by Randomizer is needed? > > *I* don't need anything. I'm just trying to think of use-cases and > edge-cases. Basically: What would a user attempt to do and what would > their expectations be? > > I'm not saying that this buffering *must* be implemented, but this is > something we need to think about. Because changing the behavior later is > pretty much impossible, as users might rely on a specific behavior for > their seeded sequences. The behavior might also need to be part of the > documentation. > > Basically what we need to think about is what guarantees we give. As an > example: > > 1. Calling Engine::generate() with the same seed results in the same > sequence (This guarantee we give, and it is useful). > 2. Calling Randomizer::getInt() with the same seeded engine results in > the same numbers for the same parameters (I think this also is useful). > 3. Calling Randomizer::getBytes() with the same seeded engine results in > the same byte sequence (This is something we are currently discussing). > 4. Calling Randomizer::getBytes() simply concatenates the raw bytes > retrieved by the Engine (This ties into (3)). > 5. Calling Randomizer::shuffleArray() with the same seeded engine > results in the same result for the same string (This one is more > debatable, because then we must maintain the exact same shuffleArray() > implementation forever). > > All these guarantees should be properly documented within the RFC. The > RFC template (https://wiki.php.net/rfc/template) says: > > > Remember that the RFC contents should be easily reusable in the PHP > Documentation. > > So by thinking about this now and putting it in the RFC, the > explanations can easily be copied into the documentation if the RFC > passes the vote. > > One should not need to look into the implementation to understand how > the Engines and the Randomizer is supposed to work. > > > Also worried that buffering will cut off random numbers at arbitrary > sizes. > > It may cause bias in the generated results. > > > > If there's bias in specific bits or bytes of the generated number then > getBytes(32) will already be biased even without buffering, as the raw > bytes are what's of interest here. It does not matter if they are at the > 1st or 4th position (for a 32-bit engine). > > Best regards > Tim D=C3=BCsterhus > Hi I am sorry for the delay in replying. Thank you for the clear explanation. It is true that the RFC in its current form lacks explanation. I'll try to fix this first. Also, as I look into other languages' implementations, I see the need to add some RNGs such as PCG. I will update the RFC to include these. Here is a Rust example: https://docs.rs/rand/latest/rand/ PCG: https://www.pcg-random.org/index.html Regards Go Kudo --000000000000c55cd905d87e65d7--