Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117035 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 38230 invoked from network); 15 Feb 2022 10:30:59 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 15 Feb 2022 10:30:59 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 17772180539 for ; Tue, 15 Feb 2022 03:48:41 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 15 Feb 2022 03:48:40 -0800 (PST) Received: by mail-yb1-f169.google.com with SMTP id j12so32946441ybh.8 for ; Tue, 15 Feb 2022 03:48:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colopl.co.jp; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CJ5z+aNOQR5ARLpISgcEbvlS7pb5xJKNlmzrDqPNztA=; b=PA/eDSzG8d71LoIrsyinZUj3JJLVCEqcmu6CZH65CDipKEGH53ZFlb+G5qLjEURMtp ZeGMIYZgw1E1bQtU/6JrVJOA6+gApl5AcCrkvt45zU6ZdPok4lHXyaNgChPAJTHbCGB+ x/YclTsL323zCB3uZBeFfZj+5AQRAig+3/Q/1qj/Ryapljw6N409O/YuHgqLf5QCked8 ocrnXh6PY22W5BP0i7xkRxd1/3hWRpEUDKlG/neW9AOnCqOXEEC7vfEebHg5M8+e2qTx +bhN7cxyw9inpoiLfJ3Lyf8pAVnE3TH9ZJ772CmV9eiEZQV9N2NZyMHgF1BzH3SsHPbZ cdVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CJ5z+aNOQR5ARLpISgcEbvlS7pb5xJKNlmzrDqPNztA=; b=FQPek1q3WO5tpFDws90Zn4x4tXrTVU8lTyiKwtllXLCnh6o0gdBLiL2OwkwkdHeqAA GwVqeiayPKpfuQ+OMKg5IzdFG8uy4ePLfu6Xckh/fsX5BUdNIdFOwdOMDGo953WwTJdP BwHREkV7A2aPskGvf/GTqcZNhYkDkOhaErWcUsLfZT8FfiTN1dUszqHFWi3irKiAE5+V 0cCL6bBURcRZQi76zbw1/SrhUBfYVJaeb1tdj3Fm/2GeccqIIWqqQAFwTEhhD1/4hTz/ kzeicrJI7EohHx20lyud4/VzuF88tK5ITGELfCY0qeyoBkmVnsi4uDEsUpNvv6dyZF4i cVbw== X-Gm-Message-State: AOAM532sI1oKdXXoK1mlT/NId4rgKppzv0nfXz7oYjPKNHBO5GFzxxVF 5I7bfxe2maoxMEXnmY+C0b1nJxhJfs65h/d21I6WoDMDE3X8 X-Google-Smtp-Source: ABdhPJy2GA0UmaIi4ttxdmiLdQxE7jomb894wLa9kHJ9wkeknYWzufyJgvu2X/CWw0ugtfQ5yU4TK7MeNYG9+P5bi3Y= X-Received: by 2002:a25:705:: with SMTP id 5mr3322070ybh.425.1644925719805; Tue, 15 Feb 2022 03:48:39 -0800 (PST) MIME-Version: 1.0 References: <41a1b458-4941-f34e-f1b4-e25b3298b80a@bastelstu.be> <553ba7ca-3821-c2d9-f88f-b216013a887b@bastelstu.be> <2c667812-88c8-0b7b-3558-561a1348d0b2@bastelstu.be> In-Reply-To: <2c667812-88c8-0b7b-3558-561a1348d0b2@bastelstu.be> Date: Tue, 15 Feb 2022 20:48:29 +0900 Message-ID: To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="0000000000003617ab05d80d1d67" Subject: Re: [PHP-DEV] [RFC] [Under Discussion] Random Extension 4.0 From: g-kudo@colopl.co.jp (Go Kudo) --0000000000003617ab05d80d1d67 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable 2022=E5=B9=B42=E6=9C=8815=E6=97=A5(=E7=81=AB) 19:03 Tim D=C3=BCsterhus : > Hi > > On 2/15/22 04:58, Go Kudo wrote: > >> Regarding "unintuitive": I disagree. I find it unintuitive that there > are > > some RNG sequences that I can't access when providing a seed. > > > > This is also the case for RNG implementations in many other languages. > For > > example, Java also uses long (64-bit) as the seed value of the argument > for > > Math. > > > > > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Ra= ndom.html#%3Cinit%3E(long) > > java.util.Random is a LCG with only 48 Bits of state. A single 64-bit > signed long is sufficient to represent the state. > > > On the other hand, some languages have access to the complete internal > > state. Python, for example, accepts bytes or bytearrays. > > > > https://docs.python.org/3/library/random.html#random.seed > > > > However, making strings available in PHP may lead to incorrect usage. > > > > I think we can safely do this by making the seed argument accept both i= nt > > and string, and only using it as the internal state if string is > specified > > and it's 128-bits long. > > That's a solution that would work for me. > > >> 1. Would you expect those two 'var_dump' calls to result in the same > > output? > > > > Added __debugInfo() magic method supports. > > > > > https://github.com/php/php-src/pull/8094/commits/78efd2bd1e0ac5db48c272b3= 64a615a5611e8caa > > Don't forget to update the RFC accordingly. It would probably be helpful > if you would put the full class stubs into the RFC. I find that easier > to understand than a list of methods. > > >> generate() should return raw bytes instead of a number (as I suggested > > before). > > > > I don't think this is a very good idea. > > > > The RNG is a random number generator and should probably not be > generating > > strings. > > I'd say that the 'number' part in RNG is not technically accurate. All > RNGs are effectively generators for a random sequence of bits. The > number part is just an interpretation of those random sequence of bits > (e.g. 64 of them). > > > Of course, I am aware that strings represent binary sequences in PHP. > > However, this is not user-friendly. > > > > The generation of a binary string is a barrier when trying to implement > > some kind of operation using numeric computation. > > I believe the average user of the RNG API would use the Randomizer > class, instead of the raw generators, thus they would not come in > contact with the raw bytes coming from the generator. > > However by getting PHP integers out of the generator it is much harder > for me to process the raw bits and bytes, if that's something I need for > my use case. > > As an example if I want to implement the following in userland. Then > with getting raw bytes: > - For Randomizer::getBytes() I can just concatenate the raw bytes. > - For a random uint16BE I can grab 2 bytes and call unpack('n', $bytes) > > If I get random 64 Bit integers then: > - For Randomizer::getBytes() I need to use pack and I'm not even sure, > whether I need to use 'q', 'Q', 'J', 'P' to receive an unbiased result. > - For uint16BE I can use "& 0xFFFF", but would waste 48 Bits, unless I > also perform bit shifting to access the other bytes. But then there's > also the same signedness issue. > > Interpreting numbers as bytes and vice versa in C / C++ is very easy. > However in PHP userland I believe the bytes -> numbers direction is > easy-ish. The numbers -> bytes direction is full of edge cases. > > > If you want to deal with the problem of generated size, it would be mor= e > > appropriate to define a method such as getGenerateSize() in the > interface. > > Even in this case, generation widths greater than PHP_INT_SIZE cannot b= e > > supported, but generation widths greater than 64-bit are not very usefu= l > in > > the first place. > > > >> The 'Randomizer' object should buffer unused bytes internally and only > > call generate() if the internal buffer is drained. > > > > Likewise, I think this is not a good idea. Buffering reintroduces the > > problem of complex state management, which has been made so easy. The > user > > will always have to worry about the buffering size of the Randomizer. > > Unfortunately you did not answer the primary question. The ones you > answered were just follow-up conclusions from the answer I would give: > > var_dump(\bin2hex($r1->getBytes(8))); > var_dump(\bin2hex($r2->getBytes(4)) . \bin2hex($r2->getBytes(4))); > > As a user: Would you expect those two 'var_dump' calls to result in the > same output? > > >> Why xorshift instead of xoshiro / xoroshiro? > > > > The XorShift128Plus algorithm is still in use in major browsers and is > dead > > in a good way. > > I believe that that the underlying RNG in web browsers is considered an > implementation detail, no? > > For PHP this would be part of the API surface and would need to be > maintained indefinitely. Certainly it would make sense to use the latest > and greatest RNG, instead of something that is outdated when its first > shipped, no? > > > Also, in our local testing, SplitMix64 + XorShift128Plus performed well > in > > terms of performance and random number quality, so I don't think it is > > necessary to choose a different algorithm. > > > > If this RFC passes, it will be easier to add algorithms in the future. > If a > > new algorithm is needed, it can be implemented immediately. > > Best regards > Tim D=C3=BCsterhus > > java.util.Random is a LCG with only 48 Bits of state. A single 64-bit signed long is sufficient to represent the state. Sorry about that. Java was not affected by this problem. At first, I updated the RFC to the latest status. https://wiki.php.net/rfc/rng_extension I need some time to think about the current issue. I understand its usefulness, but I feel uncomfortable with the fact that the NumberGenerator generates a string. I also wonder about the point of changing RNG to XorShift128Plus. There are a number of derived implementations, which RNG do you think is more suitable? Regards, Go Kudo --0000000000003617ab05d80d1d67--