Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117031 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 13731 invoked from network); 15 Feb 2022 02:40:56 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 15 Feb 2022 02:40:56 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 884681804A7 for ; Mon, 14 Feb 2022 19:58:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 14 Feb 2022 19:58:31 -0800 (PST) Received: by mail-yb1-f179.google.com with SMTP id bt13so52381776ybb.2 for ; Mon, 14 Feb 2022 19:58:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colopl.co.jp; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M78notfho9J5YjbXAU9KT3V6AtaS1IF3/7ud+L4xvX0=; b=hB2TohxxEFoZC8LHns4G9vu+h1ca2r+51VpdMDwylqCusBPedzcxasTej4TZRtRK+Z YpWCXzqyZE644yMKT0vB9cXe1T6cZ4ql5V+J8+ePWoKIotuHPHwVZjO5eb2+tilHeSIi 8sJtlM/O+WHicPxCVsNDizFcZqnfzjsZ3fBh8LcGVR/9e3IFYX0EVzlHoj4rce7sVwIk /mqrcshPpOSLaUiGbxSu22wOay2Rmb8ffVOYnP9wTOOb3zujwRZPUg3WipL7YYq+tZfs mWcPyEcoztQE5jwffskLFjeJnfDLXivRrp/m3RvfRHqLI5xA1OZsq4iF2n7sltGntr1c krkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M78notfho9J5YjbXAU9KT3V6AtaS1IF3/7ud+L4xvX0=; b=32PYBRx1BP2aKYhUcnZvFOZ9ibkzohdZrq7yYBDQKrZTffYXuCd58auidn2BoN55jq bF6L94GEs1KslsQs3biCajage4VH1+8/fSH1rEB7LM+Z/SG5wz6hZuQevXn1KtzUkAj+ G1nfnZEaQqwmbN3XoiSn/z5cbnZ1WCh9gbN5BD76VnZFCeheLKA5EdxVVWJi8LuKiCue soPjOr5fm3TEyIShD6+LH8CakDVWVfLU1WjokD7PULDHiELdDDiXfuXRHOal7cFXWukq hM2CoYL5oYDbJeUziXwnRs+Pzb0q5oZVV4LoZEnNp7P2i26UguyophgnnlmqfsHT61kw r1og== X-Gm-Message-State: AOAM5339ZVKJAjJ0NXIEf2DpChM9oX3Rd0gBQI6n3wuMtPIDbJX1yvDZ LRQR1jhqlpEbzqOZZ3cKvoWWKHu99KVx+So8Irtd X-Google-Smtp-Source: ABdhPJyAS8BeDunWYOm1e1xoAetqhl6jbxfyLQdHzuh86CMGXlIdtr6/bWTnUaHBhsTPzOtmJbayC88KYiZWWFEev4w= X-Received: by 2002:a25:dc14:: with SMTP id y20mr2148706ybe.115.1644897511187; Mon, 14 Feb 2022 19:58:31 -0800 (PST) MIME-Version: 1.0 References: <41a1b458-4941-f34e-f1b4-e25b3298b80a@bastelstu.be> <553ba7ca-3821-c2d9-f88f-b216013a887b@bastelstu.be> In-Reply-To: Date: Tue, 15 Feb 2022 12:58:20 +0900 Message-ID: To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000d8bd1705d8068b93" Subject: Re: [PHP-DEV] [RFC] [Under Discussion] Random Extension 4.0 From: g-kudo@colopl.co.jp (Go Kudo) --000000000000d8bd1705d8068b93 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable 2022=E5=B9=B42=E6=9C=8815=E6=97=A5(=E7=81=AB) 1:46 Tim D=C3=BCsterhus : > Hi > > On 2/14/22 16:44, Tim D=C3=BCsterhus wrote: > > Unfortunately your PR doesn't compile for me, so I can't test: > > > > make: *** No rule to make target 'php-src/ext/standard/lcg.c', needed b= y > > 'ext/standard/lcg.lo'. Stop. > > I've managed to compile it by cleaning the whole directory and rerunning > of the build steps. Not sure what I missed the first time. > > I've now been able to play around with it and have some additional > discussion points: > > 1) Consider the following script: > > > > use Random\NumberGenerator\XorShift128Plus; > use Random\Randomizer; > > $g1 =3D new XorShift128Plus(2); > $g2 =3D clone $g1; > > $r1 =3D new Randomizer($g1); > $r2 =3D new Randomizer($g2); > > var_dump(\bin2hex($r1->getBytes(8))); > var_dump(\bin2hex($r2->getBytes(4)) . \bin2hex($r2->getBytes(4))); > > As a user: Would you expect those two 'var_dump' calls to result in the > same output? > > Personally I would. For me that implies: > > 1. generate() should return raw bytes instead of a number (as I > suggested before). > 2. The 'Randomizer' object should buffer unused bytes internally and > only call generate() if the internal buffer is drained. > > 2) Why xorshift instead of xoshiro / xoroshiro? > > https://vigna.di.unimi.it/xorshift/ says that: > > > Information about my previous xorshift-based generators can be found > here, but they have been entirely superseded by the new ones, which are > faster and better. > > That would imply to me that xorshift should not be used in new > developments. > > 3) Consider the following script: > > > use Random\NumberGenerator\XorShift128Plus; > > $g1 =3D new XorShift128Plus(2); > > var_dump($g1); > > exit; > > Should the user be able to see the internal state of the Generator in > the var_dump() output? > > 4) Both xorshift as well as xoshiro / xoroshiro's reference > implementations include a 'jump()' function that allows one to easily > retrieve generators with distinct sequences, without needing to generate > seeds manually which might or might nor introduce a bias. > > Is this something that we should provide as well? > > 5) As a follow-up to (4): Should the 'generate()' method be called > 'next()' or 'step()' instead? Perhaps it should even be '__invoke()'? > > Best regards > Tim D=C3=BCsterhus > If there are no objections, I will change the NumberGenerator that Randomizer uses by default to Secure. > Regarding "unintuitive": I disagree. I find it unintuitive that there are some RNG sequences that I can't access when providing a seed. This is also the case for RNG implementations in many other languages. For example, Java also uses long (64-bit) as the seed value of the argument for Math. https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Rand= om.html#%3Cinit%3E(long) As mentioned above, V8 also uses a 64-bit value as the seed value, and generating with XorShift128+. https://github.com/v8/v8/blob/main/src/base/utils/random-number-generator.h On the other hand, some languages have access to the complete internal state. Python, for example, accepts bytes or bytearrays. https://docs.python.org/3/library/random.html#random.seed However, making strings available in PHP may lead to incorrect usage. I think we can safely do this by making the seed argument accept both int and string, and only using it as the internal state if string is specified and it's 128-bits long. > I've managed to compile it by cleaning the whole directory and rerunning of the build steps. Not sure what I missed the first time. This is probably due to a major change in config.m4. ./buidconf and ./configure need to be rerun properly. > 1. Would you expect those two 'var_dump' calls to result in the same output? Added __debugInfo() magic method supports. https://github.com/php/php-src/pull/8094/commits/78efd2bd1e0ac5db48c272b364= a615a5611e8caa > generate() should return raw bytes instead of a number (as I suggested before). I don't think this is a very good idea. The RNG is a random number generator and should probably not be generating strings. Of course, I am aware that strings represent binary sequences in PHP. However, this is not user-friendly. The generation of a binary string is a barrier when trying to implement some kind of operation using numeric computation. If you want to deal with the problem of generated size, it would be more appropriate to define a method such as getGenerateSize() in the interface. Even in this case, generation widths greater than PHP_INT_SIZE cannot be supported, but generation widths greater than 64-bit are not very useful in the first place. > The 'Randomizer' object should buffer unused bytes internally and only call generate() if the internal buffer is drained. Likewise, I think this is not a good idea. Buffering reintroduces the problem of complex state management, which has been made so easy. The user will always have to worry about the buffering size of the Randomizer. > Why xorshift instead of xoshiro / xoroshiro? The XorShift128Plus algorithm is still in use in major browsers and is dead in a good way. Also, in our local testing, SplitMix64 + XorShift128Plus performed well in terms of performance and random number quality, so I don't think it is necessary to choose a different algorithm. If this RFC passes, it will be easier to add algorithms in the future. If a new algorithm is needed, it can be implemented immediately. Regards, Go Kudo --000000000000d8bd1705d8068b93--