Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118842 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 54017 invoked from network); 18 Oct 2022 17:22:27 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 18 Oct 2022 17:22:27 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 77C7F1804AC for ; Tue, 18 Oct 2022 10:22:26 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 18 Oct 2022 10:22:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1666113744; bh=IkxtD03aTtz0F7WRqmDCMc3UOrRwzc7T/Z7F4b1uw9E=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=IMRGtrsVuLb/MoxvMdgdFmNXQAJaZeFWg/WD6Su+6NYfOsALe075FB6iXQvYoEMVB OK2DGOTe3pwGboxg7iA60uX2wStqH3Asbupv30nA/zpjgRZdV8Vl/E+tBLSMAeFbMh JbmpTc8V65C+D6YmTj+kSSg1F4SlPojlJ22xTv9bECtIk8bs3a1IWW7gCOmJhAULu6 d30dsaB2sO2EjDAfGW1TnOZ4EDl9WDL8tKBk6gLilRVJqKY8GbJdJP2oYhiQSROmnB 21ZZ/2EuNbwAKCxbgxC65kAi4z0lrfqJEIs4xpry0xPHP71Y2KSl5fh1mvHEmqyW5b NfoIQGW1AZOmA== Content-Type: multipart/mixed; boundary="------------bMG139Z0KI8SA98p71KXpMDd" Message-ID: Date: Tue, 18 Oct 2022 19:22:23 +0200 MIME-Version: 1.0 Content-Language: en-US To: Dan Ackroyd , =?UTF-8?Q?Joshua_R=c3=bcsweg?= Cc: internals@lists.php.net References: <5ceebae4-a3fb-5d29-cdb7-dceed7b07c78@wcflabs.de> In-Reply-To: Subject: Re: [PHP-DEV] RFC [Discussion]: Randomizer Additions From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) --------------bMG139Z0KI8SA98p71KXpMDd Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi On 10/16/22 22:24, Dan Ackroyd wrote: >> Shall an option be added to getFloat() that changes the logic to > select from [$min, $max] (i.e. allowing the maximum to be returned)? And > how should that look like? Boolean parameter? Enum? > > An enum would probably be nice, and possibly be for all four cases of > min_(inclusive|exclusive)_max_(inclusive|exclusive) unless there is a > technical reason to not include all of them. No technical reason. The paper for the γ-section algorithm by Prof. Goualard includes implementations for all four combinations. The [closed, open) and [closed, closed] variants together are the most useful combination, though. The former can be cleanly split into subintervals, whereas the latter is symmetric which is useful if your use case is as well. With rejection sampling the [closed, closed] variant can also be turned into any of the other three without introducing any bias. >> Generating a random string containing specific characters...thus requires multiple lines of code for what effectively is a very simple operation. > > Yeah, though those lines of code add distinction and emphasis for is > meant by character. > > In particular, users might be surprised when they give this string > "abc😋👨‍👩‍👦"* and get a non-ascii result. That's why the method includes 'bytes' in its name. This term is also used in ->shuffleBytes() which was renamed in https://wiki.php.net/rfc/random_extension_improvement due to this exact problem. In fact ->shuffleBytes() can be considered the companion method to the proposed ->getBytesFromAlphabet(): ->shuffleBytes() allows to simulate a Multivariate hypergeometric distribution ("sampling without replacement") by shuffling the input string and then using 'substr' on the result to select a number of bytes 'n'. ->getBytesFromAlphabet() directly maps to a Multinomial distribution ("replacement sampling"). > You're going to need to be really precise on the naming and I'm not at > all sure there is a single version that would be useful enough to > belong in core. Personally I believe the proposed ->getBytesFromAlphabet() to be useful enough to belong in core. There are quite a few use cases that are naturally restricted to ASCII: Basically everything that can be considered an identifier. Arbitrary numeric strings alone likely provide plenty of use cases: - Backup codes for multi-factor authentication (restricting the input to digits allows you to leverage a numeric keyboard, reducing the chance for input errors). - Random phone numbers for testing purposes. - Random credit card numbers (you just need to calculate the checksum yourself). While hexadecimal strings can easily be generated by applying bin2hex to the output of ->getBytes(), that is also unintuitive, because the result length is twice the number of bytes. >> whereas a 64 Bit engine could generate randomness for 8 characters at once. > > I'm really not sure that many programs are going to be speed limited > by random number generation. The syscall cost to retrieve bytes from the 'Secure' engine (which is the default engine) can be expensive. Especially on older operating system versions and depending on how many more Meltdown/Spectre-style vulnerabilities they find. A userland implementation that generates 1000 random numeric strings with 100 characters each using the 'Secure' engine requires 146ms on my computer. The native implementation without optimization requires 101ms and the optimized native implementation 26ms. For Xoshiro256** (the fastest engine) the numbers are 89ms, 7ms and 3ms respectively. Benchmark attached. > For those that are, writing their own generator to consume all 64 bits > of randomness for each call sounds reasonably sensible, unless a > useful general api can be thought of. This cannot be reasonably done in userland, because you pay an increased cost to turn the bytes into numbers and then to perform the necessary bit fiddling to debias the numbers. > For the float side of the RFC, as there are technical limitations on > which platforms it would be usable on, there needs to be a way of > determining whether the nextFloat and getFloat methods are going to > work. The way this is done on Imagick is to put appropriate defines in > the stub file and in the C code implementations so that the methods > aren't available on the class for the platforms where it isn't going > to function correctly. > > I made a PR for that to Tim's repo, though I don't know of an > environment where it can be tested. I've seent he PR and we shortly discussed this in chat. I would defer this to the code review, because this amounts to an implementation detail. Especially since all reasonable server platforms use IEEE 754. Best regards Tim Düsterhus --------------bMG139Z0KI8SA98p71KXpMDd Content-Type: application/x-php; name="alphabet-benchmark.php" Content-Disposition: attachment; filename="alphabet-benchmark.php" Content-Transfer-Encoding: base64 PD9waHAKCnVzZSBSYW5kb21cRW5naW5lOwp1c2UgUmFuZG9tXFJhbmRvbWl6ZXI7CgoKCiRs b29wcyA9IDEwMDA7CiRsZW5ndGggPSAxMDA7CgokciA9IG5ldyBSYW5kb21pemVyKCk7Cgok c3RhcnQgPSBocnRpbWUodHJ1ZSk7CiAgICBmb3IgKCRpID0gMDsgJGkgPCAkbG9vcHM7ICRp KyspICRyLT5nZXRCeXRlc0Zyb21BbHBoYWJldCgnMDEyMzQ1Njc4OScsICRsZW5ndGgpOwok ZW5kID0gaHJ0aW1lKHRydWUpOwp2YXJfZHVtcCgoJGVuZCAtICRzdGFydCkgLyAxZTkpOwoK JHN0YXJ0ID0gaHJ0aW1lKHRydWUpOwogICAgZm9yICgkaSA9IDA7ICRpIDwgJGxvb3BzOyAk aSsrKSAkci0+Z2V0Qnl0ZXNGcm9tQWxwaGFiZXROb09wdGltaXphdGlvbignMDEyMzQ1Njc4 OScsICRsZW5ndGgpOwokZW5kID0gaHJ0aW1lKHRydWUpOwp2YXJfZHVtcCgoJGVuZCAtICRz dGFydCkgLyAxZTkpOwoKJHN0YXJ0ID0gaHJ0aW1lKHRydWUpOwogICAgZm9yICgkaSA9IDA7 ICRpIDwgMTAwMDsgJGkrKykgbXlfYnl0ZXNfZnJvbV9hbHBoYWJldCgnMDEyMzQ1Njc4OScs ICRsZW5ndGgsICRyKTsKJGVuZCA9IGhydGltZSh0cnVlKTsKdmFyX2R1bXAoKCRlbmQgLSAk c3RhcnQpIC8gMWU5KTsKCiRyID0gbmV3IFJhbmRvbWl6ZXIobmV3IEVuZ2luZVxYb3NoaXJv MjU2U3RhclN0YXIpOwoKJHN0YXJ0ID0gaHJ0aW1lKHRydWUpOwogICAgZm9yICgkaSA9IDA7 ICRpIDwgJGxvb3BzOyAkaSsrKSAkci0+Z2V0Qnl0ZXNGcm9tQWxwaGFiZXQoJzAxMjM0NTY3 ODknLCAkbGVuZ3RoKTsKJGVuZCA9IGhydGltZSh0cnVlKTsKdmFyX2R1bXAoKCRlbmQgLSAk c3RhcnQpIC8gMWU5KTsKCiRzdGFydCA9IGhydGltZSh0cnVlKTsKICAgIGZvciAoJGkgPSAw OyAkaSA8ICRsb29wczsgJGkrKykgJHItPmdldEJ5dGVzRnJvbUFscGhhYmV0Tm9PcHRpbWl6 YXRpb24oJzAxMjM0NTY3ODknLCAkbGVuZ3RoKTsKJGVuZCA9IGhydGltZSh0cnVlKTsKdmFy X2R1bXAoKCRlbmQgLSAkc3RhcnQpIC8gMWU5KTsKCiRzdGFydCA9IGhydGltZSh0cnVlKTsK ICAgIGZvciAoJGkgPSAwOyAkaSA8IDEwMDA7ICRpKyspIG15X2J5dGVzX2Zyb21fYWxwaGFi ZXQoJzAxMjM0NTY3ODknLCAkbGVuZ3RoLCAkcik7CiRlbmQgPSBocnRpbWUodHJ1ZSk7CnZh cl9kdW1wKCgkZW5kIC0gJHN0YXJ0KSAvIDFlOSk7CgpmdW5jdGlvbiBteV9ieXRlc19mcm9t X2FscGhhYmV0KHN0cmluZyAkYWxwaGFiZXQsIGludCAkbiwgUmFuZG9taXplciAkcik6IHN0 cmluZwp7CiAgICAkbWF4ID0gXHN0cmxlbigkYWxwaGFiZXQpIC0gMTsKICAgICRyZXN1bHQg PSAnJzsKCgogICAgZm9yICgkaSA9IDA7ICRpIDwgJG47ICRpKyspIHsKICAgICAgICAkcmVz dWx0IC49ICRhbHBoYWJldFskci0+Z2V0SW50KDAsICRtYXgpXTsKICAgIH0KCiAgICByZXR1 cm4gJHJlc3VsdDsKfQo= --------------bMG139Z0KI8SA98p71KXpMDd--