Use decent entropy for uniqid($prefix, TRUE)

8 years ago by Yasuo Ohgaki — view source

unread

Hi all,

I thought we must fix due to proposed PHPMailer bug fix patch. (See below
for detail) Previous discussion went wrong because of compatibility
misunderstanding. There is no additional BC issue. Please keep in mind
this.

This is simple change proposal replacing weak entropy to string one.

Background:

uniqid() returns timestamp or timestamp+random value which is based on
current timestamp. It is obvious that current uniqid() has fatal problems,
uniqid() does not produce acceptable unique ID unlike the name implies.
https://php.net/uniqid

What are the fatal problems:

uniqid() (w/o more_entropy option) returns timestamp simply, tries
to guarantee uniqueness by sleeping short amount of time for a
process. This behavior may create duplicates among processes and by
system clock adjustment. i.e. NTP or like. This issue is not addressed
by proposed patch.
uniqid('', TRUE) (w/ more_entropy option) should return acceptable
unique ID, but it's not. This is due to more_entropy is generated by
timestamp, i.e. combined_lcg(), and issue like 1) exists. Proposed
patch fixes this issue.

Use of combined LCG for uniqid() entropy does not make sense now.

Solution:

Current PHP has php_random_int/bytes() which are much better
entropy than combined_lcg(). php_random_int/bytes() use CSPRNG.

Note:

There is no additional compatibility issue at all with
php_random_int/bytes(). Previously merged patch discussion regarding
uniqid() improvement was ruined compatibility FUD. open_basedir
restriction is irrelevant to internal functions. PRNG failure is
unlikely, and the system is unusable anyway if it fails. e.g. random
functions, session, crypt related functions do not work at all.

Although stronger entropy is needed, uniqid('', TRUE) will have
acceptable uniqueness similar to UUID by replacing CSPRNG entropy.
https://en.wikipedia.org/wiki/Universally_unique_identifier

This kind of change is simple enough change to be merged as "Simple
improvement for new release" or even "Simple bug fix for released
versions", IMO. I'm posting this mail to ask comments because of
previous discussion for this issue.

Following patch is basically the same as patch I committed and
reverted before. I suppose nobody insists RFC for this change now.

I'll merge the patch to master (7.2) if there is no comment.

Patch:

$ git diff
diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index f429e6d..80dacdb 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,7 +35,7 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
@@ -77,7 +77,9 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {

          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,

usec, php_combined_lcg() * 10);

```
          zend_long rand;
```

          php_random_int(1000000000, 9999999999, &rand, 1);

          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,

usec, (double)rand/10000000000);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

Proposed patch for PHPMailer command injection issue:

I found following code(patch) for PHPMailer security issue.
https://core.trac.wordpress.org/attachment/ticket/37210/0001
-Upgrade-PHPMailer-from-5.2.14-to-5.2.19.patch

 2086     * Create unique ID
 2087     * @return string
 2088     */
 2089    protected function generateId() {
 2090        return md5(uniqid(time()));

2024 2091 }
2025 2092
2026 2093 /**
……class PHPMailer
2034 2101 {
2035 2102 $body = '';
2036 2103 //Create unique IDs and preset boundaries
2037 $this->uniqueid = md5(uniqid(time()));
2104 $this->uniqueid = $this->generateId();
2038 2105 $this->boundary[1] = 'b1_' . $this->uniqueid;
2039 2106 $this->boundary[2] = 'b2_' . $this->uniqueid;
2040 2107 $this->boundary[3] = 'b3_' . $this->uniqueid;

Although I never recommend such code, the ID is good enough for this
specific usage. I think we should remove the goccha, "uniqid() is not
unique". This code explains why.

Related RFC:

Improve uniqid() uniqueness
https://wiki.php.net/rfc/uniqid

IMHO, we should enable more_entropy by default, with stronger entropy
prefered.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Kazuo Oishi — view source

unread

Hi,

I'll merge the patch to master (7.2) if there is no comment.

Patch:

$ git diff
diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index f429e6d..80dacdb 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,7 +35,7 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
@@ -77,7 +77,9 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, php_combined_lcg() * 10);
          zend_long rand;
          php_random_int(1000000000, 9999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, (double)rand/10000000000);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

I think it would be

php_random_int(0, 9999999999, &rand, 1);
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, (double)rand/1000000000);

[Please note that "(double)rand/10000000000" is changed to
"(double)rand/1000000000".]

I don't oppose this patch, but I cannot understand your argument.

I thought we must fix due to proposed PHPMailer bug fix patch. (See below
for detail) Previous discussion went wrong because of compatibility
misunderstanding. There is no additional BC issue. Please keep in mind
this.
...
Proposed patch for PHPMailer command injection issue:

I found following code(patch) for PHPMailer security issue.
https://core.trac.wordpress.org/attachment/ticket/37210/0001
-Upgrade-PHPMailer-from-5.2.14-to-5.2.19.patch
 2086     * Create unique ID
 2087     * @return string
 2088     */
 2089    protected function generateId() {
 2090        return md5(uniqid(time()));
2024 2091 }
2025 2092
2026 2093 /**
……class PHPMailer
2034 2101 {
2035 2102 $body = '';
2036 2103 //Create unique IDs and preset boundaries
2037 $this->uniqueid = md5(uniqid(time()));
2104 $this->uniqueid = $this->generateId();
2038 2105 $this->boundary[1] = 'b1_' . $this->uniqueid;
2039 2106 $this->boundary[2] = 'b2_' . $this->uniqueid;
2040 2107 $this->boundary[3] = 'b3_' . $this->uniqueid;

Although I never recommend such code, the ID is good enough for this
specific usage. I think we should remove the goccha, "uniqid() is not
unique". This code explains why.

Obviously, this is not related to your patch. "we must fix due to
proposed PHPMailer bug fix patch" is "FUD". Behavior of uniqid without
$more_entropy=TRUE is not changed.

What's your intention?

BTW, I agree that uniqid() is good enough to use as MIME boundary key, if
collision to body content is checked. But non-timestamp-based, simple
random value generator is more useful for this purpose.

IMHO, we should enable more_entropy by default, with stronger entropy
prefered.

This is BC-break, and I don't think such change is wanted.

(IMHO, what is wanted is new simple random string generator that uses
only specified characters.)

--
Kazuo Oishi

8 years ago by Yasuo Ohgaki — view source

unread

Hi Kazuo,

I thought we must fix due to proposed PHPMailer bug fix patch. (See below
for detail) Previous discussion went wrong because of compatibility
misunderstanding. There is no additional BC issue. Please keep in mind
this.
...
Proposed patch for PHPMailer command injection issue:

I found following code(patch) for PHPMailer security issue.
https://core.trac.wordpress.org/attachment/ticket/37210/0001
-Upgrade-PHPMailer-from-5.2.14-to-5.2.19.patch
 2086     * Create unique ID
 2087     * @return string
 2088     */
 2089    protected function generateId() {
 2090        return md5(uniqid(time()));
2024 2091 }
2025 2092
2026 2093 /**
……class PHPMailer
2034 2101 {
2035 2102 $body = '';
2036 2103 //Create unique IDs and preset boundaries
2037 $this->uniqueid = md5(uniqid(time()));
2104 $this->uniqueid = $this->generateId();
2038 2105 $this->boundary[1] = 'b1_' . $this->uniqueid;
2039 2106 $this->boundary[2] = 'b2_' . $this->uniqueid;
2040 2107 $this->boundary[3] = 'b3_' . $this->uniqueid;

Although I never recommend such code, the ID is good enough for this
specific usage. I think we should remove the goccha, "uniqid() is not
unique". This code explains why.
Obviously, this is not related to your patch. "we must fix due to
proposed PHPMailer bug fix patch" is "FUD". Behavior of uniqid without
$more_entropy=TRUE is not changed.

You misunderstand the mail.
PHPMailer and uniqid() fix is unrelated, but uniqid() is misused proposed
patch in obvious way.

What's your intention?

The point we should learn from the code is, it is clear that users
misunderstand how uniqid() works. You'll find number of such usages if you
search net.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Kazuo,

You misunderstand the mail.
PHPMailer and uniqid() fix is unrelated, but uniqid() is misused proposed
patch in obvious way.

What's your intention?

The point we should learn from the code is, it is clear that users
misunderstand how uniqid() works. You'll find number of such usages if you
search net.

There is uniqid() improvement RFC
https://wiki.php.net/rfc/uniqid

The proposed patch for PHPmailer proves we should improve 'more_entropy'
option and enable it by default. That's my point. Enabling 'more_entropy'
option by default will be handled by RFC process, but we can simply improve
randomness of 'more_entropy' w/o BC for now. This is what I proposed.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by lauri.kentta@gmail.com — view source

unread

```
          zend_long rand;
```

          php_random_int(1000000000, 9999999999, &rand, 1);

          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,

usec, (double)rand/10000000000);

Your code is broken. It produces 0.10000000 - 0.99999999 when it should
produce 0.00000000 - 9.99999999. Also, you have integer overflow on
32-bit systems.

Why do you mess with oversized integers and doubles and at all? It would
be cleaner and simpler to use just regular 32-bit integers like this:

```
          zend_long rand;
```

          php_random_int(0, 999999999, &rand, 1);

          uniqid = strpprintf(0, "%s%08x%05x%d.%08d", prefix, sec,

usec, rand % 10, rand / 10);

Also, your argument about PHPMailer has nothing to do with your main
complaint about lcg_value, since collisions of lcg_value are not the
problem there.

Why don't you put your effort into a more useful solution such as
random_string or something?
random_string(PHP_STRING_HEX_LOWER, 32) would produce md5-style output.
random_string(PHP_STRING_BASE64, 32) would produce a lot more entropy.
random_string("my_charset", 20) would cover the general case.
random_array([1,2,3], 20) could extend this to arbitrary arrays.

--
Lauri Kenttä

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lauri,

          zend_long rand;
          php_random_int(1000000000, 9999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, (double)rand/10000000000);
Your code is broken. It produces 0.10000000 - 0.99999999 when it should
produce 0.00000000 - 9.99999999. Also, you have integer overflow on 32-bit
systems.

Why do you mess with oversized integers and doubles and at all? It would
be cleaner and simpler to use just regular 32-bit integers like this:
          zend_long rand;
          php_random_int(0, 999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%d.%08d", prefix, sec,
usec, rand % 10, rand / 10);

Also, your argument about PHPMailer has nothing to do with your main
complaint about lcg_value, since collisions of lcg_value are not the
problem there.

          php_random_int(1000000000, 9999999999, &rand, 1);

This should be

          php_random_int(0, 9999999999, &rand, 1);

Anyway, I forgot about 32 bit systems. Thank you for catching this.

My original patch is better, then,

[yohgaki@dev PHP-master]$ git show 48f1a17886d874dc90867c669481804de90
commit 48f1a17886d874dc90867c669481804de90509e8
Author: Yasuo Ohgaki yohgaki@php.net
Date: Tue Oct 18 09:04:57 2016 +0900

Fix bug #47890 #73215 `uniqid()` should use better random source

diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index f429e6d..207cf01 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,9 +35,11 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

+#define PHP_UNIQID_ENTROPY_LEN 10

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
Generates a unique ID */
#ifdef HAVE_GETTIMEOFDAY
@@ -77,7 +79,22 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {

          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec,

php_combined_lcg() * 10);

```
          int i;
```

          unsigned char c, entropy[PHP_UNIQID_ENTROPY_LEN+1];

          for(i = 0; i < PHP_UNIQID_ENTROPY_LEN;) {

                  php_random_bytes_throw(&c, sizeof(c));

                  /* Avoid modulo bias */

```
                  if (c > 249) {
```
```
                          continue;
```
```
                  }
```

                  entropy[i] = c % 10 + '0';

```
                  i++;
```
```
          }
```

          /* Set . for compatibility */

```
          entropy[1] = '.';
```

          entropy[PHP_UNIQID_ENTROPY_LEN] = '\0';

          uniqid = strpprintf(0, "%s%08x%05x%s", prefix, sec, usec,

entropy);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

I'll revert the revert commit to master.

Why don't you put your effort into a more useful solution such as
random_string or something?
random_string(PHP_STRING_HEX_LOWER, 32) would produce md5-style output.
random_string(PHP_STRING_BASE64, 32) would produce a lot more entropy.
random_string("my_charset", 20) would cover the general case.
random_array([1,2,3], 20) could extend this to arbitrary arrays.

They are useful. I would like to have such functions, too. However, they
don't fix uniqid() problem. i.e. They are out of scope of uniqid()
improvement. As I mentioned, we should get rid of useless goccha like
uniqid() is merely a system timestamp. i.e. more_entropy option should be
enabled by default. Additionally, we may provide stronger entropy such as
128 bits random value.

Do you think uniqid() is good function that we should keep as it is now,
forever? I guess you do not. Why not improve it

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Niklas Keller — view source

unread

2017-01-07 2:15 GMT+01:00 Yasuo Ohgaki yohgaki@ohgaki.net:

Hi Lauri,

On Wed, Jan 4, 2017 at 4:56 AM, Lauri Kenttä lauri.kentta@gmail.com
wrote:
          zend_long rand;
          php_random_int(1000000000, 9999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, (double)rand/10000000000);
Your code is broken. It produces 0.10000000 - 0.99999999 when it should
produce 0.00000000 - 9.99999999. Also, you have integer overflow on
32-bit
systems.

Why do you mess with oversized integers and doubles and at all? It would
be cleaner and simpler to use just regular 32-bit integers like this:
          zend_long rand;
          php_random_int(0, 999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%d.%08d", prefix, sec,
usec, rand % 10, rand / 10);

Also, your argument about PHPMailer has nothing to do with your main
complaint about lcg_value, since collisions of lcg_value are not the
problem there.
          php_random_int(1000000000, 9999999999, &rand, 1);
This should be
          php_random_int(0, 9999999999, &rand, 1);
Anyway, I forgot about 32 bit systems. Thank you for catching this.

My original patch is better, then,

[yohgaki@dev PHP-master]$ git show 48f1a17886d874dc90867c669481804de90
commit 48f1a17886d874dc90867c669481804de90509e8
Author: Yasuo Ohgaki yohgaki@php.net
Date: Tue Oct 18 09:04:57 2016 +0900
Fix bug #47890 #73215 `uniqid()` should use better random source
diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index f429e6d..207cf01 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,9 +35,11 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

+#define PHP_UNIQID_ENTROPY_LEN 10

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
Generates a unique ID */
#ifdef HAVE_GETTIMEOFDAY
@@ -77,7 +79,22 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec,
php_combined_lcg() * 10);
          int i;
          unsigned char c, entropy[PHP_UNIQID_ENTROPY_LEN+1];
          for(i = 0; i < PHP_UNIQID_ENTROPY_LEN;) {
                  php_random_bytes_throw(&c, sizeof(c));
                  /* Avoid modulo bias */
                  if (c > 249) {
                          continue;
                  }
                  entropy[i] = c % 10 + '0';
                  i++;
          }
          /* Set . for compatibility */
          entropy[1] = '.';
          entropy[PHP_UNIQID_ENTROPY_LEN] = '\0';
          uniqid = strpprintf(0, "%s%08x%05x%s", prefix, sec, usec,
entropy);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

I'll revert the revert commit to master.

Why don't you put your effort into a more useful solution such as
random_string or something?
random_string(PHP_STRING_HEX_LOWER, 32) would produce md5-style output.
random_string(PHP_STRING_BASE64, 32) would produce a lot more entropy.
random_string("my_charset", 20) would cover the general case.
random_array([1,2,3], 20) could extend this to arbitrary arrays.

They are useful. I would like to have such functions, too. However, they
don't fix uniqid() problem. i.e. They are out of scope of uniqid()
improvement. As I mentioned, we should get rid of useless goccha like
uniqid() is merely a system timestamp. i.e. more_entropy option should be
enabled by default. Additionally, we may provide stronger entropy such as
128 bits random value.

Do you think uniqid() is good function that we should keep as it is now,
forever? I guess you do not. Why not improve it

Because it's a BC break. A new function with the deprecation of the old one
is way easier for developers to handle. They don't need version checks.
Maybe it will work on older versions, but will behave differently.

Regards, Niklas

8 years ago by Yasuo Ohgaki — view source

unread

Hi Niklas,

2017-01-07 2:15 GMT+01:00 Yasuo Ohgaki yohgaki@ohgaki.net:
Hi Lauri,

On Wed, Jan 4, 2017 at 4:56 AM, Lauri Kenttä lauri.kentta@gmail.com
wrote:
          zend_long rand;
          php_random_int(1000000000, 9999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, (double)rand/10000000000);
Your code is broken. It produces 0.10000000 - 0.99999999 when it should
produce 0.00000000 - 9.99999999. Also, you have integer overflow on
32-bit
systems.

Why do you mess with oversized integers and doubles and at all? It would
be cleaner and simpler to use just regular 32-bit integers like this:
          zend_long rand;
          php_random_int(0, 999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%d.%08d", prefix, sec,
usec, rand % 10, rand / 10);

Also, your argument about PHPMailer has nothing to do with your main
complaint about lcg_value, since collisions of lcg_value are not the
problem there.
          php_random_int(1000000000, 9999999999, &rand, 1);
This should be
          php_random_int(0, 9999999999, &rand, 1);
Anyway, I forgot about 32 bit systems. Thank you for catching this.

My original patch is better, then,

[yohgaki@dev PHP-master]$ git show 48f1a17886d874dc90867c669481804de90
commit 48f1a17886d874dc90867c669481804de90509e8
Author: Yasuo Ohgaki yohgaki@php.net
Date: Tue Oct 18 09:04:57 2016 +0900
Fix bug #47890 #73215 `uniqid()` should use better random source
diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index f429e6d..207cf01 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,9 +35,11 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

+#define PHP_UNIQID_ENTROPY_LEN 10

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
Generates a unique ID */
#ifdef HAVE_GETTIMEOFDAY
@@ -77,7 +79,22 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec,
php_combined_lcg() * 10);
          int i;
          unsigned char c, entropy[PHP_UNIQID_ENTROPY_LEN+1];
          for(i = 0; i < PHP_UNIQID_ENTROPY_LEN;) {
                  php_random_bytes_throw(&c, sizeof(c));
                  /* Avoid modulo bias */
                  if (c > 249) {
                          continue;
                  }
                  entropy[i] = c % 10 + '0';
                  i++;
          }
          /* Set . for compatibility */
          entropy[1] = '.';
          entropy[PHP_UNIQID_ENTROPY_LEN] = '\0';
          uniqid = strpprintf(0, "%s%08x%05x%s", prefix, sec, usec,
entropy);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

I'll revert the revert commit to master.

Why don't you put your effort into a more useful solution such as
random_string or something?
random_string(PHP_STRING_HEX_LOWER, 32) would produce md5-style output.
random_string(PHP_STRING_BASE64, 32) would produce a lot more entropy.
random_string("my_charset", 20) would cover the general case.
random_array([1,2,3], 20) could extend this to arbitrary arrays.

They are useful. I would like to have such functions, too. However, they
don't fix uniqid() problem. i.e. They are out of scope of uniqid()
improvement. As I mentioned, we should get rid of useless goccha like
uniqid() is merely a system timestamp. i.e. more_entropy option should be
enabled by default. Additionally, we may provide stronger entropy such as
128 bits random value.

Do you think uniqid() is good function that we should keep as it is now,
forever? I guess you do not. Why not improve it
Because it's a BC break. A new function with the deprecation of the old
one is way easier for developers to handle. They don't need version checks.
Maybe it will work on older versions, but will behave differently.

Recent rand() BC is more severe. i.e. rand() is predictable PRNG and
backend was replaced to MT rand by 7.1.

BC is important, but I couldn't find code that will be broken by
"more_entropy" by default. uniqid() is popular function yet misused a lot.
I don't think this will change because removing uniqid() is more severe BC
and RFC for it wouldn't pass. We knew slight change could break something,
but question is "Does it worth it". Wordpress's PHPMailer patch explains it
worths.

Anyway, are there any objections for revert the revert patch that uses
CSPRNG for uniqid() entropy?

48f1a17886d874dc90867c669481804de90509e8

I'll wait few days more.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by lauri.kentta@gmail.com — view source

unread

          php_random_int(1000000000, 9999999999, &rand, 1);

This should be

          php_random_int(0, 9999999999, &rand, 1);

No, it shouldn't. That fixes none of the reported problems. You still
have too many numbers (integer overflow) and still produce 0.abcdefgh
instead of a.bcdefghi.

If you can't fix it, maybe you shouldn't be doing it in the first
place...

--
Lauri Kenttä

8 years ago by Yasuo Ohgaki — view source

unread

>
>
>> + php_random_int(1000000000, 9999999999, &rand, 1);
>>
>> This should be
>>
>> + php_random_int(0, 9999999999, &rand, 1);
>>
>
> No, it shouldn't. That fixes none of the reported problems. You still have
> too many numbers (integer overflow) and still produce 0.abcdefgh instead of
> a.bcdefghi.
>
> If you can't fix it, maybe you shouldn't be doing it in the first place...

Did you read my mail?
Please read mail again.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

> On Mon, Jan 9, 2017 at 2:29 AM, Lauri Kenttä
> wrote:
>
>>
>>
>>> + php_random_int(1000000000, 9999999999, &rand, 1);
>>>
>>> This should be
>>>
>>> + php_random_int(0, 9999999999, &rand, 1);
>>>
>>
>> No, it shouldn't. That fixes none of the reported problems. You still
>> have too many numbers (integer overflow) and still produce 0.abcdefgh
>> instead of a.bcdefghi.
>>
>> If you can't fix it, maybe you shouldn't be doing it in the first place...
>
>
> Did you read my mail?
> Please read mail again.
>

Anyway, I agree your way is optimal for 9 digit chars entropy. I don't care
about extending entropy strength, longer length and use of non digits, for
now. Are we OK with the patch Lauri proposed?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Kazuo Oishi — view source

unread

Hi,

          zend_long rand;
          php_random_int(1000000000, 9999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, (double)rand/10000000000);
Your code is broken. It produces 0.10000000 - 0.99999999 when it should
produce 0.00000000 - 9.99999999. Also, you have integer overflow on 32-bit
systems.

Why do you mess with oversized integers and doubles and at all? It would
be cleaner and simpler to use just regular 32-bit integers like this:
          zend_long rand;
          php_random_int(0, 999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%d.%08d", prefix, sec,
usec, rand % 10, rand / 10);
...
My original patch is better, then,
[yohgaki@dev PHP-master]$ git show 48f1a17886d874dc90867c669481804de90
commit 48f1a17886d874dc90867c669481804de90509e8
Author: Yasuo Ohgaki yohgaki@php.net
Date: Tue Oct 18 09:04:57 2016 +0900
Fix bug #47890 #73215 `uniqid()` should use better random source
diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index f429e6d..207cf01 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,9 +35,11 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

+#define PHP_UNIQID_ENTROPY_LEN 10

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
Generates a unique ID */
#ifdef HAVE_GETTIMEOFDAY
@@ -77,7 +79,22 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec,
php_combined_lcg() * 10);
          int i;
          unsigned char c, entropy[PHP_UNIQID_ENTROPY_LEN+1];
          for(i = 0; i < PHP_UNIQID_ENTROPY_LEN;) {
                  php_random_bytes_throw(&c, sizeof(c));
                  /* Avoid modulo bias */
                  if (c > 249) {
                          continue;
                  }
                  entropy[i] = c % 10 + '0';
                  i++;
          }
          /* Set . for compatibility */
          entropy[1] = '.';
          entropy[PHP_UNIQID_ENTROPY_LEN] = '\0';
          uniqid = strpprintf(0, "%s%08x%05x%s", prefix, sec, usec,
entropy);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

I'll revert the revert commit to master.

No. Lauri's version is better.

Your php_random_bytes_throw() version is significantly slow. Lauri's
version is faster and cleaner.

[original uniqid() using php_combined_lcg]
$ time ./php_uniqid_orig -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.366s
user 0m0.350s
sys 0m0.010s

[your php_random_bytes_throw version (commit 48f1a17886d874dc90867c669481804de90509e8)]
$ time ./php_uniqid_yohgaki -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m4.509s
user 0m0.430s
sys 0m4.070s

[Lauri's php_random_int version]
$ time ./php_uniqid_lauri -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.664s
user 0m0.260s
sys 0m0.400s

--
Kazuo Oishi

8 years ago by Yasuo Ohgaki — view source

unread

Hi Kazuo,

No. Lauri's version is better.

Your php_random_bytes_throw() version is significantly slow. Lauri's
version is faster and cleaner.

[original uniqid() using php_combined_lcg]
$ time ./php_uniqid_orig -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.366s
user 0m0.350s
sys 0m0.010s

[your php_random_bytes_throw version (commit 48f1a17886d874dc90867c66948180
4de90509e8)]
$ time ./php_uniqid_yohgaki -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m4.509s
user 0m0.430s
sys 0m4.070s

[Lauri's php_random_int version]
$ time ./php_uniqid_lauri -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.664s
user 0m0.260s
sys 0m0.400s

Interesting result. AFAIK, I didn't get significant difference when I made
the patch.
What is your system? It seems your PRNG is significantly slow.

BTW, my patch is extension in mind. i.e. Use of non numeric chars

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

No. Lauri's version is better.

Your php_random_bytes_throw() version is significantly slow. Lauri's
version is faster and cleaner.

[original uniqid() using php_combined_lcg]
$ time ./php_uniqid_orig -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.366s
user 0m0.350s
sys 0m0.010s

[your php_random_bytes_throw version (commit
48f1a17886d874dc90867c669481804de90509e8)]
$ time ./php_uniqid_yohgaki -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m4.509s
user 0m0.430s
sys 0m4.070s

[Lauri's php_random_int version]
$ time ./php_uniqid_lauri -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.664s
user 0m0.260s
sys 0m0.400s

Interesting result. AFAIK, I didn't get significant difference when I made
the patch.
What is your system? It seems your PRNG is significantly slow.

The performance will be improved by reducing multiple PRNG calls to 1.
I'll modify patch later, could you test it with your system?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Kazuo Oishi — view source

unread

Hi,

[original uniqid() using php_combined_lcg]
$ time ./php_uniqid_orig -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.366s
user 0m0.350s
sys 0m0.010s

[your php_random_bytes_throw version (commit
48f1a17886d874dc90867c669481804de90509e8)]
$ time ./php_uniqid_yohgaki -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m4.509s
user 0m0.430s
sys 0m4.070s

[Lauri's php_random_int version]
$ time ./php_uniqid_lauri -r 'for($i=0; $i<1000000;$i++) uniqid("",true);'
real 0m0.664s
user 0m0.260s
sys 0m0.400s

Interesting result. AFAIK, I didn't get significant difference when I made
the patch.
What is your system? It seems your PRNG is significantly slow.

Core i7-5600U 2.60GHz
Linux version 4.8.10, gcc version 4.9.3, gentoo

The performance will be improved by reducing multiple PRNG calls to 1.
I'll modify patch later, could you test it with your system?

Sure. But as you said, Lauri's version would be optimal.

--
Kazuo Oishi

8 years ago by Yasuo Ohgaki — view source

unread

Hi Kazuo,

[original uniqid() using php_combined_lcg]
$ time ./php_uniqid_orig -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m0.366s
user 0m0.350s
sys 0m0.010s

[your php_random_bytes_throw version (commit
48f1a17886d874dc90867c669481804de90509e8)]
$ time ./php_uniqid_yohgaki -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m4.509s
user 0m0.430s
sys 0m4.070s

[Lauri's php_random_int version]
$ time ./php_uniqid_lauri -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m0.664s
user 0m0.260s
sys 0m0.400s

Interesting result. AFAIK, I didn't get significant difference when I
made
the patch.
What is your system? It seems your PRNG is significantly slow.

Core i7-5600U 2.60GHz
Linux version 4.8.10, gcc version 4.9.3, gentoo

Thanks.
I don't see such difference on my
Fedora 25 Corei7-4770S
gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC)

I'm curious because I don't see performance issue you have.
I'll send patch next week or so because I'm interested in how modified
patch will perform on your system.

The performance will be improved by reducing multiple PRNG calls to 1.
I'll modify patch later, could you test it with your system?

Sure. But as you said, Lauri's version would be optimal.

I wrote the same patch right after php_random_int() was implemented and
didn't find any problem. I think I've posted benchmark result in the
previous uniqid() discussion thread. So I checked my patch again and found
it should be

PHP-master]$ git diff
diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index 22173ae..bbd0e0a 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,7 +35,7 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
@@ -78,7 +78,9 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {

          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec,

php_combined_lcg() * 10);

```
          zend_long rand;
```

          php_random_int(0, 999999999, &rand, 1);

          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec,

(double)rand/100000000);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

Notice that int values are less than a billion which is inside of signed 32
bit int range. This version is as fast as php_combined_lcg() version on my
system. Both versions executes a million uniqid() calls about 0.36 sec.

$ php -r '$s = microtime(TRUE);for($i=0;$i<1000000;$i++) uniqid("", TRUE);
echo microtime(TRUE) - $s;'
0.36102104187012

So above patch would be the final patch. I don't expect issues but if there
is performace issue on some systems, we may consider Lauri's integer
computation version.

I should have been disturbed by something when I wrote the silly patch.
Sorry for confusions.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by lauri.kentta@gmail.com — view source

unread

Hi Kazuo,

[original uniqid() using php_combined_lcg]
$ time ./php_uniqid_orig -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m0.366s
user 0m0.350s
sys 0m0.010s

[your php_random_bytes_throw version (commit
48f1a17886d874dc90867c669481804de90509e8)]
$ time ./php_uniqid_yohgaki -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m4.509s
user 0m0.430s
sys 0m4.070s

[Lauri's php_random_int version]
$ time ./php_uniqid_lauri -r 'for($i=0; $i<1000000;$i++)
uniqid("",true);'
real 0m0.664s
user 0m0.260s
sys 0m0.400s

Interesting result. AFAIK, I didn't get significant difference
when I made
the patch.
What is your system? It seems your PRNG is significantly slow.

Core i7-5600U 2.60GHz
Linux version 4.8.10, gcc version 4.9.3, gentoo

Thanks.
I don't see such difference on my
Fedora 25 Corei7-4770S
gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC)

I'm curious because I don't see performance issue you have.
I'll send patch next week or so because I'm interested in how modified
patch will perform on your system.

The performance will be improved by reducing multiple PRNG calls
to 1.
I'll modify patch later, could you test it with your system?

Sure. But as you said, Lauri's version would be optimal.

I wrote the same patch right after php_random_int() was implemented
and didn't find any problem. I think I've posted benchmark result in
the previous uniqid() discussion thread. So I checked my patch again
and found it should be

PHP-master]$ git diff
diff --git a/ext/standard/uniqid.c b/ext/standard/uniqid.c
index 22173ae..bbd0e0a 100644
--- a/ext/standard/uniqid.c
+++ b/ext/standard/uniqid.c
@@ -35,7 +35,7 @@
#include <sys/time.h>
#endif

-#include "php_lcg.h"
+#include "php_random.h"
#include "uniqid.h"

/* {{{ proto string uniqid([string prefix [, bool more_entropy]])
@@ -78,7 +78,9 @@ PHP_FUNCTION(uniqid)
* digits for usecs.
*/
if (more_entropy) {
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, php_combined_lcg() * 10);
          zend_long rand;
          php_random_int(0, 999999999, &rand, 1);
          uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec,
usec, (double)rand/100000000);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec,
usec);
}

Notice that int values are less than a billion which is inside of
signed 32 bit int range. This version is as fast as php_combined_lcg()
version on my system. Both versions executes a million uniqid() calls
about 0.36 sec.

$ php -r '$s = microtime(TRUE);for($i=0;$i<1000000;$i++) uniqid("",
TRUE); echo microtime(TRUE) - $s;'

0.36102104187012

So above patch would be the final patch. I don't expect issues but if
there is performace issue on some systems, we may consider Lauri's
integer computation version.

I can confirm that the integer version is faster (2,6 seconds vs 3,6
seconds for 1000000 uniqid calls). Testing on ARM (Raspberry Pi) gives
similar results. It's not surprising, since converting to float/double
is not free and formatting a floating-point number is also slower than
formatting an integer.

Using floating-point numbers has exactly zero benefits, while now at
least two people have reported that integers are faster. I think it's
also much easier to read and understand integers, and your bugs tell the
same tale. So do you have some actual arguments for your version, or is
this just ”not invented here”?

Also, I must say that I'm neither for nor agains this change in general;
I'm discussing only the implementation.

--
Lauri Kenttä

8 years ago by Kazuo Oishi — view source

unread

Lauri Kenttä lauri.kentta@gmail.com writes:

signed 32 bit int range. This version is as fast as php_combined_lcg()
version on my system. Both versions executes a million uniqid() calls
about 0.36 sec.

$ php -r '$s = microtime(TRUE);for($i=0;$i<1000000;$i++) uniqid("",
TRUE); echo microtime(TRUE) - $s;'

0.36102104187012

So above patch would be the final patch. I don't expect issues but if
there is performace issue on some systems, we may consider Lauri's
integer computation version.

I can confirm that the integer version is faster (2,6 seconds vs 3,6 seconds for
1000000 uniqid calls). Testing on ARM (Raspberry Pi) gives similar results. It's
not surprising, since converting to float/double is not free and formatting a
floating-point number is also slower than formatting an integer.

I have same result; integer version is 0.65 seconds and double version
is 0.82 seconds.

--
Kazuo Oishi

8 years ago by Yasuo Ohgaki — view source

unread

Hi Lauri,

On Mon, Jan 9, 2017 at 11:07 PM, Lauri Kenttä lauri.kentta@gmail.com
wrote:

I can confirm that the integer version is faster (2,6 seconds vs 3,6
seconds for 1000000 uniqid calls). Testing on ARM (Raspberry Pi) gives
similar results. It's not surprising, since converting to float/double is
not free and formatting a floating-point number is also slower than
formatting an integer.

Using floating-point numbers has exactly zero benefits, while now at least
two people have reported that integers are faster. I think it's also much
easier to read and understand integers, and your bugs tell the same tale.
So do you have some actual arguments for your version, or is this just ”not
invented here”?

Also, I must say that I'm neither for nor agains this change in general;
I'm discussing only the implementation.

Thank you for testing.
Both versions yield the same result and code readability is the same.
Faster is better. Let's use integer version.

BTW, it's surprising that I got roughly the same performance for both
php_combined_lcg() and php_random_int() with current system (Fedora 25). I
used to get little slower result for php_random_int() with Fedora 24. It
seems PRNG gets better. There is no advantage to use php_combined_lcg() on
newer systems.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

BTW, it's surprising that I got roughly the same performance for both
php_combined_lcg() and php_random_int() with current system (Fedora 25). I
used to get little slower result for php_random_int() with Fedora 24. It
seems PRNG gets better. There is no advantage to use php_combined_lcg() on
newer systems.

Correction.
I cannot understand the result so checked to see if something wrong.
Sym link to php binary is broken :(
I still get slower result with php_random_int(). It's cost of better
entropy should be ignored.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Niklas Keller — view source

unread

2016-12-31 0:20 GMT+01:00 Yasuo Ohgaki yohgaki@ohgaki.net:

I'll merge the patch to master (7.2) if there is no comment.

Hi Yasuo,

has this been committed? It's just the same BC issue as seeding mt_rand
with a CSPRNG by default.

Regards, Niklas

8 years ago by Yasuo Ohgaki — view source

unread

Hi Niklas,

has this been committed? It's just the same BC issue as seeding mt_rand
with a CSPRNG by default.

Not yet.

I really don't see any pros for caring about failing CSPRNG and fallback to
weak behavior.

BC is extremely unlikely. Basically, no BC on healthy hardware/OS.
Then things failed, programs should fail properly. i.e. Shouldn't
fallback to weaker/problematic code.

Broken CSPRNG is like BUS error, i.e. hardware error, why should we care so
much about it?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Niklas Keller — view source

unread

I really don't see any pros for caring about failing CSPRNG and fallback
to weak behavior.

BC is extremely unlikely. Basically, no BC on healthy hardware/OS.

Then things failed, programs should fail properly. i.e. Shouldn't
fallback to weaker/problematic code.

Failing closed on a missing CSPRNG isn't really important for uniqid().
There's no guarantee that uniqid() produces ungessable output. It tries to
guarantee uniqueness and fails at the single one job it has for distributed
systems. I still think it's better to just leave it as is and deprecate it,
maybe while moving a UUID ext to core.

Regards, Niklas

Broken CSPRNG is like BUS error, i.e. hardware error, why should we care
so much about it?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net