Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:95859
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CANUQDCjRj1Z98tYFP0W+FWKroRrs9geh5_w8F61i8FfdVN6H1A@mail.gmail.com>
References: <CAGa2bXaM8q0mwhkz+XL6jbBaVLHDZ_-Jfj0aOPUq+te=fV=wow@mail.gmail.com>
 <CANUQDCgcmLozTs4xcTTXfqd_UZC9oEKubmr_iZfYLwBcg3sxfQ@mail.gmail.com>
 <CAGa2bXb-ZD9ea+o4xqqsFtQJw7iM0Pzq1XTgof_2UPSTaY2PSg@mail.gmail.com>
 <CAL0xaBE-HzFz+rbzBJ_bEM1fuCKjcQ1K5yAokQra+CFMP_8x3w@mail.gmail.com>
 <CANUQDCgFzzCNCihP7DnsZRv8k7WkOT7htB_tKpoSp1wqpizmCQ@mail.gmail.com>
 <CAL0xaBGf=iTNTU51p2XniX84REykwg-D8v0VWxkSKAvnYdmZCA@mail.gmail.com> <CANUQDCjRj1Z98tYFP0W+FWKroRrs9geh5_w8F61i8FfdVN6H1A@mail.gmail.com>
Date: Fri, 9 Sep 2016 16:02:03 +0300
Message-ID: <CAL0xaBFgL8CKhJ+m6dSXGSVCHjzvi-Y+Hvrp3Ory20ezBjrPEg@mail.gmail.com>
To: Niklas Keller <me@kelunik.com>
Cc: Yasuo Ohgaki <yohgaki@ohgaki.net>, "internals@lists.php.net" <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=001a114b3dce7ec300053c12c21a
Subject: Re: [PHP-DEV] [RFC] Make uniqid() more unique
From: arvids.godjuks@gmail.com (Arvids Godjuks)

--001a114b3dce7ec300053c12c21a
Content-Type: text/plain; charset=UTF-8

2016-09-09 15:46 GMT+03:00 Niklas Keller <me@kelunik.com>:

> 2016-09-09 13:18 GMT+02:00 Arvids Godjuks <arvids.godjuks@gmail.com>:
>
>>
>>
>> 2016-09-09 13:37 GMT+03:00 Niklas Keller <me@kelunik.com>:
>>
>>> 2016-09-09 10:36 GMT+02:00 Arvids Godjuks <arvids.godjuks@gmail.com>:
>>>
>>>> 2016-09-09 11:07 GMT+03:00 Yasuo Ohgaki <yohgaki@ohgaki.net>:
>>>>
>>>>> On Fri, Sep 9, 2016 at 4:40 PM, Niklas Keller <me@kelunik.com> wrote:
>>>>> > I think it's better to leave it as is and deprecate and discourage
>>>>> its use.
>>>>> > There's already a big warning there. Dunno whether there are really
>>>>> valid
>>>>> > use cases for it.
>>>>>
>>>>> uniqid() is handy, when developer would like to sort something by
>>>>> "time" prefix/postfix in strings. For example, prefixed/postfixed
>>>>> session ID by "time".
>>>>>
>>>>> So E_DEPRECATE might be too much.
>>>>>
>>>>> Regards,
>>>>>
>>>>> --
>>>>> Yasuo Ohgaki
>>>>> yohgaki@ohgaki.net
>>>>>
>>>>
>>>> It's also useful in other cases, where using a full blown true random
>>>> source is just overkill.
>>>>
>>>
>>> Most people think getting true random is a overkill and implement things
>>> non-secure.
>>>
>>
>> I just don't need true random here, just some form of replacing an
>> integer ID with a value, that cannot be changed just by "+1"
>>
>
> If you know the time something was created you can still easily retrieve
> it. It's not "+1" anymore, but only slightly better.
>
>
>>
>>>
>>>> For example, my recent usage was to use the result of uniqid('', true)
>>>> as a few parameters in URL's instead of plain numeric ID. Client just
>>>> wanted to users can't do a +1 and see someone else's result page that might
>>>> have a different text or a different campaign even.
>>>>
>>>
>>> That's exactly where uniqid SHOULD NOT be used. It's predictable. Anyone
>>> can easily guess these URLs. If you want to prevent that, you should use
>>> non-predictable secure random, also called cryptographically secure random:
>>> CSRPNG. See random_bytes and random_int.
>>>
>>
>> The way the system works and that this is a semi-closed tool for business
>> purposes, the only real thing why we need these ID's is to track people.
>> Before this plain numeric ID's from the DB records were used. With the
>> rewrite the client asked to make ID's so you can't just do a +1 and see
>> something different. No one will ever want to try and break the uniqid algo
>> just to get the other page (probably the same text). I also use the
>> extended version of the uniqid.
>>
>
> uniqid() is just the current time. It's enough to be about unique in some
> scenarios, but it's never unpredictable. If you want people not being able
> to guess these URLs (by adding +1 or whatever), you need unpredictable
> random. uniqid() doesn't fit here. uniqid("", true) doesn't fit either, as
> it's just adding more time resolution and ONLY ONE SINGLE CHAR of random
> data, which may be predictable, too, dunno what the generator here really
> does.
>

No, I actually don't need unpredictable random :) Nor it is required for
the task, nor it was an issue before with straight id's from the DB that
are sequential. It just was a request, if I can replace id's with a string
that you just can't do a +1 and see something.
Second - this is not a security feature, it is not intended to hide stuff
from people. There are other mechanics involved that make sure that just by
changing the uniqid to a different and existing value will not yield a
page. What I needed is a fast way to generate 600 semi-random strings in
one go. I'm not an expert on cryptography, but I do know that exhausting a
random source is a real issue, and the scope of the project did not involve
nor time, nor the payment to deal with such complex issues and making sure
that does not happen.
More over, the sequential nature of the uniqid for me was a +, because I do
not have to check if there already is such a value. So, from my personal
perspective, it was a right tool for the job. The only thing I would like,
is that it would make a visually more random string. That's all.

>
>
>> And I do need to generate those id's in bursts - 200 to 600 id's in a
>>>> single action, I would imagine generating 600 random strings of ~20 char
>>>> length can be hard on the source of the randomness, may even deplete it.
>>>> And I expect the numbers to grow.
>>>>
>>>
>>> Could you outline why you need 200 - 600 IDs in a single action?
>>>
>>
>> Because it's a CSV import and I need to assign every record an ID at that
>> moment. Those ID's are then exported by admins to a 3rd party system.
>>
>>>
>>>
>>>> So, deprecating it I think is really an overreaction. It's a handy
>>>> tool. It can be used to generate filenames too, and a lot of other stuff.
>>>>
>>>
>>> Sure, but for that you can as well just use `microtime` or `time`. As
>>> shown, it's easily misused, you're the perfect example. :-)
>>>
>>
>> microtime and time are easier to guess. And time() is not an option,
>> because I will get 600 equal ID's then. Microtime is an option, but then
>> you get number only string and it looks awfully sequential :) Hence the
>> uniqid usage, that is basically time + microtime if I understand from the
>> manual, but it generates a bit more random result and I'm sure I get a
>> unique value on every call. Improving it so it does not look awfully
>> sequential would suffice for the use cases it is needed. In my case this
>> was a clearly conscious choice with full understanding how it works.
>>
>> My thoughts are - improve it. Yes, the standard uniqid() is a bit too
>>>> short, I have never used it without the second "true" parameter and that
>>>> dot in the middle of the string is annoying - I had to strip it out every
>>>> use case I had.
>>>>
>>>
>>> `true` gives you exactly one character of more, pretty low entropy.
>>>
>>> Regards, Niklas
>>>
>>
>> Hm, without "true" you get 13 chars, with "true" - 20+.
>>
>
> You get more chars, it's still predictable.
>
> Regards, Niklas
>

And there is nothing wrong with predictable in this situation.

--001a114b3dce7ec300053c12c21a--