Deprecating uniqid()

5 years ago by AllenJB — view source

unread

Hi all,

I'd like to discuss deprecating uniqid()

I believe it's dangerously bad a doing "what it says on the tin". New
developers still reach for it and do not read the warnings on the manual
page (or if they do, don't fully understand how bad it is).

For older codebases that still rely on it, a userland replacement can be
easily implemented (and could be published on Packagist).

I noticed there was an RFC [0][1] brought up 2 years ago, but was never
voted on. Does anyone know why this was?

[0] https://externals.io/message/102097
[1] https://wiki.php.net/rfc/deprecate-uniqid

Is there interest in deprecating this function?

If not deprecation, how could it be (further) "improved"? My first
thought is to make the "more entropy" option enabled by default (the
argument could remain so that it can be disabled by codebases that rely
on the lower length and can take the tradeoffs).

AllenJB

5 years ago by Ben Ramsey — view source

unread

Hi all,

I'd like to discuss deprecating uniqid()

I believe it's dangerously bad a doing "what it says on the tin". New developers still reach for it and do not read the warnings on the manual page (or if they do, don't fully understand how bad it is).

For older codebases that still rely on it, a userland replacement can be easily implemented (and could be published on Packagist).

I noticed there was an RFC [0][1] brought up 2 years ago, but was never voted on. Does anyone know why this was?

[0] https://externals.io/message/102097
[1] https://wiki.php.net/rfc/deprecate-uniqid

Is there interest in deprecating this function?

If not deprecation, how could it be (further) "improved"? My first thought is to make the "more entropy" option enabled by default (the argument could remain so that it can be disabled by codebases that rely on the lower length and can take the tradeoffs).

Instead of deprecating and removing it, would anyone be opposed to replacing the internals of the function so that it uses random_bytes() under the hood, while all other functionality remains the same?

Cheers,
Ben

5 years ago by Ben Ramsey — view source

unread

Hi all,

I'd like to discuss deprecating uniqid()

I believe it's dangerously bad a doing "what it says on the tin". New developers still reach for it and do not read the warnings on the manual page (or if they do, don't fully understand how bad it is).

For older codebases that still rely on it, a userland replacement can be easily implemented (and could be published on Packagist).

I noticed there was an RFC [0][1] brought up 2 years ago, but was never voted on. Does anyone know why this was?

[0] https://externals.io/message/102097
[1] https://wiki.php.net/rfc/deprecate-uniqid

Is there interest in deprecating this function?

If not deprecation, how could it be (further) "improved"? My first thought is to make the "more entropy" option enabled by default (the argument could remain so that it can be disabled by codebases that rely on the lower length and can take the tradeoffs).

Instead of deprecating and removing it, would anyone be opposed to replacing the internals of the function so that it uses random_bytes() under the hood, while all other functionality remains the same?

Of course, if we did this, it would break anyone’s ability to do this:

date('r', hexdec(substr(uniqid(), 0, 8)));

But I would argue that no one should be relying on these identifiers for date/time purposes.

Cheers,
Ben

5 years ago by Nikita Popov — view source

unread

Hi all,

I'd like to discuss deprecating uniqid()

I believe it's dangerously bad a doing "what it says on the tin". New
developers still reach for it and do not read the warnings on the manual
page (or if they do, don't fully understand how bad it is).

For older codebases that still rely on it, a userland replacement can be
easily implemented (and could be published on Packagist).

I noticed there was an RFC [0][1] brought up 2 years ago, but was never
voted on. Does anyone know why this was?

[0] https://externals.io/message/102097
[1] https://wiki.php.net/rfc/deprecate-uniqid

Is there interest in deprecating this function?

If not deprecation, how could it be (further) "improved"? My first
thought is to make the "more entropy" option enabled by default (the
argument could remain so that it can be disabled by codebases that rely on
the lower length and can take the tradeoffs).

Instead of deprecating and removing it, would anyone be opposed to
replacing the internals of the function so that it uses random_bytes()
under the hood, while all other functionality remains the same?

I believe this has been discussed in the past, and the basic problem is
that uniqid() currently only returns 13 hex characters, so we can encode at
most 52 bits of entropy without changing the output format. This is
insufficient. Changing the format could break assumptions, such as database
column sizes.

Personally, I would be in favor of deprecating the function. I've run into
an issue caused by non-unique uniqid() somewhat recently myself as well.

Regards,
Nikita

5 years ago by Jakob Givoni — view source

unread

Hi all,

I'd like to discuss deprecating uniqid()

I believe it's dangerously bad a doing "what it says on the tin". New
developers still reach for it and do not read the warnings on the manual
page (or if they do, don't fully understand how bad it is).

For older codebases that still rely on it, a userland replacement can be
easily implemented (and could be published on Packagist).

I noticed there was an RFC [0][1] brought up 2 years ago, but was never
voted on. Does anyone know why this was?

[0] https://externals.io/message/102097
[1] https://wiki.php.net/rfc/deprecate-uniqid

Is there interest in deprecating this function?

If not deprecation, how could it be (further) "improved"? My first
thought is to make the "more entropy" option enabled by default (the
argument could remain so that it can be disabled by codebases that rely on
the lower length and can take the tradeoffs).

Instead of deprecating and removing it, would anyone be opposed to
replacing the internals of the function so that it uses random_bytes()
under the hood, while all other functionality remains the same?

I believe this has been discussed in the past, and the basic problem is
that uniqid() currently only returns 13 hex characters, so we can encode at
most 52 bits of entropy without changing the output format. This is
insufficient. Changing the format could break assumptions, such as database
column sizes.

Personally, I would be in favor of deprecating the function. I've run into
an issue caused by non-unique uniqid() somewhat recently myself as well.

Regards,
Nikita

I'm using this function frequently, but I am ok with deprecating it as
I think the name is dangerously misleading - basically, anything that
mentions "unique" without saying to what, is a misnomer.
However, as it's useful to have a function in core that gives you a
random string with a fixed length that is unique within some
well-defined boundaries, I'd like to be sure there is an easy
replacement for the function when the time comes to upgrade php.
Ideally something that is guaranteed to be unique within the current
php process and takes the same arguments as uniqid.

Best,
Jakob

5 years ago by Rowan Tommins — view source

unread

I'm using this function frequently, but I am ok with deprecating it as
I think the name is dangerously misleading - basically, anything that
mentions "unique" without saying to what, is a misnomer.
However, as it's useful to have a function in core that gives you a
random string with a fixed length that is unique within some
well-defined boundaries, I'd like to be sure there is an easy
replacement for the function when the time comes to upgrade php.
Ideally something that is guaranteed to be unique within the current
php process and takes the same arguments as uniqid.

I definitely think that all deprecations should come with clear guidance
of either "use this instead" or "what you're doing is fundamentally wrong".

I'm not sure it needs to retain the same arguments, or even the same
output format, though, just fit the same use cases. The prefix can be
added trivially, and the "hex, dot, numeric" output of the "more
entropy" version is not often particularly helpful.

A common suggestion is to use binhex(random_bytes($desired_length / 2)),
which isn't particularly elegant, and in my experience, the main
requirement is "a unique string of printable/alphanumeric characters, so
limiting to [0-9a-f] is just limiting entropy for no reason.

I wonder if we could add a parameter to random_bytes, or an accompanying
function, that would return only alphanumeric characters; or perhaps
accept a range of characters to allow in some form.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Jakob Givoni — view source

unread

A common suggestion is to use binhex(random_bytes($desired_length / 2)),
which isn't particularly elegant, and in my experience, the main
requirement is "a unique string of printable/alphanumeric characters, so
limiting to [0-9a-f] is just limiting entropy for no reason.

Yes, a base_convert(..., 16, 32) around that would help but I'd really
prefer a simple function than a chain of 3 functions (even if we had
Larry's pipe operator :-p)

5 years ago by Andreas Heigl — view source

unread

Hey Ben, hey all

Am 02.05.20 um 21:13 schrieb Ben Ramsey:

Hi all,

I'd like to discuss deprecating uniqid()

I believe it's dangerously bad a doing "what it says on the tin". New developers still reach for it and do not read the warnings on the manual page (or if they do, don't fully understand how bad it is).

For older codebases that still rely on it, a userland replacement can be easily implemented (and could be published on Packagist).

I noticed there was an RFC [0][1] brought up 2 years ago, but was never voted on. Does anyone know why this was?

[0] https://externals.io/message/102097
[1] https://wiki.php.net/rfc/deprecate-uniqid

Is there interest in deprecating this function?

If not deprecation, how could it be (further) "improved"? My first thought is to make the "more entropy" option enabled by default (the argument could remain so that it can be disabled by codebases that rely on the lower length and can take the tradeoffs).

Instead of deprecating and removing it, would anyone be opposed to replacing the internals of the function so that it uses random_bytes() under the hood, while all other functionality remains the same?

I'D rather deprecate it and give a clear advice on what to use instead
(i.e. in the docs) than changing the internal behaviour and break code.

As replacement I could think of showing people the way to UUIDs.

As the function itself was never intended for cryptographically secure
values I would not see random_* functions or the like as a replacement.

My 0.02 €

Cheers

Andreas

                                                          ,,,
                                                         (o o)

+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl |
| mailto:andreas@heigl.org N 50°22'59.5" E 08°23'58" |
| http://andreas.heigl.org http://hei.gl/wiFKy7 |
+---------------------------------------------------------------------+
| http://hei.gl/root-ca |
+---------------------------------------------------------------------+

5 years ago by Rowan Tommins — view source

unread

As replacement I could think of showing people the way to UUIDs.

Although the name sounds similar, I don't think UUID would be a good
replacement for uniqid(). In my experience, it's used for things like
generating ID attributes for HTML elements, or suffixes for table names, or
even file names; applications that really just need a few alphanumeric
characters that are different each time.

As the function itself was never intended for cryptographically secure
values I would not see random_* functions or the like as a replacement.

Firstly, while everyone should understand the phrase "cryptographically
secure", I don't think most users do. Despite the warning in the manual, I
would put money on people using uniqid() for things that really should use
"strong" randomness.

Secondly, is there actually a disadvantage to using cryptographically
secure randomness when you don't need it? Speed? There's no advice in the
manual for random_int or random_bytes saying not to use them, and their
names seem deliberately chosen to imply they are the go-to functions for
randomness.

The only downside I can see suggesting something like random_string(13,
'0-9a-f') as a direct replacement for uniqid() is that without a time input
it might happen to generate the same string twice in a request. On the
other hand, uniqid actually disclaims any guarantee of uniqueness anyway.

Regards,

Rowan Tommins
[IMSoP]

5 years ago by Niklas Keller — view source

unread

Rowan Tommins rowan.collins@gmail.com schrieb am Mo., 4. Mai 2020, 10:59:

As replacement I could think of showing people the way to UUIDs.

Although the name sounds similar, I don't think UUID would be a good
replacement for uniqid(). In my experience, it's used for things like
generating ID attributes for HTML elements, or suffixes for table names, or
even file names; applications that really just need a few alphanumeric
characters that are different each time.

Seems like UUIDs would be a good fit for all of these.

As the function itself was never intended for cryptographically secure
values I would not see random_* functions or the like as a replacement.

Firstly, while everyone should understand the phrase "cryptographically
secure", I don't think most users do. Despite the warning in the manual, I
would put money on people using uniqid() for things that really should use
"strong" randomness.

Secondly, is there actually a disadvantage to using cryptographically
secure randomness when you don't need it? Speed? There's no advice in the
manual for random_int or random_bytes saying not to use them, and their
names seem deliberately chosen to imply they are the go-to functions for
randomness.

The only downside I can see suggesting something like random_string(13,
'0-9a-f') as a direct replacement for uniqid() is that without a time input
it might happen to generate the same string twice in a request. On the
other hand, uniqid actually disclaims any guarantee of uniqueness anyway.

UUIDs have enough length to make collisions practically irrelevant, so
again, they seem to be the best replacement.

Best,
Niklas

Regards,

--
Rowan Tommins
[IMSoP]

5 years ago by Peter Bowyer — view source

unread

Rowan Tommins rowan.collins@gmail.com schrieb am Mo., 4. Mai 2020,
10:59:

Although the name sounds similar, I don't think UUID would be a good
replacement for uniqid(). In my experience, it's used for things like
generating ID attributes for HTML elements, or suffixes for table names,
or
even file names; applications that really just need a few alphanumeric
characters that are different each time.

Seems like UUIDs would be a good fit for all of these.

For file names, absolutely. In many cases they have to be unique across all
processes, and that's important. For the others I say a UUID is only a good
replacement if taking a substring of a UUID is going to be unique.

Take HTML element IDs. My experience is UUIDs (random data) doesn't
compress well, and so shorter unique strings are preferable (also for
reading the HTMl when debugging). The number of elements you're adding IDs
to matters: if you add 10 then the UUID overhead is negligible; if you're
adding to thousands it's different.

For table name suffixes (if needed), the maximum length of a table name is
64 characters in MySQL. It's easier to cope with all systems if the table
name pre-suffix can be more than 28 characters (27 if you put a separator
between the table name and the suffix)

For these reasons, I support adding a nice way to generate semi-unique
data, preferably of user-defined length, and that doesn't have the
drawbacks of uniqid().
And deprecating uniqid().

Peter

5 years ago by Rowan Tommins — view source

unread

Rowan Tommins rowan.collins@gmail.com schrieb am Mo., 4. Mai 2020,
10:59:

Although the name sounds similar, I don't think UUID would be a good
replacement for uniqid(). In my experience, it's used for things like
generating ID attributes for HTML elements, or suffixes for table
names, or
even file names; applications that really just need a few alphanumeric
characters that are different each time.

Seems like UUIDs would be a good fit for all of these.

For file names, absolutely. In many cases they have to be unique across
all processes, and that's important. For the others I say a UUID is only a
good replacement if taking a substring of a UUID is going to be unique.

As well as being nearly 3 times as long as the current uniqid() output, a
UUID is generally formatted with hyphens, which may be disallowed or
require careful quoting in various contexts. If you have to strip those
out, or otherwise manipulate the result to fit the use case, you've failed
at the original aim of having a single function that doesn't need further
processing. (Leaving aside the fact that we don't actually have any UUID
functions in core.)

Regards,

Rowan Tommins
[IMSoP]

5 years ago by Arvids Godjuks — view source

unread

On Tue, 5 May 2020 at 08:52, Peter Bowyer phpmailinglists@gmail.com
wrote:

Rowan Tommins rowan.collins@gmail.com schrieb am Mo., 4. Mai 2020,
10:59:

Although the name sounds similar, I don't think UUID would be a good
replacement for uniqid(). In my experience, it's used for things like
generating ID attributes for HTML elements, or suffixes for table
names, or
even file names; applications that really just need a few alphanumeric
characters that are different each time.

Seems like UUIDs would be a good fit for all of these.

For file names, absolutely. In many cases they have to be unique across
all processes, and that's important. For the others I say a UUID is only
a
good replacement if taking a substring of a UUID is going to be unique.

As well as being nearly 3 times as long as the current uniqid() output, a
UUID is generally formatted with hyphens, which may be disallowed or
require careful quoting in various contexts. If you have to strip those
out, or otherwise manipulate the result to fit the use case, you've failed
at the original aim of having a single function that doesn't need further
processing. (Leaving aside the fact that we don't actually have any UUID
functions in core.)

Regards,

Rowan Tommins
[IMSoP]

The same notion here. UUID's and random_bytes sometimes are overkill, too
slow or you can just exhaust the random source.
I have a use-case where I needed exactly the way uniqid worked (with
more_entropy = true) - a serial incrementing random value that I needed to
create for 20-30 thousand items in one request. It was fast, efficient and
there was absolutely no need to have a truly random value. And it needed to
be human-readable easily because it was also sent via SMS in some cases.

So in my opinion, a better replacement for uniqid is needed - have it
generate a bigger string with more entropy and better underline algorithm,
but it being time-based should be a thing stiff. And do not call it a
"random_string" or something, cause it's not that :)

Thanks!

--
Arvīds Godjuks

+371 26 851 664
arvids.godjuks@gmail.com
Skype: psihius
Telegram: @psihius https://t.me/psihius

5 years ago by Rowan Tommins — view source

unread

So in my opinion, a better replacement for uniqid is needed - have it
generate a bigger string with more entropy and better underline algorithm,
but it being time-based should be a thing stiff. And do not call it a
"random_string" or something, cause it's not that :)

A question just got posted on Stack Overflow asking for pretty much exactly what we've been discussing: https://stackoverflow.com/q/61634022/157957

You're right that the requirements for "random" and "unique" are distinct. Perhaps what we need is a unique_string function that allows you to specify the format (length and some control over allowed characters) and uses a mix of randomness and time (perhaps using the same time source as hrtime()?).

Then uniqid() could be deprecated, and anyone relying on its exact format could write a polyfill, while people wanting other formats wouldn't need to mess around with binhex, hexdec, etc.

Regards,

--
Rowan Tommins
[IMSoP]

5 years ago by Dan Ackroyd — view source

unread

So in my opinion, a better replacement for uniqid is needed -

You're right that the requirements for "random" and "unique" are distinct. Perhaps what we need is a unique_string function that allows you to specify the format (length and some control over allowed characters) and uses

This is a problem that would be better solved in userland rather than
trying to design and evolve inside core PHP. And already has a very
good solution:

https://hashids.org/php/

That library is, in my opinion, a much better solution for the vast
majority of people who are (mis)using uniqid.

cheers
Dan
Ack

5 years ago by Rowan Tommins — view source

unread

This is a problem that would be better solved in userland rather than
trying to design and evolve inside core PHP.

I think that's a major philosophical question: should the core of a
language provide only those most basic building blocks from which
everything else can be built; or should it include a standard library of
functions for the most common tasks? Obviously, the answer can have many
shades of grey, but PHP doesn't generally take the "minimalist" approach.

Unless we're actively trying to shrink the functionality of PHP's core, it
feels weird to say "this function is deprecated; there is no official
replacement, please write your own or find one on Packagist". Is there a
specific reason not to write a replacement, or do you just consider it
not a universal enough requirement to include in the standard library?

Regards,

Rowan Tommins
[IMSoP]

5 years ago by Dan Ackroyd — view source

unread

Unless we're actively trying to shrink the functionality of PHP's core,

I think we should.

There are things that were added to core rather than done in userland because:

distributing libraries in userland used to be a lot harder than it is now.
Some stuff needed to be in core to give adequate performance. As
userland PHP has had it's relative performance increased, and also
computers have gotten a little faster since the project began*, that
need has been greatly reduced.

So although the choice to provide some functionality in core was the
correct choice at the time, it would not be a correct choice to do
now.

The reason to try to reduce the amount of core code is that
maintaining core code is much more difficult than maintaining userland
libraries.

There are swathes of PHP core that are scary to fix bugs in, let alone
think about adding new features or refactoring their API.

Although each removal would need to be justified individually, I think
as a general aim 'more userland, less core' is good.

cheers
Dan
Ack

https://github.com/php/php-src/blob/cdade2e35da528608e777d2f9766253726edb36c/ext/opcache/zend_accelerator_hash.c#L27

5 years ago by Larry Garfield — view source

unread

Unless we're actively trying to shrink the functionality of PHP's core,

I think we should.

There are things that were added to core rather than done in userland because:

distributing libraries in userland used to be a lot harder than it is now.

Some stuff needed to be in core to give adequate performance. As
userland PHP has had it's relative performance increased, and also
computers have gotten a little faster since the project began*, that
need has been greatly reduced.

So although the choice to provide some functionality in core was the
correct choice at the time, it would not be a correct choice to do
now.

The reason to try to reduce the amount of core code is that
maintaining core code is much more difficult than maintaining userland
libraries.

There are swathes of PHP core that are scary to fix bugs in, let alone
think about adding new features or refactoring their API.

Although each removal would need to be justified individually, I think
as a general aim 'more userland, less core' is good.

cheers
Dan
Ack

https://github.com/php/php-src/blob/cdade2e35da528608e777d2f9766253726edb36c/ext/opcache/zend_accelerator_hash.c#L27

Between preloading, PHP 7's improvements, FFI funkiness, and the upcoming JIT, there's been on and off discussion about moving much of the standard library from C to "PHP code that is bundled and preloaded automatically." A real "standard library" in PHP, rather than a bunch of bridged C functions that exist for the legacy reasons Dan notes above.

Making that actually a thing would help obviate a lot of these issues, I think. It becomes no longer an implementation question but "just" a packaging question.

--Larry Garfield

5 years ago by Dik Takken — view source

unread

Between preloading, PHP 7's improvements, FFI funkiness, and the upcoming JIT, there's been on and off discussion about moving much of the standard library from C to "PHP code that is bundled and preloaded automatically." A real "standard library" in PHP, rather than a bunch of bridged C functions that exist for the legacy reasons Dan notes above.

Making that actually a thing would help obviate a lot of these issues, I think. It becomes no longer an implementation question but "just" a packaging question.

I share Larry's feeling that perhaps the time is right to discuss a PHP
library written in PHP once more. The possible advantages are stacking up.

Even if we only manage to make this technically working and ship PHP 8
with an empty stub library it would be a major step. It could be
extended in subsequent 8.x releases.

We should not hijack this thread to discuss this though. Larry, since
you got this started again, I guess the honor is yours to write an
opening post and start a new thread? ;)

Regards,
Dik Takken

5 years ago by Ben Ramsey — view source

unread

Unless we're actively trying to shrink the functionality of PHP's core,

I think we should.

There are things that were added to core rather than done in userland because:

distributing libraries in userland used to be a lot harder than it is now.

Some stuff needed to be in core to give adequate performance. As
userland PHP has had it's relative performance increased, and also
computers have gotten a little faster since the project began*, that
need has been greatly reduced.

So although the choice to provide some functionality in core was the
correct choice at the time, it would not be a correct choice to do
now.

The reason to try to reduce the amount of core code is that
maintaining core code is much more difficult than maintaining userland
libraries.

There are swathes of PHP core that are scary to fix bugs in, let alone
think about adding new features or refactoring their API.

Although each removal would need to be justified individually, I think
as a general aim 'more userland, less core' is good.

In many ways, I agree. In other ways, I see things like array_column() and str_contains()1 as being obvious additions to the core, since they’re implemented in userland so often. However, we recently rejected the Server-side Request and Response Objects RFC2, so what is the over-arching philosophy that drives our decisions?

I don’t know the answer to this, but I think it might be a good exercise to consider it. It’s probably different for each person here. I think Paul Jones attempted to call attention to this in the epilogue to the “Server-side Request and Response Objects” vote3:

The initial impression is that there is a strong desire for work
that can be done in userland to stay in userland. However, that
impression conflicts with the recent acceptance of str_contains(),
which is very easily done in userland.

Lesson: Of functionality that can be accomplished in userland,
only trivial/simple functionality is acceptable.

I reject the conclusion (“lesson”) here, but it’s difficult not to arrive at this conclusion based on the observations. It might be interesting to flip this around, though, in the case of the request/response objects vote. Maybe this specific lesson could be better stated as:

Lesson: Of functionality that can be accomplished in userland,
functionality viewed as having a high maintenance overhead is best
left in userland.

I don’t know whether that’s the correct lesson, either, but I think it helps illustrate that we’re all approaching these decisions from our own philosophical points of view for what should or shouldn’t be PHP, and what appears to be the guiding principle(s) to one may be the opposite (flipped) to another.

So, where am I going with this?

I really have no idea. I thought I had a point about trying to articulate some overarching philosophy for the direction of PHP, but now that I’ve gotten here, I think I’ve convinced myself that one of PHP’s strengths is that it has no guiding philosophy. Each person here brings with them their own philosophies of how the language should be shaped, and even those philosophies may change and contradict themselves from time-to-time, and that’s what’s made the language what it is today, and perhaps that’s what continues to make the language better.

While it would be nice to understand the rationale behind decisions for what goes into core and what comes out, the reality is people are making these decisions and we don’t get things right all of the time, but we do get things right much of the time. I think “Allow trailing comma in parameter list4” and “Add str_starts_with() and str_ends_with() functions5” might be good examples of this.

I’m done waxing philosophical, so what I can say about uniqid()?

This is one of those functions I think (without doing the research) is used a lot in CLI scripts and tooling. As a result, it may be impossible to do good research on this, since these scripts are probably not available in public repositories. Additionally, when I write scripts like this, I don’t reach out for Composer packages, since I don’t need the overhead of a full-blown project or library, and I doubt others do, as well. This means allowing userland to fill the void left by the removal of uniqid() may not be a good option.

I think we should err on the side of BC with uniqid(). While the manual says it’s been in the language since PHP 4, the research I did in the “museum” shows that it’s been in PHP since 2.0b9 (see the Changelog in the tarball for 2.0b106).

Is there any way we can improve this function without deprecating/removing it?

Cheers,
Ben

5 years ago by Arvids Godjuks — view source

unread

Hello Ben,

snip

I’m done waxing philosophical, so what I can say about uniqid()?

This is one of those functions I think (without doing the research) is
used a lot in CLI scripts and tooling. As a result, it may be impossible to
do good research on this, since these scripts are probably not available in
public repositories. Additionally, when I write scripts like this, I don’t
reach out for Composer packages, since I don’t need the overhead of a
full-blown project or library, and I doubt others do, as well. This means
allowing userland to fill the void left by the removal of uniqid() may
not be a good option.

I think we should err on the side of BC with uniqid(). While the manual
says it’s been in the language since PHP 4, the research I did in the
“museum” shows that it’s been in PHP since 2.0b9 (see the Changelog in the
tarball for 2.0b10[6]).

Is there any way we can improve this function without deprecating/removing
it?

Exactly my case - a background processing script that lives in CLI without
any 3rd party libraries parsing a CSV just doing some SQL queries inserting
data and using uniqid(more_entropy=true) (and removing the dot from it's
output).

In my opinion, this is one of those somewhat simple and widely used
functions that do not need constant updating and just making it's random
part more robust and up to date will be sufficient for it's use. Yes, we
can depreciate the uniqid and replace it with a new function that is much
better at its job to facilitate proper migration. But removing a tool like
this from the core does seem somewhat on the extreme end of things.

--
Arvīds Godjuks

+371 26 851 664
arvids.godjuks@gmail.com
Skype: psihius
Telegram: @psihius https://t.me/psihius

5 years ago by Nikita Popov — view source

unread

On Wed, May 6, 2020 at 2:34 PM Rowan Tommins rowan.collins@gmail.com
wrote:

On 5 May 2020 09:42:19 BST, Arvids Godjuks arvids.godjuks@gmail.com
wrote:

So in my opinion, a better replacement for uniqid is needed - have it
generate a bigger string with more entropy and better underline algorithm,
but it being time-based should be a thing stiff. And do not call it a
"random_string" or something, cause it's not that :)

A question just got posted on Stack Overflow asking for pretty much
exactly what we've been discussing:
https://stackoverflow.com/q/61634022/157957

You're right that the requirements for "random" and "unique" are distinct.
Perhaps what we need is a unique_string function that allows you to specify
the format (length and some control over allowed characters) and uses a mix
of randomness and time (perhaps using the same time source as hrtime()?).

Then uniqid() could be deprecated, and anyone relying on its exact format
could write a polyfill, while people wanting other formats wouldn't need to
mess around with binhex, hexdec, etc.

A possible candidate for this would be ULID (https://github.com/ulid/spec),
which is basically timestamp + random + base32 encoding. The timestamp part
makes ULIDs approximately lexicographically orderable, the random part
makes sure things are unique when generated in parallel and the base32
encoding avoids people having to deal with raw binary data.

Regards,
Nikita

5 years ago by Aleksander Machniak — view source

unread

A possible candidate for this would be ULID (https://github.com/ulid/spec),
which is basically timestamp + random + base32 encoding. The timestamp part
makes ULIDs approximately lexicographically orderable, the random part
makes sure things are unique when generated in parallel and the base32
encoding avoids people having to deal with raw binary data.

Something like that. Here's some points imho are important for the
implementation:

The random part does not exhaust system entrophy (based on mt_rand()?).
it's fast.
it's simple, i.e. is a simple function that returns base32 string.
The only argument might be the output string length.

My 2 cents.

--
Aleksander Machniak
Kolab Groupware Developer [https://kolab.org]
Roundcube Webmail Developer [https://roundcube.net]

PGP: 19359DC1 # Blog: https://kolabian.wordpress.com

5 years ago by Niklas Keller — view source

unread

Hey Allen,

there's been discussion on whether we should deprecate or replace its
functionality. Without changing the output format, it's impossible to
have enough entropy.

Without consensus on the best way forward, I've just never cared to
put this to a vote.

I'll happily collaborate on moving this RFC forward for PHP 8.

Best,
Niklas

Am Sa., 2. Mai 2020 um 20:57 Uhr schrieb AllenJB php.lists@allenjb.me.uk:

Hi all,

I'd like to discuss deprecating uniqid()

I believe it's dangerously bad a doing "what it says on the tin". New
developers still reach for it and do not read the warnings on the manual
page (or if they do, don't fully understand how bad it is).

For older codebases that still rely on it, a userland replacement can be
easily implemented (and could be published on Packagist).

I noticed there was an RFC [0][1] brought up 2 years ago, but was never
voted on. Does anyone know why this was?

[0] https://externals.io/message/102097
[1] https://wiki.php.net/rfc/deprecate-uniqid

Is there interest in deprecating this function?

If not deprecation, how could it be (further) "improved"? My first
thought is to make the "more entropy" option enabled by default (the
argument could remain so that it can be disabled by codebases that rely
on the lower length and can take the tradeoffs).

AllenJB

Deprecating uniqid()

Andreas

Regards,

Regards,

Regards,

Regards,

-- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net]

--
Aleksander Machniak
Kolab Groupware Developer [https://kolab.org]
Roundcube Webmail Developer [https://roundcube.net]