Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4
Reminder, each vote must be submitted individually.
Best regards,
Gina P. Banyard
Hi,
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
Just wanted to send some reasoning of my no votes.
I voted no on those output handlers as there might be potentially better
solutions. The whole output stuff needs a closer look so I think we should
wait on this until the review is done.
Otherwise I also voted no for the mysqli_kill and mysqli_refresh functions
as I feel that it's not a big deal to keep them (zero maintenance
basically) and there will be likely users to use them. I think it would
make sense to not add them but if there are already there I don't see a
point to remove them.
I think we should also keep file_put_contents array argument as it might
actually be used with iovec in the future which could be a significant
optimization - need to check details if that would work but if it does, it
could be a pretty good optimization.
The CSV one is also a bit weird because the default is non empty parameter
so I'm not sure what this actually brings except some inconsistency. People
that explicitly set it, do that probably for some reason. I would really
prefer not to try to change this functionality as the BC breaks will cause
more issues.
All my other no are mainly about the BC concerns that I have.
Regards
Jakub
I think we should also keep file_put_contents array argument as it might
actually be used with iovec in the future which could be a significant
optimization - need to check details if that would work but if it does, it
could be a pretty good optimization.
I had a bit closer look on this one and it should be possible to optimize
it for some cases. We could basically introduce something like
php_stream_writev. It would need to have logic to do the same sort
concatenation if filters used or for stream wrappers not supporting iovec.
But for plain wrapper, we should be able to support it and it could be a
good optimization for some users and a way how to cleanly expose it. So I
would suggest to remove this from this list as there seems to be a good use
case for this functionality.
Regards
Jakub
Hi Jakub!
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Just wanted to send some reasoning of my no votes.
The CSV one is also a bit weird because the default is non empty parameter
so I'm not sure what this actually brings except some inconsistency. People
that explicitly set it, do that probably for some reason. I would really
prefer not to try to change this functionality as the BC breaks will cause
more issues.
The default "\" likely causes more harm than good for almost anybody.
It basically enables a proprietary extension to CSV (something like
DSV), but there are a couple of issues where it is totally unclear what
should happen, and there still might be unresolved (because
unresolvable) tickets lying around about that.
I do have to agree, though, that this deprecation is somewhat
unfortunate, since an empty string is only accepted as of PHP 7.4.0, so
there is likely some code around which passes e.g. "\0" which also
disables the proprietary extension if there are no NUL bytes in the CSV
file (or to be written to a CSV file).
For that reason I didn't vote on that deprecation, although I would not
like to keep that proprietary extension forever.
Cheers,
Christoph
Hi,
Hi,
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_true8_4
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
I voted no on those output handlers as there might be potentially better
solutions. The whole output stuff needs a closer look so I think we should
wait on this until the review is done.
I just had a bit closer look to output handler working and the text is
actually not correct and does not exactly reflect how things works.
Interestingly those two suggested deprecations have associated
functionality that can be seen in the following example:
https://3v4l.org/X91eu
This simple example shows that returning false from the handler have a
special behaviour that can be in no way replaced by throwing exception.
What it does is that it flushes all buffers and does not trigger any error
as far as I see. It also shows that output in the output handler is not
actually always discarded and can be actually used to append text
which might be actually useful functionality for some users.
This is just finding from looking and testing things for around an hour and
half of my time so I might missed other bits. I really think we should
first try to properly understand how the whole output handling works before
doing those sort of deprecations. The RFC should then contain all details
about the edge cases so voters can do informed decision. I would suggest to
take this part out and at least delay it till the next release.
Apology for not taking look sooner but I have been pretty busy until now...
Regards
Jakub
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
I have voted no for a few, as they had no impact assesment at all:
- Deprecate returning non-string values from a user output handler
- Deprecate
lcg_value()
- Deprecate
md5()
,sha1()
,md5_file()
, andsha1_file()
(just says "large
impact") - Deprecate
SOAP_FUNCTIONS_ALL
constant and passing it to
SoapServer::addFunction()
And no on a few others:
- Deprecate using a single underscore ''_'' as a class name (it breaks
some of my ... old slides — but I also don't really the problem with
this. - Remove the
E_STRICT
Error Level and Deprecate theE_STRICT
constant?
(Because I added it :-) )
cheers,
Derick
- Deprecate
md5()
,sha1()
,md5_file()
, andsha1_file()
(just says "large
impact")
About 1.2 million.
https://github.com/search?q=%28md5+OR+md5_file+OR+sha1+OR+sha1_file%29+language%3APHP+&type=code
The proposed deprecation of these functions in PHP due to their
cryptographic insecurities seems to overlook their valid non-cryptographic
applications. If we consider the context, the scope of cryptographic usage
is already quite specific. We're talking about end users who are rolling
their own security implementations and are unaware of the security risks
but somehow know how to use these functions without reading the
documentation and warnings.
The number of people who fall into this specific category is quite small.
Yet, this change is being proposed for their sake. It's important to note
that these same users could/will easily make other security mistakes
regardless of this deprecation.
On the other hand, who will be impacted by these deprecations? Potentially
everyone, as these are included in many projects and in many vendor
packages. It's busy work for the people who aren't affected. Sure,
eventually, it will all be sorted out as CI warnings slowly subside because
of this.
Reasons such as GIT and most cloud storages using these functions should be
enough to spare them. Example: https://rclone.org/overview/
The point is that there are several reasons in 2024 to use md5 and sha1.
Granted hashing passwords isn't one, but we're past that as a community
already. And for the few that aren't, I'd argue there is no saving.
Thanks,
Peter
On Mon, Jul 22, 2024 at 9:06 AM Derick Rethans <derick@php.net
mailto:derick@php.net> wrote:- Deprecate `md5()`, `sha1()`, `md5_file()`, and `sha1_file()` (just says "large impact")
About 1.2 million.
https://github.com/search?q=%28md5+OR+md5_file+OR+sha1+OR+sha1_file%29+language%3APHP+&type=code https://github.com/search?q=%28md5+OR+md5_file+OR+sha1+OR+sha1_file%29+language%3APHP+&type=codeOn the other hand, who will be impacted by these deprecations?
Potentially everyone, as these are included in many projects and in many
vendor packages. It's busy work for the people who aren't affected.
Sure, eventually, it will all be sorted out as CI warnings slowly
subside because of this.Reasons such as GIT and most cloud storages using these functions should
be enough to spare them. Example: https://rclone.org/overview/
https://rclone.org/overview/The point is that there are several reasons in 2024 to use md5 and sha1.
Granted hashing passwords isn't one, but we're past that as a community
already. And for the few that aren't, I'd argue there is no saving.
And they would still be available as hash("md5") and hash("sha1"); the
only reason they're called out as their own distinct functions today is
historical inertia.
And they would still be available as hash("md5") and hash("sha1"); the
only reason they're called out as their own distinct functions today is
historical inertia.
Yes, I am aware of that, it's covered in the RFC and has been discussed.
My issue is that I think the positive effect this will have is minimal,
while the impact is very extensive. I also disagree with the notion that
there is no longer a use for these algos in the present day, as there are
many technologies and systems that still use these for basic checksumming.
To make everyone go through and update these seems ridiculous to me, as
it's basically just renaming functions. If it goes through, I
foresee a composer package called md5-sha1-shim being a popular package.
It won't stop the people this intends to save.
Lots of effect with little gain. The warning in the documentation should
be sufficient.
Thanks,
Peter
And they would still be available as hash("md5") and hash("sha1"); the
only reason they're called out as their own distinct functions today
is historical inertia.
I don't agree that the reasons for including standalone functions are
"historical". The RFC itself gives a good reason for having such functions:
Unfortunately these cryptographically secure hash functions are only
available by means of the generichash()
function ... making using them
more verbose and thus seemingly more complicated
Rather than force people to use functions that we acknowledge are hard
to use, surely the logical thing is to make the "right" code easy to use?
Which means if we want people to use SHA-256, let's add a sha256()
function to make it easy.
This is what password_hash()
and password_verify()
did right: the
functionality was already there in crypt()
, but it's hard to use, and
harder to use correctly. Providing clearer functions, even though they
do the same thing, helps new developers "fall into the pit of success".
The hash()
function isn't quite as confusing as crypt()
, but according
to the manual, it currently supports 60 different algorithms, most of
which I have never heard of. I'm aware that "sha256" is better than
"sha1", but should I be aiming higher, and using "sha384", or maybe one
of the four flavours of "sha3"? Then there's the fun-sounding
"whirlpool", the faintly rude-sounding "snefru", and a bewildering
fifteen flavours of "haval".
A new user being told "don't use sha1()
, use hash()
and pick from this
list" is more likely to say "ah, there's sha1, jolly good" than spend an
afternoon reading cryptography journals. There's no pit of success to
fall into.
Regards,
--
Rowan Tommins
[IMSoP]
And they would still be available as hash("md5") and hash("sha1");
the
only reason they're called out as their own distinct functions
today
is historical inertia.I don't agree that the reasons for including standalone functions are
"historical". The RFC itself gives a good reason for having such
functions:> Unfortunately these cryptographically secure hash functions are
only
available by means of the generichash()
function ... making using
them
more verbose and thus seemingly more complicatedRather than force people to use functions that we acknowledge are
hard
to use, surely the logical thing is to make the "right" code easy
to use?Which means if we want people to use SHA-256, let's add a sha256()
function to make it easy.This is what
password_hash()
andpassword_verify()
did right: the
functionality was already there incrypt()
, but it's hard to use, and
harder to use correctly. Providing clearer functions, even though
they
do the same thing, helps new developers "fall into the pit of
success".The
hash()
function isn't quite as confusing ascrypt()
, but
according
to the manual, it currently supports 60 different algorithms, most of
which I have never heard of. I'm aware that "sha256" is better than
"sha1", but should I be aiming higher, and using "sha384", or maybe
one
of the four flavours of "sha3"? Then there's the fun-sounding
"whirlpool", the faintly rude-sounding "snefru", and a bewildering
fifteen flavours of "haval".A new user being told "don't use
sha1()
, usehash()
and pick from
this
list" is more likely to say "ah, there's sha1, jolly good" than spend
an
afternoon reading cryptography journals. There's no pit of success to
fall into.Regards,
That's a good point. What if there were crypto functions that worked
like password_hash()
in that they had one generic function name, but
magically used the new/better "best practice" algorithms as time went
by without the need to update any calling code?
Maybe there should be three generic-named functions:
fast_hash() // not secure, makes UIDs quickly
secure_hash() // uses best practice one-way hash algo
secure_crypt() // uses best practice reversible encryption.
Then the developer signals their intent by choosing a function name,
and the algorithm magically works underneath (perhaps with the option
of an ini override to make those functions work in different
environments).
That's a good point. What if there were crypto functions that worked
like
password_hash()
in that they had one generic function name, but magically used the new/better "best practice" algorithms as time went by without the need to update any calling code? Maybe there should be three generic-named functions: fast_hash() // not secure, makes UIDs quickly secure_hash() // uses best practice one-way hash algo secure_crypt() // uses best practice reversible encryption. Then the developer signals their intent by choosing a function name, and the algorithm magically works underneath (perhaps with the option of an ini override to make those functions work in different environments).
If those were added, I would bikeshed their names to make sure their intent was 100% clear:
insecure_hash() // not secure, makes UIDs quickly
secure_oneway_hash() // uses best practice one-way hash algo
secure_reversible_hash() // uses best practice reversible encryption.
-Mike
That's a good point. What if there were crypto functions that worked
likepassword_hash()
in that they had one generic function name, but
magically used the new/better "best practice" algorithms as time went
by without the need to update any calling code?Maybe there should be three generic-named functions:
fast_hash() // not secure, makes UIDs quickly
secure_hash() // uses best practice one-way hash algo
secure_crypt() // uses best practice reversible encryption.
While I like the idea, this sounds like a huge nightmare in the waiting
when data is stored somewhere and later compared.
Example:
- Let's say these functions get introduced in PHP 8.5.
-
secure_hash()
is used in an application running on PHP 8.5 to secure
some data before storing it in a database. This data is used in
comparisons - stored vs user provided. - Now in PHP 9.1, the hash algorithm is changed.
- The production environment gets updated to PHP 9.1 and suddenly the
application breaks as the data verification will no longer work as the
new algo is used on the user provided data, but the database stored
version of the same data was created with the old algo....
That's a good point. What if there were crypto functions that
worked
likepassword_hash()
in that they had one generic function name,
but
magically used the new/better "best practice" algorithms as time
went
by without the need to update any calling code?Maybe there should be three generic-named functions:
fast_hash() // not secure, makes UIDs quickly
secure_hash() // uses best practice one-way hash algo
secure_crypt() // uses best practice reversible encryption.
While I like the idea, this sounds like a huge nightmare in the
waiting when data is stored somewhere and later compared.
Example:
* Let's say these functions get introduced in PHP 8.5.
*secure_hash()
is used in an application running on PHP 8.5 to
secure some data before storing it in a database. This data is used
in comparisons - stored vs user provided.
* Now in PHP 9.1, the hash algorithm is changed.
* The production environment gets updated to PHP 9.1 and suddenly
the application breaks as the data verification will no longer work
as the new algo is used on the user provided data, but the database
stored version of the same data was created with the old algo....
Doesn't password_hash()
handle this automatically? The result of the
password_hash()
function includes the hash and the algorithm used to
hash it. That way password_verify()
magically works with the string
that came from password_hash()
.
Doesn't
password_hash()
handle this automatically? The result of the
password_hash() function includes the hash and the algorithm used to
hash it. That waypassword_verify()
magically works with the string
that came frompassword_hash()
.
For password hashing, you are always retrieving the hash for a specific user, and then making a yes/no decision about it. Indeed, it's an explicit aim that an attacker can't take a password and quickly scan a captured database for matching hashes.
For other uses of hashes, though, the opposite is true: you want to search for matching hashes. For instance, when you store a file in git, it calculates the SHA1 hash of its content to use as a lookup key. If that key already exists in the local database, it assumes the content is the same.
That also demonstrates another difference: hashes are often shared between applications, where they need to be using an agreed algorithm. If a package manager requires SHA1 hashes of each file, you can't just substitute SHA256 hashes without some other agreed changes.
Tempting though a "secure_hash" function is, I don't think it's practical for a lot of the places hashing is used.
Regards,
Rowan Tommins
[IMSoP]
Doesn't
password_hash()
handle this automatically? The result of the
password_hash() function includes the hash and the algorithm used to
hash it. That waypassword_verify()
magically works with the string
that came frompassword_hash()
.For password hashing, you are always retrieving the hash for a specific user, and then making a yes/no decision about it. Indeed, it's an explicit aim that an attacker can't take a password and quickly scan a captured database for matching hashes.
You’d be surprised how many projects get this wrong and claim it isn't a security issue. If you can get the hashes, you likely have the ability to run arbitrary sql commands and since password_hash stores the salt right in the hash, you just need to crack one easy to guess password -- or just run password_hash on your machine ... then copy it to whatever user you want to login as. Very few php projects salt the passwords with something application/user specific (see: symfony's legacy password implementation which does, and new one which does not; and yes I reported it, and yes, it "isn't a security issue") to prevent this from happening.
There are other bad defaults, such as pdo_mysql allowing more than one sql statement (but all other drivers not -- and mysqli is also not)... making it even easier to open yourself up to getting hacked if you use pdo with mysql; allowing a single injection to be used to insert/update or even drop tables.
Security is something hard to get right, for any language and framework. PHP isn't an exception here; you have to pay attention to what you are doing and think like an attacker, every step of the way.
For other uses of hashes, though, the opposite is true: you want to search for matching hashes. For instance, when you store a file in git, it calculates the SHA1 hash of its content to use as a lookup key. If that key already exists in the local database, it assumes the content is the same.
That also demonstrates another difference: hashes are often shared between applications, where they need to be using an agreed algorithm. If a package manager requires SHA1 hashes of each file, you can't just substitute SHA256 hashes without some other agreed changes.
Tempting though a "secure_hash" function is, I don't think it's practical for a lot of the places hashing is used.
I think we can borrow from a recent RFC to return more than one thing:
secure_hash($data, $algorithm = null): [$algorithm, $hash, $updated_algorithm, $updated_hash];
if you pass in an algorithm, it has to have been considered "secure" within the last two major versions*, it also returns an optional "updated" part, where it can be used to update the hash in your database, if needed.
— Rob
Rather than force people to use functions that we acknowledge are hard
to use, surely the logical thing is to make the "right" code easy to use? Which means if we want people to use SHA-256, let's add a sha256() function to make it easy. This is what
password_hash()
andpassword_verify()
did right: the functionality was already there incrypt()
, but it's hard to use, and harder to use correctly. Providing clearer functions, even though they do the same thing, helps new developers "fall into the pit of success".
Yes! 1000% THIS.
-Mike
The
hash()
function isn't quite as confusing ascrypt()
, but according to the manual, it currently supports 60 different algorithms, most of which I have never heard of. I'm aware that "sha256" is better than "sha1", but should I be aiming higher, and using "sha384", or maybe one of the four flavours of "sha3"? Then there's the fun-sounding "whirlpool", the faintly rude-sounding "snefru", and a bewildering fifteen flavours of "haval". A new user being told "don't usesha1()
, usehash()
and pick from this list" is more likely to say "ah, there's sha1, jolly good" than spend an afternoon reading cryptography journals. There's no pit of success to fall into. Regards, -- Rowan Tommins [IMSoP]
And they would still be available as hash("md5") and hash("sha1"); the
only reason they're called out as their own distinct functions today
is historical inertia.I don't agree that the reasons for including standalone functions are
"historical". The RFC itself gives a good reason for having such functions:
By "historical" I mean just md5()
was in PHP in version 3, sha1()
was
added in 4.3, and hash()
(via PECL) in 5.1.2. md5()
, md5_file()
, sha1()
,
and sha1_file()
could have been deprecated when hash()
became a core PHP
extension in version 7, and now (or when looking at targeting 9) would
have been about when we'd be discussing removing them.
I'm not talking about the MD5 or SHA1 algorithms or whether they should
or shouldn't be used. I'm just talking about the functions themselves.
md5()
, md5_file()
, sha1()
, and sha1_file()
. They only exist because
there wasn't the generic hash algorithm extension when they were created.
Why do they get this special treatment today?
(PS: crc32b also implemented via hash()
as well as having its own function.)
A new user being told "don't use
sha1()
, usehash()
and pick from this
list" is more likely to say "ah, there's sha1, jolly good" than spend an
afternoon reading cryptography journals. There's no pit of success to
fall into.
A new user skimming through the list of string functions is likely to
see see "md5()" there and think "ah, there's a hash function, jolly good".
I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves.
md5()
,md5_file()
,sha1()
, andsha1_file()
. They only exist because there wasn't the generic hash algorithm extension when they were created.
I understand what is being claimed (and you're not the only one claiming it), I'm just not convinced it's true. I think they have standalone functions for the same reason we added str_contains and str_starts_with - because it's convenient to have straightforward functions for common use cases.
The hash()
function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need; md5()
and sha1()
are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.
The thing that always surprises me is that PHP doesn't have a standalone function for SHA-256, which is the only other I've ever used.
To continue the analogy, we're missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.
My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.
Regards,
Rowan Tommins
[IMSoP]
I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves.
md5()
,md5_file()
,sha1()
, andsha1_file()
. They only exist because there wasn't the generic hash algorithm extension when they were created.I understand what is being claimed (and you're not the only one claiming it), I'm just not convinced it's true.
I'm just looking at the manual's version information about when the
functions were introduced. Seems pretty unambiguous: md5, sha1, hash:
versions 3, 4, and 5 (via PECL).
I think they have standalone functions for the same reason we added
str_contains and str_starts_with - because it's convenient to have
straightforward functions for common use cases.
Because there weren't any purpose-built functions that did the job,
forcing users to use other functions in expensive ways for what is
internally a pretty simple task. There is a purpose-built function for
hashing.
The
hash()
function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need;md5()
andsha1()
are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.The thing that always surprises me is that PHP doesn't have a standalone function for SHA-256, which is the only other I've ever used.
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions
for both, and then when SHA4 comes along (as it inevitably will) another
standalone function for one of its variants?
To continue the analogy, we're missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.
My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.
Or leave them them the 60-piece set (which includes flat-head and
Phillips screwdrivers, so they're not being taken away), and write some
tips on how to use it correctly.
Regards,
Rowan Tommins
[IMSoP]
I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves.
md5()
,md5_file()
,sha1()
, andsha1_file()
. They only exist because there wasn't the generic hash algorithm extension when they were created.I understand what is being claimed (and you're not the only one claiming it), I'm just not convinced it's true.
I'm just looking at the manual's version information about when the
functions were introduced. Seems pretty unambiguous: md5, sha1, hash:
versions 3, 4, and 5 (via PECL).I think they have standalone functions for the same reason we added
str_contains and str_starts_with - because it's convenient to have
straightforward functions for common use cases.Because there weren't any purpose-built functions that did the job,
forcing users to use other functions in expensive ways for what is
internally a pretty simple task. There is a purpose-built function for
hashing.The
hash()
function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need;md5()
andsha1()
are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.The thing that always surprises me is that PHP doesn't have a standalone function for SHA-256, which is the only other I've ever used.
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions
for both, and then when SHA4 comes along (as it inevitably will) another
standalone function for one of its variants?To continue the analogy, we're missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.
My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.
Or leave them them the 60-piece set (which includes flat-head and
Phillips screwdrivers, so they're not being taken away), and write some
tips on how to use it correctly.Regards,
Rowan Tommins
[IMSoP]
I'd love to see a "hashing" namespace and all of these given their own functions with docblocks and manual pages instead of the current generic "god of hash" page which doesn't even list the hash functions available; you have to click on hash_algos and then look at the var_dump of hash algorithms. From there, you can google each one and try to understand what each one is good at and why you would use murmur3a over murmer3f, then try to figure out which one is the version that is compatible with javascript but not compatible with c# or maybe the other way around... (I recently got to go on that ride).
If we are going to deprecate the standalone functions (see the sha1 page, which at least links to a page about the sha1 algorithm, or the md5 rfc, which links to the md5 rfc) we should seriously invest in documenting these hashing algorithms and explaining them. In the very least, link to their respective RFCs.
— Rob
Hi
I'd love to see a "hashing" namespace and all of these given their own functions with docblocks and manual pages instead of the current generic "god of hash" page which doesn't even list the hash functions available; you have to click on hash_algos and then look at the var_dump of hash algorithms. From there, you can google each one and try to understand what each one is good at and why you would use murmur3a over murmer3f, then try to figure out which one is the version that is compatible with javascript but not compatible with c# or maybe the other way around... (I recently got to go on that ride).
The problem with adding standalone functions for every algorithm is that
it would result in a combinatorial explosion of available functions.
Unless we want to make some of them "second class citizens" with reduced
functionality or generally want to remove everything except for
incremental hashing, we would need the following for each of them (with
hash standing for the algorithm):
-
hash()
-
hash_hmac()
-
hash_file()
-
hash_init()
- hash_hmac_init()
-
hash_update()
(could also be a method) -
hash_update_file()
(could also be a method) -
hash_update_stream()
(could also be a method) -
hash_final()
(could also be a method) -
hash_hkdf()
-
hash_pbkdf2()
That clearly does not scale. See also my previous reply to Rowan
regarding the documentation.
It's clearly a topic for the farer future, but if / When tagged unions
[1] make it in the language, I plan to propose an update allowing to
pass an algorithm enum, including all the necessary options to the hash
functions. This would make the algorithms more discoverable, easier to
independently document and remove the footgun of needing to provide a
matching untyped options array (which incidentally is also listed in the
deprecation RFC as a footgun).
[1] https://wiki.php.net/rfc/tagged_unions
I'd also like to note that providing the hash algorithms by a generic
interface is not particularly unusual. Here's two examples:
- node.js: https://nodejs.org/api/crypto.html#class-hash
- Python: https://docs.python.org/3/library/hashlib.html (which also
takes a named constructor, which would be reasonably similar to my
tagged-union proposal above).
Other languages, such as Ruby or Golang, appear to use a Hash interface
with appropriate methods, but I am not sure if this is a good fit for
PHP given the way the documentation is structured, with a dedicated page
for each and every function or method, and the fact that PHP does not
provide for extension methods, which would require implementing the
convenience functionality for each algorithm separately - or as
standalone functions.
Best regards
Tim Düsterhus
The problem with adding standalone functions for every algorithm is that it would result in a combinatorial explosion of available functions.
I commented on this but as it was probably missed in a longer reply so I will repeat.
There is no overwhelming benefit to having a dedicated function for every combination. However I can see a benefits to having a dedicated function for the most commonly used functions.
To decide which, follow the Pareto principle where ~20% of functions used ~80% of the time get their own function. Deciding which are the 20% can be determined by statistical analysis of public code bases, or where there is no existing evidence for new functionality, by voters opinions.
-Mike
P.S. BTW, I do acknowledge your earlier point about more functions == more documentation — the first I've heard anyone mention that, so kudos for shining a light on it — but I wonder if that effort might not be reduced by finding ways to shared duplicated information from a single source?
Maybe one way to address the effort required for docs is to provide incentive that could shift some of the burden of documentation onto those who want to see more added to core? Maybe:
1.) Encourage docs to be prewritten for RFCs calling for more functions, to motivate people to do the docs work up-front by having a policy to vote against RFCs without pre-written docs,
2.) Provide a prominent TODO queue of doc items needed with good-first-task tags that you can point people to on the list who argue for things requiring a lot of documentation while saying "Well, if you can't be bothered to update the current docs how do we know you'll help maintain the docs for what you are asking for?" (admittedly this would probably take a bit of coding to implement),
3.) And finally host one or more "How to write docs for PHP" Zoom sessions that get recorded and posted to YouTube to empower more people to know how to do it and to provide links for discussions where you call for help
(I, for one, really have no idea how to get started helping to clean up the docs, nor do I have any idea where the most need is.)
#jmtcw
The problem with adding standalone functions for every algorithm is that it would result in a combinatorial explosion of available functions.
I commented on this but as it was probably missed in a longer reply so I will repeat.There is no overwhelming benefit to having a dedicated function forevery combination. However I can see a benefits to having a dedicated function for the most commonly used functions.
A problem is that MD5 should not be one of the most commonly used
algorithms.
Also, providing a dedicated function for an algorithm over and above
others that don't get such special treatment inflates use of that
algorithm, making it more commonly used. It becomes self-reinforcing.
A problem is that MD5 should not be one of the most commonly used algorithms.
Tell that to:
- RFC 2846 - Internet Message Format (IMF) Extensions for Internet Mail
- RFC 3229 - Delta-Encoding in HTTP
- RFC 4151 - The Internet IP Traffic Archive, and
- RFC 4964 - Use of MD5 for IP Version 4 and Version 6 Address Identification
Also, providing a dedicated function for an algorithm over and above others that don't get such special treatment inflates use of that algorithm, making it more commonly used. It becomes self-reinforcing.
Well that would be an easy fix: Provide special treatment for the preferred algorithms, i.e. their own specific function(s).
-Mike
Also, providing a dedicated function for an algorithm over and above others that don't get such special treatment inflates use of that algorithm, making it more commonly used. It becomes self-reinforcing.
Well that would be an easy fix: Provide special treatment for the preferred algorithms, i.e. their own specific function(s).
And, it seems, keep those specific functions around even after they are
no longer preferred.
Good morning all:
When calling functions from the global namespace, the PHP parser
creates opcodes that use those functions directly. When those functions
are certain built-in functions, the parser can use special opcodes that
are optimized for those function calls.
When calling functions from within a namespace, the parser does not
know at compile time if the function is defined in the current
namespace or if the global function will be used.
Because unqualified function calls are ambiguous, the parser adds an NS
Lookup opcode which performs a runtime check for the function name in
the current namespace, and then falls back to the global namespace if
the function is not defined locally.
This also prevents the parser from using dedicated opcodes for built-in
functions.
This incurs a performance penalty.
In the past, RFCs that aimed to address this issue got too hung up on a
syntax discussion, rather than a simple “should we solve this” question
proposed to the community.
I would like to discuss and then vote on this proposal as a feature,
without getting into any specifics of syntax.
I propose that we vote yes/no on if there should be some way, whatever
that way ends up being, to tell the parser to always treat unqualified
function names as global.
I think it's important to get a “clean vote” on this issue, to separate
out if past objections were objections to the concept, or if only the
syntax proposed in past discussions was disfavored.
To that end, I have created the following RFC:
https://wiki.php.net/rfc/global_function_parser_directive
I am asking that we discuss and vote on the following question:
“Should there be some way for developers to signal to the parser at
compile time that all unqualified function names found in a namespace
context are global, without a namespace lookup?”
Yes: We should do this, let's discuss syntax possibilities.
No: This should not be a feature at all.
Thank you for your consideration.
To that end, I have created the following RFC:
https://wiki.php.net/rfc/global_function_parser_directive
I am asking that we discuss and vote on the following question:
Please open a new thread about your RFC (i.e. don't reply to some other
message, but rather send a new message to the mailing list). Otherwise
it might easily be overlooked. See also https://wiki.php.net/rfc/howto.
Thanks,
Christoph
To that end, I have created the following RFC:
https://wiki.php.net/rfc/global_function_parser_directive
I am asking that we discuss and vote on the following question:Please open a new thread about your RFC (i.e. don't reply to some other
message, but rather send a new message to the mailing list). Otherwise
it might easily be overlooked. See also https://wiki.php.net/rfc/howto.
Please can people stop replying on this sub-thread until a proper new thread is created.
I know GMail and some other mail clients have a heuristic definition of "conversation", but many clients and archives - including the widely used externals.io - are based on the standardised Reply-To and Thread-Id headers, which are not reset based on a new subject line.
In those UIs, this is still part of the completely unrelated deprecation voting thread https://externals.io/message/124506, which was already becoming unmanageably long due to the protracted debate about hashing functions.
Regards,
Rowan Tommins
[IMSoP]
Hey Nick,
This also prevents the parser from using dedicated opcodes for built-in
functions.This incurs a performance penalty.
For many, many years, tools like doctrine/coding-standard
have imported
global functions.
The performance benefit is minimal and opt-in at very minimal effort for
userland (literally "run a decent coding standards tool").
Changing this behavior breaks use-cases like shadowing internal functions,
which are useful for testing / stubbing / etc. I disagree with the
technique, but it is a valid and relied-upon use-case nonetheless.
Better ways forward:
- endorse userland to use a CS tool
- namespace performance-sensitive PHP functions, so we can exclude the
entire problem, by having people use the namespaced variant :P
Marco Pivetta
Hi Nick
I find it a bit unfortunate that you gave my thread barely any time to
be discussed.
I would like to discuss and then vote on this proposal as a feature,
without getting into any specifics of syntax.I propose that we vote yes/no on if there should be some way, whatever
that way ends up being, to tell the parser to always treat unqualified
function names as global.
This can be achieved in various ways. For example:
- Per-file, via declare(), new use syntax, or whatnot.
- Globally, through an INI setting.
- Via hard cut in a new PHP version, where a relative calls just stop
looking up local scope. - By flipping lookup order, as proposed in my last thread.
One might be in favor of one approach but not others. Of course,
everybody will be in favor of a "free" 2-4% speedup. Hence, a yes vote
won't mean much.
Ilija
Hi Nick
I find it a bit unfortunate that you gave my thread barely any time
to
be discussed.
My intent was to start a formal discussion on this very topic.
This can be achieved in various ways. For example:
- Per-file, via declare(), new use syntax, or whatnot.
- Globally, through an INI setting.
- Via hard cut in a new PHP version, where a relative calls just stop
looking up local scope.- By flipping lookup order, as proposed in my last thread.
One might be in favor of one approach but not others. Of course,
everybody will be in favor of a "free" 2-4% speedup. Hence, a yes
vote
won't mean much.Ilija
From reviewing past RFCs over the past few weeks, I think the issue was
too big and went in too many different directions to get consensus.
So I thought the best approach would be to break it into different
parts.
If no one wants to implement this feature, it doesn't matter how or at
what level.
So I wanted to get a "yes" from the people who need to say yes, then
discuss all of those things you mentioned:
- File level vs global
- Syntax
- Alternative options
But I think we need a "yes" for the concept first. Otherwise the vote
will fail on the details, as they have in the past.
Hi Nick
So I wanted to get a "yes" from the people who need to say yes, then
discuss all of those things you mentioned:
- File level vs global
- Syntax
- Alternative options
But I think we need a "yes" for the concept first. Otherwise the vote
will fail on the details, as they have in the past.
But that's not quite what the RFC says:
I am asking that we discuss and vote on the following question:
“Should there be some way for developers to signal to the parser at compile time that all unqualified function names found in a namespace context are global, without a namespace lookup?”
Which implies:
- There is some change to syntax.
- All unqualified calls become global.
So, it doesn't seem like there's room for alternative approaches
within the definition of your RFC. Both of these points are not
compatible with my proposal.
Ilija
But that's not quite what the RFC says:
I am asking that we discuss and vote on the following question:
“Should there be some way for developers to signal to the parser at
compile time that all unqualified function names found in a
namespace context are global, without a namespace lookup?”Which implies:
- There is some change to syntax.
- All unqualified calls become global.
I did not intend for all unqualified calls to become global, unless the
new directive is present.
There would be some new syntax for the directive, but behavior would
remain unchanged without a new directive present.
The issue is, when a question asks: "Should we do thing Y with syntax
X", and the vote is only yes/no, then you have:
Yes.
No, because I don't think we should do it at all.
No, because I don't like the syntax.
But you can't tell the "no" votes apart.
If the people who are allowed to vote are opposed to the idea, for
whatever reason, extended syntax discussions aren't going to be a good
use of time.
So I wanted to ask, "Will you do it if we can agree on a syntax?".
Then if they will, we can discuss.
I actually have several ideas in mind for things that aren't
necessarily mutually exclusive, but I need to know that core is willing
to consider/implement along these lines.
What I'm trying to get out of this RFC is a clear yes or no from the
people who are allowed to vote.
But that's not quite what the RFC says:
I am asking that we discuss and vote on the following question:
“Should there be some way for developers to signal to the parser at
compile time that all unqualified function names found in a
namespace context are global, without a namespace lookup?”Which implies:
- There is some change to syntax.
- All unqualified calls become global.
I did not intend for all unqualified calls to become global, unless the
new directive is present.
Sorry, my language was not precise enough. Your proposal suggests
making unqualified calls global when the directive is present, whereas
my proposal suggests keeping local scope as a fallback, hence the two
not being compatible.
What I'm saying is that:
- If the vote fails, it might have been because some people don't
want opt-in behavior. Niels just voiced this opinion. - If the vote is accepted, it would void my suggestion, because it is
not compatible with the conditions laid out in your proposal. If this
were a straw poll, rather than an RFC vote, it would be clearer that
there is no mandatory approach or policy being accepted.
A vote with multiple options to pick an approach (opt-in through some
directive, hard BC break, flipping lookup order, nothing at all) might
be more appropriate.
Ilija
I did not intend for all unqualified calls to become global, unless
the new directive is present.Sorry, my language was not precise enough. Your proposal suggests
making unqualified calls global when the directive is present,
whereas my proposal suggests keeping local scope as a fallback, hence
the two not being compatible.
Keeping local scope as a fallback would still require an NS lookup
opcode.
My proposal is that there still be an NS lookup as the default
behavior, but allow an optional override.
That is, users would have three options:
(1) resolve all unqualified function names to global (no NS lookup)
(2) resolve all unqualified function names to local (no NS lookup)
(3) Use the default behavior, which at present, is local first, then
global, with a namespace lookup. (no change to user code, keeps BC)
Your proposal reverses the order of the default NS lookup from local
first to global first.
That's not mutually exclusive.
If my proposal and yours were both accepted, users would have these
options:
(1) resolve all unqualified function names to global (no NS lookup)
(2) resolve all unqualified function names to local (no NS lookup)
(3) Use the default behavior, which would now be global first, then
local, with a namespace lookup. (no change to user code, keeps BC
working, but changes the default automatic lookup order)
I believe that my propsal also helps yours be more viable, too. Because
if your proposal was accepted alone, and the NS lookup order changed,
then it would no longer be possible to do unit testing with stubs that
replace built-in functions, UNLESS there was also an option for
developers to instruct the compiler to use local functions over global
ones.
What I'm saying is that:
- If the vote fails, it might have been because some people don't
want opt-in behavior. Niels just voiced this opinion.
Every token added to a PHP file is opt-in behavior that changes how the
parser works.
- If the vote is accepted, it would void my suggestion, because it
is not compatible with the conditions laid out in your proposal.
I don't see them as being incompatible at all, they would actually work
together. An automatic performance improvement for everyone without any
BC breaks or code changes, plus a new option that gives advanced
developers a new tool to write cleaner class code, and enable advanced
unit testing tools to work.
A vote with multiple options to pick an approach (opt-in through some
directive, hard BC break, flipping lookup order, nothing at all)
might be more appropriate.
A vote has not been called, and an RFC is supposed to be amended in
response to comments and discussion.
I think something needs to be at least as formal as an RFC or people
won't take it seriously enough to contribute feedback.
I'm trying to start a discussion on this and get some momentum so that
we can hopefully implement something in a near future version.
Please, let us discuss.
Sorry, my language was not precise enough. Your proposal suggests
making unqualified calls global when the directive is present,
whereas my proposal suggests keeping local scope as a fallback, hence
the two not being compatible.Keeping local scope as a fallback would still require an NS lookup
opcode.
In the general sense: Yes, we would need to keep the NS lookup opcode.
However, functions that are known to exist in global scope would not
compile to it. The same goes for your solution, as code that doesn't
opt-into disambiguated lookups will still require the
ZEND_INIT_NS_FCALL_BY_NAME opcode.
My proposal is that there still be an NS lookup as the default
behavior, but allow an optional override.That is, users would have three options:
(1) resolve all unqualified function names to global (no NS lookup)
(2) resolve all unqualified function names to local (no NS lookup)
(3) Use the default behavior, which at present, is local first, then
global, with a namespace lookup. (no change to user code, keeps BC)Your proposal reverses the order of the default NS lookup from local
first to global first.That's not mutually exclusive.
Sure. But it's questionable whether the motivation of introducing one
when the other already exists is still there.
If my proposal and yours were both accepted, users would have these
options:(1) resolve all unqualified function names to global (no NS lookup)
(2) resolve all unqualified function names to local (no NS lookup)
(3) Use the default behavior, which would now be global first, then
local, with a namespace lookup. (no change to user code, keeps BC
working, but changes the default automatic lookup order)I believe that my propsal also helps yours be more viable, too. Because
if your proposal was accepted alone, and the NS lookup order changed,
then it would no longer be possible to do unit testing with stubs that
replace built-in functions, UNLESS there was also an option for
developers to instruct the compiler to use local functions over global
ones.
I'm not sure your proposal solves the mocking problem. If the engine
is to interpret all non-fq calls as global or local, how would a
library include your file while switching this configuration, when it
is implemented as some directive in the file? Also, how would only
singular functions be mocked when there is no fallback to the global
scope for the rest of the functions used within the file? That would
necessitate mocking all functions, even the unmodified ones.
What I'm saying is that:
- If the vote fails, it might have been because some people don't
want opt-in behavior. Niels just voiced this opinion.Every token added to a PHP file is opt-in behavior that changes how the
parser works.
I don't understand what you mean. What Niels was suggesting is that
adding more context dependency is undesired. I.e. language semantics
should not be dependent on configuration.
I think something needs to be at least as formal as an RFC or people
won't take it seriously enough to contribute feedback.I'm trying to start a discussion on this and get some momentum so that
we can hopefully implement something in a near future version.
Fair enough. But it should be made clearer what is being voted on.
From reading the RFC, it is not clear that there even are alternative
options, which ones have been proposed before, what points have been
raised in the discussion, etc.
Ilija
I'm not sure your proposal solves the mocking problem. If the engine
is to interpret all non-fq calls as global or local, how would a
library include your file while switching this configuration, when it
is implemented as some directive in the file?
I'm not sure I understand this question.
Really, all we are doing here is telling the parser to pretend that any
function call without a namespace was coded with a \
in front of it.
In other words:
"Parser, if you see array_key_exists()
, pretend you saw
\array_key_exists()
and use the dedicated opcode".
The engine wouldn't change at all. The only thing changing is the
opcode being generated. And the opcodes aren't changing either.
The parser would do the same thing as having a backslash does now. You
just wouldn't need to explicitly put the backslash there. It assumes
the backslash.
Or, to put it another way, we are simply specifying what the default
namespace is for unqualified calls.
Also, how would only singular functions be mocked when there is no
fallback to the global scope for the rest of the functions used
within the file? That would necessitate mocking all functions, even
the unmodified ones.
If a developer needed to override built-in functions (in a specific
file) with local ones, and still use some built-in functions, they
would then need to use the fully-qualified \function();
to call the
built-in.
Consider this very brief example:
// MockFileFunctions.php
namespace test;
function fopen{
echo 'pretending to write files!';
}
// TestClass.php
namespace test using local functions;
class TestClass{
function TestMethod(){
// array_key_exists uses built-in due to leading
if(\array_key_exists(...)){
// fopen uses \test\fopen because of "using local functions"
fopen('file.txt');
}
}
}
In the example above, we have specified "using local functions", which
prevents namespace lookup of functions, and defaults to local if the
function name is unqualified.
The backslash before \array_key_exists() triggers the global version,
even though this file specifies "using local functions". That's because
"using local functions" only applies to unqualified function calls
and the backslash qualifies it as global.
"Using local functions" would only ever be used in very special use
cases like unit testing.
The most common use case would be for the companion declaration: "using
global functions".
When "using global functions" is declared, the backslash is omitted
from all function calls where the global function is desired, and the
parser pretends a backslash is there (if there isn't a namespace
specified for that function call).
This boosts performance by skipping the ns lookup and by using
dedicated opcodes, while also keeping code clean in the vast majority
of use cases where only global functions are ever used by namespaced
classes.
I'm not sure your proposal solves the mocking problem. If the engine
is to interpret all non-fq calls as global or local, how would a
library include your file while switching this configuration, when it
is implemented as some directive in the file?I'm not sure I understand this question.
Consider this example:
<?php
namespace Foo;
echo time()
;
With my proposal, this would now always call the global time()
function. You were suggesting that "using local functions" would help
mitigate this, but I don't think it does.
- The user can't add "using local functions" to this file, because in
a production environment where ClockMock isn't used, there is no local
time()
function. With "using local functions" having no fallback, the
global function would not be called, and hence the code would error.
Your proposal would only help if the semantics were "use local
function if available, or global function otherwise", which is what we
have currently. - Symfony can't enable the "using local functions" option either when
including your file, because it would require modifying the source
code. That leaves eval(), which would really already solve this
problem if you replace calls totime()
with something else.
Also, how would only singular functions be mocked when there is no
fallback to the global scope for the rest of the functions used
within the file? That would necessitate mocking all functions, even
the unmodified ones.If a developer needed to override built-in functions (in a specific
file) with local ones, and still use some built-in functions, they
would then need to use the fully-qualified\function();
to call the
built-in.
Well, ok. But then we're back to prefixing global calls, which defeats
the purpose of the proposal. I think it's much preferable to look for
a different, more robust mocking solution that also works for
unnamespaced code and fully qualified calls.
Ilija
Consider this example:
<?php
namespace Foo;
echotime()
;With my proposal, this would now always call the global
time()
function. You were suggesting that "using local functions" would help
mitigate this, but I don't think it does.
- The user can't add "using local functions" to this file, because in
a production environment where ClockMock isn't used, there is no
local
time()
function. With "using local functions" having no fallback, the
global function would not be called, and hence the code would error.
Your proposal would only help if the semantics were "use local
function if available, or global function otherwise", which is what
we
have currently.- Symfony can't enable the "using local functions" option either when
including your file, because it would require modifying the source
code. That leaves eval(), which would really already solve this
problem if you replace calls totime()
with something else.
This is a different problem that could be solved by a sandbox API.
Well, ok. But then we're back to prefixing global calls, which
defeats the purpose of the proposal.
Global functions would only need a prefix \
in the very rare cases
where local functions are set as the default. For most people, the
would be omitted, as globals would be set as default for unqualified
function names.
This is a different problem that could be solved by a sandbox API.
Not sure which case we were talking about then. ClockMock is what I've
been referencing all along.
Well, ok. But then we're back to prefixing global calls, which
defeats the purpose of the proposal.Global functions would only need a prefix
\
in the very rare cases
where local functions are set as the default. For most people, the
would be omitted, as globals would be set as default for unqualified
function names.
Right. But apart from mocking, what are these cases? If performance
were no longer an issue, "using global functions" just makes the
language harder to use, removing the local fallback. "using local
functions" may be useful in namespaced code making many calls to
functions within the same namespace. In that case, it would probably
be more useful to switch the lookup order back instead. If you want to
pay zero performance penalty, you can prefix global calls with .
You'd need to do that with "using local functions" anyway.
As for mocking: If the code needs to change either way, why not make
it testable in the first place, e.g. through dependency injection for
time()
? At least this only requires changing the calls that are
mocked, instead of all the calls that aren't.
The main benefit of the approach from ClockMock is that your code
(probably) doesn't need to change. I do think that the entire approach
is hacky, and probably worth solving on a language-level, at least if
possible without adding limitations to the engine. A good first start
would be to know what functions are commonly mocked with this
approach.
Ilija
This is a different problem that could be solved by a sandbox API.
Not sure which case we were talking about then. ClockMock is what I've
been referencing all along.Well, ok. But then we're back to prefixing global calls, which
defeats the purpose of the proposal.Global functions would only need a prefix
\
in the very rare cases
where local functions are set as the default. For most people, the
would be omitted, as globals would be set as default for unqualified
function names.Right. But apart from mocking, what are these cases? If performance
were no longer an issue, "using global functions" just makes the
language harder to use, removing the local fallback. "using local
functions" may be useful in namespaced code making many calls to
functions within the same namespace. In that case, it would probably
be more useful to switch the lookup order back instead. If you want to
pay zero performance penalty, you can prefix global calls with .
You'd need to do that with "using local functions" anyway.
So. Fun story. I’ve seen this technique used to patch out fgetcsv due to a memory leak, with a pure php polyfill, in at least four unrelated codebases. I believe the leak is still there too and now that I know so much more about zend strings, I can probably guess what the issue is as well.
I digress. The point is, there are code bases that use this technique to get around php issues or even “implement older versions” of core functions to retain backwards compatibility until the code can be updated to deal with the new core version.
As for mocking: If the code needs to change either way, why not make
it testable in the first place, e.g. through dependency injection for
time()
? At least this only requires changing the calls that are
mocked, instead of all the calls that aren't.
Have you ever worked on some legacy code where you aren’t really sure how it is working in the first place? Even something as simple as shimming out time()
could cause race conditions in the overall system. Refactoring these systems is an art form all by itself and you attempt to add tests to understand the system end-to-end, long before you ever change a line of production code.
The main benefit of the approach from ClockMock is that your code
(probably) doesn't need to change. I do think that the entire approach
is hacky, and probably worth solving on a language-level, at least if
possible without adding limitations to the engine. A good first start
would be to know what functions are commonly mocked with this
approach.Ilija
Time functions are the most obvious ones, and then any function that changes between versions and breaks something would be the non-obvious ones. (Ex: counting null: https://3v4l.org/hmNiL) This allows you to upgrade php / support higher versions while slowly upgrading core function calls.
— Rob
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?
You tell me. As I have repeatedly said, I don't actually know anything about these algorithms. SHA-256 is the only one on the list which I've heard of, and I'm aware it's newer than SHA-1. I don't know why SHA-512 isn't "better", I don't know why nobody talks about SHA-3, and I don't know if one of the others in the list is absolutely amazing and should be everyone's default forever.
As far as I can see, nobody, in this whole discussion, has actually stepped up and explained what users should be using, once we have taught them that MD5 and SHA-1 are bad.
Or leave them them the 60-piece set (which includes flat-head and Phillips screwdrivers, so they're not being taken away), and write some tips on how to use it correctly.
So go ahead and write those tips. You don't need an RFC vote to improve the documentation.
Here is my offer to those arguing in favour of this deprecation: If you show me a draft of a comprehensive improvement to the manual to explain how users should be choosing a hashing algorithm, I will consider changing my vote.
I am also happy to help with proofreading, and working out how to format it into DocBook that fits nicely in the manual.
As long as the deprecation rests on "somebody in the next 10 years might get round to improving the manual", my vote remains a firm No.
Regards,
Rowan Tommins
[IMSoP]
On Sun, Jul 28, 2024, 08:42 Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions
for both, and then when SHA4 comes along (as it inevitably will) another
standalone function for one of its variants?You tell me. As I have repeatedly said, I don't actually know anything
about these algorithms. SHA-256 is the only one on the list which I've
heard of, and I'm aware it's newer than SHA-1. I don't know why SHA-512
isn't "better", I don't know why nobody talks about SHA-3, and I don't know
if one of the others in the list is absolutely amazing and should be
everyone's default forever.As far as I can see, nobody, in this whole discussion, has actually
stepped up and explained what users should be using, once we have taught
them that MD5 and SHA-1 are bad.Or leave them them the 60-piece set (which includes flat-head and
Phillips screwdrivers, so they're not being taken away), and write some
tips on how to use it correctly.So go ahead and write those tips. You don't need an RFC vote to improve
the documentation.Here is my offer to those arguing in favour of this deprecation: If you
show me a draft of a comprehensive improvement to the manual to explain how
users should be choosing a hashing algorithm, I will consider changing my
vote.I am also happy to help with proofreading, and working out how to format
it into DocBook that fits nicely in the manual.As long as the deprecation rests on "somebody in the next 10 years might
get round to improving the manual", my vote remains a firm No.Regards,
Rowan Tommins
[IMSoP]
I have voted yes only because I thought it's about removing inconsistent
function alias. I can't see anything wrong with this hashing algorithms and
I don't consider them unsafe. However, as someone pointed out this doesn't
seem to be correct as the crc32 function isn't part of the depreciation
proposal. I am confused now as to why we are trying to deprecate these
functions at all. If it's about people confusing the hashing algorithms
with password key stretching algorithms then that's not a valid reason. A
red warning in the documentation should aid people in clearing this
confusion.
Hi
I have voted yes only because I thought it's about removing inconsistent
function alias. I can't see anything wrong with this hashing algorithms and
I don't consider them unsafe. However, as someone pointed out this doesn't
seem to be correct as the crc32 function isn't part of the depreciation
proposal. I am confused now as to why we are trying to deprecate these
crc32()
is different, because the hash()
function is not a direct
drop-in replacement. The crc32()
function returns an integer, whereas
the hash()
function returns raw bytes / hex encoded bytes (as with all
hash functions it provides). Unfortunately one also needs to remember to
use the 'crc32b' algorithm value, because 'crc32' is taken up by the
bit-reversed bzip2 variant of CRC32. The 'crc32b' naming is something
PHP-specific and non-standard :-/
The standalone crc32()
could probably also be deprecated, given the big
red warning in documentation. But it's sufficiently different from the
standalone md5()
and sha1()
functions to not bundle it with them.
functions at all. If it's about people confusing the hashing algorithms
with password key stretching algorithms then that's not a valid reason. A
red warning in the documentation should aid people in clearing this
confusion.
No, it's about MD5 and SHA-1 being a bad choice nowadays and
nevertheless being more prominently available than the alternatives.
Best regards
Tim Düsterhus
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?
You tell me. As I have repeatedly said, I don't actually know anything about these algorithms. SHA-256 is the only one on the list which I've heard of, and I'm aware it's newer than SHA-1. I don't know why SHA-512 isn't "better", I don't know why nobody talks about SHA-3, and I don't know if one of the others in the list is absolutely amazing and should be everyone's default forever.
As far as I can see, nobody, in this whole discussion, has actually stepped up and explained what users should be using, once we have taught them that MD5 and SHA-1 are bad.
Or leave them them the 60-piece set (which includes flat-head and Phillips screwdrivers, so they're not being taken away), and write some tips on how to use it correctly.
So go ahead and write those tips. You don't need an RFC vote to improve the documentation.
Here is my offer to those arguing in favour of this deprecation: If you show me a draft of a comprehensive improvement to the manual to explain how users should be choosing a hashing algorithm, I will consider changing my vote.
I am also happy to help with proofreading, and working out how to format it into DocBook that fits nicely in the manual.
As long as the deprecation rests on "somebody in the next 10 years might get round to improving the manual", my vote remains a firm No.
Regards,
Rowan Tommins
[IMSoP]
Hey, all I'm doing is pointing out that the only reason those functions
were standalone to start with is because when they were added they were
the only ones around; they weren't introduced as "easier to use"
alternatives to the more generic case. If hash()
had been added in PHP
with half a dozen different algorithms right at the beginning, would
md5()
and sha1()
have been given special treatment? Possibly: MD5 (and
later SHA1) got all the publicity at the time.
Whether they are "bad" or "should not be used" has nothing to do with
that. I understand that the RFC is hard on them because they are broken
algorithms that don't have any advantages over others that have been
added since and therefore the language shouldn't be encouraging their
use by providing dedicated functions for them, I'm just pointing out
that those dedicated functions are historical artefacts.
I haven't seen an explanation of what makes them "easier to use": if you
want to use md5()
(for whatever reason: I don't care) it's not that hard
to write hash("md5") instead. I just went through a file deduplication
utility of mine and did exactly that. Yes, I am using MD5 as a message
digest algorithm.
Hi
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?
You tell me. As I have repeatedly said, I don't actually know anything about these algorithms. SHA-256 is the only one on the list which I've heard of, and I'm aware it's newer than SHA-1. I don't know why SHA-512 isn't "better", I don't know why nobody talks about SHA-3, and I don't know if one of the others in the list is absolutely amazing and should be everyone's default forever.
As far as I can see, nobody, in this whole discussion, has actually stepped up and explained what users should be using, once we have taught them that MD5 and SHA-1 are bad.
Let me attempt to give an explanation. As of today users should use in
order of priority:
- The hash function they need for interoperability: If a service
provides a SHA-1 checksum, then there is no choice and SHA-1 needs to be
used. - The hash function their security team requests them to use.
- A function from the SHA-2 family, with SHA-256 being a good default
choice, because that's the secure default choice across the industry.
See also: https://news.ycombinator.com/item?id=14469614 and specifically
https://news.ycombinator.com/item?id=14469730 ("there are hash
cryptographers who think SHA-2 may never be broken").
To expand on (3):
- SHA-256 and SHA-224 are literally the same, except for the initial
values and the fact that SHA-224 returns fewer bits. - SHA-512, SHA-384, SHA-512/224 and SHA-512/256 are literally the same,
except for the initial values and the fact that the latter 3 return
fewer bits. - The main structure of SHA-512 and SHA-256 is the same, SHA-512 just
uses 64-bit operations and larger chunks. Wikipedia explains this in
detail: https://en.wikipedia.org/wiki/SHA-2#Pseudocode - SHA-512 and its variants are faster than SHA-256 and its variants, the
reason is that SHA-256 is restricted to 32-bit operations. But: See below. - The truncated variants are immune to so-called length-extension
attacks, but using a HMAC protects against that and thus is the
recommended usage.
As for the speed difference, I've created a (pending) PR to improve the
speed of SHA-256 2x to 5x (depending on the input length), by leveraging
the SHA-NI instruction set when available. When it's not available, the
SSE2 implementation improves the speed by 1.3x:
https://github.com/php/php-src/pull/15152
(Credit where credit is due: The implementation was written by Dr. Colin
Percival, I just did the PHP integration).
Or leave them them the 60-piece set (which includes flat-head and Phillips screwdrivers, so they're not being taken away), and write some tips on how to use it correctly.
So go ahead and write those tips. You don't need an RFC vote to improve the documentation.
Here is my offer to those arguing in favour of this deprecation: If you show me a draft of a comprehensive improvement to the manual to explain how users should be choosing a hashing algorithm, I will consider changing my vote.
I am also happy to help with proofreading, and working out how to format it into DocBook that fits nicely in the manual.
As long as the deprecation rests on "somebody in the next 10 years might get round to improving the manual", my vote remains a firm No.
I'm seeing that you already found the issue discussing improvements to
the documentation, but for reference for readers following along:
https://github.com/php/doc-en/issues/3616
Please also see my previous email regarding the docs improvements I've
already made: The examples for the hash()
functions should now all use
sha256 (matching the explanation above), please point out if I missed any.
Best regards
Tim Düsterhus
PS: I know that life can get in the way, but as it fits the topic of
your last paragraph I'd like to note that I don't believe you followed
up regarding the documentation feedback back when the PHP 8.3
deprecation RFC (https://externals.io/message/120422#120601) happened.
Let me attempt to give an explanation. As of today users should use in
order of priority:
- The hash function they need for interoperability: If a service
provides a SHA-1 checksum, then there is no choice and SHA-1 needs to
be used.- The hash function their security team requests them to use.
- A function from the SHA-2 family, with SHA-256 being a good default
choice, because that's the secure default choice across the industry.
Thanks, this is a good concise explanation, and exactly the kind of
thing I think should be in the documentation before we start telling
people they're "doing it wrong" by using md5()
or sha1()
.
It also strengthens my conviction that we should add sha256() and
sha256_file() as standalone functions: a "good default choice" is really
all most users want or need. There seems to be no risk of us needing to
add a new function every other year, or provide a dozen functions to
cover different use cases.
Users in scenarios 1 or 2 will know what to look up in the hash_algos()
list and pass to hash()
. Users who have complex use cases like
incremental hashing, or whatever hash_hkdf()
does, can keep using the
complex but flexible functions that provide those facilities.
Note that if SHA-256 ever stops being the recommendation, having a
sha256() function will give us the same opportunity we have now with
md5()
and sha1()
: to issue a message to users of the standalone
function, but keep the algorithm fully available in hash()
.
Regards,
--
Rowan Tommins
[IMSoP]
I'm not talking about the MD5 or SHA1 algorithms or whether they should or shouldn't be used. I'm just talking about the functions themselves.
md5()
,md5_file()
,sha1()
, andsha1_file()
. They only exist because there wasn't the generic hash algorithm extension when they were created.I understand what is being claimed (and you're not the only one claiming it), I'm just not convinced it's true. I think they have standalone functions for the same reason we added str_contains and str_starts_with - because it's convenient to have straightforward functions for common use cases.
The
hash()
function is like a 60-piece set of interchangeable screwdriver heads, which only professionals and enthusiasts need;md5()
andsha1()
are like the flat-head and Phillips screwdrivers that everyone has in a drawer somewhere.The thing that always surprises me is that PHP doesn't have a standalone function for SHA-256, which is the only other I've ever used.
To continue the analogy, we're missing a Pozidriv screwdriver, so people are misusing the Phillips one. The RFC is suggesting that we take away their flat-head and Phillips screwdrivers, and leave them with the 60-piece set, and no instructions.
My suggestion is we instead give them a Pozidriv screwdriver, and write some tips on how to use it correctly.
I rise in support of this mindset.
Some of us like to draw inspiration from other languages, and in that vein one of the things that makes Go such a joy to program in is the fact the Go team continues to add "convenience" functions with every new 6 month release.
Many (all?) of the functions the Go team adds could have been written in "userland" but they represent such common use-cases that the Go team decided to make them easy and obvious. They even soft deprecate functions and structs that are not ideal and replace them with ones with better names and better signatures. If Go had started with the string and array functions PHP has today they would almost certainly replaced them by now, ~15 years into Go's tenure.
It is a shame that PHP's culture is so hostile towards adding functionality that could also be added in userland, especially when that functionality would simplify and standardize algorithms that are non-obvious and/or too easy to implement incorrectly. If the PHP culture embraced moving common use-cases into core it would make PHP much more pleasurable to program in and make it much less likely that PHP programs would have bugs and/or security vulnerabilities.
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?
Yes. Yes, And Yes.
And ideally within a \PHP
namespace.
-Mike
P.S. But as we know a standardized \PHP
namespace is apparently never going to happen although for the life of me I still cannot understand why not — and I was here during the voting down of that RFC ~4 years ago — given how so many other languages had done the equivalent.
Many (all?) of the functions the Go team adds could have been written in "userland" but they represent such common use-cases that the Go team decided to make them easy and obvious. They even soft deprecate functions and structs that are not ideal and replace them with ones with better names and better signatures. If Go had started with the string and array functions PHP has today they would almost certainly replaced them by now, ~15 years into Go's tenure.
It is a shame that PHP's culture is so hostile towards adding functionality that could also be added in userland, especially when that functionality would simplify and standardize algorithms that are non-obvious and/or too easy to implement incorrectly. If the PHP culture embraced moving common use-cases into core it would make PHP much more pleasurable to program in and make it much less likely that PHP programs would have bugs and/or security vulnerabilities.
I, too, wish there was more willingness to add useful functions to core.
Just saying "they can be implemented in userland" is a bit of a cop-out
because, duh, PHP is Turing-complete. A lot of the existing array
functions could be replicated by userland (ab)use of array_reduce, and
yet no-one would suggest removing them, and if they'd been absent a lot
of people would be asking for them.
Anyone else wish that sort()
took its argument by value instead of by
reference? (Solvable in userland.) Or how about a named argument that
allowed you to provide a key function to sort on instead of a
comparator? (Solvable in userland.) Okay, the first change would break a
lot, but an alternate sorted() function that did behave that way could
be added.
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?
Yes. Yes, And Yes.
And ideally within a
\PHP
namespace.
At that point you've got \PHP\sha3() instead of hash("sha3-?"), and now
you've (a) lost the word "hash" indicator of what's going on, and (b)
hidden the choice of "?" from the user. I'm not really seeing an
improvement.
I, too, wish there was more willingness to add useful functions to core.
:-)
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for both, and then when SHA4 comes along (as it inevitably will) another standalone function for one of its variants?
Yes. Yes, And Yes.
And ideally within a\PHP
namespace.
At that point you've got \PHP\sha3() instead of hash("sha3-?"), and now you've (a) lost the word "hash" indicator of what's going on, and (b) hidden the choice of "?" from the user. I'm not really seeing an improvement.
Well, your comments are based on assumptions that they would have to be implemented as you were envisioning when you wrote your reply.
My "Yes. Yes. And Yes" was not intended to be full RFC that fleshed out all the considerations and proposed a specific implementation. IOW, there are definitely ways to address your criticisms if we are open-minded in what could be considered. :-)
At that point you've got \PHP\sha3()
I'm sure you will find it ironic in hindsight like I do that you chose sha3
(vs. md5
) as the function to illustrate your argument about not having the word "hash" given how SHA is an acronym for "Secure Hash Algorithm." :-)
By the same token, we could complain about how parse_url()
, urlencode()
and urldecode()
all lost the word "resource." :-o
Seriously though, some acronyms are well-known enough — or easily discovered enough — that we should be able to use them as function names without lamenting they are not spelled out.
But if the concern is they are not grouped together as hashing functions than — had we had a \PHP
namespace as an option — we could easily have:
- \PHP\Hashing\md5()
- \PHP\Hashing\sha1()
- \PHP\Hashing\sha256()
- \PHP\Hashing\sha3()
- etc.
Also, there is no reason we have to be exhaustive. The pareto principle is always one we should consider when deciding when anything should be elevated to having its own dedicated function.
instead of hash("sha3-?")
The problem here is semantic information is encoded in a string rather than in a named symbol and thus is not recognized in the AST when parsing and requires a hack of diving into the string in order to validate.
So typically, no type checking, no auto-complete, and potentially delayed error detection.
Using strings where symbols would be better is a common wart in PHP — such as PHP not having a first-class type for class, interface or function — so we have to pass around names as non-typesafe strings instead.
BTW, I asked ChatGPT to opine on the problems caused with strings-as-symbols from computer science and software engineering perspectives, and this is what it gave me:
https://chatgpt.com/share/17d57881-c411-4b64-863a-d0692b4a4577
and (b) hidden the choice of "?" from the user. I'm not really seeing an improvement.
What's wrong with something like?
use PHP\Hashing\sha3;
use PHP\Bits;
...
$hash_224 = sha3($data,Bits::224);
$hash_256 = sha3($data,Bits::256);
$hash_384 = sha3($data,Bits::384);
$hash_512 = sha3($data,Bits::512);
The point I am trying to get across is that improving the developer experience is not a binary true or false endeavor. There are many ways to improve DX, but they all must start with a openness to consider doing it.
Hey, all I'm doing is pointing out that the only reason those functions were standalone to start with is because when they were added they were the only ones around; they weren't introduced as "easier to use" alternatives to the more generic case. If
hash()
had been added in PHP with half a dozen different algorithms right at the beginning, wouldmd5()
andsha1()
have been given special treatment? Possibly: MD5 (and later SHA1) got all the publicity at the time.I haven't seen an explanation of what makes them "easier to use": if you want to use
md5()
(for whatever reason: I don't care) it's not that hard to write hash("md5") instead. I just went through a file deduplication utility of mine and did exactly that. Yes, I am using MD5 as a message digest algorithm.
But just because they were historical artifacts doesn't mean that they should be frowned on, or removed. echo
is also a historical artifact, but no one is arguing we should get rid of this:
echo "Hello World";
And then require developers to use this instead:
fprintf(STDOUT, "Hello World");
¯_(ツ)_/¯
-Mike
At that point you've got \PHP\sha3() instead of hash("sha3-?"), and now you've (a) lost the word "hash" indicator of what's going on, and (b) hidden the choice of "?" from the user. I'm not really seeing an improvement.
Once again, you're assuming users have any idea a) what the numbers in the SHA3 variants mean, and b) how to choose between them.
I've seen plenty of uses of SHA-256 in the wild, and none of the other SHA2 variants. I don't know why, I presume people with far more knowledge than me have decided that is a good choice of variant. So when I'm looking for "something better than sha1()
", I look for sha256(), remember it doesn't exist, and write hash('sha256', ...)
If I'm doing it wrong, and should be making some calculation to choose SHA-382 or SHA-512, please let me know. But don't assume that just forcing me to put the algorithm name in qoute marks is going to make me know, or care, what the name actually means.
Regards,
Rowan Tommins
[IMSoP]
At that point you've got \PHP\sha3() instead of hash("sha3-?"), and now you've (a) lost the word "hash" indicator of what's going on, and (b) hidden the choice of "?" from the user. I'm not really seeing an improvement.
Once again, you're assuming users have any idea a) what the numbers in the SHA3 variants mean, and b) how to choose between them.
I've seen plenty of uses of SHA-256 in the wild, and none of the other SHA2 variants. I don't know why, I presume people with far more knowledge than me have decided that is a good choice of variant. So when I'm looking for "something better than
sha1()
", I look for sha256(), remember it doesn't exist, and write hash('sha256', ...)If I'm doing it wrong, and should be making some calculation to choose SHA-382 or SHA-512, please let me know. But don't assume that just forcing me to put the algorithm name in qoute marks is going to make me know, or care, what the name actually means.
Regards,
Rowan Tommins
[IMSoP]
It sounds like the argument for retaining md5()
and sha1()
, and adding
to them isn't that they're easier to use in themselves, but that hash()
offers too many alternatives. If PHP were to offer one specific hash
function (that's "one on top of those that it already offers") that can
be used without thinking, and leave hash()
to those who may have to deal
with those alternatives - presumably they already know what they're
doing or they wouldn't be dealing with them.
That still doesn't protect md5()
and sha1()
from deprecation; if there
is a PHP-mandated default hash algorithm that gets its own name, then
users should be encouraged to use that one, which means not leaving the
others lying around to for it to hide among. Anyone who needs to
continue to support the old algorithms can ... use hash()
.
When it comes to advice about which to use, that seems less the purview
of a PHP reference manual for the function, and more something like
https://csrc.nist.gov/projects/hash-functions
That still doesn't protect
md5()
andsha1()
from deprecation; if there is a PHP-mandated default hash algorithm that gets its own name, then users should be encouraged to use that one, which means not leaving the others lying around to for it to hide among. Anyone who needs to continue to support the old algorithms can ... usehash()
.
Absolutely. If someone wants to write a proposal to do that, I'd probably be willing to support it. Until then, there's no reason to disrupt users of the existing functions, when there's no clear message of what they are doing wrong.
When it comes to advice about which to use, that seems less the purview of a PHP reference manual for the function, and more something like
https://csrc.nist.gov/projects/hash-functions
If hash()
exists only as a function for power users who already know something about the subject, then yes, maybe. If we're telling everyone to look it up when they thought they were going to use sha1()
, then we need to give them something to read when they get there.
Regards,
Rowan Tommins
[IMSoP]
Hi
is already quite specific. We're talking about end users who are rolling
their own security implementations and are unaware of the security risks
but somehow know how to use these functions without reading the
documentation and warnings.
No, we are talking about end users who are following tutorials that were
written when PHP 4 was the most recent PHP version.
We are also talking about end users who look at existing code bases for
"inspiration", see md5()
used, notice that the output looks random and
use it, believing they know what they are doing, but in that process use
it in a way that is insecure.
As an example, using md5_file()
to implement a cache buster is fine, but
a less-experienced developer may believe that md5_file()
uniquely
identifies the file contents and use it in a way where strong
collision-resistance against an adversary is required.
On the other hand, who will be impacted by these deprecations? Potentially
everyone, as these are included in many projects and in many vendor
packages. It's busy work for the people who aren't affected. Sure,
eventually, it will all be sorted out as CI warnings slowly subside because
of this.
I'm positive that even existing projects written by experienced
developers would benefit from re-checking if their use of MD5 and SHA-1
is actually safe instead of assuming that this is the case, when the
specific functionality has been untouched for the last 10 years.
Looking back at my own code, I'm seeing places where using SHA-1 is not
strictly insecure, but where a stronger hash function nevertheless would
have been more appropriate, if only to simplify code audits. I just used
sha1()
, because it was temptingly convenient compared to hash('sha256', …).
Best regards
Tim Düsterhus
As an example, using
md5_file()
to implement a cache buster is fine,
but a less-experienced developer may believe thatmd5_file()
uniquely
identifies the file contents and use it in a way where strong
collision-resistance against an adversary is required.I'm positive that even existing projects written by experienced
developers would benefit from re-checking if their use of MD5 and
SHA-1 is actually safe instead of assuming that this is the case,
when the specific functionality has been untouched for the last 10
years.
Isn't the philosophy of open source software "tools, not policy"?
I'm in the process of refactoring an old framework and I just found a
use of sha1()
. It's being used to generate a unique resource lock. It
doesn't need to be secure, just a fast and random UID.
Hi
I'm in the process of refactoring an old framework and I just found a
use ofsha1()
. It's being used to generate a unique resource lock. It
doesn't need to be secure, just a fast and random UID.
SHA-1 is a deterministic algorithm, thus it is unable to generate a
random UID. Whatever this code is doing can most likely be more reliably
achieved in a different way.
Best regards
Tim Düsterhus
No, we are talking about end users who are following tutorials that were
written when PHP 4 was the most recent PHP version.We are also talking about end users who look at existing code bases for
"inspiration", seemd5()
used, notice that the output looks random and
use it, believing they know what they are doing, but in that process use
it in a way that is insecure.
Hi Tim,
How prevalent is this exactly? PHP 4 ended support in 2008. I think
putting warning labels on these things in the docs is enough, but we can't
go around locking up every kitchen knife just because there are some idiots
out there who read a book from the 50s about the war.
And like I said previously, this change isn't what is going to determine if
those people will write good, reliable, secure code. If their learning
insticast can't see past a blog tutorial from 20 years ago, not even to
look up the function in the manual, they will not ever achieve that.
I'm positive that even existing projects written by experienced
developers would benefit from re-checking if their use of MD5 and SHA-1
is actually safe instead of assuming that this is the case, when the
specific functionality has been untouched for the last 10 years.
You can say this about pretty much every software project in existence,
regarding anything. I just don't think it's up to PHP to mandate these
checks. If you want to create a fund for developers to go review their
code on the clock, fine, but don't force it on them. Might as well
deprecate everything each major version to force people to rewrite their
projects to "current best practices". If I wanted to do that, I'd just use
the JS framework of the month.
Looking back at my own code, I'm seeing places where using SHA-1 is not
strictly insecure, but where a stronger hash function nevertheless would
have been more appropriate, if only to simplify code audits. I just used
sha1()
, because it was temptingly convenient compared to hash('sha256', …).
sha1 was the "proper" alternative to md5, until it wasn't. md5
superceeded crc32, which btw, why isn't that on the hit-list?
You're using sha256? It's soooo outdated, use sha512 and key it with hmac,
you casual /s
SHA-1 is a deterministic algorithm, thus it is unable to generate a
random UID. Whatever this code is doing can most likely be more reliably
achieved in a different way.
ALL hashing functions are deterministic. That's the whole point, and
applies to sha256 just the same. You want to be able to hash the same
content and get the same hash. Just the complexity and chance of
collision changes. The reliability and security you are concerned with in
this scenario really depends on what randomness you feed it.
Thanks,
Peter
If their learning insticast
*instincts.
I should also clarify, I'm not against deprecations in general. However,
the benefits should outweigh the costs. If something is getting
unmaintainable, no longer supported, inherently insecure etc, those are all
good reasons. password_hash
as mentioned was a great addition, and
should/did solve this very issue. Even someone reading a blog tutorial
from 11 years ago would be able to see this used properly.
But md5/sha1 are not bad functions, they do exactly what they say on the
box. Being able to do the exact same thing by spelling the function
slightly differently isn't even deprecating them, just deprecating an
alias. They're only bad if used in a bad way, and that to me is not
enough of a reason.
Thanks,
Peter
If their learning insticast
*instincts.
I should also clarify, I'm not against deprecations in general. However, the benefits should outweigh the costs. If something is getting unmaintainable, no longer supported, inherently insecure etc, those are all good reasons.
password_hash
as mentioned was a great addition, and should/did solve this very issue. Even someone reading a blog tutorial from 11 years ago would be able to see this used properly.But md5/sha1 are not bad functions, they do exactly what they say on the box. Being able to do the exact same thing by spelling the function slightly differently isn't even deprecating them, just deprecating an alias. They're only bad if used in a bad way, and that to me is not enough of a reason.
Stephen Rees-Carter, a security expert that has performed countless security audits on Wordpress and Laravel websites, would like to disagree with the fact that it is not enough of a good reason. [1]
A warning on a documentation page is useless, as nobody is forced to read it.
Yet again the PHP community doesn't care about security of its users, current and future, and just prefers the convenience of needing to type less characters and not go back fix some code for better design.
I am not sure why I was expecting something else, but I guess I am just disappointed.
I suppose we are truly becoming Oracle.
Sincerely,
Gina P. Banyard
Yet again the PHP community doesn't care about security of its users, current and future, and just prefers the convenience of needing to type less characters and not go back fix some code for better design.
Gina P. Banyard
If you describe it in such a dramatic fashion, then there is no reason to keep sha/md5 functionality in hash too?
One could come up also with a different statement - "the PHP community doesn't care about backwards compatibility (in favor of questionable deprecations/removals)" (which at some point even borders with some "Karma farming" [1])
[1] https://socket.dev/blog/openssf-warns-of-reputation-farming-using-closed-github-issues-and-prs
rr
Am 26.07.2024, 12:03:53 schrieb Gina P. Banyard internals@gpb.moe:
On Friday, 26 July 2024 at 08:09, Peter Stalman sarkedev@gmail.com
wrote:If their learning insticast
*instincts.
I should also clarify, I'm not against deprecations in general. However,
the benefits should outweigh the costs. If something is getting
unmaintainable, no longer supported, inherently insecure etc, those are all
good reasons.password_hash
as mentioned was a great addition, and
should/did solve this very issue. Even someone reading a blog tutorial from
11 years ago would be able to see this used properly.But md5/sha1 are not bad functions, they do exactly what they say on the
box. Being able to do the exact same thing by spelling the function
slightly differently isn't even deprecating them, just deprecating an
alias. They're only bad if used in a bad way, and that to me is not
enough of a reason.Stephen Rees-Carter, a security expert that has performed countless
security audits on Wordpress and Laravel websites, would like to disagree
with the fact that it is not enough of a good reason. [1]
A warning on a documentation page is useless, as nobody is forced to read
it.Yet again the PHP community doesn't care about security of its users,
current and future, and just prefers the convenience of needing to type
less characters and not go back fix some code for better design.I am not sure why I was expecting something else, but I guess I am just
disappointed.
I suppose we are truly becoming Oracle.Sincerely,
Gina P. Banyard
The only thing that removal of these functions would cause is a.) make
people rant about php unnecessarily b.) 99.9% would counter the removal of
these functions bys adding this kind of code in their bootstrap, maybe
include a polyfill library via composer.
if (!function_exists(‚md5‘)) { function md5($data) { return hash(‚md5‘,
$data); }}
Stephen Rees-Carter, a security expert that has performed countless security audits on Wordpress and Laravel websites, would like to disagree with the fact that it is not enough of a good reason. [1]
A warning on a documentation page is useless, as nobody is forced to read it.
Right, but even a deprecation notice is likely to be ignored by those
(either use the shut-up operator, or use hash("md5), or maybe a polyfill
to support old PHP versions), so the deprecation wouldn't help in such
cases.
(I've recently seen a new release of a software which still uses
https://www.openwall.com/phpass/. Apparently, the notice to prefer
the password_*() API has been ignored or overlooked.)
On the other hand, I'm quite confident that a deprecation could be
useful for some developers, who would at least reconsider the use of
md5/sha1 hashes, but just have overlooked this; although some static
analysis should report respective issues. However, there is certainly
code without any static analysis, where at least this discussion appears
to be helpful, e.g. our php-sdk-binary-tools might reconsider their use
of md5()
and md5(uniqid())[2].
Note that I'm not against these deprecations, but I'm also not strongly
in favor. I see valid arguments from both proponents and opponents.
[2] https://github.com/php/php-sdk-binary-tools/issues/21
Cheers,
Christoph
Stephen Rees-Carter, a security expert that has performed countless security audits on Wordpress and Laravel websites, would like to disagree with the fact that it is not enough of a good reason. [1]
A warning on a documentation page is useless, as nobody is forced to read it.Right, but even a deprecation notice is likely to be ignored by those
(either use the shut-up operator, or use hash("md5), or maybe a polyfill
to support old PHP versions), so the deprecation wouldn't help in such
cases.(I've recently seen a new release of a software which still uses
https://www.openwall.com/phpass/. Apparently, the notice to prefer
the password_*() API has been ignored or overlooked.)On the other hand, I'm quite confident that a deprecation could be
useful for some developers, who would at least reconsider the use of
md5/sha1 hashes, but just have overlooked this; although some static
analysis should report respective issues. However, there is certainly
code without any static analysis, where at least this discussion appears
to be helpful, e.g. our php-sdk-binary-tools might reconsider their use
ofmd5()
and md5(uniqid())[2].Note that I'm not against these deprecations, but I'm also not strongly
in favor. I see valid arguments from both proponents and opponents.[2] https://github.com/php/php-sdk-binary-tools/issues/21
Cheers,
Christoph
One thing to remind people about, the deprecations for md5()
, sha1()
, and uniqid()
explicitly say they cannot be outright removed before PHP 10. That's at least 6 years away. That gives a loooooong time for documentation, tutorials, instructions, and code to be updated.
That long deprecation period is the reason why I was comfortable voting yes. This isn't something that would happen tomorrow. It would be in at least two presidential elections from now.
--Larry Garfield
That long deprecation period is the reason why I was comfortable voting
yes. This isn't something that would happen tomorrow. It would be in at
least two presidential elections from now.
Real elections or rigged elections? 😁
One thing to remind people about, the deprecations for
md5()
,sha1()
,
anduniqid()
explicitly say they cannot be outright removed before PHP
10. That's at least 6 years away. That gives a loooooong time for
documentation, tutorials, instructions, and code to be updated.
It also gives a loooooong time for us to update that documentation before we start raising deprecation notices, so that there's a chance for someone to actually know what they're supposed to do about it.
When I formally proposed deprecation of utf8_encode and utf8_decode, I didn't even post the RFC for discussion before I had written two documentation PRs, one to improve documentation even if the RFC failed; and another proposing the wording if it passed.
In contrast, I voted against the deprecation of strftime()
because no effort had been made to explain how users should replace it. Surprise surprise, nobody has spent any more effort in the 3.5 years since the deprecation passed, and the only advice in the documentation remains:
Instead use the IntlDateFormatter::format() method.
Well, you are supposed to also check the
hash_hmac()
documentation...
Why would I, if I'm not using that function? For that matter, when should I be using that function? I'm not even being facetious here, I am genuinely lacking in relevant expertise, and the summary for hash_hmac()
is meaningless unless you already know what it does:
Generate a keyed hash value using the HMAC method
If the problem is that the web is full of bad documentation, find or write some GOOD documentation. Then, work out how best to signpost users to that documentation. Deprecating md5()
and sha1()
does neither.
Regards,
Rowan Tommins
[IMSoP]
One thing to remind people about, the deprecations for
md5()
,sha1()
,
anduniqid()
explicitly say they cannot be outright removed before PHP
10. That's at least 6 years away. That gives a loooooong time for
documentation, tutorials, instructions, and code to be updated.It also gives a loooooong time for us to update that documentation before we start raising deprecation notices, so that there's a chance for someone to actually know what they're supposed to do about it.
Hmm, such soft deprecations should be a good thing, but I'm afraid they
are not really reaching much of the user base. Remember ext/mysql?
That was soft deprecated for "centuries", but still support channels
were burning when it actually had been deprecated, and even after it had
been removed. (interestingly https://pecl.php.net/package/mysql still
says the package would have been moved to http://php.net/mysql)
Maybe, just maybe, it might be a good idea to repurpose E_STRICT
for
such things. Basically a three step deprecation: first document that a
feature is obsolete, then trigger E_STRICT, and only then E_DEPRECATED.
I haven't really thought this through, though.
In contrast, I voted against the deprecation of
strftime()
because no
effort had been made to explain how users should replace it. Surprise
surprise, nobody has spent any more effort in the 3.5 years since the
deprecation passed, and the only advice in the documentation remains:Instead use the IntlDateFormatter::format() method.
Yeah, the documentation should certainly be improved, but if there is
more work to do than time to do it – what can you do? If there was only
the need to cater to PHP core and the bundled extensions, there might be
sufficient time to keep the documentation in a good state, but there are
also so many PECL extensions documented there, and at least some of them
appear even unmaintained, and many of them probably nobody working on
the documentation has ever used; see e.g.
https://github.com/php/doc-en/pull/3360.
Well, you are supposed to also check the
hash_hmac()
documentation...Why would I, if I'm not using that function? […]
I should have explicitly marked my comment as irony. Of course, readers
of the documentation are not supposed to check some other functions,
unless told to do so.
Cheers,
Christoph
Hmm, such soft deprecations should be a good thing, but I'm afraid they
are not really reaching much of the user base. Remember ext/mysql?
That was soft deprecated for "centuries", but still support channels
were burning when it actually had been deprecated, and even after it had
been removed. (interestingly https://pecl.php.net/package/mysql still
says the package would have been moved to http://php.net/mysql)Maybe, just maybe, it might be a good idea to repurpose
E_STRICT
for
such things. Basically a three step deprecation: first document that a
feature is obsolete, then trigger E_STRICT, and only then E_DEPRECATED.
I haven't really thought this through, though.
Reading this I pondered why long soft deprecations do not really work and why there is still a crisis when the hard deprecation happens. Seems to me that as long as those who prioritize spend can put off doing things with no short term benefit then there is no tangible incentive to update. People will (almost?) always prioritize addressing a current crisis — or adding features that benefit them in the near term — than remediating something that is not causing them a current problem.
I wondered if it would not be possible to give code owners an incentive to remediate without actually forcing them to? The one thing that I came up with is reduced performance over time.
Somehow I expect to get a firestorm of negativity for even suggesting this, but please hear me out.
Imagine we had another round of deprecation voting for md5()
, sha1()
, etc. and instead of it just being soft deprecated until PHP 10 then hard deprecated, what if we ADDED a sleep duration in each of those functions, and we escalate for each minor release. Start with 100 milliseconds delay per function call, and then add another 100 milliseconds delay each point release of PHP.
This would allow all code to continue functioning but over time any code that uses the functions will get slower. The code owners — not the developers — will then be incented to prioritize a remediation sooner than later. And the longer they wait the worse performance will get assuming they keep upgrading their version of PHP. OTOH their code will continue to work no matter what,. so they can put off remediating until it becomes their priority.
This would certainly get lots of libraries to be motivated to remediate as their users would get annoyed with the delays, and commonly used libraries can affect large numbers of installations. And since performance topics drive eyeballs, lots of developer websites would be motivated to write articles about how and why people should remediate those functions.
Something to consider?
-Mike
P.S. Frankly, I really would not want to see md5()
nor sha1()
removed because there are valid use-cases for them. I would at least like to see them kept in some form, maybe in an \Insecure
namespace, or renamed insecure_md5()
and insecure_sha1()
or maybe add a third optional bool parameter $insecure_ok
that defaults to false
— or ?enum flag parameter accepting Hashing::INSECURE_OK as its only value — thus allowing developers to explicitly opt-in to insecure use.
Hi
P.S. Frankly, I really would not want to see
md5()
norsha1()
removed because there are valid use-cases for them. I would at least like to see them kept in some form, maybe in an\Insecure
namespace, or renamedinsecure_md5()
andinsecure_sha1()
or maybe add a third optional bool parameter$insecure_ok
that defaults tofalse
— or ?enum flag parameter accepting Hashing::INSECURE_OK as its only value — thus allowing developers to explicitly opt-in to insecure use.
Renaming the functions would do nothing but make this a backwards
compatibility break, whereas a deprecation does not.
Remember: The algorithms are also available by means of the hash()
function (and the related functions), without emitting a deprecation,
warning, error, or Exception.
Best regards
Tim Düsterhus
Hi
Hmm, such soft deprecations should be a good thing, but I'm afraid they
are not really reaching much of the user base. Remember ext/mysql?
That was soft deprecated for "centuries", but still support channels
were burning when it actually had been deprecated, and even after it had
been removed. (interestingly https://pecl.php.net/package/mysql still
says the package would have been moved to http://php.net/mysql)
What prevented me personally from adding a "soft deprecation" to the
documentation is, that the function is not actually deprecated and not
slated for deprecation. It is not up to me to decide to soft deprecate
something.
And regarding your remark of not reaching folks, I agree. uniqid()
's
documentation is already full of warnings, but folks still reach for it.
Maybe, just maybe, it might be a good idea to repurpose
E_STRICT
for
such things. Basically a three step deprecation: first document that a
feature is obsolete, then trigger E_STRICT, and only then E_DEPRECATED.
I haven't really thought this through, though.
I've proposed a deprecation instead of any other error level, because it
is the least severe error level we have available: Libraries and
Frameworks nowadays generally understand that deprecations are not hard
errors and thus do not convert these to exceptions, but instead just
direct them to a different log file / show them in the framework's
debugging toolkit.
Best regards
Tim Düsterhus
What prevented me personally from adding a "soft deprecation" to the
documentation is, that the function is not actually deprecated and not
slated for deprecation. It is not up to me to decide to soft deprecate
something.
Well, yeah, that needs some discussion at least.
And regarding your remark of not reaching folks, I agree.
uniqid()
's
documentation is already full of warnings, but folks still reach for it.
"I haven't looked at the documentation, because I know what uniqid()
does: it gives me a unique ID. I've read that 20 years ago in a great
tutorial." ;)
I've proposed a deprecation instead of any other error level, because it
is the least severe error level we have available: Libraries and
Frameworks nowadays generally understand that deprecations are not hard
errors and thus do not convert these to exceptions, but instead just
direct them to a different log file / show them in the framework's
debugging toolkit.
Thinking about it, deprecations are not really a problem, so no need for
E_STRICT, that would unlikely help. The actual problem is that some
projects try to be strict about deprecations (and would likely be about
E_STRICT
either), and others wait until the functionality is no longer
availabe, anyway. There is probably no way to offer users a smoother
upgrade path.
Cheers,
Christoph
Hi
One thing to remind people about, the deprecations for
md5()
,sha1()
,
anduniqid()
explicitly say they cannot be outright removed before PHP
10. That's at least 6 years away. That gives a loooooong time for
documentation, tutorials, instructions, and code to be updated.It also gives a loooooong time for us to update that documentation before we start raising deprecation notices, so that there's a chance for someone to actually know what they're supposed to do about it.
Part of the motivation of the deprecation (and my argument against the
addition of a standalone sha256() function) is simplifying the
documentation: Everything needs to be written down in multiple different
places, any changes to hash_file()
will likely also need to be applied
to md5_file()
and sha1_file()
- and then it will need to be translated.
Given that the md5()
, sha1()
, md5_file()
, and sha1_file()
functions are
not part of the hash extension, it's also much harder for the user to
discover the incremental hashing functionality provided by hash_init()
.
It's much much easier to keep the documentation in a good shape if there
is a single place.
I did some updates to the documentation before this RFC went to vote,
though (and did additional ones in response to this discussion). Here
are my PRs:
https://github.com/php/doc-en/pulls?q=is%3Apr+author%3ATimWolla+hash+is%3Aclosed
To summarize the changes:
- I've completely rewritten the documentation of
hash_equals()
. - I updated the examples for the hash_*() functions to use 'sha256' and
to be synchronized across the different functions to showcase how the
different functions all result in the same output, given the same input. - I cleaned up the "See Also" section to make the references from
md5()
/sha1() tohash()
a "one-way street". Once you discovered the
hash()
functions, you shouldn't needmd5()
andsha1()
. - I've removed the broken algorithms from the $algo parameter
explanation, leaving only 'sha256' as the opinionated example (already
merged, but not yet deployed).
Best regards
Tim Düsterhus
Part of the motivation of the deprecation (and my argument against the
addition of a standalone sha256() function) is simplifying the
documentation: Everything needs to be written down in multiple
different places, any changes tohash_file()
will likely also need to
be applied tomd5_file()
andsha1_file()
- and then it will need to be
translated.
We already have a solution for this: named snippets (implemented as XML
entities), which exist in the source once, are translated once, and then
inserted into every page that applies when it's rendered to HTML.
For instance, the yellow "Caution" box on pages like
https://www.php.net/rand is defined once in "language-snippets.ent", and
can be inserted into any page with &caution.cryptographically-insecure;
Given that the
md5()
,sha1()
,md5_file()
, andsha1_file()
functions
are not part of the hash extension, it's also much harder for the user
to discover the incremental hashing functionality provided by
hash_init()
.
Again, that seems easily solved: ext/hash is now always-on, so moving
the functions there from ext/standard would have no effect on users.
--
Rowan Tommins
[IMSoP]
Part of the motivation of the deprecation (and my argument against the
addition of a standalone sha256() function) is simplifying the
documentation: Everything needs to be written down in multiple
different places, any changes tohash_file()
will likely also need to
be applied tomd5_file()
andsha1_file()
- and then it will need to be
translated.We already have a solution for this: named snippets (implemented as XML
entities), which exist in the source once, are translated once, and then
inserted into every page that applies when it's rendered to HTML.
These entities are of limited use, though, since they are not
parametrizable, so you always get exactly the same text, and may need
ugly workarounds to even use entities for reusable text snippets.
And lots of entities can make the sources of the documentation pretty
hard to read.
Cheers,
Christoph
One thing to remind people about, the deprecations for
md5()
,sha1()
, anduniqid()
explicitly say they cannot be outright removed before PHP 10. That's at least 6 years away. That gives a loooooong time for documentation, tutorials, instructions, and code to be updated.
Considering that the hash()
function was introduced in PHP 5.1.2
(January 2006) and password_*() in PHP 5.5 (June 2013), I don't share
your optimism about tutorials being updated within six years ....
Hi
Considering that the
hash()
function was introduced in PHP 5.1.2
(January 2006) and password_*() in PHP 5.5 (June 2013), I don't share
your optimism about tutorials being updated within six years ....
I attribute that to the fact that there was no indication that the
tutorials would need updating: The code continued to work without
showing any warnings or errors.
As long as users click the tutorial and thus see the embedded ads and
advertising revenue comes in, there is no need to change anything ;-)
Best regards
Tim Düsterhus
Yet again the PHP community doesn't care about security of its users, current and future, and just prefers the convenience of needing to type less characters and not go back fix some code for better design.
This is a gross misrepresentation of what people are saying. I am in favour of the aim of educating users to use better hashing functions, but I don't agree that the proposed deprecation is the right way to achieve that aim.
Maybe some people who already know SHA1 is outdated will be prompted to say "huh, I hadn't realised we used it there, let's add a backlog task to migrate to something else". But just as likely they'll do that during a security audit anyway.
The people you really want to reach, those who don't know much about it, will do a find-and-replace from "sha1(" to "hash('sha1', " and gain nothing.
The deprecation might make sense alongside introducing some new functions that we want people to discover instead, but on its own, I don't think the benefits outweigh the costs.
Regards,
Rowan Tommins
[IMSoP]
Stephen Rees-Carter, a security expert that has performed countless security audits on Wordpress and Laravel websites, would like to disagree with the fact that it is not enough of a good reason. [1]
People who work in emergency rooms think that motorcycles are the ultimate evil and should be banned, because emergency room workers are the ones who see all of the carnage of the small percent who wreck their motorcycles, and they see none of motorcycling's upsides.
Similarly, security experts see everything through the lens of security issues, because they see the problems FAR more often than everyone else. And as security expertise, they don't see code through other lenses where security is not an issue.
Not saying the input of a security experts is not useful, but one man's input is only one side of the story, just like emergency room workers vs. motorcycles.
Yet again the PHP community doesn't care about security of its users, current and future, and just prefers the convenience of needing to type less characters and not go back fix some code for better design.
Explicitly stated, that is a straw man argument, which Rowan already called out.
Different people weight risks, costs, and benefits differently, and just because you might feel your approach for addressing security concerns should eclipse anyone else's approach and all other concerns does not mean your approach exists at the peak of the moral high ground.
Every time PHP deprecates software it places the burden and the cost of remediation on anyone and everyone who continues to use the software that requires the deprecated items. Those who are zealously security-first generally dismiss those burdens and cost of remediation — because they do not have to be burdened by then nor pay the costs — and so they shift them to everyone, including those who are using functions properly.
Those more pragmatic balance that burden and cost with the potential burden and costs that deprecating can impose. And in the case of md5()
where public code on GitHub shows almost 1 million uses, that imposed burden and cost is pretty large.
But ignoring the burden and cost, is it strongly arguable that deprecating md5()
wouldn't even fix the security problems in most cases as those you most want to force to fix things will the ones more likely to just create a polyfill and move on. As many has already stated on this thread.
Kudos to Tim Düsterhus for identifying https://www.phptutorial.net/php-tutorial/php-csrf/ and https://www.php-einfach.de/php-tutorial/die-wichtigsten-php-funktionen/ but his takeaway for an action item was less inspiring. He argued those articles support deprecations when it seems to me the more obvious takeaway after finding those articles would be to reach out to those websites — as well as others publishing insecure information — and provide them with updated content to replace the content they are currently publishing with content that is promotes secure practices. Getting those websites updated is likely to have far more positive impact for new PHP developers learning to do things "the right way" then forcing them to update their code where they'll likely just use hash("md5").
Further, rather than shift the burden of remediation to everyone else, why not write a crawler that can automatically and proactively submit PRs to all the code out there using md5()
, etc. so that most people only need to accept the PR to update their code, and make is available as a CLI for internal use? I know it is not that simple to remediate, but who do you expect will know how to do that better than those on PHP internals. Certainly not most GitHub repo owners. Besides, the PR could say "Review your code we are proposing the change, and if you are confident that your uses are secure then do not apply this PR. But if you are not sure they are secure then just apply the PR, test it, and then you'll certainly be safer."
Rather than just take a low-effort, feel-good action for security theater, if the PHP community REALLY cares about security for its users it would take a pro-active, higher-effort approach to addressing the concern. The WordPress community implemented at least one successful technology-supported "marketing" campaign to move its user base in the past, one of which was the "Serve Happy" campaign to get people to update their version of PHP (how ironic!):
https://make.wordpress.org/core/features/servehappy/
Why not create a working group to promote a "SERVE SECURELY" campaign modeled after WordPress's "Serve Happy" campaign, and do your best to help people remediate their security issues? Hell, imagine the free press and industry-wide exposure that such as campaign would provide as a way to educate PHP programmers on the dangers of misusing md5()
and other insecure approaches?
It is also strongly possible you could even get significant sponsorship for such as campaign to pay for some more developer time to address the problem. It almost certainly could be seen as a feel-good thing for big industry players to support.
Frankly, if the pro-deprecation voters in the PHP community are not willing to pursue an initiative that proactively seeks to help users remediate and educate users about security concerns then I would argue they do not really care about security of PHP users but instead are only willing to paying lip service to it. #fwiw
TLDR;? Use a carrot, not a stick.
-Mike
Kudos to Tim Düsterhus for identifying https://www.phptutorial.net/php-tutorial/php-csrf/ and https://www.php-einfach.de/php-tutorial/die-wichtigsten-php-funktionen/ but his takeaway for an action item was less inspiring. He argued those articles support deprecations when it seems to me the more obvious takeaway after finding those articles would be to reach out to those websites — as well as others publishing insecure information — and provide them with updated content to replace the content they are currently publishing with content that is promotes secure practices. Getting those websites updated is likely to have far more positive impact for new PHP developers learning to do things "the right way" then forcing them to update their code where they'll likely just use hash("md5").
As a quick follow up:
https://www.phptutorial.net/contact/
And:
https://www.php-einfach.de/author/nils/
https://www.nils-reimers.de/contact/
-Mike
Frankly, if the pro-deprecation voters in the PHP community are not
willing to pursue an initiative that proactively seeks to help users
remediate and educate users about security concerns then I would argue
they do not really care about security of PHP users but instead are only
willing to paying lip service to it. #fwiwTLDR;? Use a carrot, not a stick.
-Mike
Thanks Mike, I see you have already made a very similar point to the one I
just sent out, but quite a bit more eloquently!
The deprecation arguments seem almost academic to me.
Thanks,
Peter
Hi
How prevalent is this exactly? PHP 4 ended support in 2008. I think
putting warning labels on these things in the docs is enough, but we can't
go around locking up every kitchen knife just because there are some idiots
out there who read a book from the 50s about the war.
I just Googled "PHP tutorial" and found https://www.phptutorial.net/ as
the second search result, which considers itself to be "the modern PHP
tutorial".
I've clicked at the CSRF section
(https://www.phptutorial.net/php-tutorial/php-csrf/) and what do I find:
$_SESSION['token'] = md5(uniqid(mt_rand(), true));
Exactly the md5-uniqid construction that is called out as unsafe in
the RFC and used in a security context.
Further down on the first page I find
https://www.tutorialspoint.com/php/php_mysql_login.htm, which does not
even hash the passwords that are stored within the database. At least
it's using mysqli_real_escape_string()
.
Then I have the German php-einfach.de, which on
https://www.php-einfach.de/php-tutorial/die-wichtigsten-php-funktionen/
("the most important PHP functions") lists md5()
and sha1()
as an
important function, but does not mention hash()
at all.
I'm sure I would find quite a few more, but I believe those already
support the point I was trying to make.
And like I said previously, this change isn't what is going to determine if
those people will write good, reliable, secure code. If their learning
insticast can't see past a blog tutorial from 20 years ago, not even to
look up the function in the manual, they will not ever achieve that.
I think you are expecting a little too much from a beginner that is
following "the modern PHP tutorial" if you expect them to critically
question whether the tutorial is actually good or not. They are likely
already struggling with syntax and explaining the difference between
"if" and "while". You wouldn't believe how often I've heard the term
"if-Schleife" (if loop) in German.
I'm positive that even existing projects written by experienced
developers would benefit from re-checking if their use of MD5 and SHA-1
is actually safe instead of assuming that this is the case, when the
specific functionality has been untouched for the last 10 years.You can say this about pretty much every software project in existence,
regarding anything. I just don't think it's up to PHP to mandate these
checks. If you want to create a fund for developers to go review their
code on the clock, fine, but don't force it on them. Might as well
A deprecation is not forcing anything. It's an indicator that whatever
you are doing might not be the best current practice. You are free to
ignore it and in this specific case you are not even at the risk of
removal, because the RFC does not propose the removal.
Looking back at my own code, I'm seeing places where using SHA-1 is not
strictly insecure, but where a stronger hash function nevertheless would
have been more appropriate, if only to simplify code audits. I just used
sha1()
, because it was temptingly convenient compared to hash('sha256', …).sha1 was the "proper" alternative to md5, until it wasn't. md5
Right, technology advances and security is a moving target. What is the
point you are trying to make?
superceeded crc32, which btw, why isn't that on the hit-list?
CRC32 does not claim to be a cryptographically secure hash algorithm.
Its use case is completely different.
You're using sha256? It's soooo outdated, use sha512 and key it with hmac,
you casual /s
I'm seeing the sarcasm indicator, but I'm compelled to point out that
SHA-256 and SHA-512 are both SHA-2. If one is broken, it is likely that
the other is as well.
SHA-1 is a deterministic algorithm, thus it is unable to generate a
random UID. Whatever this code is doing can most likely be more reliably
achieved in a different way.ALL hashing functions are deterministic. That's the whole point, and
Yes, that's why I asked how they are using a hash function to get a
random result.
applies to sha256 just the same. You want to be able to hash the same
content and get the same hash. Just the complexity and chance of
collision changes. The reliability and security you are concerned with in
this scenario really depends on what randomness you feed it.
My point was that if you already have randomness then you don't need to
pair it with a hash function. You don't gain any randomness by passing
it through a hash function. Just convert the randomness to a readable
representation using bin2hex, base64_encode or by using
Randomizer::getBytesFromString().
Best regards
Tim Düsterhus
Hi
How prevalent is this exactly? PHP 4 ended support in 2008. I think
putting warning labels on these things in the docs is enough, but we can't
go around locking up every kitchen knife just because there are some idiots
out there who read a book from the 50s about the war.I just Googled "PHP tutorial" and found https://www.phptutorial.net/ as
the second search result, which considers itself to be "the modern PHP
tutorial".I've clicked at the CSRF section
(https://www.phptutorial.net/php-tutorial/php-csrf/) and what do I find:$_SESSION['token'] = md5(uniqid(mt_rand(), true));
Exactly the md5-uniqid construction that is called out as unsafe in
the RFC and used in a security context.
In regards to hashing, this is likely fine; for now. There still isn't an arbitrary pre-image attack on md5 (that I'm aware of). Can you create a random file with a matching hash? Yes, in a few seconds, on modern hardware. But you cannot yet make it have arbitrary contents in our lifetime. The NSA probably has something like this though, but if so, this isn't widely known.
That being said, this is just randomly creating a random id without leaking it's internal construction, no different than putting an md5 in a UUID-v8. The real issue here is the use of uniqid()
and rand()
, making it quite likely (at scale, at least) that a session id will overlap with another session id.
— Rob
HI
$_SESSION['token'] = md5(uniqid(mt_rand(), true));
Exactly the md5-uniqid construction that is called out as unsafe in
the RFC and used in a security context.In regards to hashing, this is likely fine; for now. There still isn't an arbitrary pre-image attack on md5 (that I'm aware of). Can you create a random file with a matching hash? Yes, in a few seconds, on modern hardware. But you cannot yet make it have arbitrary contents in our lifetime. The NSA probably has something like this though, but if so, this isn't widely known.
Neither collision-, nor pre-image resistance is relevant here. The
attack vector is a brute force attack / an attacker guessing the token
rather than the token's contents.
That being said, this is just randomly creating a random id without leaking it's internal construction, no different than putting an md5 in a UUID-v8. The real issue here is the use of
uniqid()
andrand()
, making it quite likely (at scale, at least) that a session id will overlap with another session id.
The point is that it showcases a fundamental misunderstanding of what
MD5 (or really any other hash algorithm) does for you. The application
of the MD5 does not make the token more random or more unique or
whatever positive adjective you would like to use. It would be equally
strong (or rather weak) if the output of uniqid(mt_rand(), true)
was
used directly.
As per Kerckhoffs's principle, the security of the algorithm must not
rely on the attacker not knowing how it's implemented. Given how
prevalent constructions like the above are, an attacker could make an
educated guess about how it looks like and match their own token against
a precomputed table to find out if it matches.
Best regards
Tim Düsterhus
HI
$_SESSION['token'] = md5(uniqid(mt_rand(), true));
Exactly the md5-uniqid construction that is called out as unsafe in
the RFC and used in a security context.In regards to hashing, this is likely fine; for now. There still isn't an arbitrary pre-image attack on md5 (that I'm aware of). Can you create a random file with a matching hash? Yes, in a few seconds, on modern hardware. But you cannot yet make it have arbitrary contents in our lifetime. The NSA probably has something like this though, but if so, this isn't widely known.
Neither collision-, nor pre-image resistance is relevant here. The
attack vector is a brute force attack / an attacker guessing the token
rather than the token's contents.
You do realize that GUID and md5 hashes are the same size? One does not simply "guess" a GUID or an md5 hash. gravatar used md5 until a couple of years ago, and had millions? billions? of emails addresses and zero collisions.
That being said, this is just randomly creating a random id without leaking it's internal construction, no different than putting an md5 in a UUID-v8. The real issue here is the use of
uniqid()
andrand()
, making it quite likely (at scale, at least) that a session id will overlap with another session id.The point is that it showcases a fundamental misunderstanding of what
MD5 (or really any other hash algorithm) does for you. The application
of the MD5 does not make the token more random or more unique or
whatever positive adjective you would like to use. It would be equally
strong (or rather weak) if the output ofuniqid(mt_rand(), true)
was
used directly.
Yes, it does, but probably not how you think. It would be much weaker to leak the internal construction (uniqid(mt_rand(), true)) because then someone could literally guess a working id if they knew when the id was generated (depending on the size of mt_rand, rate limits, etc).
By wrapping it in an md5, it is literally unguessable how it is constructed, but the construction is still crap in this case.
As per Kerckhoffs's principle, the security of the algorithm must not
rely on the attacker not knowing how it's implemented. Given how
prevalent constructions like the above are, an attacker could make an
educated guess about how it looks like and match their own token against
a precomputed table to find out if it matches.
In this example, an ID is being constructed. If it needs uniqueness, the ID is being constructed incorrectly, but if you could argue that a GUID would fit the bill here, md5 has more "entropy" than a GUIDv4. But due to how the md5 is constructed, it actually has less entropy. So, I think we both can agree that the construction is crap. However, the usage of md5 doesn't matter here. If it really bothers you, craft a GUIDv8 from it.
But to Kerckhoffs's principle, that is in regards to encryption ... this is not encryption.
— Rob
In regards to hashing, this is likely fine; for now. There still
isn't an arbitrary pre-image attack on md5 (that I'm aware of). Can
you create a random file with a matching hash? Yes, in a few seconds,
on modern hardware. But you cannot yet make it have arbitrary
contents in our lifetime. The NSA probably has something like this
though, but if so, this isn't widely known.
The NSA likely owns "Let's Encrypt" and can therefore MitM every TLS
site on the internet.
If the problem is that the web is full of bad documentation, find or
write some GOOD documentation. Then, work out how best to signpost
users to that documentation. Deprecatingmd5()
andsha1()
does
neither.
This. I'm not going to quote everything, but I read through the
comments from today and would say this:
-
This seems very much like the people in support of these
deprecations are trying to push PHP to enforce policy on developers,
rather than simply providing tools. -
PHP should provide good documentation, but should not try to force
every user to do something "best practice" by renaming functions. -
If a websever/host updates the PHP version and the code breaks, the
last thing a dev is looking for is "what's the best practice to
refactor this code".
The dev is thinking, "our site is down, the boss/client is angry,
what's the fastest band-aid I can slap on this to get the site up
again".
Thus:
Provide tools, not policy.
Provide good documentation.
--
Nick
I think you are expecting a little too much from a beginner that is
following "the modern PHP tutorial" if you expect them to critically
question whether the tutorial is actually good or not. They are likely
already struggling with syntax and explaining the difference between
"if" and "while". You wouldn't believe how often I've heard the term
"if-Schleife" (if loop) in German.
I think you are expecting a little too much from a beginner if you think they will see the message "md5() is deprecated", and research up to date advice on hashing algorithms, rather than asking ChatGPT how to make the code work, and replacing it with "hash('md5', ...)".
CRC32 does not claim to be a cryptographically secure hash algorithm.
Its use case is completely different.
As an inexperienced user looking at the PHP manual for hash()
and hash_algos()
, how would I know that? It's right there in the list, just after something called "adler32".
I'm seeing the sarcasm indicator, but I'm compelled to point out that
SHA-256 and SHA-512 are both SHA-2. If one is broken, it is likely that
the other is as well.
Again, you know that, but do the users you're trying to help by deprecating sha1()
? I'm a reasonably experienced developer, and I have no idea why SHA-512 would exist if it's not in some way "better" than SHA-256.
Regards,
Rowan Tommins
[IMSoP]
CRC32 does not claim to be a cryptographically secure hash algorithm.
Its use case is completely different.As an inexperienced user looking at the PHP manual for
hash()
andhash_algos()
, how would I know that? It's right there in the list, just after something called "adler32".
Well, you are supposed to also check the hash_hmac()
documentation,
where a changelog entry for 7.2.0 states:
| Usage of non-cryptographic hash functions (adler32, crc32, crc32b,
| fnv132, fnv1a32, fnv164, fnv1a64, joaat) was disabled.
Or maybe we should fix https://github.com/php/doc-en/issues/3616.
Cheers,
Christoph
Hi
I think you are expecting a little too much from a beginner that is
following "the modern PHP tutorial" if you expect them to critically
question whether the tutorial is actually good or not. They are likely
already struggling with syntax and explaining the difference between
"if" and "while". You wouldn't believe how often I've heard the term
"if-Schleife" (if loop) in German.I think you are expecting a little too much from a beginner if you think they will see the message "md5() is deprecated", and research up to date advice on hashing algorithms, rather than asking ChatGPT how to make the code work, and replacing it with "hash('md5', ...)".
I am not expecting that from a beginner. I am expecting two things:
-
That the beginner switches to a tutorial that does not emit any error
messages or warnings, because they realize that the tutorial is not as
good as it claims to be. -
That (1) leads to the outdated tutorials falling out of favor with
regards to search engines or alternatively that the outdated tutorials
are updated to no longer be outdated.
CRC32 does not claim to be a cryptographically secure hash algorithm.
Its use case is completely different.As an inexperienced user looking at the PHP manual for
hash()
andhash_algos()
, how would I know that? It's right there in the list, just after something called "adler32".
I expect the inexperienced user to look at existing tutorials or code
snippets, rather than the reference documentation: If they are
inexperienced, they would not even know what to look for in the
documentation.
That also ties into (1): I hope that the deprecation results in better
tutorials / making bad tutorials less attractive.
Of course that doesn't mean we shouldn't improve the documentation, and
I'm seeing that Christoph and Jim already started doing so.
I'm seeing the sarcasm indicator, but I'm compelled to point out that
SHA-256 and SHA-512 are both SHA-2. If one is broken, it is likely that
the other is as well.Again, you know that, but do the users you're trying to help by deprecating
sha1()
? I'm a reasonably experienced developer, and I have no idea why SHA-512 would exist if it's not in some way "better" than SHA-256.
See above. Also: Really any choice from the SHA-2 or SHA-3 family is
better than both MD5 and SHA-1 and I would expect users to generally
gravitate towards things they have heard about before and you really
need to try hard not to have heard about SHA-256 before. Wikipedia is
also helpful regarding this topic.
Best regards
Tim Düsterhus
I just Googled "PHP tutorial" and found https://www.phptutorial.net/ as
the second search result, which considers itself to be "the modern PHP
tutorial".I've clicked at the CSRF section
(https://www.phptutorial.net/php-tutorial/php-csrf/) and what do I find:$_SESSION['token'] = md5(uniqid(mt_rand(), true));
Exactly the md5-uniqid construction that is called out as unsafe in
the RFC and used in a security context.Further down on the first page I find
https://www.tutorialspoint.com/php/php_mysql_login.htm, which does not
even hash the passwords that are stored within the database. At least
it's usingmysqli_real_escape_string()
.Then I have the German php-einfach.de, which on
https://www.php-einfach.de/php-tutorial/die-wichtigsten-php-funktionen/
("the most important PHP functions") listsmd5()
andsha1()
as an
important function, but does not mentionhash()
at all.I'm sure I would find quite a few more, but I believe those already
support the point I was trying to make.
I don't think the examples you provided support the argument for
deprecating these functions. If anything, they highlight the real problem:
outdated tutorials being prominently featured in search results. As you
mentioned, the MySQL login one doesn't even use a hashing function, so
deprecating md5 and sha1 functions would do nothing to fix that!
And how are these the top results? Are you telling me that the PHP
community can't create better websites and SEO than these ancient tutorials?
If someone encounters a problem because they can't use the md5()
function,
they're likely to Google it and find a simple workaround like "just paste
this code and it'll work again." mentioned above. That would be just like
this deprecation proposal: identifying the wrong solution to the actual
problem.
The real question is, why aren't there better, more up-to-date resources
easily available for someone wanting to learn PHP in 2024? We're the PHP
community, we should be leading the web and SEO. Yet most people looking to
get into webdev today aren't reaching for PHP. I've seen recent videos
where developers are positively surprised by PHP's modern features. But
can we blame them for being surprised if these are the top tutorials out
there?
Deprecating these functions isn't addressing the core issue. The focus
should be on making it easy for new learners to access up-to-date tutorials.
Thanks,
Peter
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
Best regards,
Gina P. Banyard
The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a potential
future conflict in the pattern matching RFC, where it can be used as a
wildcard.
However, I see no mention of this character as a wildcard anywhere in that
RFC.
Can somebody clarify?
--
Matthew Weier O'Phinney
mweierophinney@gmail.com
https://mwop.net/
he/him
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
Best regards,
Gina P. Banyard
The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a
potential future conflict in the pattern matching RFC, where it can be
used as a wildcard.However, I see no mention of this character as a wildcard anywhere in that RFC.
Can somebody clarify?
The pattern matching RFC previously listed _ as a wildcard character.
In the discussion a month ago, someone pointed out that mixed
already serves that exact purpose, so having an extra wildcard was removed.
However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
That's the context/background here. Whether that encourages you to vote for or against that section I leave as an exercise for the reader.
--Larry Garfield
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a
potential future conflict in the pattern matching RFC, where it can be
used as a wildcard.However, I see no mention of this character as a wildcard anywhere in that RFC.
Can somebody clarify?
The pattern matching RFC previously listed _ as a wildcard character.
In the discussion a month ago, someone pointed out that
mixed
already serves that exact purpose, so having an extra wildcard was removed.However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
That's the context/background here. Whether that encourages you to vote for or against that section I leave as an exercise for the reader.
Well, I wonder how that is supposed to work. Assuming the underscore
would be used as wildcard in a class name context, that could only be
done after using that character as class name is no longer allowed. So
that would have to wait for the next major PHP version (at least).
Note that I'm not worried about no longer being able to use an
underscore as class name, but rather that this introduces another
inconsistency to our indentifiers. Disallowing an underscore as
function name is obviously off the table, thanks to gettext.
Christoph
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a
potential future conflict in the pattern matching RFC, where it can be
used as a wildcard.However, I see no mention of this character as a wildcard anywhere in that RFC.
Can somebody clarify?
The pattern matching RFC previously listed _ as a wildcard character.
In the discussion a month ago, someone pointed out that
mixed
already serves that exact purpose, so having an extra wildcard was removed.However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
That's the context/background here. Whether that encourages you to vote for or against that section I leave as an exercise for the reader.
Well, I wonder how that is supposed to work. Assuming the underscore
would be used as wildcard in a class name context, that could only be
done after using that character as class name is no longer allowed. So
that would have to wait for the next major PHP version (at least).Note that I'm not worried about no longer being able to use an
underscore as class name, but rather that this introduces another
inconsistency to our indentifiers. Disallowing an underscore as
function name is obviously off the table, thanks to gettext.Christoph
I think someone checked and found no examples of someone using _ as a class name, so the impact of removing it and/or using it for something else would be nearly nil. That may still push _ as a wildcard out to a future version, but I leave that up to others. As I said, I don't have strong feelings either way.
--Larry Garfield
On Tue, Jul 23, 2024 at 9:06 AM Larry Garfield larry@garfieldtech.com
wrote:
On Fri, Jul 19, 2024 at 12:41 PM Gina P. Banyard internals@gpb.moe
wrote:Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
Best regards,
Gina P. Banyard
The section "Deprecate using a single underscore ''_'' as a class name"
indicates that probably the primary reason to deprecate it is a
potential future conflict in the pattern matching RFC, where it can be
used as a wildcard.However, I see no mention of this character as a wildcard anywhere in
that RFC.Can somebody clarify?
The pattern matching RFC previously listed _ as a wildcard character.
In the discussion a month ago, someone pointed out that
mixed
already
serves that exact purpose, so having an extra wildcard was removed.However, a few people indicated a desire to have an explicit wildcard _
anyway, even if it's redundant, as it's a more common and standard approach
in other languages. We've indicated that we are open to making that an
optional secondary vote in the pattern matching RFC if there's enough
interest (it would be trivial), though I haven't bothered to add it to the
RFC text yet.Having _ available could also be used in other "wildcard" or "ignore this"
cases, like exploding into a list assignment or similar, though I don't
believe that has been fully explored.
Can you provide examples of what that usage would look like? And the
question I have really is, does this actually require using "_", or could
another token be used for such matches?
--
Matthew Weier O'Phinney
mweierophinney@gmail.com
https://mwop.net/
he/him
However, a few people indicated a desire to have an explicit wildcard _ anyway, even if it's redundant, as it's a more common and standard approach in other languages. We've indicated that we are open to making that an optional secondary vote in the pattern matching RFC if there's enough interest (it would be trivial), though I haven't bothered to add it to the RFC text yet.
Having _ available could also be used in other "wildcard" or "ignore this" cases, like exploding into a list assignment or similar, though I don't believe that has been fully explored.
Can you provide examples of what that usage would look like? And the
question I have really is, does this actually require using "_", or
could another token be used for such matches?
Hypothetical pattern matching example:
$foo is ['a' => int, 'b' => $b, 'c' => mixed];
That would assert that there's 3 keys. "a" may be any integer (but only an integer), "b" can be anything and will be captured to a variable, and "c" must be defined but we don't care what it is.
The suggestion is to basically alias _ to "mixed" for pattern purposes:
$foo is ['a' => int, 'b' => $b, 'c' => _];
As "there's a var here but I don't care what it is, ignore it" is a common meaning of _ in other languages. But that would need to be disambiguated from a pattern saying "c must be an instance of the class _". Technically any symbol/set of symbols could be used there (as it's just an alias to mixed, which has the exact same effect), but _ is a common choice in other languages.
In theory, that could be expanded in the future to something like (note: this hasn't been seriously discussed that I know of, I'm just spitballing randomly):
[$a, $b, _] = explode(':', 'foo:bar:baz');
To assign $a = "foo", $b to "bar", and just ignore "baz". Which might cause parser issues if _ is a legal class name, I'm not sure. There's probably other "ignore this" cases we could come up with, but I haven't actually thought about it.
Again, whether any of the above is a compelling argument or not I leave as an exercise for the reader.
--Larry Garfield
Hypothetical pattern matching example:
$foo is ['a' => int, 'b' => $b, 'c' => mixed];
That would assert that there's 3 keys. "a" may be any integer (but only an integer), "b" can be anything and will be captured to a variable, and "c" must be defined but we don't care what it is.
The suggestion is to basically alias _ to "mixed" for pattern purposes:
$foo is ['a' => int, 'b' => $b, 'c' => _];
As "there's a var here but I don't care what it is, ignore it" is a common meaning of _ in other languages. But that would need to be disambiguated from a pattern saying "c must be an instance of the class _". Technically any symbol/set of symbols could be used there (as it's just an alias to mixed, which has the exact same effect), but _ is a common choice in other languages.
I do not see this use-case as compelling.
mixed
is perfectly sufficient and using _
for a data types just gives two ways to do the same. Not that multiple ways to do the same thing is necessarily wrong, but I think it needs a better justification than just to save characters. Besides, it has the potential to confuse people as to its exact meaning whereas mixed
does not.
OTOH, if you really want to say characters — albeit not as many — then choose any
, which is certainly less likely to be confusing and has an analog in Go, TypeScript, and Python, at least.
Also, AFAIK, few (no?) other languages actually allow for using _
for types, they only allow using them for variables. I know that to be the case for Go, and if I understand the docs correctly it is also true for Rust, Zig, Haskell and Swift, with caveats for Rust.
- Rust allows underscore for type inference, e.g.: let x: _ = "Hello, world!";
- Also for a Generics' type placeholder, e.g.: let vec: Vec<_> = vec![1, 2, 3];
- But as for Rust pattern matching, the underscore is only used for values, not types.
For any other languages, I cannot say.
In theory, that could be expanded in the future to something like (note: this hasn't been seriously discussed that I know of, I'm just spitballing randomly):
[$a, $b, _] = explode(':', 'foo:bar:baz');
This is actually where a "blank" variable represented by _
actually makes a lot of sense. It is also how Go and Zig use them and effectively also how Rust, Haskell, and Swift use them.
Unlike for types where we have mixed
, there is no current globally consistent alternate to using a blank variable in PHP. The only option is to use an arbitrary name that other developers won't know the intention of unless the developer adds comments to the effect.
In summary, although I don't have strong feelings about deprecating classes named _
, I do not think the arguments made for disallowing them actually have any analog in any other languages so I question if there is valid justification for the deprecation. #jmtcw #fwiw
-Mike
Hello internals,
I have opened the vote for the mega deprecation RFC:
https://wiki.php.net/rfc/deprecations_php_8_4Reminder, each vote must be submitted individually.
2 days late but I have now closed the vote for all the RFC proposals.
The ones that have been accepted will be implemented in due course.
Best regards,
Gina P. Banyard