1 year ago by Marco Pivetta — view source

unread

Hey Gina, Tim,

I agree with most of these deprecations, except:

uniqid(), in my case (XKCD 1172) is largely used for quickly
generating a semi-random string for test purposes: a suitable replacement
PRNG implementation would be welcome. Even refactoring with tools like
Rector will lead to quite messy code, or added dependencies. IMO fine to
get rid of this specific implementation, if a safe function is
provided, such as random_ascii_string() or such (dunno, just a hint)
md5(), sha1() - OK-ish with moving to hash('<algo>', ...), but
while these are insecure for most use-cases, they are part of the domain of
many tools, including GIT itself. I can Rector my way out of it, just not
sure these should be hidden into hash(...)

That said, welcome changes :-)

Marco Pivetta

https://mastodon.social/@ocramius

https://ocramius.github.io/

Hello internals,

It is this time of year again where we proposed a list of deprecations to
add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past
year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions,
they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from
the proposal altogether.

Best regards,

Gina P. Banyard

1 year ago by Derick Rethans — view source

unread

Please, no top posting.

I agree with most of these deprecations, except:

uniqid(), in my case (XKCD 1172) is largely used for quickly
generating a semi-random string for test purposes: a suitable replacement
PRNG implementation would be welcome. Even refactoring with tools like
Rector will lead to quite messy code, or added dependencies. IMO fine to
get rid of this specific implementation, if a safe function is
provided, such as random_ascii_string() or such (dunno, just a hint)

md5(), sha1() - OK-ish with moving to hash('<algo>', ...), but
while these are insecure for most use-cases, they are part of the domain of
many tools, including GIT itself. I can Rector my way out of it, just not
sure these should be hidden into hash(...)

I agree on both points.

These functions are indeed often used. The documentation for uniqid has
some nice big warnings too. It can be used in situations, where it is
OK.

Replacing the algorithm underneath uniqid can (and perhaps should) be
looked at.

For the md5 and sha1 functions. Although these algorithms are not useful
for "verifying the authenticity of a a payload", that doesn't mean they
can't be used outside of that use case.

I understand the reasons why you want to "nudge" people to no longer use
these "unsafe" functions, but IMO, in this case, it's not worth the
BC issues.

cheers,
Derick

1 year ago by tim@bastelstu.be — view source

unread

Hi

These functions are indeed often used. The documentation for uniqid has
some nice big warnings too. It can be used in situations, where it is
OK.

Yes, the RFC acknowledges that.

Replacing the algorithm underneath uniqid can (and perhaps should) be
looked at.

As explained in the RFC: The behavior is unfixable, because uniqid() is
documented to be time-based. Changing the output format would be a much
bigger breaking change than deprecating the function and letting users
make an educated decision when choosing an alternative.

For the md5 and sha1 functions. Although these algorithms are not useful
for "verifying the authenticity of a a payload", that doesn't mean they
can't be used outside of that use case.

See my reply to Robert. The algorithms themselves are not going away. At
least not as part of this RFC.

I understand the reasons why you want to "nudge" people to no longer use
these "unsafe" functions, but IMO, in this case, it's not worth the
BC issues.

For both uniqid() and the md5()/sha1() section, the RFC specifically
acknowledges that there is a vast collection of code using it and for
this reason specifically defines an indefinite deprecation period (with
a guarantee that indefinite is at least 5 years / until PHP 10).

Please keep in mind that a deprecation is not an error, not a removal
and not a change in behavior. Users will be able to migrate on their own
pace and for both uniqid() and md5()/sha1() there is a drop-in
replacement provided in the RFC that is guaranteed to work with PHP 7.4+.

Best regards
Tim Düsterhus

1 year ago by Marco Pivetta — view source

unread

uniqid(), in my case (XKCD 1172) is largely used for quickly
generating a semi-random string for test purposes: a suitable replacement
PRNG implementation would be welcome. Even refactoring with tools like
Rector will lead to quite messy code, or added dependencies. IMO fine to
get rid of this specific implementation, if a safe function is
provided, such as random_ascii_string() or such (dunno, just a hint)

Update: Tim gave me a decent alternative that I can live with.

uniqid() becomes bin2hex(random_bytes(16)).

I can live with that :-)

Marco Pivetta

https://mastodon.social/@ocramius

https://ocramius.github.io/

1 year ago by tim@bastelstu.be — view source

unread

Hi

Update: Tim gave me a decent alternative that I can live with.

uniqid() becomes bin2hex(random_bytes(16)).

For context: Marco also pinged me on Roave Discord and I sent a quick
reply from my phone.

I've now added the bin2hex(random_bytes(16)) alternative to the RFC
text:
https://wiki.php.net/rfc/deprecations_php_8_4?do=diff&rev2%5B0%5D=1719336981&rev2%5B1%5D=1719337102&difftype=sidebyside

Best regards
Tim Düsterhus

1 year ago by Rowan Tommins [IMSoP] — view source

unread

* uniqid(), in my case (XKCD 1172) is largely used for quickly
generating a semi-random string for test purposes: a suitable
replacement PRNG implementation would be welcome. Even refactoring
with tools like Rector will lead to quite messy code, or added
dependencies. IMO fine to get rid of this specific implementation,
if a safe function is provided, such as random_ascii_string() or
such (dunno, just a hint)

Agreed, the implementation is weird, but nothing else matches the
convenience to just get "some random printable bytes". As of PHP 8.3, we
finally have Random\Randomizer::getBytesFromString, but the comparison
is pretty stark:

$foo = uniqid();
$foo = (new
\Random\Randomizer)->getBytesFromString('abcdefghijklmnopqrstuvwxyz0123456789',
13);

Alternatively, you have the shorter but slightly cryptic:

$foo = bin2hex(random_bytes(6));

Then again, if you actually want it to be unique, rather than random,
those aren't the right replacements anyway.

I'd love to replace uniqid() with something, but I don't think we have
that thing yet.

--
Rowan Tommins
[IMSoP]

1 year ago by tim@bastelstu.be — view source

unread

Hi

Then again, if you actually want it to be unique, rather than random,
those aren't the right replacements anyway.

They are for all intents and purposes if the generated string is long
enough. By the pigeonhole principle you can't guarantee uniqueness for a
fixed-length string, but when you have 128 bits of entropy you are
statistically all but guaranteed to receive a unique string. I've made
an example calculation for the "session.sid_length and
session.sid_bits_per_character" bit of this very RFC.

The replacement I suggested to Marco bin2hex(random_bytes(16)) does
use exactly 128 bits (16 bytes) of secure randomness for that reason.

For Randomizer::getBytesFromString() you can calculate the entropy as
follows:

 var_dump(log(strlen($string) ** $length, 2));

You can calculate the minimum length to have 128 bits of entropy for a
given alphabet string as follows:

 var_dump(ceil(log(2**128, strlen($string))));

For some example alphabets, the minimum length for 128 bits of entropy
would be:

[0-9] : 39
[0-9a-f] : 32
[a-z] : 28
[a-z0-9] : 25
[a-zA-Z] : 23

Best regards
Tim Düsterhus

1 year ago by Rob Landers — view source

unread

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

My only issue is md5, sha1, etc. There are many uses for them besides secure contexts. Sha1 is used by git, md5 fits snuggly in many data structures (uuidv5, for example, though some implementations also use the first 128 bits of a sha1).

Even though these may be cryptographically weak, they are quite useful and well understood.

— Rob

1 year ago by tim@bastelstu.be — view source

unread

Hi

My only issue is md5, sha1, etc. There are many uses for them besides secure contexts. Sha1 is used by git, md5 fits snuggly in many data structures (uuidv5, for example, though some implementations also use the first 128 bits of a sha1).

I believe you might have misunderstood the proposal. The RFC is not
proposing the MD5 and SHA-1 algorithms. It's just proposing the
standalone functions.

The algorithms will still be available by means of the hash() function,
which - besides MD5 and SHA-1 - provides access to a multitude of other
hash algorithms. The RFC explicitly lists the necessary replacement.

The RFC is just proposing phasing out the special treatment of having
dedicated functions of MD5 and SHA-1.

Best regards
Tim Düsterhus

PS: git is planning to move away from SHA-1.

1 year ago by Mike Schinkel — view source

unread

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4 https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.

`strtok()`

strtok() is found 35k times in GitHub:

https://github.com/search?q=md5%28+language%3APHP+&type=code https://github.com/search?q=md5%28+language%3APHP+&type=code

It is a commonly used as a "left part of string up to a character" in addition to its intended use for tokenizing.

I would prefer not deprecated because of BC breakage, but IF it is deprecated I would suggest adding a one-for-one replacement function for the "left part of string up to a character" use-case; maybe str_left("abc.txt",".") returning "abc".

`md5()`/md5_file()

Just FYI, md5() is found 868k times and md5_file() 29.7k times in GitHub:

https://github.com/search?q=md5%28+language%3APHP+&type=code https://github.com/search?q=md5%28+language%3APHP+&type=code
https://github.com/search?q=md5_file%28+language%3APHP+&type=code https://github.com/search?q=md5_file%28+language%3APHP+&type=code

That is a lot or broken code.

However, if deprecated I would suggest adding insecure_md5() and insecure_md5_file() as a drop-in replacement which would be more obvious and easier than using hash() — so people would be more apt to use it — and that would signal they are obviously using an insecure function which increases the likelihood developers to go to the effort to actually fix the security issues in their code and/or not use md5 for security sensitive code to begin with.

`sha1()`/sha1_file()

sha1() is found 167k times and sha1_file() 6.8k times in GitHub:

https://github.com/search?q=sha1%28+language%3APHP+&type=code https://github.com/search?q=sha1%28+language%3APHP+&type=code
https://github.com/search?q=sha1_file%28+language%3APHP+&type=code https://github.com/search?q=sha1_file%28+language%3APHP+&type=code

Same arguments for md5()/md5_file(), e.g. if deprecated add insecure_sha1() and `insecure_sha1_file().

#jmtcw

-Mike

1 year ago by Gina P. Banyard — view source

unread

strtok()

strtok() is found 35k times in GitHub:

https://github.com/search?q=md5%28+language%3APHP+&type=code

It is a commonly used as a "left part of string up to a character" in addition to its intended use for tokenizing.

I would prefer not deprecated because of BC breakage, but IF it is deprecated I would suggest adding a one-for-one replacement function for the "left part of string up to a character" use-case; maybe str_left("abc.txt",".") returning "abc".

For this exact case of extracting a file name without an extension, you should really just use:

pathinfo

(

$filepath

, PATHINFO_FILENAME);

But for something more generic, you can just do:
explode($delimiter, $str)[0];

So I really don't see why we would need an "str_left()" function.

Best regards,
Gina P. Banyard

1 year ago by Mike Schinkel — view source

unread

strtok()

strtok() is found 35k times in GitHub:

https://github.com/search?q=strtok%28+language%3APHP+&type=code https://github.com/search?q=md5%28+language%3APHP+&type=code

It is a commonly used as a "left part of string up to a character" in addition to its intended use for tokenizing.

I would prefer not deprecated because of BC breakage, but IF it is deprecated I would suggest adding a one-for-one replacement function for the "left part of string up to a character" use-case; maybe str_left("abc.txt",".") returning "abc".

For this exact case of extracting a file name without an extension, you should really just use:
pathinfo($filepath, PATHINFO_FILENAME);
But for something more generic, you can just do:
explode($delimiter, $str)[0];

So I really don't see why we would need an "str_left()" function.

Ah, the dangers of providing a specific example of a broader use-case is that someone will invariably discredit the specific example instead of focusing on the applicability for the broader use-case. 🤦‍♂️

To wit, here are seven (7) use-cases for which pathinfo() is not a viable alternative:

https://3v4l.org/RDYFs#v8.3.8 https://3v4l.org/RDYFs#v8.3.8

Note those seven use-cases are found in around the first 25 results when searching GitHub for "strtok(". I could probably find more if I kept looking:

https://github.com/search?q=strtok%28+language%3APHP+&type=code https://github.com/search?q=strtok%28+language%3APHP+&type=code

Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during compilation —it is a really inefficient way to find the substring up to the first character, especially for large strings and/or when in a tight loop where the explode is contained in a called function.

Here is a benchmark (https://onlinephp.io/c/87341) showing that — on average of the runs I performed — for using strtok() to fully process through a 3972 byte file with 359 commas it took right at 90 times longer using explode($delimiter, $str)[0] vs. strtok($str,$delimiter). Imagine is the file were 39,720 bytes, or larger, instead.

Size of file: 3972
Number of commas: 359
Time taken for strtok: 0.0034 seconds
Time taken for explode: 0.3036 seconds
Times strtok() faster: 89.1

Yes the above processes the entire file using explode()[0] each time rather than first using explode(",") once — because of the equivalent of the N+1 problem[1] where the explode() is buried in a function. This illustrates why strtok() is so good for its primary use-case of parsing text files. strtok() is fast and does not use heaps of memory on every token.

This leads me to think strtok() should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

-Mike
[1] https://www.baeldung.com/cs/orm-n-plus-one-select-problem <https://www.baeldung.com/cs/orm-n-plus-one-select-problem

1 year ago by Dusk — view source

unread

This leads me to think strtok() should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

What would be really useful as a replacement for strtok() - among other things - would be a function analogous to MySQL's SUBSTRING_INDEX():

https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_substring-index

Where SUBSTRING_INDEX($a, $b, $c) is functionally equivalent to explode($a, $b)[$c], but with the added ability to use negative indices to count from the end of the input.

1 year ago by Mike Schinkel — view source

unread

This leads me to think strtok() should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

What would be really useful as a replacement for strtok() - among other things - would be a function analogous to MySQL's SUBSTRING_INDEX():

https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_substring-index

Where SUBSTRING_INDEX($a, $b, $c) is functionally equivalent to explode($a, $b)[$c], but with the added ability to use negative indices to count from the end of the input.

Yes. There are numerous quality-of-life functions like that which would improve PHP DX, performance, and likely security if incorporated into the standard library.

Unfortunately there is a generally antipathy on this list towards adding functions that "can be written in userland" even though relegating them to userland means many people writing, writing about and publishing many different named functions doing similar and often incompatible things, and doing them less efficiently than if the one-time bullet was bitten and they were written in C, added to the docs, and included in core PHP.

#fwiw

-Mike

P.S. And no, SUBSTRING_INDEX($a, $b, $c) would not add a significant maintenance burden. Simple functions are an order of magnitude easier to maintain than, for example, adding new syntax for new language features, or adding a library feature needs to be upgraded in response to an evolution orthogonal to PHP, such as supporting a file format, a protocol or database connector.

1 year ago by Gina P. Banyard — view source

unread

This leads me to think strtok() should not be deprecated given how inefficient string handling in PHP can otherwise be, at least not without a much more efficient object for string parsing.

What would be really useful as a replacement for strtok() - among other things - would be a function analogous to MySQL's SUBSTRING_INDEX():

https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_substring-index

Where SUBSTRING_INDEX($a, $b, $c) is functionally equivalent to explode($a, $b)[$c], but with the added ability to use negative indices to count from the end of the input.

That is a rather interesting function that I did not know existed in MySQL.
I agree this would be useful, and probably should be its own RFC/thread.

Best regards,

Gina P. Banyard

1 year ago by Gina P. Banyard — view source

unread

https://3v4l.org/RDYFs#v8.3.8

Note those seven use-cases are found in around the first 25 results when searching GitHub for "strtok(". I could probably find more if I kept looking:

https://github.com/search?q=strtok%28+language%3APHP+&type=code

Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during compilation —it is a really inefficient way to find the substring up to the first character, especially for large strings and/or when in a tight loop where the explode is contained in a called function

Then use a regex: https://3v4l.org/SGWL5

Or a combination of strpos and substr.

I'd bet that both of these solutions would use less memory, and I would guess the PCRE one should also be better for performance (although not benchmarked) as it is highly specialized in that task.

There are plenty of solutions to the specific problem you pose here, and thus many different solutions more or less appropriate.

Best regards,
Gina P. Banyard

1 year ago by Mike Schinkel — view source

unread

https://3v4l.org/RDYFs#v8.3.8 https://3v4l.org/RDYFs#v8.3.8

Note those seven use-cases are found in around the first 25 results when searching GitHub for "strtok(". I could probably find more if I kept looking:

https://github.com/search?q=strtok%28+language%3APHP+&type=code https://github.com/search?q=strtok%28+language%3APHP+&type=code

Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during compilation —it is a really inefficient way to find the substring up to the first character, especially for large strings and/or when in a tight loop where the explode is contained in a called function

Then use a regex: https://3v4l.org/SGWL5 https://3v4l.org/SGWL5

Using preg_match() instead of strtok() to process the ~4k file of commas is, on average, same as using explode()[0], or 10x as long as using strtok() (at times it got as low as 4.4x, but that was rare):

https://onlinephp.io/c/e1fad https://onlinephp.io/c/e1fad

Size of file: 3972
Number of commas: 359
Time taken for strtok: 0.003 seconds
Time taken for regex: 0.0307 seconds
Times strtok() faster: 10.25

Or a combination of strpos and substr.

Using strpos()+ substr() instead of strtok() to process the ~4k file of commas is, took on average ~3x as long as using strtok(). I implemented a class for this and tried to optimize it by using only string positions and not copying the string repeatedly. It also took about 1/2 hour to get the code working vs. about 15 seconds to get the code working with strtok(); which will most programmers prefer?

https://onlinephp.io/c/2a09f https://onlinephp.io/c/2a09f

Size of file: 3972
Number of commas: 359
Time for strtok: 0.0027 seconds
Time for strpos/substr: 0.0089 seconds
Times strtok() faster: 3.31

There are plenty of solutions to the specific problem you pose here, and thus many different solutions more or less appropriate.

Yes, and in all cases the existing solutions are significantly slower, except one.

And that one solution that is not significantly slower is to not deprecate strtok(). Not to mention not deprecating would keep from causing lots of BC breakage.

-Mike

1 year ago by Stephen Reay — view source

unread

https://3v4l.org/RDYFs#v8.3.8

Note those seven use-cases are found in around the first 25 results when searching GitHub for "strtok(". I could probably find more if I kept looking:

https://github.com/search?q=strtok%28+language%3APHP+&type=code

Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during compilation —it is a really inefficient way to find the substring up to the first character, especially for large strings and/or when in a tight loop where the explode is contained in a called function

Then use a regex: https://3v4l.org/SGWL5

Using preg_match() instead of strtok() to process the ~4k file of commas is, on average, same as using explode()[0], or 10x as long as using strtok() (at times it got as low as 4.4x, but that was rare):

https://onlinephp.io/c/e1fad

Size of file: 3972
Number of commas: 359
Time taken for strtok: 0.003 seconds
Time taken for regex: 0.0307 seconds
Times strtok() faster: 10.25

Or a combination of strpos and substr.

Using strpos()+ substr() instead of strtok() to process the ~4k file of commas is, took on average ~3x as long as using strtok(). I implemented a class for this and tried to optimize it by using only string positions and not copying the string repeatedly. It also took about 1/2 hour to get the code working vs. about 15 seconds to get the code working with strtok(); which will most programmers prefer?

https://onlinephp.io/c/2a09f

Size of file: 3972
Number of commas: 359
Time for strtok: 0.0027 seconds
Time for strpos/substr: 0.0089 seconds
Times strtok() faster: 3.31

There are plenty of solutions to the specific problem you pose here, and thus many different solutions more or less appropriate.

Yes, and in all cases the existing solutions are significantly slower, except one.

And that one solution that is not significantly slower is to not deprecate strtok(). Not to mention not deprecating would keep from causing lots of BC breakage.

-Mike

Hi All,

I do appreciate that strtok has a kind of bizarre signature/use pattern and potential for confusion due to how subsequent calls work, but to me that sounds like a better result for uses that need the repeated call functionality, would be to introduce a builtin StringTokenizer class that wraps the underlying strtok_r C call and uses internal state to keep track of the string being tokenized.

As a "works the same" solution for grabbing the first segment of a string up to any of the delimiter chars, could the strpbrk function be expanded with a $before_needle arg like strstr has? (strstr matches on an exact substring, not on any pf a list of characters)

Cheers

Stephen

1 year ago by Chuck Adams — view source

unread

I do appreciate that strtok has a kind of bizarre signature/use pattern and potential for confusion due to how subsequent calls work, but to me that sounds like a better result for uses that need the repeated call functionality, would be to introduce a builtin StringTokenizer class that wraps the underlying strtok_r C call and uses internal state to keep track of the string being tokenized.

strtok is weird, but it’s not actually dangerous. An OO wrapper doesn’t sound worth it unless it comes with compelling extra features or at least a reusable abstraction in the form of a concrete StrTokTokenizer (or hopefully a less ugly name) implementing a StringTokenizer interface.

Personally I say let strtok be and just admit in the documentation that it’s weird because C.

[Resent to the list. Sorry for any previously cc'd dups, macOS Mail keeps guessing the wrong sender address.]
—c (weird just because)

1 year ago by tim@bastelstu.be — view source

unread

Hi

Personally I say let strtok be and just admit in the documentation that it’s weird because C.

strtok() is not weird because C. It does not rely on the libc strtok()
function and did not since at least 1999 (and likely never did):
https://github.com/php/php-src/commit/257de2baded9330ff392f33fd5a7cc0ba271e18d#diff-fcf8a2a38ee4a0e3e2cb7c47251c9920ba8c5886d85969f676f9ddbee7aba503R332

strtok() is weird, because someone believed that relying on global state
was good API design. I find that excusable, because it happened more
than a quarter of a century ago.

Best regards
Tim Düsterhus

1 year ago by Chuck Adams — view source

unread

Hi

Personally I say let strtok be and just admit in the documentation that it’s weird because C.

strtok() is not weird because C. It does not rely on the libc strtok()
function and did not since at least 1999 (and likely never did):
https://github.com/php/php-src/commit/257de2baded9330ff392f33fd5a7cc0ba271e18d#diff-fcf8a2a38ee4a0e3e2cb7c47251c9920ba8c5886d85969f676f9ddbee7aba503R332

Okay, how about “weird because POSIX”? The API gets the blame then, not any implementation.

strtok() is weird, because someone believed that relying on global state
was good API design. I find that excusable, because it happened more
than a quarter of a century ago.

Oh I think strtok is awful, but it also looks like it beats the pants off the alternatives performance-wise. I imagine it’s also completely ignorant about unicode, so yeah…

I guess it should at least be deprecated until being replaced by a userspace version with similar performance.

—c

1 year ago by tim@bastelstu.be — view source

unread

Hi

Regarding explode($delimiter, $str)[0] — unless it is to be special-cased during compilation —it is a really inefficient way to find the substring up to the first character, especially for large strings and/or when in a tight loop where the explode is contained in a called function

Then use a regex: https://3v4l.org/SGWL5

Or a combination of strpos and substr.

I'd bet that both of these solutions would use less memory, and I would guess the PCRE one should also be better for performance (although not benchmarked) as it is highly specialized in that task.

There are plenty of solutions to the specific problem you pose here, and thus many different solutions more or less appropriate.

FWIW: explode() also has a third parameter to limit the number of
segments to return. If you always want just one, you can set the
parameter to 2 to prevent further processing from happening after
encountering the first delimiter.

Best regards
Tim Düsterhus

1 year ago by Markus Podar — view source

unread

Hi,

On Jun 25, 2024, at 4:51 PM, Gina P. Banyard <internals@gpb.moe
mailto:internals@gpb.moe> wrote:

On Tuesday, 25 June 2024 at 19:06, Mike Schinkel <mike@newclarity.net
mailto:mike@newclarity.net> wrote:
strtok()

strtok() is found 35k times in GitHub:
https://github.com/search?q=strtok%28+language%3APHP+&type=code
<https://github.com/search?q=md5%28+language%3APHP+&type=code>;
It is a commonly used as a "left part of string up to a character" in
addition to its intended use for tokenizing.

I would prefer not deprecated because of BC breakage, but IF it is
deprecated I would suggest adding a one-for-one replacement function
for the "left part of string up to a character" use-case; maybe
str_left("abc.txt",".") returning "abc".
For this exact case of extracting a file name without an extension,
you should really just use:
|pathinfo($filepath, PATHINFO_FILENAME);|
But for something more generic, you can just do:
explode($delimiter, $str)[0];

So I really don't see why we would need an "str_left()" function.
Ah, the dangers of providing a specific example of a broader use-case
is that someone will invariably discredit the specific example instead
of focusing on the applicability for the broader use-case. 🤦‍♂️

To wit, here are seven (7) use-cases for which pathinfo() is not a
viable alternative:
https://3v4l.org/RDYFs#v8.3.8 <https://3v4l.org/RDYFs#v8.3.8>;
Note those seven use-cases are found in around the first 25 results when
searching GitHub for "strtok(". I could probably find more if I kept
looking:
https://github.com/search?q=strtok%28+language%3APHP+&type=code
<https://github.com/search?q=strtok%28+language%3APHP+&type=code>;
Regarding explode($delimiter, $str)[0] — unless it is to be
special-cased during compilation —it is a really inefficient way to find
the substring up to the first character, especially for large strings
and/or when in a tight loop where the explode is contained in a called
function.

Here is a benchmark (https://onlinephp.io/c/87341
https://onlinephp.io/c/87341) showing that — on average of the runs I
performed — for using strtok() to fully process through a 3972 byte
file with 359 commas it took right at /90 times/ longer using
explode($delimiter, $str)[0] vs. strtok($str,$delimiter). Imagine is the
file were 39,720 bytes, or larger, instead.
Size of file:                3972
Number of commas:            359
Time taken for strtok:       0.0034 seconds
Time taken for explode:      0.3036 seconds
*Times `strtok()` faster:     89.1*
Yes the above processes the entire file using explode()[0] each time
rather than first using explode(",") once — because of the equivalent of
the N+1 problem[1] where the explode() is buried in a function. This
illustrates why strtok() is so good for its primary use-case of parsing
text files. strtok() is fast and does not use heaps of memory on every
token.

This leads me to think strtok() /should not/ be deprecated given how
inefficient string handling in PHP can otherwise be, at least not
without a much more efficient object for string parsing.

I'm with Mike on strtok() and don't understand why it would be on a
deprecation list.

I see nothing inherently "wrong" or "dangerous" with it: it's one of the
"works an intended" and if you know how to use it, it works perfectly
they way it is designed.

The variations of suggestions in other replies how to handle certain use
cases of strtok() already shows there's no clear migration path and
depends on the situation, which is the worst.

Compare this with suggestion like sha1() or similar, where the
deprecation is about "the function, but not the functionality", because
SHA1 is available by other means.
But there's no clear alternative to strtok(), as it is its own kind.

👎 on deprecating it; if a gotcha with it is not clear (e.g. using it in
different scopes, as this was brought up), I see this rather as a
"documentation problem".

cheers,

Markus

1 year ago by tim@bastelstu.be — view source

unread

Hi

👎 on deprecating it; if a gotcha with it is not clear (e.g. using it in
different scopes, as this was brought up), I see this rather as a
"documentation problem".

If one can easily use a function incorrectly in a way that is not
immediately apparent, then I consider the function to be badly
designed. In my book this includes all functions that rely on global
state, because that will lead to spooky action from a distance sooner
rather than later.

Here's an example (https://3v4l.org/XNl3X):

 &lt;?php

 function processInner($line) {
     $tok = strtok($line, ",");

     while ($tok !== false) {
         echo "Entry=$tok\n";
         $tok = strtok(",");
     }
 }

 function processOuter($csv) {
     $line = strtok($csv, "\n");

     while ($line !== false) {
         processInner($line);
         $line = strtok("\n");
     }
 }

 processOuter("foo,bar,baz\na,b,c\n1,2,3");

Each of the functions individually is "fine" (for an appropriate
definition of fine), but combined they are buggy. This becomes worse
when the processInner() function is part of a third party library you
don't control. Do you check each update of that library to see if it
added or removed any strtok() calls anywhere?

Pointing towards the documentation is not an excuse for bad API design.

Best regards
Tim Düsterhus

1 year ago by mickmackusa — view source

unread

If one can easily use a function incorrectly in a way that is not
immediately apparent, then I consider the function to be badly
designed.

Does that philosophy also cover preg_quote()? I've lost count of the
number of times that I've seen it used in Stack Overflow answers without a
second parameter (including array_map('preg_quote', $array)) and its
returned value used in a regex that has foward slashes as delimiters.

Additionally, it is an unintuitively named function; it doesn't actually
"quote" anything -- it \e\s\c\a\p\e\s characters. This makes life
unnecessarily harder for devs who are new to PHP who need to find the regex
escaping function.

Would it be reasonable to create preg_escape() which also (sometimes
unnecessrily) includes the (de facto default delimiter) forward slash in
its default list of escaped characters so that preg_quote() could
eventually be deprecated? As far as I know this would do no harm, will
prevent holes in code, and make PHP more intuitive.

Mick

1 year ago by Gina P. Banyard — view source

unread

If one can easily use a function incorrectly in a way that is not
immediately apparent, then I consider the function to be badly
designed.

Does that philosophy also cover preg_quote()? I've lost count of the number of times that I've seen it used in Stack Overflow answers without a second parameter (including array_map('preg_quote', $array)) and its returned value used in a regex that has foward slashes as delimiters.

Additionally, it is an unintuitively named function; it doesn't actually "quote" anything -- it \e\s\c\a\p\e\s characters. This makes life unnecessarily harder for devs who are new to PHP who need to find the regex escaping function.

Would it be reasonable to create preg_escape() which also (sometimes unnecessrily) includes the (de facto default delimiter) forward slash in its default list of escaped characters so that preg_quote() could eventually be deprecated? As far as I know this would do no harm, will prevent holes in code, and make PHP more intuitive.

It would possibly be reasonable, but this is a seperate discussion to this.
Arguably a lot of functions/methods named "quote" do escaping, so this feels like a more general problem than just ext/pcre.

Moreover, I really don't think people use a forward slash as a defacto default delimiter, I have always use # as this is what the first tutorial about regexes that I read used.

Best regards,
Gina P. Banyard

1 year ago by tim@bastelstu.be — view source

unread

Hi

If one can easily use a function incorrectly in a way that is not
immediately apparent, then I consider the function to be badly
designed.

Does that philosophy also cover preg_quote()? I've lost count of the

Yes, it does.

Would it be reasonable to create preg_escape() which also (sometimes
unnecessrily) includes the (de facto default delimiter) forward slash in
its default list of escaped characters so that preg_quote() could
eventually be deprecated? As far as I know this would do no harm, will

I'd rather see the delimiter being a required parameter or a
well-designed (object-oriented) API, such as the one provided by T-Regx:
https://github.com/t-regx/T-Regx?tab=readme-ov-file#prepared-patterns

But as Gina said, this is something for another discussion.

Best regards
Tim Düsterhus

1 year ago by Bruce Weirdan — view source

unread

Is there a reason to keep crc32?

--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com

1 year ago by Gina P. Banyard — view source

unread

Is there a reason to keep crc32?

Good question, I had a chat with Tim as I thought it was similar to the md5()/sha1() functions.
Moreover, the crc32() function returns an int, whereas the equivalent of the hash extension
returns a string, so to get the same behaviour one needs to do:

hexdec(hash('crc32b', $str));

It might still make sense to add it to the RFC, but it would need to be its own section with
its own rationale.

Best regards,
Gina P. Banyard

1 year ago by Kamil Tekiela — view source

unread

I think the "Deprecate passing E_USER_ERROR to trigger_error()" should
be better explained. Why is using this constant a problem? There is a
link to another RFC, but I can't see an explanation as to why
E_USER_ERROR suffers the same problem as fatal errors do. From an
average Joe's perspective, it looks fine and does the job
https://3v4l.org/e97TO

1 year ago by Gina P. Banyard — view source

unread

I think the "Deprecate passing E_USER_ERROR to trigger_error()" should
be better explained. Why is using this constant a problem? There is a
link to another RFC, but I can't see an explanation as to why
E_USER_ERROR suffers the same problem as fatal errors do. From an
average Joe's perspective, it looks fine and does the job
https://3v4l.org/e97TO

Returning control after an E_USER_ERROR seems problematic to me in
the first place, as the condition which lead to the trigger surely
implies the current code is unable to handle the situation.
See: https://3v4l.org/7pdvO

But the issues with fatal errors are the same as explained in the
linked RFC, in that destructors (and finally blocks, etc.) are not
called. See: https://3v4l.org/J5NXF

Using exceptions instead is more robust.
Is this explanation clear enough?
If so, I will incorporate it into the RFC.

Best regards,

Gina P. Banyard

1 year ago by Kamil Tekiela — view source

unread

I think the "Deprecate passing E_USER_ERROR to trigger_error()" should
be better explained. Why is using this constant a problem? There is a
link to another RFC, but I can't see an explanation as to why
E_USER_ERROR suffers the same problem as fatal errors do. From an
average Joe's perspective, it looks fine and does the job
https://3v4l.org/e97TO

Returning control after an E_USER_ERROR seems problematic to me in
the first place, as the condition which lead to the trigger surely
implies the current code is unable to handle the situation.
See: https://3v4l.org/7pdvO

But the issues with fatal errors are the same as explained in the
linked RFC, in that destructors (and finally blocks, etc.) are not
called. See: https://3v4l.org/J5NXF

Using exceptions instead is more robust.
Is this explanation clear enough?
If so, I will incorporate it into the RFC.

Best regards,

Gina P. Banyard

Yes, that is a better description.

1 year ago by Gina P. Banyard — view source

unread

Yes, that is a better description.

I have updated this section of the RFC.

Best regards,

Gina P. Banyard

1 year ago by Morgan — view source

unread

I do not believe it is appropriate to deprecate strtok() without a proper
replacement.

While I agree that its signature is undesirable, the suggested replacement
functions or “just write a parser” are not very pleasant solutions to fill
the void it would leave.

The stateful functionality it exhibits is incredibly useful, though I will
admit confusing. Would it not be better to change how the functionality is
accessed to reflect the fact that state is preserved rather than remove it
entirely and force a performance burden on developers?

Hello internals,

It is this time of year again where we proposed a list of deprecations to
add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past
year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions,
they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from
the proposal altogether.

Best regards,

Gina P. Banyard

1 year ago by Gina P. Banyard — view source

unread

I do not believe it is appropriate to deprecate strtok() without a proper replacement.

While I agree that its signature is undesirable, the suggested replacement functions or “just write a parser” are not very pleasant solutions to fill the void it would leave.

The stateful functionality it exhibits is incredibly useful, though I will admit confusing. Would it not be better to change how the functionality is accessed to reflect the fact that state is preserved rather than remove it entirely and force a performance burden on developers?

First of all, please do not top post on the mailing list.

Secondly, please explain how you would provide the statefullness.
Thirdly please provide an example of usage of strtok() where the suggestions I have given as replies in this thread are not applicable.
If the problem is indeed parsing some very complicated structure and you are using strtok for this, I would argue writing a proper parser is overall better.

Of note, I'm not the direct author of this section, I just cleaned it up because the initial wording was... draft state like.

Best regards,
Gina P. Banyard

1 year ago by Kamil Tekiela — view source

unread

I have added one more deprecation

Deprecate the second parameter to mysqli_store_result().

1 year ago by Marc Bennewitz — view source

unread

Hi Gina,

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

I would like to propose a deprecation of implicit cast to int of numeric
strings using bit shift operators.

For the following reasons:

In PHP strings are byte arrays and without context it's not possible
to know if "123" is actually a number or just three bytes of 0x313234
The other bitwise operators |, &, ~, ^ already take it as
byte array, only the bit shift operators try to be smart here
Non numeric strings already fail with "Unsupported operand types:
string >> int"
This makes working with byte arrays unnecessary hard and forces you
to use limited and system depending int's.

https://3v4l.org/IBUDD

While processing strings as byte arrays using bit shift operators needs
a separate RFC, I think, if there is an agreement on deprecating this
implicit cast it would already be beneficial to have this sooner than later.

What do you think?

Best regards,

Gina P. Banyard

Best,
Marc

1 year ago by Gina P. Banyard — view source

unread

Hi Gina,

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:
https://wiki.php.net/rfc/deprecations_php_8_4
As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

I would like to propose a deprecation of implicit cast to int of numeric strings using bit shift operators.

For the following reasons:

In PHP strings are byte arrays and without context it's not possible to know if "123" is actually a number or just three bytes of 0x313234

The other bitwise operators |, &, ~, ^ already take it as byte array, only the bit shift operators try to be smart here

Non numeric strings already fail with "Unsupported operand types: string >> int"

This makes working with byte arrays unnecessary hard and forces you to use limited and system depending int's.

https://3v4l.org/IBUDD

While processing strings as byte arrays using bit shift operators needs a separate RFC, I think, if there is an agreement on deprecating this implicit cast it would already be beneficial to have this sooner than later.

What do you think?

I personally think the scope of this is too large as you have not accounted of all the details.

If you try to do a bitwise operator between a numeric string and an integer the numeric string will be implicitly converted to an int.
However, if you try to use a bitwise operator between an integer and a non-numeric string you also get a TypeError about unsupported operand types: https://3v4l.org/W582TN

Thus the current bit shift operators follow from the existing semantics, and curtailling them without doing it for the other bitwise operators doesn't make a lot of sense to me.

Moreover, I feel it makes more sense creating dedicated functions for byte array/string bitwise operators and deprecate using the native operators for this and relegate them just for integers.

Best regards,
Gina P. Banyard

1 year ago by Juliette Reinders Folmer — view source

unread

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

I've read through the complete set of proposals and have the following
observations:

While a number of proposals include an impact analysis (thank you!), a
significant number of the proposals don't.
It would be appreciated if for those proposals which aren't
removing unused/unusable functionality, some sort of impact analysis was
added.
DomDocument and DomEntity properties section: the text seems to
contradict itself - the proposal seems to suggest to soft-deprecate
something which is already soft deprecated. Some clarification of what
the actual proposal is, would be helpful.
xml_set_object() section: the mitigation path for this deprecation
is unclear and more so, it is unclear as of which PHP version the
mitigation path is available (if there are restrictions). It would be
helpful if some example code was added to show the mitigation path more
clearly.
CSV escaping section: please make it explicit which functions will be
affected by this proposal.
file_put_contents() section: please make the mitigation path
explicit (which I presume would be something along the lines of
file_put_contents( $filename, implode('', $data) ) ?)

Other than that, I join the previously voiced objections to the
deprecation of uniqid(), md5(), sha1(), md5_file(), sha1_file().
While I acknowledge that these functions can be used inappropriately
for security-sensitive code, which should use alternative methods, these
functions have perfectly valid use-cases for non-security-sensitive code
and the impact of the BC-break of deprecating and eventually removing
these methods can, IMO, not be justified.
Keep in mind that while "we" know and understand that deprecations are
not errors, end-users often don't and particularly for open source
projects, this means that in practice these deprecations will need to be
addressed anyway to reduce the noise of users opening issues about them,
which without a clear path to removal of the functions, will, in a lot
of cases, mean adding the @ operator to all uses.

Regarding the deprecation of using E_USER_ERROR in trigger_error():
there are errors which should never be caught and using
trigger_error() with E_USER_ERROR is appropriate for those.
The fact that execution can be returned to the code via
set_error_handler() returning true sounds to me like a bug which
should be fixed, rather than disabling the functionality for userland
code to hard exit with an error when deemed appropriate.

As for deprecating the E_USER_ERROR constant, this will lead to a lot
of guard code needing to be added for calls to error_reporting() as
well as in custom error handler functions, when the (open source) code
needs to be PHP cross version compatible.
In my opinion, this deprecation proposal should be moved to a later
major than a deprecation of using E_USER_ERROR in trigger_error().

Either way, these are two pennies.

Smile,
Juliette

1 year ago by Morgan — view source

unread

Other than that, I join the previously voiced objections to the
deprecation of uniqid(), md5(), sha1(), md5_file(), sha1_file().
While I acknowledge that these functions can be used inappropriately
for security-sensitive code, which should use alternative methods, these
functions have perfectly valid use-cases for non-security-sensitive code
and the impact of the BC-break of deprecating and eventually removing
these methods can, IMO, not be justified.

md5(), sha1_file() et al. are duplicated by hash('md5'),
hash_file('sha1'), etc. These work in exactly the same way and produce
exactly the same output as the dedicated functions and can be used just
as appropriately or inappropriately.

Why, except in deference to their age, continue calling out MD5 and SHA1
in particular for special consideration?

uniqid() doesn't have such an obvious translation (of the half-dozen
alternatives in the RFC, the closest one is the first), but that's
because its output isn't very good anyway: it doesn't even guarantee
uniqueness.

The biggest effort in replacing it would come if you were relying on the
main part[1] of its output being strictly increasing (which would be a
mistake because it might not be).

[1] The main part is just a timestamp, extra entropy is provided by a
call to random_bytes(4).

Keep in mind that while "we" know and understand that deprecations are
not errors, end-users often don't and particularly for open source
projects, this means that in practice these deprecations will need to be
addressed anyway to reduce the noise of users opening issues about them,
which without a clear path to removal of the functions, will, in a lot
of cases, mean adding the @ operator to all uses.

Well, that's true of any deprecation. Fortunately for these functions
at least, clear paths for their removal are just that - clear.

1 year ago by Gina P. Banyard — view source

unread

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:
https://wiki.php.net/rfc/deprecations_php_8_4
As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

I've read through the complete set of proposals and have the following observations:

While a number of proposals include an impact analysis (thank you!), a significant number of the proposals don't.

It would be appreciated if for those proposals which aren't removing unused/unusable functionality, some sort of impact analysis was added.

You will need to clarify which ones you are talking about.
These "bulk removal" RFCs are written by various authors over the course of a year, and might not have been looked at for 9+ months.

DomDocument and DomEntity properties section: the text seems to contradict itself - the proposal seems to suggest to soft-deprecate something which is already soft deprecated. Some clarification of what the actual proposal is, would be helpful.

Those properties have been soft deprecated for a long time, the proposal is to formally deprecate them.
I've clarified this.

xml_set_object() section: the mitigation path for this deprecation is unclear and more so, it is unclear as of which PHP version the mitigation path is available (if there are restrictions). It would be helpful if some example code was added to show the mitigation path more clearly.

The migration path is available since at least PHP 5.3, probably even longer.
Instead of calling xml_set_object() with an object $obj, and e.g. xml_set_default_handler() with a string corresponding to the name of a method of $obj,
you should set the handler with a proper callable, i.e. using [$obj, 'methodOfObj'] as the handler.

CSV escaping section: please make it explicit which functions will be affected by this proposal.

Done.

file_put_contents() section: please make the mitigation path explicit (which I presume would be something along the lines of file_put_contents( $filename, implode('', $data) ) ?)

Done.

Other than that, I join the previously voiced objections to the deprecation of uniqid(), md5(), sha1(), md5_file(), sha1_file().
While I acknowledge that these functions can be used inappropriately for security-sensitive code, which should use alternative methods, these functions have perfectly valid use-cases for non-security-sensitive code and the impact of the BC-break of deprecating and eventually removing these methods can, IMO, not be justified.

Keep in mind that while "we" know and understand that deprecations are not errors, end-users often don't and particularly for open source projects, this means that in practice these deprecations will need to be addressed anyway to reduce the noise of users opening issues about them, which without a clear path to removal of the functions, will, in a lot of cases, mean adding the @ operator to all uses.

If I may be a bit cheeky, if we consider that userland does not understand that deprecations are not errors, how can we trust them to use the 5 aforementioned functions correctly?
Especially as there are more appropriate replacements available.

Regarding the deprecation of using E_USER_ERROR in trigger_error(): there are errors which should never be caught and using trigger_error() with E_USER_ERROR is appropriate for those.

The fact that execution can be returned to the code via set_error_handler() returning true sounds to me like a bug which should be fixed, rather than disabling the functionality for userland code to hard exit with an error when deemed appropriate.

In that case, calling exit() with a string will provide you more consistent behaviour, and also run destructors and finally blocks.
The main motivation to remove it is to curtail the usage of the bailout mechanism, as it has various issues explained in the linked RFC.

I have added this to the section.

As for deprecating the E_USER_ERROR constant, this will lead to a lot of guard code needing to be added for calls to error_reporting() as well as in custom error handler functions, when the (open source) code needs to be PHP cross version compatible.

In my opinion, this deprecation proposal should be moved to a later major than a deprecation of using E_USER_ERROR in trigger_error().

I forgot about error_reporting, I moved it out of the RFC and for something to be tackled at a later date.

Best regards,
Gina P. Banyard

1 year ago by Juliette Reinders Folmer — view source

unread

On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer
php-internals_nospam@adviesenzo.nl wrote:

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

Thanks for making those updates Gina!

While a number of proposals include an impact analysis (thank
you!), a significant number of the proposals don't.
It would be appreciated if for those proposals which aren't removing
unused/unusable functionality, some sort of impact analysis was added.

You will need to clarify which ones you are talking about.
These "bulk removal" RFCs are written by various authors over the
course of a year, and might not have been looked at for 9+ months.

I'd suggest for an impact analysis/expected impact statement to be added
to the following deprecation proposals:

session.sid_length and session.sid_bits_per_character
xml_set_object() and xml_set_*_handler() with string method names
Deprecate proprietary CSV escaping mechanism
Deprecate strtok() function
Deprecate returning non-strings values from a user output handler
Deprecate producing output in a user output handler
Deprecate mysqli_refresh()
Deprecate mysqli_kill()
Deprecate lcg_value()
Deprecate md5(), sha1(), md5_file(), and sha1_file() (add an actual
analysis, not just a statement as this is a high impact proposal)
Deprecate passing E_USER_ERROR to trigger_error()
Deprecate SOAP_FUNCTIONS_ALL constant and passing it to
SoapServer::addFunction()

And to a lesser degree for:

Formally deprecate Soft-deprecated DOMDocument and DOMEntity properties
Deprecate SplFixedArray::__wakeup()
Deprecate passing null and false to dba_key_split()
Deprecate passing incorrect data types for options to ext/hash functions
Constants SUNFUNCS_RET_STRING, SUNFUNCS_RET_DOUBLE, SUNFUNCS_RET_TIMESTAMP
Remove E_STRICT error level and deprecate E_STRICT constant
mysqli_ping() and mysqli::ping()

P.S.: typo in "xml_set_object() and xml_set_*_handler() with string
method names": "witch" => "which"

Other than that, I join the previously voiced objections to the
deprecation of uniqid(), md5(), sha1(), md5_file(),
sha1_file().
While I acknowledge that these functions can be used
inappropriately for security-sensitive code, which should use
alternative methods, these functions have perfectly valid use-cases
for non-security-sensitive code and the impact of the BC-break of
deprecating and eventually removing these methods can, IMO, not be
justified.
Keep in mind that while "we" know and understand that deprecations
are not errors, end-users often don't and particularly for open
source projects, this means that in practice these deprecations will
need to be addressed anyway to reduce the noise of users opening
issues about them, which without a clear path to removal of the
functions, will, in a lot of cases, mean adding the @ operator to
all uses.

If I may be a bit cheeky, if we consider that userland does not
understand that deprecations are not errors, how can we trust them to
use the 5 aforementioned functions correctly?
Especially as there are more appropriate replacements available.

There is a difference between "userland" (dev-users) and end-users. I
was talking about end-users, while based on your remark, you are talking
about dev-users.

I also don't agree that there are "more appropriate replacements available".
The suggested hash() replacements for the md5/sha1* functions have
the exact same functionality, which the RFC considers "incorrect use",
so what are we actually solving by this deprecation ? Devs not having
enough to do already ?
The problem (for open source) with "force-replacing" the uses of
md5/sha1* functions with the hash function calls, is that the hash
extension was not part of PHP core until PHP 7.4, which means that for a
significant number of open source projects, the replacement is not a
one-on-one function call replacement, but needs guard code for PHP < 7.4
in case the hash extension is not available.

Also, having read through the RFC a second time, I find the voting
choices inconsistent - in particular the first deprecation vote, which
makes the others ambiguous.
Could each voting choice please be explicitly one of the below to
prevent any confusion ?

Remove in PHP 8.4
Deprecate in PHP 8.4 and remove in PHP 9
Deprecate in PHP 8.4 and remove at a later date after a separate vote

Smile,
Juliette

1 year ago by Andreas Heigl — view source

unread

Am 08.07.24 um 05:04 schrieb Juliette Reinders Folmer:
[...]

I also don't agree that there are "more appropriate replacements available".
The suggested hash() replacements for the md5/sha1* functions have
the exact same functionality, which the RFC considers "incorrect use",
so what are we actually solving by this deprecation ? Devs not having
enough to do already ?
The problem (for open source) with "force-replacing" the uses of
md5/sha1* functions with the hash function calls, is that the hash
extension was not part of PHP core until PHP 7.4, which means that for a
significant number of open source projects, the replacement is not a
one-on-one function call replacement, but needs guard code for PHP < 7.4
in case the hash extension is not available.

From the docs it looks like the hash function was part of the core
since php 5.1.2 but perhaps I read that wrongly from the docs.

Anyhow, a replacement could possibly be to declare a userland function
that then does the version check and either calls the respective
function directly or delegates to the hash-function.

The replacement could be a

function md5_userland(string $string, bool $binary = false): string {
     if (version_compare(PHP_VERSION, '7.4.0', '<')) {
         return md5($string, $binary);
     }
     return hash('md5', $string, $binary);
}

Replacing all occurrences of md5( with md5_userland( in code is then
a doable task.

Alternatively accepting the deprecation and adding a

if (! function_exists('md5')){
     function md5(string $string, bool $binary = false): string
     {
         return hash('md5', $string, $binary);
     }
}

would even skip the step of having to replace the function calls at the
cost of having the deprecations in the log as long as the function still
exists.

A way to mark specific deprecation messages as OK (and not show up in
the logs) would be helpful here, but there are already userland
libraries that allow such things. So people that are concerend about
that already have the possibility to "fix" that.

So to me that looks like a solvable problem.

Yes! It needs to be addressed by people! But that is probably the cost
of supporting legacy infrastructure.

What might be another idea is to allow overwriting deprecated language
functions with userland functions, so that it would immediatel possible
to replace the deprecated function with a userland one. But that is for
sure a different RFC.

Just my 0.02 €

Cheers

Andreas

--
,,,
(o o)
+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl |
| mailto:andreas@heigl.org N 50°22'59.5" E 08°23'58" |
| https://andreas.heigl.org |
+---------------------------------------------------------------------+
| https://hei.gl/appointmentwithandreas |
+---------------------------------------------------------------------+
| GPG-Key: https://hei.gl/keyandreasheiglorg |
+---------------------------------------------------------------------+

1 year ago by Juliette Reinders Folmer — view source

unread

Am 08.07.24 um 05:04 schrieb Juliette Reinders Folmer:
[...]

I also don't agree that there are "more appropriate replacements
available".
The suggested hash() replacements for the md5/sha1* functions have
the exact same functionality, which the RFC considers "incorrect
use", so what are we actually solving by this deprecation ? Devs not
having enough to do already ?
The problem (for open source) with "force-replacing" the uses of
md5/sha1* functions with the hash function calls, is that the
hash extension was not part of PHP core until PHP 7.4, which means
that for a significant number of open source projects, the
replacement is not a one-on-one function call replacement, but needs
guard code for PHP < 7.4 in case the hash extension is not available.

From the docs it looks like the hash function was part of the core
since php 5.1.2 but perhaps I read that wrongly from the docs.

Anyhow, a replacement could possibly be to declare a userland function
that then does the version check and either calls the respective
function directly or delegates to the hash-function.

Agreed, but the fact that it is solvable, is not a justification for
adding "busy-work" when the replacement for the deprecated function is,
by all accounts, just as bad/incorrect as the original....

I don't mind putting the work in when there is a good justification, but
I don't see one for this deprecation.

Smile,
Juliette

1 year ago by Andreas Heigl — view source

unread

Hey all

Am 08.07.24 um 07:05 schrieb Juliette Reinders Folmer:

Am 08.07.24 um 05:04 schrieb Juliette Reinders Folmer:
[...]

I also don't agree that there are "more appropriate replacements
available".
The suggested hash() replacements for the md5/sha1* functions have
the exact same functionality, which the RFC considers "incorrect
use", so what are we actually solving by this deprecation ? Devs not
having enough to do already ?
The problem (for open source) with "force-replacing" the uses of
md5/sha1* functions with the hash function calls, is that the
hash extension was not part of PHP core until PHP 7.4, which means
that for a significant number of open source projects, the
replacement is not a one-on-one function call replacement, but needs
guard code for PHP < 7.4 in case the hash extension is not available.

From the docs it looks like the hash function was part of the core
since php 5.1.2 but perhaps I read that wrongly from the docs.

Anyhow, a replacement could possibly be to declare a userland function
that then does the version check and either calls the respective
function directly or delegates to the hash-function.

Agreed, but the fact that it is solvable, is not a justification for
adding "busy-work" when the replacement for the deprecated function is,
by all accounts, just as bad/incorrect as the original....

I don't mind putting the work in when there is a good justification, but
I don't see one for this deprecation.
The only one I can see is cleaning up the codebase and removing
duplicate methods.

But the RFC definitely states that it is to "encourage users to use a
secure hash functions, instead of using an insecure algorithm"

Which is fine. But I am totally with you that deprecating a function by
encouraging users to use the same insecure algorithm via a different
function is ... an interesting take to say the least.

So with that argumentation I am also in the camp to say 'thanks, but
no thanks' to that part of the RFC.

Cheers

Andreas

--
,,,
(o o)
+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl |
| mailto:andreas@heigl.org N 50°22'59.5" E 08°23'58" |
| https://andreas.heigl.org |
+---------------------------------------------------------------------+
| https://hei.gl/appointmentwithandreas |
+---------------------------------------------------------------------+
| GPG-Key: https://hei.gl/keyandreasheiglorg |
+---------------------------------------------------------------------+

1 year ago by tim@bastelstu.be — view source

unread

Hi

I don't mind putting the work in when there is a good justification, but
I don't see one for this deprecation.
The only one I can see is cleaning up the codebase and removing
duplicate methods.

But the RFC definitely states that it is to "encourage users to use a
secure hash functions, instead of using an insecure algorithm"

Which is fine. But I am totally with you that deprecating a function by
encouraging users to use the same insecure algorithm via a different
function is ... an interesting take to say the least.

Gina already mentioned it in the long email from earlier today, but for
reference:

The intention is that the users do not perform a mindless search and
replace, but instead use the opportunity to re-evaluate the choice on a
case by case basis.

Cleaning up the codebase is not a concern, because the implementation of
the functions is trivial.

However cleaning up the documentation and API surface is something
that is useful. As an example it is easier for the (inexperienced) user
to navigate the documentation, because all the hashing functionality is
available by the standard 'hash' functions. It also makes maintaining
the documentation easier. As an example a few months ago, I updated all
the examples to no longer showcase 'md5' and instead showcase the usage
of 'sha256':

https://github.com/php/doc-en/commit/20dcfbb0dd7150cbe5dfd7903a3001229295c549

Of course the functions still support MD5, but now the documentation
shows current best practices. Anyone whom I trust to use MD5 safely, I
also trust to understand how to use it by means of the hash() function
and for all the others the examples will be helpful in writing safer code.

Also once the users migrated to the hash() function, they will be able
to switch out algorithms much more easily going forward, because the
algorithm choice can easily be stored in a central configuration and
passed as a string. (no, no one calls functions using a dynamic name).

In other words, the goal of the proposal is the anticipated positive
downstream effects in overall ecosystem safety and simplified learning
curve for new PHP developers.

Best regards
Tim Düsterhus

1 year ago by Niels Dossche — view source

unread

I'd suggest for an impact analysis/expected impact statement to be added to the following deprecation proposals:
(...)

And to a lesser degree for:
(...)

Deprecate passing incorrect data types for options to ext/hash functions

Since this is my proposal, I went ahead and performed a simple test and found that at least the top 2K packages (likely) don't have any impact of this.
I added the full explanation for the impact of this in the RFC text.

Kind regards
Niels

1 year ago by Gina P. Banyard — view source

unread

While a number of proposals include an impact analysis (thank you!), a significant number of the proposals don't.
It would be appreciated if for those proposals which aren't removing unused/unusable functionality, some sort of impact analysis was added.

You will need to clarify which ones you are talking about.
These "bulk removal" RFCs are written by various authors over the course of a year, and might not have been looked at for 9+ months.

I'd suggest for an impact analysis/expected impact statement to be added to the following deprecation proposals:

I am going to start this reply with the following:
An impact analysis showing a large impact to userland, is not in itself, an argument against a deprecation.
What an impact analysis helps to determine is the length of the deprecation and the timeline for removal.

It is getting exhausting to need to provide this, when what it is, is me asking Damien to check usage on the corpus of over 3100 projects, some open source (such as Wordpress, Drupal, OSS accounting software, etc.), top 1000 composer packages, and the private codebases he has access via his company, using his Exakat static analysis tool. [1]
The corpus is 160 MLOC (1.2 Billion tokens), 1.4 M files and as already mentioned over 3100 distinct projets.

But his tool will sometimes report duplicates, and has outdated versions which might not be affected by the issues anymore.
One reason is that some projects inline composer dependencies, and unless I do a painstaking manual review I cannot narrow this down.
Especially as it takes time to run the analysis on the corpus, and if I don't ask the precise question I don't get all the relevant stats.

So every stat is a conservative approximation.

We don't decide to deprecate and remove things for the fun of it.
But if something is misleading, badly designed, dangerous, has a security risk, or causes issues it should be deprecated.
It is my belief that it does not matter if this affects 10, 10 000, or 10 000 000 codebases.
However, how and when we remove this, yes this is affected by the usage.

session.sid_length and session.sid_bits_per_character

Auditing INI setting usages is effectively impossible with Exakat.

Misusing these settings can lead to security issues,
and the new values will match the existing defaults.

I would guess that the majority of users don't even know about this setting and thus are not affected.
Similarly, it seems likely that application developers are also not aware of it,
causing applications to break if a hosting provider would adjust these settings.
For example: if the application expects it to be a specific format, which is defined in the database schema.

Considering the above, these INI settings should be removed and deprecated, regardless of impact.

xml_set_object() and xml_set_*_handler() with string method names

This behaviour is unintuitive and breaks all usual language semantics.
This should be deprecated and removed regardless of impact.
But when I was working on this I had asked Damien to run some analysis with Exakat and found 66 projects.
To which I have sent PRs to some to remove the usage, which is extemely simple to do.

To clarify the rationale, the following code is ambiguous:
xml_set_element_handler($p, 'strrev')

It either calls the \strrev() string function, or a method called strrev on the object provided by a call to xml_set_object().
This is going to be the logic as of PHP 8.4 after some refactoring I did last October.
In the current released versions it is even more ambiguous, as the object provided by xml_set_object() could be passed after setting the string callable.

This behaviour is totally unintuitive, so regardless of the impact it should be removed.

Deprecate proprietary CSV escaping mechanism

This is a follow-up on an RFC whose first step was implemented in PHP 7.4. (https://wiki.php.net/rfc/kill-csv-escaping)
The first step was implemented (https://github.com/php/php-src/pull/3515) without a vote being held following the discusion on internals: https://externals.io/message/103268

This routinely bites people, and we still get issues about people being confused about this parameter.
We really should address this, and not wait yet another 5 years for complaints to once again be raised before we take any action.

Deprecate strtok() function

Symfony agrees with the rationale provided by the RFC and has banned the function from their project: https://github.com/symfony/symfony/issues/57542
This seems to indicate that the rationale around it is sound.
But just for the sake of it, I asked Damien, and I don't have the total number of usages, just that at least one call to strtok is made in 274 projects.
I have no idea which projects, and whether the majority of them are in a bunch of libraries or not.

Deprecate returning non-strings values from a user output handler

Deprecate producing output in a user output handler

This is hard to analyse as it depends on runtime execution.
However, the current behaviour when doing one of these things is questionable and/or broken.
And I firmly believe this should be deprecated/changed regardless of impact.

Deprecate mysqli_refresh()

Deprecate mysqli_kill()

These are following upstream deprecations from MySQL.

Deprecate lcg_value()

This function is effectively broken.
Thus, I do not see what benefit we get from an impact analysis.

Deprecate md5(), sha1(), md5_file(), and sha1_file() (add an actual analysis, not just a statement as this is a high impact proposal)

To circle back to the beginning, what does a detailed analysis brings us here?
Tim is aware this has potential to impact a lot of code, which is why it is explicitly not being slated for removal in 9.0, and would require a follow-up RFC to remove it.

Moreover, this is in the same vein as when we deprecated utf8_decode() and utf8_encode() in PHP 8.2:
https://wiki.php.net/rfc/remove_utf8_decode_and_utf8_encode

Tim slightly adjusted the wording of the RFC to make it clearer that the suggested replacements are only intended for users that are locked into the algorithm choice.
I struggle to see a good reason to use MD5 in 2024, and I would hope that no-one uses MD5 to hash passwords in 2024, but somehow I doubt that.

I'm also trusting Tim to implement the deprecation message, and the changelog entry on the manual, in a way that prompts users to re-evaluate their choice of algorithm rather then blindly using the hash extension with MD5/SHA1.

But I did ask Damien, and he has told me that for each function there are that many projects that use the function at least once.
I don't have any idea if it stems from a library, if they only used it once or 10 000 times in the project, nor for what purpose.

md5: 862
sha1: 495
sha1_file : 85
md5_file : 245

Deprecate passing E_USER_ERROR to trigger_error()

This is to limit usage and access to the bailout mechanism, and better alternatives exist and should be used.
This is prime example of deprecations being the correct tool.

Deprecate SOAP_FUNCTIONS_ALL constant and passing it to SoapServer::addFunction()

To me, this is a security issue first and foremost, and therefore we should discourage its use and remove it.
However, once again, I've asked Damien to run a quick analysis and 182 projects use it, mainly Symfony and Drupal.

And to a lesser degree for:

Formally deprecate Soft-deprecated DOMDocument and DOMEntity properties

We are following the DOM Spec here.
Thus I don't see how an impact analysis is useful.

Deprecate SplFixedArray::__wakeup()

This never worked properly.
Thus I don't see how an impact analysis is useful.

Deprecate passing null and false to dba_key_split()

This also never worked properly and is a bug.
Thus I don't see how an impact analysis is useful.

Deprecate passing incorrect data types for options to ext/hash functions

This is potential security issue, and only possible to know at runtime, so regardless of impact this should be removed.
However, Niels did add such an analysis to the RFC.

Constants SUNFUNCS_RET_STRING, SUNFUNCS_RET_DOUBLE, SUNFUNCS_RET_TIMESTAMP
This is a follow-up on a deprecation enacted in PHP 8.1, and arguably should have been done at the same time,
cf. https://wiki.php.net/rfc/deprecations_php_8_1#date_sunrise_and_date_sunset

Remove E_STRICT error level and deprecate E_STRICT constant

This error level had only 2 strange uses in PHP 7, and has been completely removed in PHP 8.
I don't see what benefit an impact analysis would bring here, we are just deprecating/removing cruft at this point.

mysqli_ping() and mysqli::ping()

This is broken as of PHP 8.2.
Thus I don't see how an impact analysis is useful.

P.S.: typo in "xml_set_object() and xml_set_*_handler() with string method names": "witch" => "which"

Fixed

Other than that, I join the previously voiced objections to the deprecation of uniqid(), md5(), sha1(), md5_file(), sha1_file().
While I acknowledge that these functions can be used inappropriately for security-sensitive code, which should use alternative methods, these functions have perfectly valid use-cases for non-security-sensitive code and the impact of the BC-break of deprecating and eventually removing these methods can, IMO, not be justified.
Keep in mind that while "we" know and understand that deprecations are not errors, end-users often don't and particularly for open source projects, this means that in practice these deprecations will need to be addressed anyway to reduce the noise of users opening issues about them, which without a clear path to removal of the functions, will, in a lot of cases, mean adding the @ operator to all uses.

If I may be a bit cheeky, if we consider that userland does not understand that deprecations are not errors, how can we trust them to use the 5 aforementioned functions correctly?
Especially as there are more appropriate replacements available.

There is a difference between "userland" (dev-users) and end-users. I was talking about end-users, while based on your remark, you are talking about dev-users.

I am unsure what you mean by "end-users" here, I am going to assume you mean PHP developers that write PHP code using PHP libraries and/or frameworks.
Because if you refer to "end-users" as people that install WordPress (or whatever PHP application) via something like CPanel, this is a totally different conversation.

I sincerely appreciate that you are very much a tooling and library ecosystem developer, but from a core developer PoV that someone is an "end-user" or a "dev-user" is not a practical distinction.
We cannot make a function available only to "dev-users" that know what they do, we need to consider the whole userbase, and arguably end-users are the largest proportion of this.
Thus if end-users do not understand that deprecations are not errors, how should I expect end-users to read the documentation?
A deprecation is loud and clear, the documentation can be easily ignored, and there is no way to verify if the aforementioned functions are used correctly.

I will add, I very much dislike the argument "but it is clearly explained on the documentation, so it is not a problem".
I do not know about most people, but frankly, I do not look up the documentation everytime I want to use a function, similarly to me not having consulted a dictionary to verify the meaning of the words I'm using before typing them.
In the same way that human languages will "deprecate" words by discouraging their usage, we ought to do the same for the language we write our code in.

If the issue is that you, as a maintainer, don't have a page on php.net to point to users where we clearly say "Promoting deprecations to Exceptions is wrong and bad" then we can make such a page.
Tim made a PR, which is now live [2], to the ErrorException page changing the example to a correct error handler promoting warnings and notices to exceptions but not deprecations.

However, if even creating such a page is not enough, then maybe we need to do some engine level changes where we properly split out deprecations from the other diagnostics being emitted,
so that it becomes impossible for people to promote deprecations to exceptions (which somehow I'm thinking that people like Marco and Nicolas would appreciate).
These are all conversations that we can have.
However, "stop deprecating things" is not a "solution" to people not understanding what deprecations are.
This has come up again and again, and the answer has been constant, it is unacceptable to tell the project to not deprecate things.
It is one of our limited tools to actually make changes to the language, removing this is not an option.

Because if we do remove this option, I can definitely see people starting to create their own flavours of PHP to fix stuff that apparently must be set in stone in the official language.
Which I don't think many people want to see this.
And, considering that most of the deprecations are in extensions this is, yet again, reinforcing my opinion that we should unbundle all extensions so that they can move at their own pace and that users can install whatever version of said extension they want.

Moreover, if you permit me to do an aside from another industry.
The construction industry in Europe is going through a massive overhaul via the second generation of Eurocodes. [3]
All of the final drafts were submitted prior to October of 2023, and will be finalized, translated into German/French, and voted on prior to 30 March 2026.
These new standards will then be implemented at a national level by all 34 members of CEN [4] via their national standard body (e.g. the British Standard Institute for the UK) by 30 September 2027.
Finally, the previous versions of the standard must be withdrawn by 30 March 2028.
Therefore, if the final version of a standard is published at the latest possible time, there is at most a two year transition period for a large part of an industry and, at minimum, 34 national standardization bodies to adopt the new standard, deprecate and withdraw the old one.
It is possible to use the new Eurocodes prior to them becoming mandated at the national level (which the member countries of CEN are obligated to do), but once that step is taken using the previous Eurocodes will require a legal exemption.

Meanwhile, PHP, a programming language that represents a fraction of the software industry, and does not need to deal with any legally binding system whatsoever, provides longer time frames, and yet this is not enough?
There is no legal requirement for any project to need to use the latest version of PHP, and if you fancy it, you could create a new project written in PHP 5.2 today if you wanted.
Maybe PHP and its users aren't able to cause the same level of damage as a bridge collapsing by letting potential security problems go unresolved, but that doesn't mean we can't learn a lesson from "real engineering" that benefits the project.

I also don't agree that there are "more appropriate replacements available".
The suggested hash() replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers "incorrect use", so what are we actually solving by this deprecation ? Devs not having enough to do already ?
The problem (for open source) with "force-replacing" the uses of md5/sha1* functions with the hash function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available.

Reiterating what I said previously, replacing it with the one-to-one equivalent should only be done if you truly need those specific algorithms.
Otherwise its usage should be reconsidered depending on the requirements and switched to something "safer".
Hopefully this is clearer now that Tim amended the RFC.

I can understand that userland projects and end-users work on a broad range of versions, but it is unreasonable to expect the PHP project to not do something because the situation used to be different over 5 years ago.
Moreover, according to Tim, who used to work on a PHP application that people would install on shared-hosting, most hosting companies actually have optional extensions enabled such as ext/hash, ext/intl, ext/mbstring, or even ext/gmp.

So with all due respect, I do not think that ext/hash not being mandatory prior to PHP 7.4 is a good counter argument.
Especially, as according to Packagist statistics, 93% of users use a version of PHP that is PHP 7.4 or above, and this percentage is only going to increase. [4]

Also, having read through the RFC a second time, I find the voting choices inconsistent - in particular the first deprecation vote, which makes the others ambiguous.
Could each voting choice please be explicitly one of the below to prevent any confusion ?

Remove in PHP 8.4

Deprecate in PHP 8.4 and remove in PHP 9

Deprecate in PHP 8.4 and remove at a later date after a separate vote

Unless specified otherwise, it is deprecate in 8.4 and remove in PHP 9, the other ones which specify it is for process efficiency.

Best regards,

Gina P. Banyard

[1] https://www.exakat.io/en/
[2] https://www.php.net/manual/en/class.errorexception.php
[3] https://eurocodes.jrc.ec.europa.eu/second-generation-eurocodes
[4] https://standards.cencenelec.eu/dyn/www/f?p=CEN:5
[5] https://stitcher.io/blog/php-version-stats-july-2024

1 year ago by Rob Landers — view source

unread

While a number of proposals include an impact analysis (thank you!), a significant number of the proposals don't.
It would be appreciated if for those proposals which aren't removing unused/unusable functionality, some sort of impact analysis was added.

You will need to clarify which ones you are talking about.
... snip big ...

I also don't agree that there are "more appropriate replacements available".
The suggested hash() replacements for the md5/sha1* functions have the exact same functionality, which the RFC considers "incorrect use", so what are we actually solving by this deprecation ? Devs not having enough to do already ?
The problem (for open source) with "force-replacing" the uses of md5/sha1* functions with the hash function calls, is that the hash extension was not part of PHP core until PHP 7.4, which means that for a significant number of open source projects, the replacement is not a one-on-one function call replacement, but needs guard code for PHP < 7.4 in case the hash extension is not available.

Reiterating what I said previously, replacing it with the one-to-one equivalent should only be done if you truly need those specific algorithms.
Otherwise its usage should be reconsidered depending on the requirements and switched to something "safer".
Hopefully this is clearer now that Tim amended the RFC.

This always gets me. "safer" doesn't have a consistent meaning. For example, if you were to want to create a "content addressable address" using a hash and it needs to fit inside a 128 bit number (such as a GUID), you may be tempted to take SHA-X and just truncate it. However, this biases the resulting numbers, which this bias may be considered unsafe (such as using it in an A/B testing tool). Just because you have a short hash, doesn't make it "unsafe" as longer hashes can also be considered "unsafe." What people usually mean by this is in the context of encryption, and in those cases it is unsafe, but in the context of non-encryption, usage of truncated larger hashes is just as unsafe.

— Rob

1 year ago by tim@bastelstu.be — view source

unread

Hi

This always gets me. "safer" doesn't have a consistent meaning. For

Yes it does. SHA-256 is safer than MD5. And on modern CPUs with sha_ni
extensions, it's also faster. The following is on a Intel i7-1365U:

$ openssl speed md5 sha1 sha256 sha512
snip
version: 3.0.10
built on: Wed Feb 21 10:45:39 2024 UTC
options: bn(64,64)
compiler: snip
CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c027bc239c27eb
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
md5 114683.10k 286174.51k 550288.90k 715171.50k 783611.22k 788556.46k
sha1 138578.57k 440607.38k 1082163.29k 1674088.45k 2017296.38k 2047377.41k
sha256 150670.11k 460483.71k 1054829.57k 1553830.57k 1807897.94k 1823981.57k
sha512 41246.76k 181566.07k 341457.66k 645468.50k 781042.81k 804296.02k

example, if you were to want to create a "content addressable
address" using a hash and it needs to fit inside a 128 bit number
(such as a GUID), you may be tempted to take SHA-X and just truncate
it. However, this biases the resulting numbers, which this bias may

This is false. For a hash algorithm to be considered cryptographically
secure (which I consider to be a reasonable definition of "safe"), it -
among other properties - needs to have the "avalanche effect" property,
which means that any change in the input is going to affect each output
bit with 50% probability.

This means that for a cryptographic hash algorithm - such as the SHA-2
family - the resulting hash is indistinguishable from uniformly selected
random bits. And this property also holds after truncation - you just
have fewer bits of course.

be considered unsafe (such as using it in an A/B testing tool). Just
because you have a short hash, doesn't make it "unsafe" as longer
hashes can also be considered "unsafe." What people usually mean by
this is in the context of encryption, and in those cases it is
unsafe, but in the context of non-encryption, usage of truncated
larger hashes is just as unsafe.

I'm afraid I don't understand what you are attempting to say here.

Best regards
Tim Düsterhus

1 year ago by Brandon Jackson — view source

unread

Yes it does. SHA-256 is safer than MD5. And on modern CPUs with sha_ni
extensions, it's also faster. The following is on a Intel i7-1365U:

$ openssl speed md5 sha1 sha256 sha512
snip
version: 3.0.10
built on: Wed Feb 21 10:45:39 2024 UTC
options: bn(64,64)
compiler: snip
CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c027bc239c27eb
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
md5 114683.10k 286174.51k 550288.90k 715171.50k 783611.22k 788556.46k
sha1 138578.57k 440607.38k 1082163.29k 1674088.45k 2017296.38k 2047377.41k
sha256 150670.11k 460483.71k 1054829.57k 1553830.57k 1807897.94k 1823981.57k
sha512 41246.76k 181566.07k 341457.66k 645468.50k 781042.81k 804296.02k
Tim Düsterhus

Oh, that's interesting information. Blindly assuming that md5 was
faster than sha256, I did occasionally use md5 for non security
sensitive things like creating hashes used as cache keys or something
similar.

Consider something like:

$cache_key = md5(json_encode([
'query' => "SELECT * FROM books WHERE author = ? LIMIT $offset,$limit",
'params' => $params,
'db' => 'kids_books',
]));

I think that would resolve my last possible reason for continuing to use md5.

1 year ago by Rob Landers — view source

unread

Hi

This always gets me. "safer" doesn't have a consistent meaning. For

Yes it does. SHA-256 is safer than MD5. And on modern CPUs with sha_ni
extensions, it's also faster. The following is on a Intel i7-1365U:

$ openssl speed md5 sha1 sha256 sha512
snip
version: 3.0.10
built on: Wed Feb 21 10:45:39 2024 UTC
options: bn(64,64)
compiler: snip
CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c027bc239c27eb
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
md5 114683.10k 286174.51k 550288.90k 715171.50k 783611.22k 788556.46k
sha1 138578.57k 440607.38k 1082163.29k 1674088.45k 2017296.38k 2047377.41k
sha256 150670.11k 460483.71k 1054829.57k 1553830.57k 1807897.94k 1823981.57k
sha512 41246.76k 181566.07k 341457.66k 645468.50k 781042.81k 804296.02k

example, if you were to want to create a "content addressable
address" using a hash and it needs to fit inside a 128 bit number
(such as a GUID), you may be tempted to take SHA-X and just truncate
it. However, this biases the resulting numbers, which this bias may

This is false. For a hash algorithm to be considered cryptographically
secure (which I consider to be a reasonable definition of "safe"), it -
among other properties - needs to have the "avalanche effect" property,
which means that any change in the input is going to affect each output
bit with 50% probability.

from a practical perspective across hundreds of millions of hashes of unique ids, I can say that there is a practical and detectable bias when truncating sha-256 hashes. Enough that we were having to throw out a/b test results… I’m not going to write a paper on it and I’m not going to bother arguing the point that no hash function is perfect, but I will point out that “theory” and “reality” don’t always agree.

This means that for a cryptographic hash algorithm - such as the SHA-2
family - the resulting hash is indistinguishable from uniformly selected
random bits. And this property also holds after truncation - you just
have fewer bits of course.

See also: https://security.stackexchange.com/a/34797/21705

be considered unsafe (such as using it in an A/B testing tool). Just
because you have a short hash, doesn't make it "unsafe" as longer
hashes can also be considered "unsafe." What people usually mean by
this is in the context of encryption, and in those cases it is
unsafe, but in the context of non-encryption, usage of truncated
larger hashes is just as unsafe.

I'm afraid I don't understand what you are attempting to say here.

Best regards
Tim Düsterhus

— Rob

1 year ago by Rob Landers — view source

unread

Hi

This always gets me. "safer" doesn't have a consistent meaning. For

Yes it does. SHA-256 is safer than MD5. And on modern CPUs with sha_ni
extensions, it's also faster. The following is on a Intel i7-1365U:

$ openssl speed md5 sha1 sha256 sha512
snip
version: 3.0.10
built on: Wed Feb 21 10:45:39 2024 UTC
options: bn(64,64)
compiler: snip
CPUINFO: OPENSSL_ia32cap=0x7ffaf3ffffebffff:0x98c027bc239c27eb
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
md5 114683.10k 286174.51k 550288.90k 715171.50k 783611.22k 788556.46k
sha1 138578.57k 440607.38k 1082163.29k 1674088.45k 2017296.38k 2047377.41k
sha256 150670.11k 460483.71k 1054829.57k 1553830.57k 1807897.94k 1823981.57k
sha512 41246.76k 181566.07k 341457.66k 645468.50k 781042.81k 804296.02k

example, if you were to want to create a "content addressable
address" using a hash and it needs to fit inside a 128 bit number
(such as a GUID), you may be tempted to take SHA-X and just truncate
it. However, this biases the resulting numbers, which this bias may

This is false. For a hash algorithm to be considered cryptographically
secure (which I consider to be a reasonable definition of "safe"), it -
among other properties - needs to have the "avalanche effect" property,
which means that any change in the input is going to affect each output
bit with 50% probability.

from a practical perspective across hundreds of millions of hashes of unique ids, I can say that there is a practical and detectable bias when truncating sha-256 hashes. Enough that we were having to throw out a/b test results… I’m not going to write a paper on it and I’m not going to bother arguing the point that no hash function is perfect, but I will point out that “theory” and “reality” don’t always agree.

I have been corrected. The issue was due to a modulus causing the bias deeper in the code.

— Rob

1 year ago by Christoph M. Becker — view source

unread

On Tuesday, 2 July 2024 at 10:52, Juliette Reinders Folmer
php-internals_nospam@adviesenzo.nl wrote:

Other than that, I join the previously voiced objections to the
deprecation of uniqid(), md5(), sha1(), md5_file(),
sha1_file().
While I acknowledge that these functions can be used
inappropriately for security-sensitive code, which should use
alternative methods, these functions have perfectly valid use-cases
for non-security-sensitive code and the impact of the BC-break of
deprecating and eventually removing these methods can, IMO, not be
justified.
Keep in mind that while "we" know and understand that deprecations
are not errors, end-users often don't and particularly for open
source projects, this means that in practice these deprecations will
need to be addressed anyway to reduce the noise of users opening
issues about them, which without a clear path to removal of the
functions, will, in a lot of cases, mean adding the @ operator to
all uses.

If I may be a bit cheeky, if we consider that userland does not
understand that deprecations are not errors, how can we trust them to
use the 5 aforementioned functions correctly?
Especially as there are more appropriate replacements available.

There is a difference between "userland" (dev-users) and end-users. I
was talking about end-users, while based on your remark, you are talking
about dev-users.

To clarify, by end-users you are referring to users who install and
"operate" (open-source) software on (shared) hosting. If so, indeed
they may stumble upon deprecation notices, and from my (limited)
experience, they will report that as issue, unless the software
developers release a new version which does not trigger these
deprecation notices. This is unfortunate, but I really do hope that
these developers only use the shut-up operator when all else fails, and
that they remove it as soon as possible. Yeah, even more work, but that
is what you are sometimes not paid for. ;)

I also don't agree that there are "more appropriate replacements
available".
The suggested hash() replacements for the md5/sha1* functions have
the exact same functionality, which the RFC considers "incorrect use",
so what are we actually solving by this deprecation ? Devs not having
enough to do already ?
The problem (for open source) with "force-replacing" the uses of
md5/sha1* functions with the hash function calls, is that the hash
extension was not part of PHP core until PHP 7.4, which means that for a
significant number of open source projects, the replacement is not a
one-on-one function call replacement, but needs guard code for PHP < 7.4
in case the hash extension is not available.

Well, I don't think it's hard to deal with deprecations for which
alternatives are easily available. Just replace all e.g. md5() calls
with something namespaced, and define that function depending on the PHP
version.

With regard to md5() and sha1() my first though was that we easily could
keep them as aliases. However, the RFC explains that it might be a good
idea to reconsider the use cases, and that is a good idea, in my opinion.

I do not, however, agree with the reasoning that a function (like
uniqid()) is often used in a unsafe way (i.e. for purposes it has not
been designed), and therefore should be deprecated/removed. There are
likely a couple of developers who are easily rolling their own
implementation which can be way worse. I've seen "encryption" code
which was basically a Caesar cipher, spiced with some obsure function
calls to make it "even more safe". And I've seen obscure HTML escaping
code with an not so obvious back-door, that was once available as user
note on php.net.

That doesn't mean that I'm against the uniqid() deprecation, especially
if the deprecation message is clear on what to use instead.

Cheers,
Christoph

1 year ago by tim@bastelstu.be — view source

unread

Hi

I do not, however, agree with the reasoning that a function (like
uniqid()) is often used in a unsafe way (i.e. for purposes it has not
been designed), and therefore should be deprecated/removed. There are
likely a couple of developers who are easily rolling their own
implementation which can be way worse. I've seen "encryption" code

That's effectively what's already being done, with complex, but
meaningless, constructions such as sha1(uniqid(microtime(true), true).rand()). I probably did so myself in the past, back when I didn't
know better and followed tutorials that also didn't know better.

I agree that it helps no one if the cure is worse than the disease.

That's why I strongly believe in misuse-resistant APIs: The secure
choice should also be the default choice and also the easiest choice.
The 'password_*' API is an great example for that. It has a few
functions that do exactly what they say on the tine. You basically
cannot use it incorrectly (except by not using it).

The old procedural API to use randomness is not such an example [1]. The
uniqid() function is the most obvious choice to generate a unique
string and when using it, the output also looks "random". But in reality
it does not guarantee that the output is unique or unguessable - making
the function name a lie.

It's almost always the wrong choice and I expect the type of developer
that is able to use it safely to be able to write their own
domain-specific formatter for the current time. All other users would be
better suited by choosing something else, such as the
bin2hex(random_bytes(16)) construction that is supported since 7.0 and
even longer with ParagonIE's polyfill [2]. Yes, it is not as terse as a
call to uniqid(), but it is nicely explicit in what the output will
look like and what the security properties are.

which was basically a Caesar cipher, spiced with some obsure function
calls to make it "even more safe". And I've seen obscure HTML escaping
code with an not so obvious back-door, that was once available as user
note on php.net.

That doesn't mean that I'm against the uniqid() deprecation, especially
if the deprecation message is clear on what to use instead.

I will make sure to write useful migration docs, helping users making an
educated choice for an alternative. Unfortunately is no
one-size-fits-all solution to the problem of generating an unique string.

Best regards
Tim Düsterhus

[1] This includes uniqid(), rand(), mt_rand(), lcg_value().
random_bytes() and random_int() are fine.
[2] https://github.com/paragonie/random_compat

1 year ago by Christoph M. Becker — view source

unread

Hi Tim!

That doesn't mean that I'm against the uniqid() deprecation, especially
if the deprecation message is clear on what to use instead.

I will make sure to write useful migration docs, helping users making an
educated choice for an alternative. Unfortunately is no
one-size-fits-all solution to the problem of generating an unique string.

See also a respective GH issue regarding deprecation messages:
https://github.com/php/php-src/issues/14320; it probably makes sense
to include an URL in the deprecation message, or maybe some code which
users can use to look up more thorough information about the deprecation.

Cheers,
Christoph

1 year ago by tim@bastelstu.be — view source

unread

Hi

See also a respective GH issue regarding deprecation messages:
https://github.com/php/php-src/issues/14320; it probably makes sense
to include an URL in the deprecation message, or maybe some code which
users can use to look up more thorough information about the deprecation.

To loop the list back in: I've replied to that issue that as part of the
#[\Deprecated] RFC implementation new deprecation messages were added to
(almost) all functions that are currently deprecated:

https://github.com/php/php-src/pull/14750

In case of utf8_decode() / utf8_encode() the message just points towards
the documentation, but there is an extensive explanation discussing the
possible alternatives, for the others the message already points out the
alternatives or explains the deprecation in another way (e.g. for the
*_free() functions that became obsolete when moving from resources to
objects).

For uniqid() there are already various warnings in the documentation and
today a PR was merged to adjust the description text to avoid using the
word "unique", because it was factually untrue:

https://github.com/php/doc-en/pull/3571

If the deprecation is accepted, then the list of possible alternatives
that are already mentioned in the RFC can be included in the documentation.

Best regards
Tim Düsterhus

1 year ago by Gina P. Banyard — view source

unread

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

I have added a section to deprecate the SOAP_FUNCTIONS_ALL constant:
https://wiki.php.net/rfc/deprecations_php_8_4#deprecate_soap_functions_all_constant_and_passing_it_to_soapserveraddfunction

Best regards,

Gina P. Banyard

1 year ago by Claude Pache — view source

unread

Le 25 juin 2024 à 16:36, Gina P. Banyard internals@gpb.moe a écrit :

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

Hi,

For each deprecation, it would be nice to provide explicitly the text of the deprecation notice so that we can guarantee that it will be helpful for users, see https://github.com/php/php-src/issues/14320
I don’t see the point of deprecating DOMImplementation::getFeature() instead of just removing it? “DOMImplementation::getFeature() is deprecated, throw manually an Error exception instead.”
About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(...) and strcspn(...) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC

—Claude

1 year ago by Mike Schinkel — view source

unread

Le 25 juin 2024 à 16:36, Gina P. Banyard internals@gpb.moe a écrit :
https://wiki.php.net/rfc/deprecations_php_8_4

About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(...) and strcspn(...) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC https://3v4l.org/926tC
Well your modern_strtok() function is not an exact replacement as it requires using a generator and thus forces the restructure of the code that calls strtok().

So not a drop-in — search-and-replace — replacement for strtok(). But it is a reasonable replacement for those who are motivated to do the restructure.

========

Just out a curiosity for the performance of your modern_strtok() function so I benchmarked it and found it takes — on rough average — about ~2.5 times as long to run compared to when using strtok():

https://3v4l.org/AMECf#v8.3.9 https://3v4l.org/AMECf#v8.3.9

That makes yours the fastest alternative I have benchmarked, but significantly still slower than strtok().

I was curious to see if I could improve its performance by avoiding the generator, but that just made it slightly worse, e.g. taking — on rough average — ~2.75 times as long to run as strtok():

https://3v4l.org/ZVS5Md#v8.3.9 https://3v4l.org/ZVS5Md#v8.3.9

#fwiw

-Mike

1 year ago by drealecs@gmail.com — view source

unread

Le 25 juin 2024 à 16:36, Gina P. Banyard internals@gpb.moe a écrit :
https://wiki.php.net/rfc/deprecations_php_8_4

About strtok(): An exact replacement of strtok() that is reasonably
performant may be constructed with a sequence of strspn(...) and
strcspn(...) calls; here is an implementation using a generator in order to
keep the state: https://3v4l.org/926tC

Well your modern_strtok() function is not an exact replacement as it
requires using a generator and thus forces the restructure of the code that
calls strtok().

So not a drop-in — search-and-replace — replacement for strtok(). But it
is a reasonable replacement for those who are motivated to do the
restructure.

I looked a bit into this and, taking the idea further, let's also consider
defining a StringTokenizer class:

class StringTokenizer {
private \Generator $tokenGenerator;
public function __construct(public readonly string $string) {
}

public function nextToken(string $characters): string|null {
    if (!isset($this->tokenGenerator)) {
        $this->tokenGenerator = $this->generator($characters);
        return $this->tokenGenerator->current();
    }
    return $this->tokenGenerator->send($characters);
}

private function generator(string $characters): \Generator {
    $pos = 0;
    while (true) {
        $pos += \strspn($this->string, $characters, $pos);
        $len = \strcspn($this->string, $characters, $pos);
        if (!$len)
            return;
        $token = \substr($this->string, $pos, $len);
        $characters = yield $token;
        $pos += $len;
    }
}

}

and if we define a wrapper function:

I think that this might be a perfect replacement.

If we want, we could implement the StringTokenizer in the core, so that it
would be a nice replacement.

If we don't want to do this at this stage, we can completely avoid the
class for now, using an anonymous class:

What do you think?
Mike, would you mind benchmarking this as well to make sure it's similarly
fast with the initial suggestion from Claude?

I'm hoping this can be simplified further, but to get to the point, I also
think we should have a userland replacement suggestion in the RFC.
And, ideally, we should have a class that can replace it in PHP 9.0,
similar to the above StringTokenizer.

Regards,
Alex

1 year ago by drealecs@gmail.com — view source

unread

On Mon, Jul 8, 2024 at 6:43 PM Alexandru Pătrănescu drealecs@gmail.com
wrote:

I'm hoping this can be simplified further, but to get to the point, I also
think we should have a userland replacement suggestion in the RFC.

Managed to simplify it like this and I find it reasonable enough:

Alex

1 year ago by Mike Schinkel — view source

unread

Managed to simplify it like this and I find it reasonable enough:
function strtok2(string $string, ?string $token = null): string|false {
static $tokenGenerator = null;
if ($token) {
$tokenGenerator = (function(string $characters) use ($string): \Generator {
$pos = 0;
while (true) {
$pos += \strspn($string, $characters, $pos);
$len = \strcspn($string, $characters, $pos);
if ($len === 0)
return;
$token = \substr($string, $pos, $len);
$characters = yield $token;
$pos += $len;
}
})($token);
return $tokenGenerator->current() ?? false;
}
return $tokenGenerator?->send($string) ?? false;
}
Hi Alexandru,

Great attempt.

Unfortunately, however, it seems around 4.5 slower than strtok():

https://3v4l.org/7lXlM#v8.3.9 https://3v4l.org/7lXlM#v8.3.9

Le 6 juil. 2024 à 03:22, Mike Schinkel mike@newclarity.net a écrit :

About strtok(): An exact replacement of strtok() that is reasonably performant may be constructed with a sequence of strspn(...) and strcspn(...) calls; here is an implementation using a generator in order to keep the state: https://3v4l.org/926tC https://3v4l.org/926tC
Well your modern_strtok() function is not an exact replacement as it requires using a generator and thus forces the restructure of the code that calls strtok().

Yes, of course, I meant: it has the exact same semantics. You cannot have the same API without keeping global state somewhere. If you use strtok() for what it was meant for, you must restructure your code if you want to eliminate hidden global state.

Hi Claude,

Agreed that semantics would have to change. Somewhat.

The reason I made the comment was when I saw you stated it was an "exact replacement" I was concern people not paying close attention to the thread may see it and and think: "Oh, okay, there is an exact, drop-in replacement so I will vote to deprecate" when that same person might not vote to deprecate if they did not think there was an exact drop-in replacement. But I did my best to try to soften my words so it did not come off as accusatory and instead just matter-of-fact. If I failed at that, I apologize.

Anyway, your comments about needing to change the semantics got me thinking that addressing the concern when remediating code with strtok() could be much closer to a drop in replacement than a generator, assuming there is a will to actually tackle this. And this it small enough scope that I might even be able to learn enough C-for-PHP to create a pull request, if the idea were blessed.

Consider this simple code for using strtok():

$token = strtok($content, ',');
while ($token !== false) {
$token = strtok (',');
}

Now compare to this potential enhancement:

$handle=strtok($content, ',', STRTOK_INIT);
do {
$token = strtok($handle);
} while ($token !== false);
strtok($handle, STRTOK_RELEASE)

This would be much closer to a drop-in replacement, would allow PHP to keep the fast strtok() function, AND would address the reason for deprecation.

See any reason this approach would not be viable?

-Mike

1 year ago by Claude Pache — view source

unread

1 year ago by Gina P. Banyard — view source

unread

For each deprecation, it would be nice to provide explicitly the text of the deprecation notice so that we can guarantee that it will be helpful for users, see https://github.com/php/php-src/issues/14320

Considering that until recently, [1] there was no way to provide a message for functions that were deprecated, this was not a consideration.
We are adding more useful messages for prior deprecations in a PR right now, [2] but the format of it hasn't been finalized yet, thus I don't think adding it to the RFC at this point is useful.
This is something to take into account for next year's RFC, but the implementation of those deprecation is expected to have more useful messages.

I don’t see the point of deprecating DOMImplementation::getFeature() instead of just removing it? “DOMImplementation::getFeature() is deprecated, throw manually an Error exception instead.”

Just removing it makes sense, I'll talk to Niels about it, to change what the vote is actually about.

Best regards,
Gina P. Banyard
[1] https://wiki.php.net/rfc/deprecated_attribute
[2] https://github.com/php/php-src/pull/14750

1 year ago by Niels Dossche — view source

unread

I don’t see the point of deprecating DOMImplementation::getFeature() instead of just removing it? “DOMImplementation::getFeature() is deprecated, throw manually an Error exception instead.”

Just removing it makes sense, I'll talk to Niels about it, to change what the vote is actually about.

The reason I put it on deprecation instead of removal was because I thought we always had to go through deprecation first before we could remove things.
Gina and I talked and we concluded that we could just remove it indeed, as this method always threw an exception anyway due to it being unimplemented.
I'll update the RFC.

Kind regards
Niels

1 year ago by Gina P. Banyard — view source

unread

Hello internals,

It is this time of year again where we proposed a list of deprecations to add in PHP 8.4:

https://wiki.php.net/rfc/deprecations_php_8_4

As a reminder, this list has been compiled over the course of the past year by various different people.

And as usual, each deprecation will be voted in isolation.

We still have a bit of time buffer, so if anyone else has any suggestions, they are free to add them to the RFC.

Some should be non-controversial, others a bit more.
If such, they might warrant their own dedicated RFC, or be dropped from the proposal altogether.

Best regards,

Gina P. Banyard

Hello internals,

It's been a bit over 3 weeks since the discussion started, and I intend to open the vote tomorrow.

Best regards,

Gina P. Banyard

[RFC] Deprecations for PHP 8.4

strtok()

md5()/md5_file()

sha1()/sha1_file()

strtok()

strtok()

strtok()

`strtok()`

`md5()`/md5_file()

`sha1()`/sha1_file()

`strtok()`

`strtok()`

`strtok()`