Adding in a case-insensitive version of str_contains

2 months ago by Adam Cable — view source

unread

As a PHP developer of 20 years I'm somewhat accustomed to adding "i" into
the function name when I'm after a case-insensitive version.

Therefore, I found it a bit odd I couldn't do it with the new
"str_contains" function, so have built a basic ci version @
https://github.com/php/php-src/pull/18705

I appreciate that there may be a bit of negative reaction about adding such
a basic function that can so easily be written in user-land. But for me, as
an average PHP coder, I think it's useful to try and fill in gaps which can
cause head-scratching.

Thanks,
Adam

2 months ago by Kamil Tekiela — view source

unread

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
1 & 2. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea to
explain what the real life use case for it is. While str_contains is
very useful for checking the existence of a byte-string within another
byte-string, a case-sensitive check doesn't seem to have much use.

2 months ago by Derick Rethans — view source

unread

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
[1] & [2]. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

mbstring doesn't deal with, or handle, locales, only charactersets. So that wouldn't be a good fit either.

There is grapheme_stripos, but that also doesn't do it locale dependent: https://www.php.net/manual/en/function.grapheme-stripos.php

PHP could really do with a locale aware, grapheme aware, set of Text utilities.

I have a prototype at https://github.com/derickr/php-text where this suggested function also would fit in.

cheers
Derick

2 months ago by Aleksander Machniak — view source

unread

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
[1] & [2]. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

stripos is not locale dependent, str_icontains should use the same
case-folding rules and we're good and consistent. I'll vote Yes.

--
Aleksander Machniak
Kolab Groupware Developer [https://kolab.org]
Roundcube Webmail Developer [https://roundcube.net]

PGP: 19359DC1 # Blog: https://kolabian.wordpress.com

2 months ago by Adam Cable — view source

unread

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
1 & 2. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea to
explain what the real life use case for it is. While str_contains is
very useful for checking the existence of a byte-string within another
byte-string, a case-sensitive check doesn't seem to have much use.

Thanks for this.

In terms of real-life cases, we deal with a lot of datafeeds and
user-contributed content.
We have lots of rules engines that categorise or display data depending on
the content contained (and sometimes these rules change from time to time).
So for example, we have a rule that if the string contains the word
"exclusive" that it's displayed in a certain way.
We can add a rule that says if (str_contains($text, "exclusive"))... and if
want to include the ability for sentences to start with the word
"Exclusive" we currently have to write like this (as we like functions to
be truthy)... stripos(" ".$text, $exclusive) OR
str_contains(strtolower($text), "exclusive")

I'd just find it useful to have str_icontains/stri_contains available as
the same ASCII-folding variant that we have for other functions.

2 months ago by youkidearitai — view source

unread

2025年5月30日(金) 16:20 Adam Cable adamcable@gmail.com:

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
1 & 2. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea to
explain what the real life use case for it is. While str_contains is
very useful for checking the existence of a byte-string within another
byte-string, a case-sensitive check doesn't seem to have much use.

Thanks for this.

In terms of real-life cases, we deal with a lot of datafeeds and user-contributed content.
We have lots of rules engines that categorise or display data depending on the content contained (and sometimes these rules change from time to time).
So for example, we have a rule that if the string contains the word "exclusive" that it's displayed in a certain way.
We can add a rule that says if (str_contains($text, "exclusive"))... and if want to include the ability for sentences to start with the word "Exclusive" we currently have to write like this (as we like functions to be truthy)... stripos(" ".$text, $exclusive) OR str_contains(strtolower($text), "exclusive")

I'd just find it useful to have str_icontains/stri_contains available as the same ASCII-folding variant that we have for other functions.

Surely, mbstring does not locale-dependency.
And grapheme function also can not specify locale. (Default is no
specify the locale)
Perhaps, does grapheme function need to specify the locale?

I thought so when I implemented the grapheme function.

Regards
Yuya

--

Yuya Hamada (tekimen)

2 months ago by Nikita Popov — view source

unread

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
1 & 2. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea to
explain what the real life use case for it is. While str_contains is
very useful for checking the existence of a byte-string within another
byte-string, a case-sensitive check doesn't seem to have much use.

To make it a bit more explicit: The proposed str_icontains function does not support UTF-8. It would only be case-insensitive on ASCII characters. Do we really want to add new functions that do not properly handle UTF-8?

I think that thanks to https://wiki.php.net/rfc/strtolower-ascii (which removed C locale support from this family of functions), there actually is a pretty viable way forward to make the non-mbstring case-insensitive string functions useful again: Make them work on UTF-8. (In the sense of using Unicode case folding and case mapping on UTF-8, while still returning code unit offsets. This would make them superior to both the current stri* functions, and the mb_stri* functions.)

Regards,
Nikita

2 months ago by youkidearitai — view source

unread

2025年5月31日(土) 19:41 Nikita Popov php@npopov.com:

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
1 & 2. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea to
explain what the real life use case for it is. While str_contains is
very useful for checking the existence of a byte-string within another
byte-string, a case-sensitive check doesn't seem to have much use.

To make it a bit more explicit: The proposed str_icontains function does not support UTF-8. It would only be case-insensitive on ASCII characters. Do we really want to add new functions that do not properly handle UTF-8?

I think that thanks to https://wiki.php.net/rfc/strtolower-ascii (which removed C locale support from this family of functions), there actually is a pretty viable way forward to make the non-mbstring case-insensitive string functions useful again: Make them work on UTF-8. (In the sense of using Unicode case folding and case mapping on UTF-8, while still returning code unit offsets. This would make them superior to both the current stri* functions, and the mb_stri* functions.)

Regards,
Nikita

Hi,

I agree that it's important to think about it in UTF-8.

I think about UTF-8 support case folding function in past few days.
Maybe... It is like below?

grapheme_setlocale($locale);
grapheme_icontains($haystack, $needle);

First, grapheme_* function supports locale.
Second, add grapheme_icontains function for case insensitive version
for str_contains. .

What do you think?

Regards
Yuya

--

Yuya Hamada (tekimen)

2 months ago by Derick Rethans — view source

unread

2025年5月31日(土) 19:41 Nikita Popov php@npopov.com:

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
1 & 2. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea to
explain what the real life use case for it is. While str_contains is
very useful for checking the existence of a byte-string within another
byte-string, a case-sensitive check doesn't seem to have much use.

To make it a bit more explicit: The proposed str_icontains function does not support UTF-8. It would only be case-insensitive on ASCII characters. Do we really want to add new functions that do not properly handle UTF-8?

I think that thanks to https://wiki.php.net/rfc/strtolower-ascii (which removed C locale support from this family of functions), there actually is a pretty viable way forward to make the non-mbstring case-insensitive string functions useful again: Make them work on UTF-8. (In the sense of using Unicode case folding and case mapping on UTF-8, while still returning code unit offsets. This would make them superior to both the current stri* functions, and the mb_stri* functions.)

I agree that it's important to think about it in UTF-8.

I think about UTF-8 support case folding function in past few days.
Maybe... It is like below?
grapheme_setlocale($locale);
grapheme_icontains($haystack, $needle);
First, grapheme_* function supports locale.
Second, add grapheme_icontains function for case insensitive version
for str_contains. .

I don't think it's a good idea to rely on a global state containing the locale.

cheers
Derick

2 months ago by youkidearitai — view source

unread

2025年6月1日(日) 1:07 Derick Rethans derick@php.net:

2025年5月31日(土) 19:41 Nikita Popov php@npopov.com:

As I understand, it was a conscious decision not to add this function
when str_contains was created. The reason is that case sensitivity is
locale-dependent, and for such use cases, mbstring extension is better
1 & 2. Do you think that locale is a concern here, and if not,
why? Would it be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea to
explain what the real life use case for it is. While str_contains is
very useful for checking the existence of a byte-string within another
byte-string, a case-sensitive check doesn't seem to have much use.

To make it a bit more explicit: The proposed str_icontains function does not support UTF-8. It would only be case-insensitive on ASCII characters. Do we really want to add new functions that do not properly handle UTF-8?

I think that thanks to https://wiki.php.net/rfc/strtolower-ascii (which removed C locale support from this family of functions), there actually is a pretty viable way forward to make the non-mbstring case-insensitive string functions useful again: Make them work on UTF-8. (In the sense of using Unicode case folding and case mapping on UTF-8, while still returning code unit offsets. This would make them superior to both the current stri* functions, and the mb_stri* functions.)

I agree that it's important to think about it in UTF-8.

I think about UTF-8 support case folding function in past few days.
Maybe... It is like below?
grapheme_setlocale($locale);
grapheme_icontains($haystack, $needle);
First, grapheme_* function supports locale.
Second, add grapheme_icontains function for case insensitive version
for str_contains. .
I don't think it's a good idea to rely on a global state containing the locale.

cheers
Derick

Hi, Derick (and Internals)

Thank you your feedback.
Well, Then I could find two ways.

First, grapheme_* functions add $locale parameter. For example in
grapheme_strpos.

 grapheme_strpos(string $haystack, string $needle, int $offset = 0,
string $locale): int|false

Second, Contain a locale in object instance (But I can't find just object).

By the way, intl is already exists Locale class.
https://www.php.net/manual/en/class.locale.php
But, it is not seems use anymore.

Regards
Yuya

--

Yuya Hamada (tekimen)

Adding in a case-insensitive version of str_contains

-- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net]

--

--

--

--
Aleksander Machniak
Kolab Groupware Developer [https://kolab.org]
Roundcube Webmail Developer [https://roundcube.net]