Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127518 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id 98E631A00BC for ; Sat, 31 May 2025 16:04:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1748707324; bh=1J3rF0FG/wKILxMqiMXNsTyyDdXz/QpxXMwilACsw0g=; h=Date:From:To:Subject:In-Reply-To:References:From; b=jH2iKLp8g1wcoIwTvkfhMiVE4qLgAMcImZVzgTKpHMgJ1dBvjh3uTsXPmC1nUGokd DBNcHpQJyDdTBPheMoD0TZ3NX9/h4NB6lwQY+TCAhmteEVwwjYMMiXwwjBRzLRqa0d db86v485C+1Fdkob4uyM6g5tt8xm9vhM6tFmJgYPEgd98Vftxgp10oebFsEdkmJk5J VeRW9xlr00KECzxePP/U8p3k+K9fmUw46Bp0VNBRZuWmfXIFCBuO++EHugL8yevOev v0Sx+UBl6o7zBeREH9l5DwfhmiiDh4jEI+XmmBih2nyP6vcR2DKyS+r0xyw1kSPi/V ZjgUJFc+0NYmQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5B56D1801C7 for ; Sat, 31 May 2025 16:02:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: *** X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,SPF_HELO_PASS, SPF_SOFTFAIL autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from xdebug.org (xdebug.org [82.113.146.227]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 31 May 2025 16:02:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1748707447; bh=1J3rF0FG/wKILxMqiMXNsTyyDdXz/QpxXMwilACsw0g=; h=Date:From:To:Subject:In-Reply-To:References:From; b=Tmk/6u66pmxVII95qh2Elh5roeiQtScUFCBbydf6GOVuMrhiOl+2rfPIPuR7IMjmV /SP7c+oxBGdAeQeSFxr6+RapObhNaYUEo4s8Cx8CuHPnkzW7IrhgttjOzlQlEtA8jQ 57G/41tEAU2KIx7CuLA8P1c99iWegW8/MG74ZMNaHttXVL8sl+tl+NiS9zkrU//wCW +I5YHfiJoFn6n4Gdx0pGnh1v+oogNSs60JPExcre7kdk903DltP2ktDw3ZF2bU472X Z6svECXaD3Y3LyKqHI6yUCC9RNqwxGvmhcSELYCGH5wIVY22C3MPL+RTp66aNpCPai /QlnwQh+OkYRQ== Received: from [127.0.0.1] (unknown [157.125.108.230]) by xdebug.org (Postfix) with ESMTPSA id 99CB310C051; Sat, 31 May 2025 17:04:07 +0100 (BST) Date: Sat, 31 May 2025 17:04:06 +0100 To: internals@lists.php.net Subject: Re: [PHP-DEV] Adding in a case-insensitive version of str_contains User-Agent: K-9 Mail for Android In-Reply-To: References: <2a626b9f-292b-4fc2-a023-0b0db9a64ede@app.fastmail.com> Message-ID: <20AFBEA9-C3DA-4F6A-BD0A-112F86064316@php.net> Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: derick@php.net (Derick Rethans) On 31 May 2025 12:36:52 BST, youkidearitai wrot= e: >2025=E5=B9=B45=E6=9C=8831=E6=97=A5(=E5=9C=9F) 19:41 Nikita Popov : >> >> On Thu, May 29, 2025, at 23:00, Kamil Tekiela wrote: >> >> As I understand, it was a conscious decision not to add this function >> when str_contains was created=2E The reason is that case sensitivity is >> locale-dependent, and for such use cases, mbstring extension is better >> [1] & [2]=2E Do you think that locale is a concern here, and if not, >> why? Would it be a good idea to add mb_str_icontains instead? >> >> If you're going to propose an RFC for this, it would be a good idea to >> explain what the real life use case for it is=2E While str_contains is >> very useful for checking the existence of a byte-string within another >> byte-string, a case-sensitive check doesn't seem to have much use=2E >> >> [1]: https://stackoverflow=2Ecom/a/63121809/1839439 >> [2]: https://wiki=2Ephp=2Enet/rfc/str_contains#case-insensitivity_and_m= ultibyte_strings >> >> >> To make it a bit more explicit: The proposed str_icontains function doe= s not support UTF-8=2E It would only be case-insensitive on ASCII character= s=2E Do we really want to add new functions that do not properly handle UTF= -8? >> >> I think that thanks to https://wiki=2Ephp=2Enet/rfc/strtolower-ascii (w= hich removed C locale support from this family of functions), there actuall= y is a pretty viable way forward to make the non-mbstring case-insensitive = string functions useful again: Make them work on UTF-8=2E (In the sense of = using Unicode case folding and case mapping on UTF-8, while still returning= code unit offsets=2E This would make them superior to both the current str= i* functions, and the mb_stri* functions=2E) >> > >I agree that it's important to think about it in UTF-8=2E > >I think about UTF-8 support case folding function in past few days=2E >Maybe=2E=2E=2E It is like below? > >``` >grapheme_setlocale($locale); >grapheme_icontains($haystack, $needle); >``` > >First, grapheme_* function supports locale=2E >Second, add grapheme_icontains function for case insensitive version >for str_contains=2E =2E I don't think it's a good idea to rely on a global state containing the lo= cale=2E cheers Derick