Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127515 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id C5D3A1A00BC for ; Sat, 31 May 2025 10:37:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1748687708; bh=nf/L48RpbvQRHS4vOVBrQLlj7H3UP/iYgd3WYXq8sjk=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From; b=NdYE0BK7hG+lxUoArV6OjoOR6xAnB/DsaeqLy4gNt8ePd89k0WaVcm/jQkulTrSD9 0JYwKIsT5EBESVHTcycOo0upIZPT9Xa7pqshSkhZENT61JLNtKY0uefowAog5Smv6D 01p7tdcWvYepqeU2+th4gPQQp9BGuNL9GCx82pPoeJflu4WC7KT+5xzcsPeA8z4QkG 3Cb4TFKzZ3JmiDfGkpcJZDypDJNddXD1o7l22jyVJy+nROUUH1UPWmsSV04ZB2yHba vOKb0s8REw3u7vo1C6XDppw/UBDOcejYJ3ZA0+gXYn5gevDTCbG1SwUzkOyFimzdQL 4XtuyBdUA5iBw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6009518004C for ; Sat, 31 May 2025 10:35:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 31 May 2025 10:35:07 +0000 (UTC) Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfhigh.stl.internal (Postfix) with ESMTP id B7498254010A; Sat, 31 May 2025 06:37:11 -0400 (EDT) Received: from phl-imap-07 ([10.202.2.97]) by phl-compute-06.internal (MEProxy); Sat, 31 May 2025 06:37:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=npopov.com; h=cc :cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1748687831; x=1748774231; bh=fjBgKj1lj9 WIu755j1PLkXVjNRbWFxcuDSf8A6SanF8=; b=WecVMaxK1nxffN9RfWFYXbPlGx j9RtLH85tddOU4MxQYeHi2Lg2CfCZIT/DlHQcICm3r9IFolstXWOgEZOa346b+2n 6QIx2mQ3dRJsl4awR9N/Stn22VFLBSsett7MCArtS6P6CE1t08h9kLk4CaLakZuh 0f/hfgsPQzSOICp5dh9eHMtlBSGGwrt9r6yRHmlaIn8BvJ6LDTW0ZPm5Xu9yecJr spzmhSxVbmXqx3KkRl/i1dtcEfvlHG76tuvR1O3Eos3wPzqVGfcVynmge5Ajtz6n qLHkAlVSjZWSp3vhRO2xYC5/QyPTACFSVJVQhmBm1Br63H/3Gel185NbxKbA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1748687831; x=1748774231; bh=fjBgKj1lj9WIu755j1PLkXVjNRbWFxcuDSf 8A6SanF8=; b=Gu9Ngpvk6tJprDem87dNK5voZE+C3IsZExB1kWnq0S08+K6cmVK KJ9XhfSJICvoMI59O4iv9VGNKX5vlweBH/BMrcz8KMq08G319DkyodeoXzZXZPAY UhGJTh3vzji1OFSCKuhz7rY5+kQ72LtkkwNoS02Kz5lOjYHBmz7qtT7bFGymhzSW XS461shsP1942ww8HqiT9THD8udvF7pC9SpQTT2ci9gV6sCVHlzEFV8oaP3UhRWD Lt+6eb50e9D7FoBOgaCK4Z+rEe0mCEXfFU5msSZ6FPy69gYr2v9H1MyhB/0R2fMe fB6ev4gYKJdWIgqd11hP8BO/p3sE19cSdYg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtddtgdefudeijeculddtuddrgeefvddrtd dtmdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggft fghnshhusghstghrihgsvgdpuffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftd dtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefoggffhffvvefk jghfufgtsegrtderreertddtnecuhfhrohhmpedfpfhikhhithgrucfrohhpohhvfdcuoe hphhhpsehnphhophhovhdrtghomheqnecuggftrfgrthhtvghrnhepfeetteeutefhledt geettedttdelheelgeehieeggfejtdehteeiheeuheefhefgnecuffhomhgrihhnpehsth grtghkohhvvghrfhhlohifrdgtohhmpdhphhhprdhnvghtnecuvehluhhsthgvrhfuihii vgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhhphesnhhpohhpohhvrdgtohhmpd hnsggprhgtphhtthhopeefpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopegruggr mhgtrggslhgvsehgmhgrihhlrdgtohhmpdhrtghpthhtohepthgvkhhivghlrgdvgeeise hgmhgrihhlrdgtohhmpdhrtghpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrphhh phdrnhgvth X-ME-Proxy: Feedback-ID: i1199467b:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id CA9191EA0061; Sat, 31 May 2025 06:37:10 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 X-ThreadId: T4870d121154c49fd Date: Sat, 31 May 2025 12:34:58 +0200 To: "Kamil Tekiela" , "Adam Cable" Cc: "Levi Morrison" Message-ID: <2a626b9f-292b-4fc2-a023-0b0db9a64ede@app.fastmail.com> In-Reply-To: References: Subject: Re: [PHP-DEV] Adding in a case-insensitive version of str_contains Content-Type: multipart/alternative; boundary=0088f4d3b94f435ab0847de062a59b1c From: php@npopov.com ("Nikita Popov") --0088f4d3b94f435ab0847de062a59b1c Content-Type: text/plain Content-Transfer-Encoding: 7bit On Thu, May 29, 2025, at 23:00, Kamil Tekiela wrote: > As I understand, it was a conscious decision not to add this function > when str_contains was created. The reason is that case sensitivity is > locale-dependent, and for such use cases, mbstring extension is better > [1] & [2]. Do you think that locale is a concern here, and if not, > why? Would it be a good idea to add mb_str_icontains instead? > > If you're going to propose an RFC for this, it would be a good idea to > explain what the real life use case for it is. While str_contains is > very useful for checking the existence of a byte-string within another > byte-string, a case-sensitive check doesn't seem to have much use. > > [1]: https://stackoverflow.com/a/63121809/1839439 > [2]: https://wiki.php.net/rfc/str_contains#case-insensitivity_and_multibyte_strings To make it a bit more explicit: The proposed str_icontains function does not support UTF-8. It would only be case-insensitive on ASCII characters. Do we really want to add new functions that do not properly handle UTF-8? I think that thanks to https://wiki.php.net/rfc/strtolower-ascii (which removed C locale support from this family of functions), there actually is a pretty viable way forward to make the non-mbstring case-insensitive string functions useful again: Make them work on UTF-8. (In the sense of using Unicode case folding and case mapping on UTF-8, while still returning code unit offsets. This would make them superior to both the current stri* functions, and the mb_stri* functions.) Regards, Nikita --0088f4d3b94f435ab0847de062a59b1c Content-Type: text/html Content-Transfer-Encoding: quoted-printable
On Thu, May = 29, 2025, at 23:00, Kamil Tekiela wrote:
As I understand, it was a conscious decision n= ot to add this function
when str_contains was created. The rea= son is that case sensitivity is
locale-dependent, and for such= use cases, mbstring extension is better
[1] & [2]. Do you= think that locale is a concern here, and if not,
why? Would i= t be a good idea to add mb_str_icontains instead?

If you're going to propose an RFC for this, it would be a good idea t= o
explain what the real life use case for it is. While str_con= tains is
very useful for checking the existence of a byte-stri= ng within another
byte-string, a case-sensitive check doesn't = seem to have much use.


To make it a bit more explicit: The = proposed str_icontains function does not support UTF-8. It would only be= case-insensitive on ASCII characters. Do we really want to add new func= tions that do not properly handle UTF-8?

I thin= k that thanks to h= ttps://wiki.php.net/rfc/strtolower-ascii (which removed C locale sup= port from this family of functions), there actually is a pretty viable w= ay forward to make the non-mbstring case-insensitive string functions us= eful again: Make them work on UTF-8. (In the sense of using Unicode case= folding and case mapping on UTF-8, while still returning code unit offs= ets. This would make them superior to both the current stri* functions, = and the mb_stri* functions.)

Regards,
=
Nikita
--0088f4d3b94f435ab0847de062a59b1c--