Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:127516 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by lists.php.net (Postfix) with ESMTPS id F2EAB1A00BC for ; Sat, 31 May 2025 11:37:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1748691298; bh=ByUH3kqCDIDEnMBJz0VcyUbvxJuv6yJ3nwgCfVKxsOs=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Nd2v+zJm3668VY4x0WlKKK5OtBWOMKlt3JLWRtszSQAzPqcc9h1OnupAtHVmIJu9w w2H/cUJgtQ8AF6jnTq3LXayXU47pHinMqq20vSoVRFEJ7KzRu4vRy33L53YWCwU47+ /ZFCLz4oF+bB9IuCRVp2ZLbwxFRTEkJtMJRyQvLDf7gQsBQyIofQC1kXfHzjHwwMjb KwiNFFUdUy2ONexPjBMN7Yt2T/5FD+OAqDxUs6gDbqjF0PXSYxOCllTCIZ0H+K1oMC /gVuJQty4o4gfuYlBwYskbE/mMu5XqC1+N0PrzqaJs+JWMsKslt7ItQD1eGn8yifva Tfm+POLxdvGuQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5917A180057 for ; Sat, 31 May 2025 11:34:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.1 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 31 May 2025 11:34:57 +0000 (UTC) Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-3a367ec7840so1874500f8f.2 for ; Sat, 31 May 2025 04:37:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748691421; x=1749296221; darn=lists.php.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ByUH3kqCDIDEnMBJz0VcyUbvxJuv6yJ3nwgCfVKxsOs=; b=Q+NuQ5Vo+BLUzYQ5tyQWC1Xi+/fSXhn9V9OkJ3/Br/aoMkl0eWpGlE1bQp2CIy4xkG 7eUcZ3sMjzncS15itMBpc9s2pq9JGK7iRh3BIj8RGa8pLyLdGxhnPL0FOBdaE5blVokX tiDA5+0wGwa8DFbiw3G5hoj4rRuGxWMpDLT+ERg7pkbrEoLV/u2phFtXHocYliqUjbwg Fmo6nHm9Dpbs8dGDxdQxlpcCfmpjLoNo6TdUh9GERxMehUYJvhdRWIocVwkMySTIokDJ Yp56B0Wmf6ku/k0VaUHurFt9f2U+YHULba/AqSL+jYAnCJv1xXxkLVyube0JGhx9W0Hi OGqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748691421; x=1749296221; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ByUH3kqCDIDEnMBJz0VcyUbvxJuv6yJ3nwgCfVKxsOs=; b=e20hW+0L8LZ0nw9sbPeOM2xBhc6ferVMiWlU4GSYF4rGiPozScMWYPJX614MO/Cax9 qFpC9T88FRWl5e14XQFeJQsdDhAeIwhrfnQdhOu2yzCMrxsroFmZsyafEyxXrx1D/O1o TBtb0qIJqL0Plq1yJ6TOAClxSXAkwI+kb1Ybe3TGowyDyj19BtTuZkGlJU+fLsqw1kCU mAvK4NvAob5pY3SGNKo/o8ZV8XxoGvZ5SrPmCFrHDm6P0H/v/7YzP6Cvph3b0deNgJ/j p8JmnGr5dcvplVfLOtiTphDiMnt6lpEB9JSKR0DqRdoj3G1TnJCH7AQeFi5fJ7+pIB9o yj/A== X-Gm-Message-State: AOJu0YwZgxH1fIsHQxjeY7Is+oDdAkartpjh8akUYwRWIfTNFMlzqg8T gpn3BZcNixjMjKXm1xwf0jQPJa0yWZr/Sfhn1W1j4/UjHPhQ45wDnqFFkSR4KX46/Ln7wY8/G+Y hPb1Rf6i1MsHmRnHXo3drPKWxbn6wUlc+PwvcxQ== X-Gm-Gg: ASbGncu6sXugmCXDIEii+JBRGVJ2zXh4Zv+nZBqWtWvmBj04aDOAi1JySHfBnXoExRB wmltkQpnB50qlQxAFHlVdWb0b7rObHzcswN0T/hG4f+UW3dOvf22RiYbuBCw2sbfsnlzNe/BqAL 2N0fVN4mKdbKHo+MK/Pjv4LNCXFu/RZeHObniR7FJIWipkxLkfJ0PCZHMSbZrf X-Google-Smtp-Source: AGHT+IHMPk/KauDAXduV8xNBJ/E0W34p+pIh6EJsUe8rElX1p1mqdfytR2TRcFxohf/GPh/Fa1l8w8LiH+NvuykAWAo= X-Received: by 2002:a05:6000:40cb:b0:3a4:c909:ce16 with SMTP id ffacd0b85a97d-3a4f89e322dmr4708068f8f.49.1748691421057; Sat, 31 May 2025 04:37:01 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <2a626b9f-292b-4fc2-a023-0b0db9a64ede@app.fastmail.com> In-Reply-To: <2a626b9f-292b-4fc2-a023-0b0db9a64ede@app.fastmail.com> Date: Sat, 31 May 2025 20:36:52 +0900 X-Gm-Features: AX0GCFthO3Jt9G0sQE4EH_UmQw8pfwAKgcgbSz0X63pnraJOuUQqOqO1tPsRSUU Message-ID: Subject: Re: [PHP-DEV] Adding in a case-insensitive version of str_contains To: php internals Cc: Nikita Popov , Kamil Tekiela , Adam Cable Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: youkidearitai@gmail.com (youkidearitai) 2025=E5=B9=B45=E6=9C=8831=E6=97=A5(=E5=9C=9F) 19:41 Nikita Popov : > > On Thu, May 29, 2025, at 23:00, Kamil Tekiela wrote: > > As I understand, it was a conscious decision not to add this function > when str_contains was created. The reason is that case sensitivity is > locale-dependent, and for such use cases, mbstring extension is better > [1] & [2]. Do you think that locale is a concern here, and if not, > why? Would it be a good idea to add mb_str_icontains instead? > > If you're going to propose an RFC for this, it would be a good idea to > explain what the real life use case for it is. While str_contains is > very useful for checking the existence of a byte-string within another > byte-string, a case-sensitive check doesn't seem to have much use. > > [1]: https://stackoverflow.com/a/63121809/1839439 > [2]: https://wiki.php.net/rfc/str_contains#case-insensitivity_and_multiby= te_strings > > > To make it a bit more explicit: The proposed str_icontains function does = not support UTF-8. It would only be case-insensitive on ASCII characters. D= o we really want to add new functions that do not properly handle UTF-8? > > I think that thanks to https://wiki.php.net/rfc/strtolower-ascii (which r= emoved C locale support from this family of functions), there actually is a= pretty viable way forward to make the non-mbstring case-insensitive string= functions useful again: Make them work on UTF-8. (In the sense of using Un= icode case folding and case mapping on UTF-8, while still returning code un= it offsets. This would make them superior to both the current stri* functio= ns, and the mb_stri* functions.) > > Regards, > Nikita Hi, I agree that it's important to think about it in UTF-8. I think about UTF-8 support case folding function in past few days. Maybe... It is like below? ``` grapheme_setlocale($locale); grapheme_icontains($haystack, $needle); ``` First, grapheme_* function supports locale. Second, add grapheme_icontains function for case insensitive version for str_contains. . What do you think? Regards Yuya --=20 --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------