Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122761 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 1506B1A009C for ; Tue, 26 Mar 2024 17:15:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1711473355; bh=K8uSjGDK/EAQmb3JffH0DG1T2uwLEs/P0pXc2Xr1/vA=; h=Date:From:To:Subject:In-Reply-To:References:From; b=XMPSV9sf2wSyGVt4k2ku985aOMH8SUHeNezqZHi3hk66YRX/+v8dsBw+q9tfG9LXJ Ppw4A8PLS6VLREl2S0FbjJdZAYS4GK3H72fw3ijPxv+wCZ6maBQdvVrqu6HkFL+a/z usT4OhmpOx+7iNxbSYRvYUXO6UWZ7FXPX35ndXX6nEVPeujo1Sgyl8gF93rvE2z7yP yjFiRMzf6lcBT8NUOZf6ZCoLF2ej3x4X9y8nGYYwYr8s0KWGLsIX6QMAkA1mNnuNrg jCXqTSmFV7mZel9qspGv480kpngFfeyBmPqprs2ltl1d2cFlCL2Gw3+MTVupMAQa+1 sPhYGJJ4nZW3Q== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2EA9D18007C for ; Tue, 26 Mar 2024 17:15:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: *** X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,SPF_HELO_PASS, SPF_SOFTFAIL,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from xdebug.org (xdebug.org [82.113.146.227]) by php-smtp4.php.net (Postfix) with ESMTP for ; Tue, 26 Mar 2024 17:15:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1711473325; bh=K8uSjGDK/EAQmb3JffH0DG1T2uwLEs/P0pXc2Xr1/vA=; h=Date:From:To:Subject:In-Reply-To:References:From; b=k6Fpm4oYItc7LrJ531m+0UldmomTmUNT1XIovcbbxg9oGQ4i2dkdYvSwxAtnJe1hU nrarlrljbNh1gQYsDADX4GM75+F6+u7rU1goBNGeNT/dyPAVvsiR+w2cJZ900hfyjr 2q7iWzcOcFOauPjBMbKUDZ9hqibHsXT3wNZ1pPhjkiZ5OLEeb8h0225ZeQS8VI5NRn t/86BWYkKafyjjyEqswSDKiKFAntCmO/jIG+Ny9hSGqIyBPOqO5Q9WjWqPo47iYDa8 fsEORSs6HodA2gyIeR7qGdUEDCm1xq6thiDWd/Xr2Z970GYa3T1UB5R+ggOEGl2G2u zmN1FbPlsGS5g== Received: from [127.0.0.1] (unknown [185.69.144.30]) by xdebug.org (Postfix) with ESMTPSA id 66C2210C182; Tue, 26 Mar 2024 17:15:25 +0000 (GMT) Date: Tue, 26 Mar 2024 17:15:25 +0000 To: internals@lists.php.net Subject: =?US-ASCII?Q?Re=3A_=5BPHP-DEV=5D=5BRFC=5D_grapheme_cluster_for?= =?US-ASCII?Q?_str=5Fsplit=2C_grapheme=5Fstr=5Fsplit_function?= User-Agent: K-9 Mail for Android In-Reply-To: <141e31f3-b7cf-4bd1-9bac-c9ec078767ed@app.fastmail.com> References: <141e31f3-b7cf-4bd1-9bac-c9ec078767ed@app.fastmail.com> Message-ID: Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: derick@php.net (Derick Rethans) On 26 March 2024 17:04:18 GMT, Casper Langemeijer w= rote: >I'd like to address an issue I have with this RFC=2E Please don't top reply=2E=20 >I'm not sure is solves a problem by itself=2E If I understand all of this= correctly this only does what already can be accomplished with preg_match_= all('/\X/u', =2E=2E=2E)=2E The result of this method in my opinion is not v= ery usefull by itself=2E I've done some searching on various code platforms= where I mostly find the use-case for counting the number of grapheme's=2E = I've used it to implement strrev() that correctly works multibyte=2E=20 > >I'm very sad that mbstring works on codepoints instead of grapheme's and = I would very much like to see something happening in that area, but I think= expanding a simple string to an array of as many elements to give develope= rs a tool to do this in PHP-space is not good enough=2E Especially since it= can already be achieved with a regexp that already works=2E > >In my opinion: This adds nothing, and tells the PHP developer that is ok = to do count(grapheme_str_split()) for a more accurate mb_strlen()=2E > >I would like to see a family of functions that can do multibyte str_split= (), strrev(), substr()=2E Ideally as bugfix in mb_* functions, because the = edge-case of wanting to know the length in codepoints of a string is a weir= d edge-case=2E No developer wants to know that=2E mb_strlen() should have r= eturned the number of graphemes from the start=2E Many of these already exist, such as grapheme_substr=2E We can't simply ch= ange the behaviour of the already existing functions due to BC reasons=2E= =20 The intl extension is also built on ICU, an actual unicode text processing= library=2E=20 The grapheme_str_split function, as well as other intl extension functions= is what should replace mbstring really=2E=20 cheers=20 Derick=20