Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122760 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 8CFE91A009C for ; Tue, 26 Mar 2024 17:04:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1711472709; bh=9VjCyL+VVkuMH01yw0TzexgDZ9yORnFMYHbAf3XDAGI=; h=In-Reply-To:References:Date:From:To:Subject:From; b=HSTYRySKxY3bcSYK9b76pi9TwzyacEmRsTdZjaNcjpYmISiuM/C00w/ODZt7ywQ55 hZf6P63/2y2BdqXOw+ICVkUA1iiV8CSxso+Td8mlcf6Cmp0iT5jqRPGIboasb76Z5z AZpnJTNkixd3IJ+urn+uV+1bSie6OZo2kCOrmwpiD5XF4/eP/NcFL19gheEv+huVBQ gpWwBrZOjHqfC5BxjPXsSighm+Fm/O4e4yIZe/dhhgC4bfvTK7/1tnY0nMRmvNngqj 63yQ0sma7BkkaOZu26DZEWefn+GUXlP5YgL75Chtol2wJy4pH4zD/tEc2vXg6WOY2v dx0+iQcxKDx1w== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id B764218004E for ; Tue, 26 Mar 2024 17:05:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: *** X-Spam-Status: No, score=3.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DMARC_MISSING,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_SOFTFAIL,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com [64.147.123.25]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 26 Mar 2024 17:05:07 +0000 (UTC) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id C2B963200A0D for ; Tue, 26 Mar 2024 13:04:40 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute3.internal (MEProxy); Tue, 26 Mar 2024 13:04:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1711472680; x=1711559080; bh=9VjCyL+VVkuMH01yw0TzexgDZ9yO RnFMYHbAf3XDAGI=; b=Z+jppz3u21Do+MzaysI9AwwChTWH7x2P0kiAd5VtKu52 fQYTtxgZWM/MP35o60VbRFHlTon0Uj8ZAtSqvRm01o9s/3po79sPuZsnCaeW19sf rvHzI4+Zz9Bp54Fd95TDIvb89ezvNy7ghaHU7y08arvWqTa7v6yF3kt5Hi7ptbDv fHvUfMTY+ntd+02JJIUrLI7+IgiEFsZBGJbVsvVw6k0+nl536fwrJcPLY9Xt4TaM NiEFVu4vPn7QRIRVIoAj6YTWlwWUqVXneBqoTMYyJKCAxI9d0Zlz1OdcgncbkLf2 9CVy30e1eDtU/jml+g+Gla5/9Y7oOGKXOowceUsN5w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledruddufedgleejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsegrtd erreerreejnecuhfhrohhmpedfvegrshhpvghrucfnrghnghgvmhgvihhjvghrfdcuoehl rghnghgvmhgvihhjvghrsehphhhprdhnvghtqeenucggtffrrghtthgvrhhnpedulefgtd fhieffkeegheetveelgfelgeefueejteekveetvdfhveeghefgveejteenucffohhmrghi nhepphhhphdrnhgvthdpthgvkhhithhohhdqmhgvmhguhhhoihdrihhnfhhopdhgihhthh husgdrtghomhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehlrghnghgvmhgvihhjvghrsehphhhprdhnvght X-ME-Proxy: Feedback-ID: id4f946ef:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id C3FC91700093; Tue, 26 Mar 2024 13:04:38 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-328-gc998c829b7-fm-20240325.002-gc998c829 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Message-ID: <141e31f3-b7cf-4bd1-9bac-c9ec078767ed@app.fastmail.com> In-Reply-To: References: Date: Tue, 26 Mar 2024 18:04:18 +0100 To: internals@lists.php.net Subject: Re: [PHP-DEV][RFC] grapheme cluster for str_split, grapheme_str_split function Content-Type: multipart/alternative; boundary=4c29534aa21a4145a8f8983fd39cd240 From: langemeijer@php.net ("Casper Langemeijer") --4c29534aa21a4145a8f8983fd39cd240 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable I'd like to address an issue I have with this RFC. I'm not sure is solves a problem by itself. If I understand all of this = correctly this only does what already can be accomplished with preg_matc= h_all('/\X/u', ...). The result of this method in my opinion is not very= usefull by itself. I've done some searching on various code platforms w= here I mostly find the use-case for counting the number of grapheme's. I= 've used it to implement strrev() that correctly works multibyte.=20 I'm very sad that mbstring works on codepoints instead of grapheme's and= I would very much like to see something happening in that area, but I t= hink expanding a simple string to an array of as many elements to give d= evelopers a tool to do this in PHP-space is not good enough. Especially = since it can already be achieved with a regexp that already works. In my opinion: This adds nothing, and tells the PHP developer that is ok= to do count(grapheme_str_split()) for a more accurate mb_strlen(). I would like to see a family of functions that can do multibyte str_spli= t(), strrev(), substr(). Ideally as bugfix in mb_* functions, because th= e edge-case of wanting to know the length in codepoints of a string is a= weird edge-case. No developer wants to know that. mb_strlen() should ha= ve returned the number of graphemes from the start. On Tue, Mar 26, 2024, at 01:44, youkidearitai wrote: > 2024=E5=B9=B43=E6=9C=8826=E6=97=A5(=E7=81=AB) 5:43 David CARLIER : > > > > I second this, I think it is a good addition which makes a lot of se= nse. > > > > Cheers. > > > > On Mon, 25 Mar 2024 at 20:36, Ayesh Karunaratne wr= ote: > >> > >> > > >> > 2024=E5=B9=B43=E6=9C=889=E6=97=A5(=E5=9C=9F) 15:26 youkidearitai = : > >> > > > >> > > Hello, Internals > >> > > > >> > > I created an wiki for `grapheme_str_split` function. > >> > > Please see: > >> > > https://wiki.php.net/rfc/grapheme_str_split > >> > > > >> > > I would like to "Under Discussion" section. > >> > > > >> > > Best Regards > >> > > Yuya > >> > > > >> > > -- > >> > > --------------------------- > >> > > Yuya Hamada (tekimen) > >> > > - https://tekitoh-memdhoi.info > >> > > - https://github.com/youkidearitai > >> > > ----------------------------- > >> > > >> > Hello, Internals > >> > > >> > I want to go to "Voting" phase if nothing any comment. > >> > I will start at tomorrow(26th) to "Voting" phase. > >> > > >> > Thank you > >> > Yuya > >> > > >> > -- > >> > --------------------------- > >> > Yuya Hamada (tekimen) > >> > - https://tekitoh-memdhoi.info > >> > - https://github.com/youkidearitai > >> > ----------------------------- > >> > >> I think it makes sense to add this function, and the PR worked well > >> too; It correctly split individual graphemes for all comlex Emojis, > >> ZWJs, and those Cthulu texts, and everything else I threw at it. > >> > >> Good luck for the RFC vote today, hope it passes =F0=9F=A4=9E. >=20 >=20 > Hi, Internals >=20 > grapheme_str_split going to "Voting" phase. > Vote end is 10th April 00:00 GMT >=20 > Regards > Yuya >=20 > --=20 > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- >=20 --4c29534aa21a4145a8f8983fd39cd240 Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable
I'd like to add= ress an issue I have with this RFC.

I'm not= sure is solves a problem by itself. If I understand all of this correct= ly this only does what already can be accomplished with preg_match_= all('/\X/u', ...). The result of this method in my opinion is not very u= sefull by itself. I've done some searching on various code platforms whe= re I mostly find the use-case for counting the number of grapheme's. I'v= e used it to implement strrev() that correctly works multibyte. 

I'm very sad that mbstring works on codepoint= s instead of grapheme's and I would very much like to see something happ= ening in that area, but I think expanding a simple string to an arr= ay of as many elements to give developers a tool to do this in PHP-space= is not good enough. Especially since it can already be achieved with a = regexp that already works.

In my opinion: T= his adds nothing, and tells the PHP developer that is ok to do count(gra= pheme_str_split()) for a more accurate mb_strlen().

I would like to see a family of functions that can do multibyte= str_split(), strrev(), substr(). Ideally as bugfix in mb_* functions, b= ecause the edge-case of wanting to know the length in codepoints of a st= ring is a weird edge-case. No developer wants to know that. mb_strlen() = should have returned the number of graphemes from the start.


On Tue, Mar 26, 2024, at 01:44, youkide= aritai wrote:
2024=E5=B9=B43=E6=9C=8826=E6=97=A5(=E7=81=AB) 5:43 David CARLIER <= devnexen@gmail.com>:
>
> I second this, I think it is a good addi= tion which makes a lot of sense.
>
> C= heers.
>
> On Mon, 25 Mar 2024 at 20:3= 6, Ayesh Karunaratne <ayesh@php.wa= tch> wrote:
>>
>> >
>> > 2024=E5=B9=B43=E6=9C=889=E6=97=A5(=E5=9C=9F) 1= 5:26 youkidearitai <youkid= earitai@gmail.com>:
>> > >
>> > > Hello, Internals
>> > >
>> > > I created an wiki for `grapheme_str_spli= t` function.
>> > > Please see:
&g= t;> > >
>> > > I would like to "Under= Discussion" section.
>> > >
>= ;> > > Best Regards
>> > > Yuya
>> > >
>> > > --
>> > > ---------------------------
>&= gt; > > Yuya Hamada (tekimen)
&g= t;> > > -----------------------------
>> &g= t;
>> > Hello, Internals
>> &= gt;
>> > I want to go to "Voting" phase if nothin= g any comment.
>> > I will start at tomorrow(26th= ) to "Voting" phase.
>> >
>> = > Thank you
>> > Yuya
>> &= gt;
>> > --
>> > ---------= ------------------
>> > Yuya Hamada (tekimen)
=
>> > -----------------------------
>>
>> I think it makes sense to add thi= s function, and the PR worked well
>> too; It correc= tly split individual graphemes for all comlex Emojis,
>= > ZWJs, and those Cthulu texts, and everything else I threw at it.
>>
>> Good luck for the RFC vote t= oday, hope it passes =F0=9F=A4=9E.


Hi, Internals

grapheme_str_split goi= ng to "Voting" phase.
Vote end is 10th April 00:00 GMT
=

Regards
Yuya

<= /div>
-- 
---------------------------
Yuya Hamada (tekimen)
-----------------------------


--4c29534aa21a4145a8f8983fd39cd240--