Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119169 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 95237 invoked from network); 16 Dec 2022 13:28:57 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 16 Dec 2022 13:28:57 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5B8DC180506 for ; Fri, 16 Dec 2022 05:28:57 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_NEUTRAL,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS30827 82.113.144.0/20 X-Spam-Virus: No X-Envelope-From: Received: from xdebug.org (xdebug.org [82.113.146.227]) by php-smtp4.php.net (Postfix) with ESMTP for ; Fri, 16 Dec 2022 05:28:57 -0800 (PST) Received: from localhost (localhost [IPv6:::1]) by xdebug.org (Postfix) with ESMTPS id 89E9010C0AB; Fri, 16 Dec 2022 13:28:56 +0000 (GMT) Date: Fri, 16 Dec 2022 13:28:56 +0000 (GMT) X-X-Sender: derick@singlemalt.home.derickrethans.nl To: Sara Golemon cc: PHP Developers Mailing List In-Reply-To: Message-ID: References: User-Agent: Alpine 2.23 (DEB 453 2020-06-18) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323329-1795251922-1671197336=:462551" Subject: Re: [PHP-DEV] [RFC] Unicode Text Processing From: derick@php.net (Derick Rethans) --8323329-1795251922-1671197336=:462551 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Thu, 15 Dec 2022, Sara Golemon wrote: > On Thu, Dec 15, 2022 at 9:34 AM Derick Rethans wrote: >=20 > > I have just published an initial draft of the "Unicode Text=20 > > Processing" RFC, a proposal to have performant unicode text=20 > > processing always available to PHP users, by introducing a new=20 > > "Text" class. > > > > You can find it at: https://wiki.php.net/rfc/unicode_text_processing > > > > I'm looking forwards to hearing your opinions, additions, and=20 > > suggestions =E2=80=94 the RFC specifically asks for these in places. > > > Question 2 is that class. I know folks have been clammoring for a=20 > `String` class for some time and this actually fills that niche quite=20 > well. A part of me wonders if we can overload it a little to provide=20 > a psuedo locale of "binary" so that users can, optionally, treat it=20 > like a more generalized String class in specific cases, storing a=20 > normal `char*` zend_string under the hood in that case. Possibly as a=20 > specialzation tree. An alternative could be to just have this as an implementation detail,=20 in case the associated locale/collation is C/root. Then nobody needs to=20 worry about it, *but* it would mean implementing everything twice. Which=20 I am not too keen on, especially because we have such a wide array of=20 operations on strings already. cheers, Derick --=20 https://derickrethans.nl | https://xdebug.org | https://dram.io Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/suppo= rt Host of PHP Internals News: https://phpinternals.news mastodon: @derickr@phpc.social @xdebug@phpc.social --8323329-1795251922-1671197336=:462551--