Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119167 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 90418 invoked from network); 16 Dec 2022 12:55:47 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 16 Dec 2022 12:55:47 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4FC51180544 for ; Fri, 16 Dec 2022 04:55:47 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_NEUTRAL,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS30827 82.113.144.0/20 X-Spam-Virus: No X-Envelope-From: Received: from xdebug.org (xdebug.org [82.113.146.227]) by php-smtp4.php.net (Postfix) with ESMTP for ; Fri, 16 Dec 2022 04:55:47 -0800 (PST) Received: from localhost (localhost [IPv6:::1]) by xdebug.org (Postfix) with ESMTPS id 766B810C0AB; Fri, 16 Dec 2022 12:55:46 +0000 (GMT) Date: Fri, 16 Dec 2022 12:55:46 +0000 (GMT) X-X-Sender: derick@singlemalt.home.derickrethans.nl To: Jakub Zelenka cc: "Christoph M. Becker" , PHP Developers Mailing List In-Reply-To: Message-ID: References: User-Agent: Alpine 2.23 (DEB 453 2020-06-18) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323329-204716545-1671195346=:462551" Subject: Re: [PHP-DEV] Re: [RFC] Unicode Text Processing From: derick@php.net (Derick Rethans) --8323329-204716545-1671195346=:462551 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Thu, 15 Dec 2022, Jakub Zelenka wrote: > On Thu, Dec 15, 2022 at 4:56 PM Christoph M. Becker > wrote: >=20 > > On 15.12.2022 at 16:34, Derick Rethans wrote: > > > > > I have just published an initial draft of the "Unicode Text=20 > > > Processing" RFC, a proposal to have performant unicode text=20 > > > processing always available to PHP users, by introducing a new=20 > > > "Text" class. > > > > > > You can find it at:=20 > > > https://wiki.php.net/rfc/unicode_text_processing > > > > > > I'm looking forwards to hearing your opinions, additions, and=20 > > > suggestions =E2=80=94 the RFC specifically asks for these in places. > > > > | As the implementation requires ICU, this would also mean that PHP=20 > > | depend on the ICU library. > > > > Our current stance is that a minimal PHP should be buildable without=20 > > requiring any "non-standard" libraries; this is the reason why we=20 > > bundle PCRE. If we wanted to stick with that policy, we would need=20 > > to bundle ICU, what might not be the best idea =E2=80=93 it's generally= not=20 > > great to have bundled libraries which are still maintained outside=20 > > of php-src, and especially for such huge libraries. > > > > > I agree with this. Bundling ICU doesn't seem like a good idea.=20 > Wouldn't be better to base on something smaller that can be bundled=20 > and does the job? For example NJS and QuickJS use their own=20 > implementations which seem to be fine. Especially=20 > https://github.com/bellard/quickjs/blob/master/libunicode.c seems like=20 > something that we could fork and maintain potentially. I have no intentions of bundling ICU. That'd be a crazy thing to do.=20 Instead, the current proposal is to make PHP depend on libicu. I realise=20 that this is against our current stance, but considering that 1. most=20 (if not all) Linux distributions ignore our bundled libraries any way as=20 per their policies; 2. libicu is pretty much available everywhere; and=20 3. I am not proposing to require the latest and greatest, I believe we=20 can safely rely on it being available. I'm not opposed to using something else than ICU Most of the other=20 unicode related libraries that I had a quick look at, either provide a=20 small subset =E2=80=94 either just character properties, or graphemes, none= of=20 them also take care of collation/locales and transliteration. I am also=20 weary about some of these library's development and future proofness.=20 ICU won't have these problems. cheers, Derick --=20 https://derickrethans.nl | https://xdebug.org | https://dram.io Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/suppo= rt Host of PHP Internals News: https://phpinternals.news mastodon: @derickr@phpc.social @xdebug@phpc.social twitter: @derickr and @xdebug --8323329-204716545-1671195346=:462551--