Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119156 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 1995 invoked from network); 15 Dec 2022 18:41:29 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 15 Dec 2022 18:41:29 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4D866180555 for ; Thu, 15 Dec 2022 10:41:28 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: *** X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_40, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_SOFTFAIL,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 15 Dec 2022 10:41:27 -0800 (PST) Received: by mail-qv1-f51.google.com with SMTP id h10so2386736qvq.7 for ; Thu, 15 Dec 2022 10:41:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vAbWL+h+J2LPa81RzfiX+GoxzFKqiCQsjUhYxUFnLac=; b=tLGMzSRSR8Iv9Nw2LgGZeDAPuOd+oG8Nvqf/kZn0paTv0m8eckBXVwO11V5wiPSepC QghvklcjdkmlvODGOH4cc3aFabS5jR1T+rBWz08VabrxDTppkdWASlmprWcef3rBXjzn 4j61gNCsfXEM95ficVYIutJTeaozXmHLASepkJXqp4Bj+IRZhO3cCqMHrBQjHbsu6U5Q Y2ehElP5S3Q70lX6K4CsylhMdgINPE0gmuhL+Oqe1iwo0i1FUrl1BmUFz9MvZ/WK0Cl4 z7ySx2buAa340pU3h58RPuFjwX+4WMS1KuLDwP82430qnCsJi2zrsQQIyiAtr/4nlo0E gFpg== X-Gm-Message-State: ANoB5pl9In3NynBFZ2QvFukt6w69WNbAvNaGwJa/gd/Ohf3uXgcpay+0 tkc+hrYmfbHIoRXxQPHmBXBbXI+NMHnS27Ko5FORfPMFj71/Tag5 X-Google-Smtp-Source: AA0mqf7/uO5sUp121i5p+HoT68QSK8/I0C8nxrZba9+w/xi1PIpm9zLEUkxil7cB8Hp2qxePCF7wjPC21Sk7hZSTzNk= X-Received: by 2002:ad4:4c41:0:b0:4c7:3cc:d073 with SMTP id cs1-20020ad44c41000000b004c703ccd073mr44449947qvb.20.1671129686982; Thu, 15 Dec 2022 10:41:26 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 15 Dec 2022 12:41:16 -0600 Message-ID: To: Derick Rethans Cc: PHP Developers Mailing List Content-Type: multipart/alternative; boundary="0000000000005dda3e05efe233a0" Subject: Re: [PHP-DEV] [RFC] Unicode Text Processing From: pollita@php.net (Sara Golemon) --0000000000005dda3e05efe233a0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 15, 2022 at 9:34 AM Derick Rethans wrote: > I have just published an initial draft of the "Unicode Text Processing" > RFC, a proposal to have performant unicode text processing always > available to PHP users, by introducing a new "Text" class. > > You can find it at: > https://wiki.php.net/rfc/unicode_text_processing > > I'm looking forwards to hearing your opinions, additions, and > suggestions =E2=80=94 the RFC specifically asks for these in places. > > Obviously, hurdle one is making the ICU library a requirement for building PHP. I'd almost make that it's own milestone in this project with the introduction of the Text class as a separate followon. A very casual (IANAL) read of the ICU license doesn't seem to make this a problem, so it may be more of a question of whether we put this on people wanting to build PHP. ICU is pretty widely available and used, so I also don't see this as a major stumbling block. Question 2 is that class. I know folks have been clammoring for a `String` class for some time and this actually fills that niche quite well. A part of me wonders if we can overload it a little to provide a psuedo locale of "binary" so that users can, optionally, treat it like a more generalized String class in specific cases, storing a normal `char*` zend_string under the hood in that case. Possibly as a specialzation tree. /* names as examples only */ interface Stringy { /* define all those APIs */ } class Text implements Stringy { /* ... */ } class BinaryString implements Stringy { /* ... */ } I think you'd get a lot more buy-in from the folks who worry that UTF16 is overhead they don't want, but who do like the idea of an OOPy string. It also provides a migration path to avoid having to rethink byte vs grapheme conversions up front, instead deferring that part of a migration till later= . Overall, I'm more positive on this than negative, and I eagerly await the rest of this thread. -Sara --0000000000005dda3e05efe233a0--