Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119155 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 98993 invoked from network); 15 Dec 2022 18:16:40 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 15 Dec 2022 18:16:40 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id C81F8180572 for ; Thu, 15 Dec 2022 10:16:38 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 15 Dec 2022 10:16:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1671128196; bh=DQNDbk5gfDNsNpQmSTc/MOWRGFD0VTB4/92bBkhP+EI=; h=Date:Subject:To:References:From:In-Reply-To:From; b=UoERFHh2S0mnJ70vDtGt/qhndE6KOj0POm4bVoiE2LUNTOWqitkDvuu1tig+a66J1 RYDuzhXLiYm9z7poTBj16maQt6eX3Vzs+E+3J6L4ffL1asKImfUVsSy2CNVTaCOPov 0wUBjiI6N7/WKWmTAYHfpSiofnwqKZQHjjRo+cD8/4hZDykqPdZ5qCJfMC8q9KBW5H PgIQki5wdcGOXY9q4fbnRFOCZ0cAI+nmLHCK3/tpaiquOqzAW/G6od/0YR+EOI6bdg mat5+8bwo/GKoSArFjiupPZ5TCVgxOhu82F1JPNEjJRiZtFTmh/HFz4tAfXDVAhhHI oaznh8nN4/wjQ== Message-ID: <60ba972f-5b7c-f9ea-fb07-7b565bd108cd@bastelstu.be> Date: Thu, 15 Dec 2022 19:16:36 +0100 MIME-Version: 1.0 Content-Language: en-US To: Andreas Heigl , internals@lists.php.net References: <09aa499b-f5ec-5e80-63d8-6271fa87c390@heigl.org> In-Reply-To: <09aa499b-f5ec-5e80-63d8-6271fa87c390@heigl.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] [RFC] Unicode Text Processing From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) Hi On 12/15/22 17:05, Andreas Heigl wrote: > I see a few challenges in the approach. My first question was: Why do we > need a new implementation of the ICU library? Creating a userland > > […] > > I'm ambivalent about this. On the one hand it could make some things for > sure easier. On the other hand it adds burden onto the core-developers > that could be avoided by providing the intl (and mb-string) extension by > default instead of having to add them separately. And then find a group > if people willing to build a userland implementation. Because a programming language needs a standard library, otherwise one could just use JavaScript and pull in a dependency for 'is-odd' or left-padding. The biggest advantage this proposal has compared to ext/intl is that it *adds a new data type*. If you receive a 'Text' object then you are guaranteed to have valid Unicode/UTF-8 inside of it. It also provides a OO API around text/string processing functionality, which is something users have desired for quite some time already ("scalar objects"). The addition of a new data type is also a reason why this cannot usefully be implemented in userland alone: It would require every developer to standardize on a single userland implementation, as otherwise you need bridges to convert between the different representations of various userland libraries (or need to round-trip through the standard 'string' type), which I consider to be a non-starter for something as fundamental as text processing. Both because it adds complexity and because it will kill performance. As the RFC notes, an explicit design goal is to keep the API simple and focused, so I don't expect much ongoing maintenance burden here. Especially if all the heavy lifting is off-loaded to ICU. Any convenience functionality can then be be provided in userland based on the building blocks provided by PHP itself, with the benefit that userland libraries are going to be fully interoperable because they all use the standard 'Text' type that is guaranteed to be available [1]. Best regards Tim Düsterhus [1] The 'Text' class should likely be made final, because folks might otherwise rely on a specific userland extension, preventing actual interoperability.