Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119178 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 19601 invoked from network); 16 Dec 2022 15:48:41 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 16 Dec 2022 15:48:41 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 68A9F18054B for ; Fri, 16 Dec 2022 07:48:40 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 16 Dec 2022 07:48:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1671205719; bh=59Y1NK6fjyNHZ0rqWe7NPInR83Uz+L2QLhdDsDwEOi0=; h=Date:Subject:To:References:From:In-Reply-To:From; b=VkRbpAmL3ks3RCj1X68XE0iioOuZjyX9Q6Ie+Vbh22hWdovuWjn7wZNHl0l/uTEJN IofXvsHBKa54BB+DkXCBywypRkxZ2qlyCih92t/qjGE2YJC4kmTmc/95OgDn4B6N9T GSVAgUdhW15OX2VX4x7jqIS6MFqZJuS/fpCNUWLn0QBLNNoKo5FfUaRgBpokiO27Ow ZjIqo749wk5tUZzMeaIwV0TTLaYTD60KyOYPLrsPjhTfuMKszRWdS/F9rtZvcOtKMp 43NPo1CeENyuEhdutEHg12DiTqPLqMUZ/VnKzYmQ1MX0Tbkjs+Np8DQNMyqaU3taa4 /WuOSM3kfXL7A== Message-ID: Date: Fri, 16 Dec 2022 16:48:38 +0100 MIME-Version: 1.0 Content-Language: en-US To: Andreas Heigl , internals@lists.php.net References: <24ba312e-5965-8b8c-2506-2db9767aa3d5@bastelstu.be> <68e8a528-2d7b-5d0b-43b9-bd36da7a46a5@heigl.org> In-Reply-To: <68e8a528-2d7b-5d0b-43b9-bd36da7a46a5@heigl.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] [RFC] Unicode Text Processing From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) Hi On 12/16/22 16:27, Andreas Heigl wrote: >> I rather not see this either, because if a 'Text' object may contain >> binary data, the type safety is lost and users cannot rely on "'Text' >> implies valid UTF-8" (see sibling thread). > > Does Text contain valid UTF-8? Or valid Unicode? As IIRC the idea was to > internally use UTF-16 as encoding. > > In the end the internal encoding should be irrelevant to the user as > long as we can assert that __toString() returns a Unicode-String in a > valid encoding. And I'm with you that UTF-8 might be the best choice for > that. The RFC already specifies that the inputs (__construct()) and outputs (__toString()) must/will be UTF-8 strings in https://wiki.php.net/rfc/unicode_text_processing#basics. So for all intents and purposes "'Text' implies valid UTF-8" is what this guarantees, because the internal representation will not be visible to the user. Best regards Tim Düsterhus