Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121828 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 28303 invoked from network); 28 Nov 2023 19:48:03 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 28 Nov 2023 19:48:03 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8D020180035 for ; Tue, 28 Nov 2023 11:48:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DMARC_MISSING, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from malamute.woofle.net (woofle.net [74.207.252.100]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 28 Nov 2023 11:48:10 -0800 (PST) Received: by malamute.woofle.net (Postfix) with ESMTPSA id 974FC1F1B4; Tue, 28 Nov 2023 11:48:01 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.200.91.1.1\)) In-Reply-To: <1BA05C1A-AFAE-4E86-BAA2-420B22549519@gmail.com> Date: Tue, 28 Nov 2023 11:47:51 -0800 Cc: Hans Henrik Bergan , PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: <0D8856BC-DDEE-47F8-8C59-7F4DC7A64237@woofle.net> References: <1BA05C1A-AFAE-4E86-BAA2-420B22549519@gmail.com> To: Claude Pache X-Mailer: Apple Mail (2.3774.200.91.1.1) Subject: Re: [PHP-DEV] Deprecate declare(encoding='...') + zend.multibyte + zend.script_encoding + zend.detect_unicode ? From: dusk@woofle.net (Dusk) On Nov 28, 2023, at 11:12, Claude Pache wrote: > Le 28 nov. 2023 =C3=A0 19:57, Hans Henrik Bergan = a =C3=A9crit : >> With the dominance of UTF-8 (a fixed-endian encoding), surely no new >> code should utilize any of declare(encoding=3D'...') / zend.multibyte = / >> zend.script_encoding / zend.detect_unicode. >> I propose we deprecate all 4. >=20 > What is the migration path for legacy code that use those directives? Convert your PHP source files to UTF-8. These directives are only = required for code written in legacy multibyte encodings like Shift-JIS, = Big5, or EUC-CN. (These encodings are primarily used for Chinese and = Japanese text.) These directives are not required for scripts which *process* text in = these encodings. They're only required if the source code itself is in a = legacy multibyte encoding, as those encodings can contain octets in the = basic ASCII range (0x20 - 0x7f) within multibyte sequences. For example, = the character "=E3=83=9C" (U+30DC KATAKANA LETTER BO) is encoded in = Shift-JIS as 83 7B, whose second octet would ordinarily represent the = ASCII character "{". If this character appeared in a variable name, for = instance, PHP would need to recognize that the "7B" does not represent = open brace. >> With the dominance of UTF-8 (a fixed-endian encoding) I'll add that what's special about UTF-8 isn't that it's "fixed-endian". = It's that UTF-8 only uses octets above 0x7F for characters outside the = ASCII range, so the parser doesn't have to be specifically aware of = UTF-8 encoding when processing text.=