Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113661 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 35373 invoked from network); 22 Mar 2021 10:33:26 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Mar 2021 10:33:26 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 14F41180505 for ; Mon, 22 Mar 2021 03:28:40 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from v-smtpout1.han.skanova.net (v-smtpout1.han.skanova.net [81.236.60.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 22 Mar 2021 03:28:38 -0700 (PDT) Received: from [192.168.7.11] ([213.64.245.126]) by cmsmtp with ESMTPA id OHnYlDvAz3UCOOHnYlSYA2; Mon, 22 Mar 2021 11:28:37 +0100 To: Rowan Tommins References: <3a4d89fc-c5f8-4720-b2e0-f6f3c28684f9@www.fastmail.com> <5f5fd136-e181-d5d3-fe40-1a4cc5c668f2@gmail.com> <25680b8d-af02-c1d4-e630-7bf079881f1c@gmail.com> Cc: PHP internals Message-ID: Date: Mon, 22 Mar 2021 11:28:40 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <25680b8d-af02-c1d4-e630-7bf079881f1c@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: sv Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfKbm0XkClINuaqWAEABGEaV3ili/y89PQV7A1D/4LgGFazl52PGdQg0pjX4eEx2umaK4OYqI7HWJ5E7CR7ac7Rka7f1FVeSeHuScqneYbFZqftvkOUx1 fU7virbg0szq9CU+0QRjHuKf4Taq19yNhuXDk4yJEfd6FFxoVQRIZyuS/gtSjSlAi+rUPtxhFDhBpEBO3qd+gfDOEvQKHhu1DjrF3S6IUNpXtV17teiqb7X0 Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: bjorn.x.larsson@telia.com (=?UTF-8?Q?Bj=c3=b6rn_Larsson?=) Den 2021-03-21 kl. 22:39, skrev Rowan Tommins: > On 21/03/2021 21:00, Max Semenik wrote: >> Just a quick reminder that it's possible to compile PHP without >> mbstring and intl, which means that some hosts will provide PHP >> without these extensions, and some packagers make them available as >> separate packages that users can't or don't know how to install. Maybe >> we've got an opportunity to think about making these extensions >> mandatory? > > > It's somewhat relevant that until PHP 7.2, it was also possible for > utf8_encode and utf8_decode to be missing, because they were in ext/xml, > which is also optional. > > Bundling mbstring sounds great, until you look into the details of > what's in there and how it works. Its origin as a PHP 4 extension for > handling Japanese-specific character encodings is visible in parts of > its design - there's a lot of global state, and very little support for > the nuances of Unicode. > > Bundling intl would be great, but it's a wrapper around ICU, which is > huge (because Unicode is complicated). I have read that incorporating > that into core was one of the icebergs that sunk PHP 6. It's also > extremely sparsely documented (if someone's looking for a project, it > would be great to fill in all the manual stubs with a few details from > the corresponding ICU documentation). > > For what its worth, it seems these would be the relevant polyfills: > > function utf8_encode(string $string) { return > UConverter::transcode($string, 'UTF8', 'ISO-8859-1'); } > function utf8_decode(string $string) { return > UConverter::transcode($string, 'ISO-8859-1', 'UTF8'); } > > > Regards, > In our case we use the utf8_decode functions to convert from UTF8 in the client to ISO-8859-1 on the server, since the site is encoded in latin1. Our usage of that function is working flawlessly, so for us it's super important to have a clear migration path with a good polyfill! r//Björn L