Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113669 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 58294 invoked from network); 22 Mar 2021 13:56:53 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Mar 2021 13:56:53 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id D559A1804DD for ; Mon, 22 Mar 2021 06:52:09 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from v-smtpout1.han.skanova.net (v-smtpout1.han.skanova.net [81.236.60.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 22 Mar 2021 06:52:08 -0700 (PDT) Received: from [192.168.7.11] ([213.64.245.126]) by cmsmtp with ESMTPA id OKyVlFDSW3UCOOKyVlTZEn; Mon, 22 Mar 2021 14:52:07 +0100 To: Sara Golemon , Rowan Tommins Cc: PHP Internals References: Message-ID: <267f5dd2-fb3f-26ff-3807-4ff40e6560cf@telia.com> Date: Mon, 22 Mar 2021 14:52:07 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: sv Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfPMdIKW/56lKxOlUvmixY0TZgYZ9ITJeJa+anL7g8L5dbjlXQ5ppcsMNk+3JoFe00/7hLticH/47f5SR7VuChXsPp5wnijXGeq0o9fCZ2Ahedzka0MEj Kthg/F6l9sAMZ00NYLK5yFcFOg3CmjT07UH48owGRqKd5wgyY3J6NOyc4Hftk3tyJ7h7Uod7jExvGzx3Jea1cMLHySOOakHX2Lwus9Pi7NFHe2g9Xre4cqcj SxnpmB9keHWMSlyeuEcMPg== Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: bjorn.x.larsson@telia.com (=?UTF-8?Q?Bj=c3=b6rn_Larsson?=) Den 2021-03-22 kl. 14:10, skrev Sara Golemon: > On Mon, Mar 22, 2021 at 5:24 AM Rowan Tommins > wrote: >> I'm strongly against any concept of "indefinite deprecation". I consider >> any deprecation notice a commitment to remove the feature in the future, >> even if a specific timeline for that removal is not given. >> > > I don't feel strongly about indefinite deprecation. If you wanna nuke it > in 9.0, have fun. I'm just saying I don't necessarily see the need to do > so. The problem being addressed here is that *some* users of this function > are probably misusing it, so it's worth putting guiderails on. I'm > hesitant to punish the ones who know exactly what they're doing as a result > of that well-meaning intention. > >> * People who just want to replace calls to utf8_decode won't want to go >> through every call and make it exception safe. >> > > Then they shouldn't use these replacements, it's not for them. It's for > people using iso-8859-1. > >> * People who want to write a polyfill couldn't use it, because they >> wouldn't be able to recover the remainder of the string after an error >> is thrown. >> > > If you're writing a polyfill, then write a polyfill. The polyfill for the > old functions is trivial, I could have written it a dozen times in the > course of writing this email reply. > So this replacement is also not for them. > >> * People who want transcoding without any optional extensions will be >> disappointed to find only this one encoding supported. >> > This function isn't for them.It's for people using iso-8859-1. > > There's a theme in here. :) > >> You'd effectively be adding a completely new core function just for >> those people who work with Latin1 text, and are confident that it's not >> Windows-1252 in disguise. >> > > Yes. I'm specifically addressing the people who have been using > utf8_en/decode() correctly all this time. They shouldn't be punished for > the stupidity of others. > >> It's tempting to make any C1 control characters an error as well - >> although technically valid in Latin1, these are very rarely used, and >> it's much more likely that any bytes in that range are intended as >> characters in Windows-1252. But that would feel very odd without having >> a corresponding utf8_from_windows1252 function to use instead, at which >> point we're into designing a whole new conversion library. And of >> course, once you've got that UTF-8 string, you can't do much with it, >> because PHP's native string functions are all byte-based, so you've >> basically got to re-invent large chunks of ext/mbstring... >> > > I disagree that you'd need to add utf8_from/to_windows1252 "for > completeness". The goal isn't to provide all possible conversion > utilities. The goal is only to not punish users by taking away a valid API > that they were using correctly (for those users who were using it > correctly). > >> -Sara > Think I'm one such user :-) So keeping them and improving a little would be fine with me! r//Björn L