Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:113669
To: Sara Golemon <pollita@php.net>, Rowan Tommins <rowan.collins@gmail.com>
Cc: PHP Internals <internals@lists.php.net>
References: <f313c9c4-f8a2-0b39-7499-30620d80cecd@gmail.com>
 <CAESVnVoa2U1ZCsthWCt8Cu8Es-P8WWO=vky4TG5CmtaoumEmUA@mail.gmail.com>
 <e120bf6b-12cd-5b3e-b27c-b746d1a1661b@gmail.com>
 <CAESVnVqJh3oOM-5CkatiUctRupOdS7mce7QYzv3uxUb3QcpHWw@mail.gmail.com>
Message-ID: <267f5dd2-fb3f-26ff-3807-4ff40e6560cf@telia.com>
Date: Mon, 22 Mar 2021 14:52:07 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.8.1
MIME-Version: 1.0
In-Reply-To: <CAESVnVqJh3oOM-5CkatiUctRupOdS7mce7QYzv3uxUb3QcpHWw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: sv
Content-Transfer-Encoding: 8bit
Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode?
From: bjorn.x.larsson@telia.com (=?UTF-8?Q?Bj=c3=b6rn_Larsson?=)

Den 2021-03-22 kl. 14:10, skrev Sara Golemon:
> On Mon, Mar 22, 2021 at 5:24 AM Rowan Tommins <rowan.collins@gmail.com>
> wrote:
>> I'm strongly against any concept of "indefinite deprecation". I consider
>> any deprecation notice a commitment to remove the feature in the future,
>> even if a specific timeline for that removal is not given.
>>
> 
> I don't feel strongly about indefinite deprecation.  If you wanna nuke it
> in 9.0, have fun.  I'm just saying I don't necessarily see the need to do
> so.  The problem being addressed here is that *some* users of this function
> are probably misusing it, so it's worth putting guiderails on.  I'm
> hesitant to punish the ones who know exactly what they're doing as a result
> of that well-meaning intention.
> 
>> * People who just want to replace calls to utf8_decode won't want to go
>> through every call and make it exception safe.
>>
> 
> Then they shouldn't use these replacements, it's not for them. It's for
> people using iso-8859-1.
> 
>> * People who want to write a polyfill couldn't use it, because they
>> wouldn't be able to recover the remainder of the string after an error
>> is thrown.
>>
> 
> If you're writing a polyfill, then write a polyfill.   The polyfill for the
> old functions is trivial, I could have written it a dozen times in the
> course of writing this email reply.
> So this replacement is also not for them.
> 
>> * People who want transcoding without any optional extensions will be
>> disappointed to find only this one encoding supported.
>>
> This function isn't for them.It's for people using iso-8859-1.
> 
> There's a theme in here. :)
> 
>> You'd effectively be adding a completely new core function just for
>> those people who work with Latin1 text, and are confident that it's not
>> Windows-1252 in disguise.
>>
> 
> Yes.  I'm specifically addressing the people who have been using
> utf8_en/decode() correctly all this time.  They shouldn't be punished for
> the stupidity of others.
> 
>> It's tempting to make any C1 control characters an error as well -
>> although technically valid in Latin1, these are very rarely used, and
>> it's much more likely that any bytes in that range are intended as
>> characters in Windows-1252. But that would feel very odd without having
>> a corresponding utf8_from_windows1252 function to use instead, at which
>> point we're into designing a whole new conversion library. And of
>> course, once you've got that UTF-8 string, you can't do much with it,
>> because PHP's native string functions are all byte-based, so you've
>> basically got to re-invent large chunks of ext/mbstring...
>>
> 
> I disagree that you'd need to add utf8_from/to_windows1252 "for
> completeness".  The goal isn't to provide all possible conversion
> utilities.  The goal is only to not punish users by taking away a valid API
> that they were using correctly (for those users who were using it
> correctly).
> 
>> -Sara
> 

Think I'm one such user :-) So keeping them and improving a little would
be fine with me!

r//Björn L