Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:95667 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 26611 invoked from network); 5 Sep 2016 18:38:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 Sep 2016 18:38:15 -0000 X-Host-Fingerprint: 90.212.141.121 unknown Received: from [90.212.141.121] ([90.212.141.121:25438] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F0/67-45301-59BBDC75 for ; Mon, 05 Sep 2016 14:38:14 -0400 Message-ID: To: internals@lists.php.net References: Date: Mon, 5 Sep 2016 19:38:09 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 90.212.141.121 Subject: Re: [RFC][DISCUSSION] Remove utf8_decode() and utf8_encode() From: ajf@ajf.me (Andrea Faulds) Hi Yasuo, Yasuo Ohgaki wrote: > utf8_decode() and utf8_encode() are not needed and causing problems > than solving. > > https://wiki.php.net/rfc/remove_utf_8_decode_encode > > Proposal > - Document deprecation them now > - Remove them from 7.2 > > I think only few users are using and they shouldn't have problem using > mbstring/iconv/intl functions. > > Any comments? I don't agree with this. utf8_decode() and _encode() are functions which you probably ought not to use in modern code, and the names are maybe unhelpful (decode to what? encode from what?). But the job they do is sometimes needed (if you're dealing with this specific legacy encoding), and I believe they work correctly. Plus, a lot of existing code uses them. This seems like a needless deprecation for this reason. I would propose something else: remove them from the XML extension, and move them somewhere more fitting, like ext/intl, ext/mbstring or maybe ext/standard. These are generic functions which work on any text, not just XML, and their inclusion is mutually superfluous with respect to XML: if you're decoding XML, you don't necessarily need to convert text to/from UTF-8, and if you're converting text to/from UTF-8, you don't necessarily need to deal with XML. Plus, given the names alone, you'd have no idea they're part of the XML extension. Also, to avoid confusion, maybe they could be renamed to iso88591_to_utf8() and utf8_to_iso88591(), with the old names kept as aliases. I got this idea from this comment: http://php.net/manual/en/function.utf8-encode.php#104906 Another thing to consider is that the manual perhaps ought to warn the user that ISO-8859-1 is not Windows-1252. A lot of text on the Internet marked as the former is actually the latter (thanks to the widespread use of Windows), and browsers assume this. Windows-1252 contains some extra printable characters where ISO-8859-1 has control characters, such as the Euro sign, curly quotes, the trademark sign, and some extra lengths of dash. So, interpreting Windows-1252 text as ISO-8859-1 will garble such characters. Thanks. -- Andrea Faulds https://ajf.me/