Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:95369 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 77158 invoked from network); 22 Aug 2016 11:16:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 22 Aug 2016 11:16:23 -0000 Authentication-Results: pb1.pair.com header.from=lester@lsces.co.uk; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=lester@lsces.co.uk; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lsces.co.uk from 217.147.176.230 cause and error) X-PHP-List-Original-Sender: lester@lsces.co.uk X-Host-Fingerprint: 217.147.176.230 mail4-3.serversure.net Received: from [217.147.176.230] ([217.147.176.230:46235] helo=mail4.serversure.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id CB/55-35246-50FDAB75 for ; Mon, 22 Aug 2016 07:16:22 -0400 Received: (qmail 19523 invoked by uid 89); 22 Aug 2016 11:16:19 -0000 Received: by simscan 1.3.1 ppid: 19517, pid: 19520, t: 0.0708s scanners: attach: 1.3.1 clamav: 0.96/m:52/d:10677 Received: from unknown (HELO ?10.0.0.7?) (lester@rainbowdigitalmedia.org.uk@81.138.11.136) by mail4.serversure.net with ESMTPA; 22 Aug 2016 11:16:19 -0000 To: internals@lists.php.net References: <8f77da79-e843-aee0-e68d-e132ada5e782@gmail.com> <89dc4d22-7f4f-3dde-deae-33fdfc87324d@php.net> Message-ID: <15bd49ce-a00d-f90f-4bb8-86576da547e8@lsces.co.uk> Date: Mon, 22 Aug 2016 12:16:18 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2 MIME-Version: 1.0 In-Reply-To: <89dc4d22-7f4f-3dde-deae-33fdfc87324d@php.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [RFC][DISCUSSION] Remove utf8_decode() and utf8_encode() From: lester@lsces.co.uk (Lester Caine) On 22/08/16 11:55, Michael Wallner wrote: >> As far as I can see, these functions exist because the XML parser >> > infrastructure needed them, and someone thought it might be handy to >> > expose them to users. Funnily enough, the internal versions actually >> > take a parameter for the target encoding, but only support US-ASCII and >> > 8859-1: https://github.com/php/php-src/blob/master/ext/xml/xml.c#L283 >> > >> > If anything, they should probably have a "str_" prefix, and maybe even >> > be moved into the string section of the source, exposed in such a way >> > that the XML parser can still make use of them. > Thanks for looking deeper. That makes even more sense now. The original code pre-dates the move to ext/ in 1999 where utf8_decode is hard coded as ISO-8859-1 but uses xml_utf8_decode internally. At that time of cause there was no provision for multi-byte characters and the decoding of a string is hard code in the function. If you look closer you will see that xml_utf8_decode still expects strings of type XML_Char * and so utf8_decode() wraps that to hide the differences. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk