Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:95423 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 48530 invoked from network); 24 Aug 2016 01:22:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Aug 2016 01:22:38 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@ohgaki.net; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@ohgaki.net; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ohgaki.net designates 180.42.98.130 as permitted sender) X-PHP-List-Original-Sender: yohgaki@ohgaki.net X-Host-Fingerprint: 180.42.98.130 ns1.es-i.jp Received: from [180.42.98.130] ([180.42.98.130:50349] helo=es-i.jp) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 34/25-10212-BD6FCB75 for ; Tue, 23 Aug 2016 21:22:37 -0400 Received: (qmail 112836 invoked by uid 89); 24 Aug 2016 01:22:32 -0000 Received: from unknown (HELO mail-qt0-f169.google.com) (yohgaki@ohgaki.net@209.85.216.169) by 0 with ESMTPA; 24 Aug 2016 01:22:32 -0000 Received: by mail-qt0-f169.google.com with SMTP id w38so1005425qtb.0 for ; Tue, 23 Aug 2016 18:22:32 -0700 (PDT) X-Gm-Message-State: AE9vXwM/Rev68UAyj56AMSpjVcEdOMMVEgtpi3w+doCmx1bBLI3DNFx2pZBxXXzoT8ufGb9NH08/tuL1PZxCkA== X-Received: by 10.200.33.183 with SMTP id 52mr435194qty.128.1472001746193; Tue, 23 Aug 2016 18:22:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.85.242 with HTTP; Tue, 23 Aug 2016 18:21:45 -0700 (PDT) In-Reply-To: <89dc4d22-7f4f-3dde-deae-33fdfc87324d@php.net> References: <8f77da79-e843-aee0-e68d-e132ada5e782@gmail.com> <89dc4d22-7f4f-3dde-deae-33fdfc87324d@php.net> Date: Wed, 24 Aug 2016 10:21:45 +0900 X-Gmail-Original-Message-ID: Message-ID: To: Michael Wallner Cc: "internals@lists.php.net" Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] [RFC][DISCUSSION] Remove utf8_decode() and utf8_encode() From: yohgaki@ohgaki.net (Yasuo Ohgaki) Hi all, On Mon, Aug 22, 2016 at 7:55 PM, Michael Wallner wrote: > On 22/08/16 12:44, Rowan Collins wrote: > >> As far as I can see, these functions exist because the XML parser >> infrastructure needed them, and someone thought it might be handy to >> expose them to users. Funnily enough, the internal versions actually >> take a parameter for the target encoding, but only support US-ASCII and >> 8859-1: https://github.com/php/php-src/blob/master/ext/xml/xml.c#L283 >> >> If anything, they should probably have a "str_" prefix, and maybe even >> be moved into the string section of the source, exposed in such a way >> that the XML parser can still make use of them. > > Thanks for looking deeper. That makes even more sense now. Any more comments for prefixing "str_"? str_latin1_to_utf8() == utf8_encode() str_utf8_to_latin1() == utf8_decode() I'm a little uncomfortable to have special new encoding conversion functions for ISO-8859-1 in ext/standard. However, it's better than keeping utf8_decode/encode() as primary function names forever. Although encoding parameter is not exposed to users, but the XML module internal code for utf8_encode/decode() supports ISO-8859-1 and ASCII (convert chars > 127 to '?'). If this resolution is adopted, I'll remove ASCII support and make it work only for ISO-8859-1. No external library is used. New functions can be defined as ext/standard functions. Users cannot specify encoding, so there is no BC in userland. Internal APIs are exposed. 3rd party modules may have BC. Internal APIs are named xml_utf8_encode/decode(). I would not like to keep them in ext/standard nor expose them to 3rd party module developers. Alternatively, we may keep XML module as it is now and add "xml_" prefix functions xml_latin1_to_utf8() == utf8_encode() xml_utf8_to_latin1() == utf8_decodoe() then encourage users to use general encoding conversion functions in the manual. I prefer this way. May I have voting choices? Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net