Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:62530 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 59197 invoked from network); 26 Aug 2012 19:57:08 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 26 Aug 2012 19:57:08 -0000 Authentication-Results: pb1.pair.com smtp.mail=smalyshev@sugarcrm.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=smalyshev@sugarcrm.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain sugarcrm.com designates 67.192.241.147 as permitted sender) X-PHP-List-Original-Sender: smalyshev@sugarcrm.com X-Host-Fingerprint: 67.192.241.147 smtp147.dfw.emailsrvr.com Linux 2.6 Received: from [67.192.241.147] ([67.192.241.147:39285] helo=smtp147.dfw.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3C/5B-00843-09F7A305 for ; Sun, 26 Aug 2012 15:57:06 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp31.relay.dfw1a.emailsrvr.com (SMTP Server) with ESMTP id 08E5B5050E; Sun, 26 Aug 2012 15:57:02 -0400 (EDT) X-Virus-Scanned: OK Received: by smtp31.relay.dfw1a.emailsrvr.com (Authenticated sender: smalyshev-AT-sugarcrm.com) with ESMTPSA id 88C18504F2; Sun, 26 Aug 2012 15:57:01 -0400 (EDT) Message-ID: <503A7F8D.5070200@sugarcrm.com> Date: Sun, 26 Aug 2012 12:57:01 -0700 Organization: SugarCRM User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: Rasmus Lerdorf CC: PHP internals References: <5036551E.1030804@lerdorf.com> In-Reply-To: <5036551E.1030804@lerdorf.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Default input encoding for htmlspecialchars/htmlentities From: smalyshev@sugarcrm.com (Stas Malyshev) Hi! > In PHP 6 we tried to introduce separate input, script and output > encoding settings. Currently in 5.4 we don't have that, but we have > those 3 separately for mbstring and for iconv: > > iconv.input_encoding > iconv.internal_encoding > iconv.output_encoding > mbstring.http_input > mbstring.internal_encoding > mbstring.http_output > > Ideally we should be getting rid of the per-feature encoding settings > and have a single set of them that we refer to when we need them. This I agree, having unified set of encodings would be a good thing. However, I have a feeling most of the people won't really understand what these three do, and would never bother to set them. From my experience, people don't even bother to set PHP timezone, even though PHP complains each time date function is accessed. So these will be left as default in 99.999% of cases. > So do we create a new default_input_encoding ini directive mid-stream in > 5.4 for this? Of course with the longer-term in mind that this will be > part of a unified set of encoding settings in 5.5 and beyond. What happens to these 6 directives? Will we now have 9 directives for setting the encoding? This reminds me of: http://xkcd.com/927/. Having yet more settings is not really a solution to the problem of too many different settings. So unless we deprecate all others in 5.5 and have people use only generic ones it's not very useful. If we do deprecate them, we need some kind of migration path - i.e. if you set iconv.input_encoding what actually happens? If you set default_input_encoding will it also set mbstring.http_input - or will it affect mbstring without actually setting it? I guess we'd need a good detailed RFC on this :) -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227