Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:62540 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 78179 invoked from network); 26 Aug 2012 21:57:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 26 Aug 2012 21:57:49 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.48 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 74.125.82.48 mail-wg0-f48.google.com Received: from [74.125.82.48] ([74.125.82.48:65348] helo=mail-wg0-f48.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9D/4F-00843-BDB9A305 for ; Sun, 26 Aug 2012 17:57:48 -0400 Received: by wgbdq11 with SMTP id dq11so2599742wgb.29 for ; Sun, 26 Aug 2012 14:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=uPeR1nl2QDJUfp9N3LIoxkA7qYx0NlWMsK9SvS4eqww=; b=apTmFKjffY+XgO0V9OP+Z50p12kAdW5/0zkIoIJM91kbPd93Mzm1eHFQRAB3vAG1H6 rFcjzIGBvWBSFALu8E0NyAsDIxIg/JoEee92mWdg0p8mN3qp3X4i+mQPHwDdJcjw3LwK A82c9qH3y8NBNcge0f5xYmMJkp9scYPMtlsd3teh3hm3NoBP19QEkEMPxIg4ZYA53LaQ etQQJOJI3kRopDhLaXd8xRxEp0eqlM8BmrkxRIfYbPbGOsGxqoHP2ITiQy2iCXUrBw6y 4M/TRcISXulJzSLICHA6IFZkbPIoMK4QFiSyWz3wxRejQI63Ww/iu3j7BYi+L7M7lGHA Oagg== Received: by 10.216.195.40 with SMTP id o40mr6073683wen.36.1346018264979; Sun, 26 Aug 2012 14:57:44 -0700 (PDT) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.223.86.201 with HTTP; Sun, 26 Aug 2012 14:57:04 -0700 (PDT) In-Reply-To: <503A7F8D.5070200@sugarcrm.com> References: <5036551E.1030804@lerdorf.com> <503A7F8D.5070200@sugarcrm.com> Date: Mon, 27 Aug 2012 06:57:04 +0900 X-Google-Sender-Auth: MNEh3aoL13gV1KJFzQW5o8RtSC0 Message-ID: To: Stas Malyshev Cc: Rasmus Lerdorf , PHP internals Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] Default input encoding for htmlspecialchars/htmlentities From: yohgaki@ohgaki.net (Yasuo Ohgaki) Hi, 2012/8/27 Stas Malyshev : > Hi! > >> In PHP 6 we tried to introduce separate input, script and output >> encoding settings. Currently in 5.4 we don't have that, but we have >> those 3 separately for mbstring and for iconv: >> >> iconv.input_encoding >> iconv.internal_encoding >> iconv.output_encoding >> mbstring.http_input >> mbstring.internal_encoding >> mbstring.http_output >> >> Ideally we should be getting rid of the per-feature encoding settings >> and have a single set of them that we refer to when we need them. This > > I agree, having unified set of encodings would be a good thing. However, > I have a feeling most of the people won't really understand what these > three do, and would never bother to set them. From my experience, people > don't even bother to set PHP timezone, even though PHP complains each > time date function is accessed. So these will be left as default in > 99.999% of cases. I agree. Other than applications that are made by CJK native, I rarely see them set. > >> So do we create a new default_input_encoding ini directive mid-stream in >> 5.4 for this? Of course with the longer-term in mind that this will be >> part of a unified set of encoding settings in 5.5 and beyond. > > What happens to these 6 directives? Will we now have 9 directives for > setting the encoding? This reminds me of: http://xkcd.com/927/. Having > yet more settings is not really a solution to the problem of too many > different settings. So unless we deprecate all others in 5.5 and have > people use only generic ones it's not very useful. If we do deprecate > them, we need some kind of migration path - i.e. if you set > iconv.input_encoding what actually happens? If you set > default_input_encoding will it also set mbstring.http_input - or will it > affect mbstring without actually setting it? > I guess we'd need a good detailed RFC on this :) If I write patch for it, I'll modify iconv.*/mbstring.* to use php.* (or zend.*) When default_chartset is set and other settings are null, use it as default for all including htmlentities(), mb_*(), etc. default_charset will be single encoding configuration if user uses single encoding for application. How to deal with iconv.*/mbstring.* master: remove iconv.*/mbstring.* 5.4: iconv.*/mbstring.* remains for compatibility and use them it they set. We could remove iconv.*/mbstring.* for 5.4. It's a big change for CJK users but they will be okay with it. Almost all users are using single encoding for application anyway. I think removing iconv.*/mbstring.* for master and5.4 would be nicer. Any opinions? Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net