Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:69977 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 11831 invoked from network); 31 Oct 2013 08:21:02 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 Oct 2013 08:21:02 -0000 X-Host-Fingerprint: 80.4.21.210 cpc22-asfd3-2-0-cust209.1-2.cable.virginm.net Received: from [80.4.21.210] ([80.4.21.210:2789] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 98/21-01939-DE212725 for ; Thu, 31 Oct 2013 03:21:02 -0500 To: internals@lists.php.net,Yasuo Ohgaki Message-ID: <527212E9.7040201@php.net> Date: Thu, 31 Oct 2013 08:20:57 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7 MIME-Version: 1.0 References: <526F98C1.4040607@php.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 80.4.21.210 Subject: Re: [RFC] Use default_charset As Default Character Encoding From: krakjoe@php.net (Joe Watkins) On 10/30/2013 11:05 PM, Yasuo Ohgaki wrote: > Hi Joe, > > On Tue, Oct 29, 2013 at 8:15 PM, Joe Watkins wrote: > >> I'm not sure what it is you are actually trying to achieve here ?? >> > > I have 3 objectives in this RFC. > > 1. Setting charset in HTTP header is recommended since the first XSS > advisory in 2000 Feb. by CERT and Microsoft. > 2. There are too many encoding settings and it is better to consolidated. > 3. If we have yet another multibyte string module in the future, the new > settings can be used. > > I'll add these if I didn't write them in RFC later. > > I proposed "default_charset=UTF-8" years ago, but there were many users > uses "ISO-8859-*"/"EUC-*"/etc at that time and we decided leave the setting > to users. > > +1 on the 5.5 changes >> >> But the rest I don't really understand what the aim is, it would seem that >> renaming settings, especially ones that are not actually anything to do >> with the core, is just breaking compatibility for no good reason. >> > > Encoding must be specified for proper operation. It's a security risk also. > > >> What I could understand is a proposal to move the functionality provided >> by mbstring/iconv into core and introduce dot script_encoding complementary >> settings: >> >> zend.input_encoding >> zend.output_encoding >> >> I could understand this kind of proposal being aimed at 6. >> > > I don't think Zend engine will have multibyte char handling feature at > least any time soon. > > Currently, Zend engine has zend multibyte option, but it's only for > encoding that is not > compatible ISO-8859-1. (e.g. SJIS, BIG5. These encodings has \ in chars and > engine > would not work script written by these encodings with zend multibyte off.) > > However, having encoding settings in the engine will work also even if it > does not use > them. It may be a good idea have these settings in the engine. I'm +1 for > this idea. > > Regards, > > -- > Yasuo Ohgaki > yohgaki@ohgaki.net > I don't see that it is possible to merge the settings from different libraries, what if an application is relying on mbstring and iconv having different settings ?? It's possible that applications are relying on the separation of their settings in order to function properly, is what I am trying to say. The only way you could possibly merge those configuration settings is by also merging the functionality, there's no backward compatible way to do that, but I can imagine at some time in the future those libraries being used to support all of the required input/output/script encoding features at the level of Zend. I don't see how this can move forward and not break stuff ... Cheers Joe