Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:62541 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79802 invoked from network); 26 Aug 2012 22:01:32 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 26 Aug 2012 22:01:32 -0000 Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.220.170 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.220.170 mail-vc0-f170.google.com Received: from [209.85.220.170] ([209.85.220.170:54029] helo=mail-vc0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 32/00-13425-BBC9A305 for ; Sun, 26 Aug 2012 18:01:31 -0400 Received: by vcbgb30 with SMTP id gb30so4351215vcb.29 for ; Sun, 26 Aug 2012 15:01:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-gm-message-state; bh=mJG3rMZkAreBZE0JzHcgiSix0wPbYH3B1BfE1t5lB4Y=; b=N4E6Pb9NdRDStJeRlG8UiZnCzvjUzbA0Cx1VAvX/3kG9u2CENh8pywVBqA2Np/ocDp Gviu1qKQkENdu0PvQVS7v66Do5d7D/yag/mP70/u6D4oXt1baRZ4nFuZrh/v2VHiT5QQ tBzl5dke2wviUQCEkFusLJVPRUgQE773BK9GJ2BiApjztscJMHaCzxVtvVaWgYhsY5bg T76jefhom9OVJXO/i9xkFfNL6xzLFSCiwF7A15LB6ZXwhD0AXnGrPBPP+yIYMgsWK+Ac neQqHtUkAq2xRlNcT+3+/PSLHykB0pZKHaBKOtcOqUuxNle1lOc1vM8CuqRMOSCNLaIy +d/A== Received: by 10.52.72.229 with SMTP id g5mr8647970vdv.26.1346018488417; Sun, 26 Aug 2012 15:01:28 -0700 (PDT) Received: from [192.168.200.148] (c-50-131-44-225.hsd1.ca.comcast.net. [50.131.44.225]) by mx.google.com with ESMTPS id da9sm842315vdc.11.2012.08.26.15.01.26 (version=SSLv3 cipher=OTHER); Sun, 26 Aug 2012 15:01:27 -0700 (PDT) Message-ID: <503A9CB5.2010506@lerdorf.com> Date: Sun, 26 Aug 2012 15:01:25 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Yasuo Ohgaki CC: Stas Malyshev , PHP internals References: <5036551E.1030804@lerdorf.com> <503A7F8D.5070200@sugarcrm.com> In-Reply-To: X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQkho0zvlOzZN53IDjKIK7Fkq5s40MnjUP/RuqMQpQeaeV4Xh903L6SG2D+WIiFq5UMw8t9d Subject: Re: [PHP-DEV] Default input encoding for htmlspecialchars/htmlentities From: rasmus@lerdorf.com (Rasmus Lerdorf) On 08/26/2012 02:57 PM, Yasuo Ohgaki wrote: > Hi, > > 2012/8/27 Stas Malyshev : >> Hi! >> >>> In PHP 6 we tried to introduce separate input, script and output >>> encoding settings. Currently in 5.4 we don't have that, but we have >>> those 3 separately for mbstring and for iconv: >>> >>> iconv.input_encoding >>> iconv.internal_encoding >>> iconv.output_encoding >>> mbstring.http_input >>> mbstring.internal_encoding >>> mbstring.http_output >>> >>> Ideally we should be getting rid of the per-feature encoding settings >>> and have a single set of them that we refer to when we need them. This >> >> I agree, having unified set of encodings would be a good thing. However, >> I have a feeling most of the people won't really understand what these >> three do, and would never bother to set them. From my experience, people >> don't even bother to set PHP timezone, even though PHP complains each >> time date function is accessed. So these will be left as default in >> 99.999% of cases. > > I agree. Other than applications that are made by CJK native, I rarely > see them set. > >> >>> So do we create a new default_input_encoding ini directive mid-stream in >>> 5.4 for this? Of course with the longer-term in mind that this will be >>> part of a unified set of encoding settings in 5.5 and beyond. >> >> What happens to these 6 directives? Will we now have 9 directives for >> setting the encoding? This reminds me of: http://xkcd.com/927/. Having >> yet more settings is not really a solution to the problem of too many >> different settings. So unless we deprecate all others in 5.5 and have >> people use only generic ones it's not very useful. If we do deprecate >> them, we need some kind of migration path - i.e. if you set >> iconv.input_encoding what actually happens? If you set >> default_input_encoding will it also set mbstring.http_input - or will it >> affect mbstring without actually setting it? >> I guess we'd need a good detailed RFC on this :) > > If I write patch for it, I'll modify iconv.*/mbstring.* to use php.* (or zend.*) > When default_chartset is set and other settings are null, use it as > default for all including htmlentities(), mb_*(), etc. > > default_charset will be single encoding configuration if user uses > single encoding for application. > > How to deal with iconv.*/mbstring.* > master: remove iconv.*/mbstring.* > 5.4: iconv.*/mbstring.* remains for compatibility and use them it they set. > > We could remove iconv.*/mbstring.* for 5.4. It's a big change for CJK > users but they will be okay with it. Almost all users are using single > encoding for application anyway. > > I think removing iconv.*/mbstring.* for master and5.4 would be nicer. > Any opinions? We can't remove them in 5.4. We can add new ones without breaking anything and we can make mbstring/iconv/html* use those if they are set and then mark the mbstring/iconv settings as deprecated in master. -Rasmus