Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58854 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 13256 invoked from network); 12 Mar 2012 06:53:42 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 06:53:42 -0000 Authentication-Results: pb1.pair.com header.from=laruence@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=laruence@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.170 as permitted sender) X-PHP-List-Original-Sender: laruence@gmail.com X-Host-Fingerprint: 209.85.217.170 mail-lb0-f170.google.com Received: from [209.85.217.170] ([209.85.217.170:33345] helo=mail-lb0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A6/32-33887-47D9D5F4 for ; Mon, 12 Mar 2012 01:53:41 -0500 Received: by lbbgg13 with SMTP id gg13so935483lbb.29 for ; Sun, 11 Mar 2012 23:53:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=c097W7X4VHWx2oSKk7h6nB3ZWkpk7kIADNobp5s9zQE=; b=IEVmKSht3lgLazgYxiurS3M4mpZ2iqfsYg85bHFoLRdkArSR8UClQNdJosne56SGW3 SuuLfU297U0pzGab3iDre8MLxriIfOKFf169bkxjNgoBKjNJvy9aNzNZ4zt/Z2Tx8RmR dPqjFQDuOJR2Jr4gFPOr9r03poL4Nj5iihf0NmtO/diSbPa1rDG0XOm9JdwUlS6Lg8Wf OSODmZMBO56Gij9aQxD8dTe53ezMqlDfM8//dANBqD0L6zY2A2HpcV5bEZXIW/k0dZ5R JWegppypYXlFPiVN46SJ8MfED6O0zk7moB9pOANFHaTRaPw0qnNAL4VNCdIbch7e8kT0 sWdA== Received: by 10.152.113.229 with SMTP id jb5mr8312673lab.45.1331535217259; Sun, 11 Mar 2012 23:53:37 -0700 (PDT) MIME-Version: 1.0 Sender: laruence@gmail.com Received: by 10.112.20.73 with HTTP; Sun, 11 Mar 2012 23:53:17 -0700 (PDT) In-Reply-To: <4F5D9C77.3030000@lerdorf.com> References: <4F5D9C77.3030000@lerdorf.com> Date: Mon, 12 Mar 2012 14:53:17 +0800 X-Google-Sender-Auth: gH032nh1OUGajzARH1xwYVzHx0w Message-ID: To: Rasmus Lerdorf Cc: PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] default charset confusion From: laruence@php.net (Laruence) On Mon, Mar 12, 2012 at 2:49 PM, Rasmus Lerdorf wrote: > I caused this situation myself by not explicitly differentiating between > the default charset for the internal htmlspecialchars() and > htmlentities() functions and the output charset directive ini directive > default_charset. > > The idea behind the default_charset ini directive was to act as the > charset that gets specified in the HTTP Content-type header if you do > not explicitly send your own Content-type header with the header() > function. This has been muddied a bit by the fact that > htmlspecialchars/htmlentities can take it into account when it is trying > to choose which encoding to use when handling data passed to it. This > isn't done by default since it actually makes little sense. It is only > done if you pass an empty string as the encoding argument. If you don't > pass anything at all the default is UTF-8 in 5.4. In 5.3 this was > ISO-8859-1. > > And here is where the confusion comes in. We, myself included, have told > people that they can get the 5.3 behaviour back by setting the > default_charset ini directive to iso-8859-1. But, this is only true if > they are forcing htmlspecialchars/htmlentities to check that setting > with an empty string as the encoding arg. Most apps just do > htmlspecialchars($str) and nothing else. Plus, it is really not a good > idea to tie the internal encoding of data being passed to these > functions to the output charset. You should be able to change the output > charset without worrying about your runtime encoding at that level. > > What this effectively means is that we are asking people to go through > their code and add an explicit charset to all htmlspecialchars() and > htmlentities() calls. I think this will be a hurdle for 5.4 adoption. > > What we really need is what we added in PHP 6. A runtime encoding ini > setting that is distinct from the output charset which we can use here. > That would allow people to fix all their legacy code to a specific > runtime encoding with a single ini setting instead of changing thousands > of lines of code. I propose that we add such a directive to 5.4.1 to > ease migration. +1, especially for non-utf8 applications. thanks > > See https://bugs.php.net/61354 for the first signs of grumbling about > this one. As more people migrate I have a feeling this will end up being > the most difficult part of the migration. > > -Rasmus > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > --=20 Laruence =C2=A0Xinchen Hui http://www.laruence.com/