Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58913 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 31005 invoked from network); 13 Mar 2012 14:21:39 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Mar 2012 14:21:39 -0000 X-Host-Fingerprint: 212.55.194.17 gw-search.cyberlink.ch Received: from [212.55.194.17] ([212.55.194.17:2762] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D5/74-06964-0F75F5F4 for ; Tue, 13 Mar 2012 09:21:38 -0500 To: internals@lists.php.net Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> <4F5E15D6.6080302@lerdorf.com> <4F5EA433.3060909@lerdorf.com> Date: Tue, 13 Mar 2012 15:21:33 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Message-ID: User-Agent: Opera Mail/11.61 (MacIntel) X-Posted-By: 212.55.194.17 Subject: Re: [PHP-DEV] default charset confusion From: cschneid@cschneid.com ("Christian Schneider") Am 13.03.2012, 02:34 Uhr, schrieb Rasmus Lerdorf : > On 03/12/2012 05:52 PM, Yasuo Ohgaki wrote: >> I always set all parameters for htmlentities/htmlspecialchars, therefore >> I haven't noticed this was changed from 5.3. They may be migrating from >> 5.2 or older. (RHEL5 uses 5.1) > > No, like I showed, moving from 5.3 to 5.4 breaks because the new default > UTF-8 encoding validates the input and 8859-1 in 5.3 does not. So for > charsets that are actually safe for the low-ascii chars that are > significant to html htmlspecialchars() now returns false in 5.4 because > their chars fail the UTF8 validity check. For people who explicitly set > all the parameters nothing has changed, of course. I second that. It causes us big PITA because we're still using 8859-1 (shame on us) and it is made even worse because the encoding parameter is after the (optional) flags parameter which now has to be given too. The sane version from my naive point of view would be to honor default_charset if nothing is given. That's what I expected when I read the migration guide. - Chris