Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58884 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 17823 invoked from network); 12 Mar 2012 19:41:02 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 19:41:02 -0000 Authentication-Results: pb1.pair.com header.from=smalyshev@sugarcrm.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=smalyshev@sugarcrm.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain sugarcrm.com designates 67.192.241.173 as permitted sender) X-PHP-List-Original-Sender: smalyshev@sugarcrm.com X-Host-Fingerprint: 67.192.241.173 smtp173.dfw.emailsrvr.com Linux 2.6 Received: from [67.192.241.173] ([67.192.241.173:49170] helo=smtp173.dfw.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 87/C0-13375-D415E5F4 for ; Mon, 12 Mar 2012 14:41:01 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp7.relay.dfw1a.emailsrvr.com (SMTP Server) with ESMTP id E2C072584B6; Mon, 12 Mar 2012 15:40:57 -0400 (EDT) X-Virus-Scanned: OK Received: by smtp7.relay.dfw1a.emailsrvr.com (Authenticated sender: smalyshev-AT-sugarcrm.com) with ESMTPSA id 7260225812B; Mon, 12 Mar 2012 15:40:57 -0400 (EDT) Message-ID: <4F5E5148.4030106@sugarcrm.com> Date: Mon, 12 Mar 2012 12:40:56 -0700 Organization: SugarCRM User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: Rasmus Lerdorf CC: PHP internals References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> In-Reply-To: <4F5DAFCE.8020600@lerdorf.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] default charset confusion From: smalyshev@sugarcrm.com (Stas Malyshev) Hi! > And yes, it may very well be dangerous to use the wrong charset and now > that we have better support for GB2312 and other asian charsets in the > entities functions in 5.4 it is even more prudent to choose the right > one so we should provide some way to help people get it right short of > changing every call. I'm not sure "changing every call" is such a big problem - it's one grep and one replace, can be done in one line of sed/awk/perl/php probably. But a bigger issue is here that people insist on using wrong charsets and expect language to have some magical external defaults that work for exactly their use case, instead of doing what they should be doing all along - putting charset right there in the argument. We need to get people off this mindset fast, since it is not a good one. Having tons of hidden defaults that modify behavior of functions called with the same arguments in hundreds of different ways is a coding and maintenance nightmare. Now if I write htmlspecialchars() I can never be sure if works right and uses UTF-8 - what if somebody messed with the INI setting because of some other broken library that required that to work? -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227