Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58861 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 26876 invoked from network); 12 Mar 2012 07:52:45 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 07:52:45 -0000 Authentication-Results: pb1.pair.com smtp.mail=smalyshev@sugarcrm.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=smalyshev@sugarcrm.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain sugarcrm.com designates 207.97.245.203 as permitted sender) X-PHP-List-Original-Sender: smalyshev@sugarcrm.com X-Host-Fingerprint: 207.97.245.203 smtp203.iad.emailsrvr.com Linux 2.6 Received: from [207.97.245.203] ([207.97.245.203:37061] helo=smtp203.iad.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id FC/51-20445-D4BAD5F4 for ; Mon, 12 Mar 2012 02:52:45 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp30.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id 747B8204B5; Mon, 12 Mar 2012 03:52:42 -0400 (EDT) X-Virus-Scanned: OK Received: by smtp30.relay.iad1a.emailsrvr.com (Authenticated sender: smalyshev-AT-sugarcrm.com) with ESMTPSA id 12111204B3; Mon, 12 Mar 2012 03:52:41 -0400 (EDT) Message-ID: <4F5DAB49.3030808@sugarcrm.com> Date: Mon, 12 Mar 2012 00:52:41 -0700 Organization: SugarCRM User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: Rasmus Lerdorf CC: PHP internals References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> In-Reply-To: <4F5DA894.8060606@lerdorf.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] default charset confusion From: smalyshev@sugarcrm.com (Stas Malyshev) Hi! > Ignoring 5.4 for a second, if you in 5.3 do this: > > echo htmlspecialchars($string); > echo htmlspecialchars($string, NULL, "ISO-8859-1"); > echo htmlspecialchars($string, NULL, "UTF-8"); > > You will see that the first two output the escaped string with the > GB2312 bytes intact within it and the UTF-8 calls returns false because > it correctly recognizes that GB2312 is not UTF-8. We don't have any such > check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for > htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4. So the difference is that ISO8859-1 does not validate but UTF-8 validates? I'm not sure what GB2312 encoding does but isn't it dangerous to do htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also produce wrong result when used with wrong encoding? -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227