Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58862 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28954 invoked from network); 12 Mar 2012 08:09:38 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 08:09:38 -0000 Authentication-Results: pb1.pair.com smtp.mail=adamjonr@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=adamjonr@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.160.170 as permitted sender) X-PHP-List-Original-Sender: adamjonr@gmail.com X-Host-Fingerprint: 209.85.160.170 mail-gy0-f170.google.com Received: from [209.85.160.170] ([209.85.160.170:42012] helo=mail-gy0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BB/B1-20445-04FAD5F4 for ; Mon, 12 Mar 2012 03:09:37 -0500 Received: by ghbg2 with SMTP id g2so2561681ghb.29 for ; Mon, 12 Mar 2012 01:09:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=AY7mh2wM5e8VZxGdvsUX3cZW/won0MI1mO6cF7g9rVc=; b=I2OtlGISN1uFfP5tSYkUVGBf8cj/FSS8iB6gJxbDFMl+uCeaH0sT/N0/1RQYQDq5AU ukZg/+q746wQpa+DcOUg+cydpyBotvQuLVU7R34HusQtR/aQQh2qulT6h5PnfUo9Dzhk 2VUYxbwrQANmmtlJpnO06eieqRCh07TkzAGlUKWrTYvm0hl33fKWMRqsjKLdqMrebkTt Loq7ZoIKSYInxAMP93z9ySnSdi7OmaQZ1tQY7KYqSVLJLqJC8CqEDV4ziELgKH26+Igp oG+SlyTGvw5bhrpI/hgRQH7fpKp+BatyUWGyQ9wFjY4LE0fQV6E6FfX2HyKNHkuRGxCX GKPw== MIME-Version: 1.0 Received: by 10.60.12.230 with SMTP id b6mr6306723oec.54.1331539774336; Mon, 12 Mar 2012 01:09:34 -0700 (PDT) Received: by 10.182.19.104 with HTTP; Mon, 12 Mar 2012 01:09:34 -0700 (PDT) In-Reply-To: <4F5DAB49.3030808@sugarcrm.com> References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> Date: Mon, 12 Mar 2012 04:09:34 -0400 Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary=e89a8ff1ce2640ecdf04bb0743a8 Subject: Re: [PHP-DEV] default charset confusion From: adamjonr@gmail.com (Adam Jon Richardson) --e89a8ff1ce2640ecdf04bb0743a8 Content-Type: text/plain; charset=ISO-8859-1 On Mon, Mar 12, 2012 at 3:52 AM, Stas Malyshev wrote: > Hi! > > > Ignoring 5.4 for a second, if you in 5.3 do this: >> >> echo htmlspecialchars($string); >> echo htmlspecialchars($string, NULL, "ISO-8859-1"); >> echo htmlspecialchars($string, NULL, "UTF-8"); >> >> You will see that the first two output the escaped string with the >> GB2312 bytes intact within it and the UTF-8 calls returns false because >> it correctly recognizes that GB2312 is not UTF-8. We don't have any such >> check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for >> htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4. >> > > So the difference is that ISO8859-1 does not validate but UTF-8 validates? > I'm not sure what GB2312 encoding does but isn't it dangerous to do > htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also > produce wrong result when used with wrong encoding? The EUC-CN encoding appears to ensure compatibility with ascii by avoiding the ascii range for each of its two bytes, so it seems that htmlspecialchars should work OK: http://en.wikipedia.org/wiki/GB_2312#EUC-CN http://php.net/manual/en/mbstring.supported-encodings.php Adam Adam --e89a8ff1ce2640ecdf04bb0743a8--