Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58865 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 42578 invoked from network); 12 Mar 2012 10:05:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 10:05:46 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.161.170 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.161.170 mail-gx0-f170.google.com Received: from [209.85.161.170] ([209.85.161.170:64842] helo=mail-gx0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 6C/93-20445-97ACD5F4 for ; Mon, 12 Mar 2012 05:05:45 -0500 Received: by ggmb2 with SMTP id b2so2651506ggm.29 for ; Mon, 12 Mar 2012 03:05:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=dlcZSASNiZDQX5ex2cWpBp2Ustc+XX78bPivPELIvxU=; b=OlOVi+/2Y/AfPQCR8hdWyceBF5ISeBQ85XnDL5h/prJQzAnv8Avcx0jkLkWZbM/t2r L/OKCTTK4kEaEOqS+X1Tqrg8F76e8aJH+iM1ZpHoHM2ytVMTLClrv0t9pz5Y6bSJfTIx jhiOzmSiXyqmZKeUu76uSak0xsDE8zx0tlTD/j0ln+oFixTDdLiijGxBKnvaAqqyG62D kIN6wnOTmzEFWXl7xkVGGrg8UiUy7qapsGQRAcgxFx77VFifX5L1pPBai6lTP7YSPrXT Ck3+iBqIjvjBfZ+HFswF0u5Wxk4Bo28M1eWodmdWkACkbOTFc8cy+aJ2OvMY8FcYfTCb 7KNw== Received: by 10.236.181.233 with SMTP id l69mr12637963yhm.32.1331546743132; Mon, 12 Mar 2012 03:05:43 -0700 (PDT) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.101.112.19 with HTTP; Mon, 12 Mar 2012 03:05:03 -0700 (PDT) In-Reply-To: <4F5DAFCE.8020600@lerdorf.com> References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> Date: Mon, 12 Mar 2012 19:05:03 +0900 X-Google-Sender-Auth: T8KTV5fp8SVJiIPdiYW5txDiFXs Message-ID: To: Rasmus Lerdorf Cc: Stas Malyshev , PHP internals Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] default charset confusion From: yohgaki@ohgaki.net (Yasuo Ohgaki) Hi I think following PHP 5.4.0 NEWS entry is misleading. . Changed default value of "default_charset" php.ini option from ISO-8859-1 to UTF-8. (Rasmus) I thought default_charset became UTF-8, so I was expecting following HTTP header. content-type text/html; charset=UTF-8 However, I got empty charset (missing 'charset=UTF-8'). So I looked up to source and found the line in SAPI.h 293 #define SAPI_DEFAULT_CHARSET "" Empty string should be "UTF-8", isn't it? BTW, empty charset in HTTP header does not mean the default will be ISO-8859-1, but it let browser guess the encoding is used. Guessing encoding may cause XSS under certain conditions. Anyway, I was curious so I've checked ext/standard/html.c and found /* {{{ entity_charset determine_charset * returns the charset identifier based on current locale or a hint. * defaults to UTF-8 */ static enum entity_charset determine_charset(char *charset_hint TSRMLS_DC) { int i; enum entity_charset charset = cs_utf_8; int len = 0; const zend_encoding *zenc; /* Default is now UTF-8 */ if (charset_hint == NULL) return cs_utf_8; There are 2 problems. - php.ini's default_charset should be UTF-8. - determine_charset() should not blindly default to UTF-8 when there are no hint. Old htmlentities/htmlspecialchars actually determines charset from default_charset/mbstring.internal_encoding/etc. I think old behavior is better than now. How about make determine_charset() behaves like 5.3 and set the SAPI_DEFAULT_CHARSET to "UTF-8"? Then PHP will behave like as NEWS mentions, htmlentities/htmlspecialchars default encoding became 'UTF-8' and users will have control for default htmlenties/htmlspecialchars encoding. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net