Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58879 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 90709 invoked from network); 12 Mar 2012 15:27:25 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 15:27:25 -0000 Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.213.42 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.213.42 mail-yw0-f42.google.com Received: from [209.85.213.42] ([209.85.213.42:34672] helo=mail-yw0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 26/D1-15296-CD51E5F4 for ; Mon, 12 Mar 2012 10:27:25 -0500 Received: by yhfq11 with SMTP id q11so3008926yhf.29 for ; Mon, 12 Mar 2012 08:27:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-gm-message-state; bh=d/uLzIcFKn6599HB2XAT3YvzQGmv7bnvKxqCA3L64Xk=; b=ZZT9uD9lpIXFPp1TT/1DzxssL3JGVbJjijALbJ3/8IG+DegppoyD5cRRmtJx1UQy7D s0EoEwsyIekSCtfeCaFv9YYsdhj0PrLkbgwzIltqUUDCClL5oRpch43376dsgx1zJJuk tpzUcdw7c0LGAM5+MBB+/8Y1fcvh5+0HCHkQjicShDZ+DfpsYsHgXOHkdx/gktasJt7n 5ph6/Urygk4tU1GovBE6pFz2A/wpNEf7aykd9kYzom/slwsOAWLjtPv5T2EfXhlBHssH ITt+3sLDpYeiR8yS1KJN6WIN8JdQaT8x5KCIynymAVUCl9XNnXKK7CBslZBPg9o9EwRK pq+A== Received: by 10.60.13.36 with SMTP id e4mr7847650oec.22.1331566042131; Mon, 12 Mar 2012 08:27:22 -0700 (PDT) Received: from [192.168.200.5] (c-50-131-44-225.hsd1.ca.comcast.net. [50.131.44.225]) by mx.google.com with ESMTPS id n1sm11180927oen.8.2012.03.12.08.27.20 (version=SSLv3 cipher=OTHER); Mon, 12 Mar 2012 08:27:21 -0700 (PDT) Message-ID: <4F5E15D6.6080302@lerdorf.com> Date: Mon, 12 Mar 2012 08:27:18 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Yasuo Ohgaki CC: Stas Malyshev , PHP internals References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> In-Reply-To: X-Enigmail-Version: 1.3.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQliRc64rS5/0LLPbl0+F6/BwgXUotS72NrhvcWFr1juJtHvBsvFhAUIYPm3vhgazOD3EZBF Subject: Re: [PHP-DEV] default charset confusion From: rasmus@lerdorf.com (Rasmus Lerdorf) On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: > Hi > > I think following PHP 5.4.0 NEWS entry is misleading. > > . Changed default value of "default_charset" php.ini option from ISO-8859-1 to > UTF-8. (Rasmus) Yes, I have fixed that now. > I thought default_charset became UTF-8, so I was expecting > following HTTP header. > > content-type text/html; charset=UTF-8 > > However, I got empty charset (missing 'charset=UTF-8'). > So I looked up to source and found the line in SAPI.h > > 293 #define SAPI_DEFAULT_CHARSET "" > > Empty string should be "UTF-8", isn't it? No, we can't force an output charset on people since it would end up breaking a lot of sites. > - php.ini's default_charset should be UTF-8. > - determine_charset() should not blindly default to UTF-8 when there > are no hint. > > Old htmlentities/htmlspecialchars actually determines charset from > default_charset/mbstring.internal_encoding/etc. I think old behavior > is better than now. > > How about make determine_charset() behaves like 5.3 and set the > SAPI_DEFAULT_CHARSET to "UTF-8"? PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have: if (charset_hint == NULL) return cs_8859_1; and in 5.4 we have: if (charset_hint == NULL) return cs_utf_8; So there is no difference in their guessing when there is no hint, the only difference is that in 5.4 we choose utf8 and in 5.3 we choose 8859-1 in that case. -Rasmus