Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58880 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 92571 invoked from network); 12 Mar 2012 15:40:19 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Mar 2012 15:40:19 -0000 Authentication-Results: pb1.pair.com smtp.mail=mikegstowe@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=mikegstowe@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.210.170 as permitted sender) X-PHP-List-Original-Sender: mikegstowe@gmail.com X-Host-Fingerprint: 209.85.210.170 mail-iy0-f170.google.com Received: from [209.85.210.170] ([209.85.210.170:33706] helo=mail-iy0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id AF/32-15296-2E81E5F4 for ; Mon, 12 Mar 2012 10:40:19 -0500 Received: by iaeh11 with SMTP id h11so7954143iae.29 for ; Mon, 12 Mar 2012 08:40:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type; bh=2oV5eF/j2Dt5+GAluYWg77oT/9SPrmDub9ra++FHQ0M=; b=f4DzwX/UPGoODNrlRxZrDYJyaw7pSIN3bg8pVtrbs+0PYk5SqWvyYH9D94Tg5CVIrd 1qREnWqXk/9QGwPl60FVeuiq4l3K2PLFVecJLxakeSc3eiBq+6fOBGVyLDYX7OlxqS7m 8Sh66zyJ7+chpA/AFpPH5haR0vDHKbFnwitzLWOExOWNC6E1rsBhtf0Dmw6K9axlVTll Kp2W/nXuS0ygkIJ8bqVVW/pj4yPL/HshCfmMHdMunso1ai42n8w+GeJ/msCutMkxGjyN VBnPVPKD6VsogBgAz359O8/Y1s+HKMooCb+V02CdFQXK87djXff/lRxrj/0XbjIvbZMF Rx7Q== Received: by 10.42.138.9 with SMTP id a9mr16067256icu.14.1331566815916; Mon, 12 Mar 2012 08:40:15 -0700 (PDT) MIME-Version: 1.0 Sender: mikegstowe@gmail.com Received: by 10.50.79.33 with HTTP; Mon, 12 Mar 2012 08:39:54 -0700 (PDT) In-Reply-To: <4F5E15D6.6080302@lerdorf.com> References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> <4F5E15D6.6080302@lerdorf.com> Date: Mon, 12 Mar 2012 10:39:54 -0500 X-Google-Sender-Auth: Mtv8EsrObeneEa-6B03urO93M6E Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary=90e6ba6e86ea0eb43e04bb0d8f7f Subject: Re: [PHP-DEV] default charset confusion From: me@mikestowe.com (Michael Stowe) --90e6ba6e86ea0eb43e04bb0d8f7f Content-Type: text/plain; charset=ISO-8859-1 I think the ini directive, while adding another to the list, may be the most unobtrusive method to address this issue, at least for developers. I definitely agree with Rasmus that this could be one of the bigger headaches in transitioning to 5.4 (for non-UTF8 sites) and unless we can come up with a better solution, I say let's move forward with it for 5.4.1. - Mike On Mon, Mar 12, 2012 at 10:27 AM, Rasmus Lerdorf wrote: > On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: > > Hi > > > > I think following PHP 5.4.0 NEWS entry is misleading. > > > > . Changed default value of "default_charset" php.ini option from > ISO-8859-1 to > > UTF-8. (Rasmus) > > Yes, I have fixed that now. > > > I thought default_charset became UTF-8, so I was expecting > > following HTTP header. > > > > content-type text/html; charset=UTF-8 > > > > However, I got empty charset (missing 'charset=UTF-8'). > > So I looked up to source and found the line in SAPI.h > > > > 293 #define SAPI_DEFAULT_CHARSET "" > > > > Empty string should be "UTF-8", isn't it? > > No, we can't force an output charset on people since it would end up > breaking a lot of sites. > > > - php.ini's default_charset should be UTF-8. > > - determine_charset() should not blindly default to UTF-8 when there > > are no hint. > > > > Old htmlentities/htmlspecialchars actually determines charset from > > default_charset/mbstring.internal_encoding/etc. I think old behavior > > is better than now. > > > > How about make determine_charset() behaves like 5.3 and set the > > SAPI_DEFAULT_CHARSET to "UTF-8"? > > PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have: > > if (charset_hint == NULL) > return cs_8859_1; > > and in 5.4 we have: > > if (charset_hint == NULL) > return cs_utf_8; > > So there is no difference in their guessing when there is no hint, the > only difference is that in 5.4 we choose utf8 and in 5.3 we choose > 8859-1 in that case. > > -Rasmus > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > -- ----------------------- "My command is this: Love each other as I have loved you." John 15:12 ----------------------- --90e6ba6e86ea0eb43e04bb0d8f7f--