Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58908 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 73466 invoked from network); 13 Mar 2012 00:52:45 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Mar 2012 00:52:45 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.161.170 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.161.170 mail-gx0-f170.google.com Received: from [209.85.161.170] ([209.85.161.170:54426] helo=mail-gx0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D8/0C-13375-B5A9E5F4 for ; Mon, 12 Mar 2012 19:52:44 -0500 Received: by ggmb2 with SMTP id b2so3661859ggm.29 for ; Mon, 12 Mar 2012 17:52:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=SZXibheaptm60xGIxMuio5KVAp2a6qNWawJOdK2gvIU=; b=b3RHKw8Nf2ug34llw3/nSwjxF3MLKZCfbj0OJ3OfUbXF7GNPOpCfxXaXVMpkT5Z7b+ g7XEaEmyssD/cdY98kRJFP3Id8sP+RZulBWBKKovtAuzkRXaIAN98lEMGD69TJW6hz4W AGHofWcDTG3LNjgzWJhTsZCGo+IfLWxrWOy8pCkhmZNAW6GYRFf4F61jXLCL3Crh2fcK dqrl3Vg7wcwKdOl9jFbQwPglzmicFPC+c+BXoDsqV8lQ96BWg2D0QXucFAOqtOvp+/yV wjBVDxSeW8xq1BqDYeDmBbxbWf2kCY71TeVWlwQeHWbArjbsTFWGgM93fYgYQgmUrB6f k1Pw== Received: by 10.236.155.226 with SMTP id j62mr15294627yhk.49.1331599961102; Mon, 12 Mar 2012 17:52:41 -0700 (PDT) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.101.112.19 with HTTP; Mon, 12 Mar 2012 17:52:00 -0700 (PDT) In-Reply-To: <4F5E15D6.6080302@lerdorf.com> References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> <4F5E15D6.6080302@lerdorf.com> Date: Tue, 13 Mar 2012 09:52:00 +0900 X-Google-Sender-Auth: 3xquei1g3JHTTjY8Su32nGSC3_s Message-ID: To: Rasmus Lerdorf Cc: Stas Malyshev , PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] default charset confusion From: yohgaki@ohgaki.net (Yasuo Ohgaki) 2012/3/13 Rasmus Lerdorf : > On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: >> I thought default_charset became UTF-8, so I was expecting >> following HTTP header. >> >> content-type =C2=A0text/html; charset=3DUTF-8 >> >> However, I got empty charset (missing 'charset=3DUTF-8'). >> So I looked up to source and found the line in SAPI.h >> >> 293 =C2=A0 #define SAPI_DEFAULT_CHARSET =C2=A0 =C2=A0 =C2=A0 =C2=A0"" >> >> Empty string should be "UTF-8", isn't it? > > No, we can't force an output charset on people since it would end up > breaking a lot of sites. Right, so may be for the next major release? 5.5.0? As the first XSS advisory in 2000 states, explicitly setting char coding wi= ll prevent certain XSS. Recent browsers have much better encoding handing, but setting encoding explicitly is better for security still. > PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have: > > =C2=A0 =C2=A0if (charset_hint =3D=3D NULL) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return cs_8859_1; > > and in 5.4 we have: > > =C2=A0 =C2=A0if (charset_hint =3D=3D NULL) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return cs_utf_8; > > So there is no difference in their guessing when there is no hint, the > only difference is that in 5.4 we choose utf8 and in 5.3 we choose > 8859-1 in that case. I got this with 5.3 ',ENT_QUOTES); echo htmlentities('<=E6=97=A5=E6=9C=AC=E8=AA=9EUTF-8>',ENT_QUOTES, 'UTF-8')= ; <æ=EF=BF=BD¥æ=EF=BF=BD¬èª=EF=BF=BDUTF8 ><=E6=97=A5=E6=9C=AC=E8=AA=9EUTF-8> So people migrating from 5.3 to 5.4 should not have problems. Migration older than 5.3 to 5.4 will be problematic. I always set all parameters for htmlentities/htmlspecialchars, therefore I haven't noticed this was changed from 5.3. They may be migrating from 5.2 or older. (RHEL5 uses 5.1) Since PHP does not have default multibyte module, it may be good for having input_encoding internal_encoding output_encoding php.ini settings and make multibyte modules use them when they are set. Or just make mbstring default, alternatively. Rather big change for released version, but this is simple easy change. Regards, -- Yasuo Ohgaki