Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:58921 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 56091 invoked from network); 14 Mar 2012 13:56:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Mar 2012 13:56:01 -0000 Authentication-Results: pb1.pair.com header.from=julienpauli@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=julienpauli@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.83.42 as permitted sender) X-PHP-List-Original-Sender: julienpauli@gmail.com X-Host-Fingerprint: 74.125.83.42 mail-ee0-f42.google.com Received: from [74.125.83.42] ([74.125.83.42:48328] helo=mail-ee0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F7/80-51575-073A06F4 for ; Wed, 14 Mar 2012 08:56:01 -0500 Received: by eekb57 with SMTP id b57so922202eek.29 for ; Wed, 14 Mar 2012 06:55:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=5OZynWsolA7xMU/dtzePxf/8ySTn/rCCCDxvkIcNfbc=; b=i1R1X6rfOXY2DJAaPy3tXE3YMGIPu5O9XLOk7u8MuClGC8FTeWEluYZ2tPr72V5vQd 2Sw1EgRfi37TLIDWLJv8xMUsbnsCd/T3n8ODlWdhVhde97xg3rEv/VL1Te64/wm8VDXo PErl3mgVxaAi11LqFKnGw299u8A/lw2BADtr8cA9pCqd/CjUZwQde/gIYAoXTArDbq4/ tfLBRdiWVprNa5K6gDQn5gTId8jsM3qhJ05GLqham2zlhBYJVj34O+t0sBwK4AemsaYv GxvHfW89EOlVJK5K6ig/RumwzOcsvSACdUsU1GE6WXfJSZm3uCndz+rL/7EIMZ0jKCB6 bQ+w== Received: by 10.213.4.210 with SMTP id 18mr221163ebs.38.1331733357291; Wed, 14 Mar 2012 06:55:57 -0700 (PDT) MIME-Version: 1.0 Sender: julienpauli@gmail.com Received: by 10.213.15.196 with HTTP; Wed, 14 Mar 2012 06:55:17 -0700 (PDT) In-Reply-To: References: <4F5D9C77.3030000@lerdorf.com> <4F5DA152.10109@sugarcrm.com> <4F5DA894.8060606@lerdorf.com> <4F5DAB49.3030808@sugarcrm.com> <4F5DAFCE.8020600@lerdorf.com> <4F5E15D6.6080302@lerdorf.com> Date: Wed, 14 Mar 2012 14:55:17 +0100 X-Google-Sender-Auth: zpQJ0W-LDp_xnxFGZuAraZu4b4s Message-ID: To: Yasuo Ohgaki Cc: Rasmus Lerdorf , Stas Malyshev , PHP internals Content-Type: multipart/alternative; boundary=0015174bf198b26a2804bb345578 Subject: Re: [PHP-DEV] default charset confusion From: jpauli@php.net (jpauli) --0015174bf198b26a2804bb345578 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue, Mar 13, 2012 at 1:52 AM, Yasuo Ohgaki wrote: > 2012/3/13 Rasmus Lerdorf : > > On 03/12/2012 03:05 AM, Yasuo Ohgaki wrote: > >> I thought default_charset became UTF-8, so I was expecting > >> following HTTP header. > >> > >> content-type text/html; charset=3DUTF-8 > >> > >> However, I got empty charset (missing 'charset=3DUTF-8'). > >> So I looked up to source and found the line in SAPI.h > >> > >> 293 #define SAPI_DEFAULT_CHARSET "" > >> > >> Empty string should be "UTF-8", isn't it? > > > > No, we can't force an output charset on people since it would end up > > breaking a lot of sites. > > Right, so may be for the next major release? 5.5.0? > > As the first XSS advisory in 2000 states, explicitly setting char coding > will > prevent certain XSS. Recent browsers have much better encoding handing, > but setting encoding explicitly is better for security still. > > > PHP 5.3's determine_charset behaves exactly like 5.4's. In 5.3 we have: > > > > if (charset_hint =3D=3D NULL) > > return cs_8859_1; > > > > and in 5.4 we have: > > > > if (charset_hint =3D=3D NULL) > > return cs_utf_8; > > > > So there is no difference in their guessing when there is no hint, the > > only difference is that in 5.4 we choose utf8 and in 5.3 we choose > > 8859-1 in that case. > > I got this with 5.3 > echo htmlentities('<=E6=97=A5=E6=9C=AC=E8=AA=9EUTF-8>',ENT_QUOTES); > echo htmlentities('<=E6=97=A5=E6=9C=AC=E8=AA=9EUTF-8>',ENT_QUOTES, 'UTF-8= '); > > <æ=EF=BF=BD¥æ=EF=BF=BD¬èª=EF=BF=BDUTF8 > ><=E6=97=A5=E6=9C=AC=E8=AA=9EUTF-8> > > So people migrating from 5.3 to 5.4 should not have problems. > Migration older than 5.3 to 5.4 will be problematic. > > I always set all parameters for htmlentities/htmlspecialchars, therefore > I haven't noticed this was changed from 5.3. They may be migrating from > 5.2 or older. (RHEL5 uses 5.1) > > Since PHP does not have default multibyte module, it may be good for havi= ng > > input_encoding > internal_encoding > output_encoding > > I would then propose to make mbstring compile time mandatory. I'm against yet another global ini setting, I find the actual ini settings confusing enough to add one more that would moreover reflect mbstring one's (and add more and more confusion). Why not turn ext/mbstring mandatory at compile time, for all future PHP versions, like preg or spl are ? We do need multibyte handling either. ZendEngine takes advantage of mbstring for internal encoding as well, so I probably missed something as why it is still possible to --disable-mbstring (or not add --enable-mbstring) when compiling ? Has it a huge performance impact ? Thank you :) Julien.P --0015174bf198b26a2804bb345578--