Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:62471 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 12298 invoked from network); 25 Aug 2012 08:17:05 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Aug 2012 08:17:05 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.170 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 74.125.82.170 mail-we0-f170.google.com Received: from [74.125.82.170] ([74.125.82.170:63290] helo=mail-we0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3C/11-06857-00A88305 for ; Sat, 25 Aug 2012 04:17:05 -0400 Received: by weyr1 with SMTP id r1so1596940wey.29 for ; Sat, 25 Aug 2012 01:17:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=VTAptJKTnpoKjsedpT89LbvUPRMaUKKToatDgBVcK0Q=; b=xo3LxrNqPyhBXXgsLdzwf2kuBTi8xa6ZcsFP1YSjnK+0ELA8Dj1EgzCsgsZiHglgMl Ue8vV8ZZS0xq3DI7GimnPC/TqVYNiFcKQtiipjXDCquv8g1SEvc/7VzA4ylm0p3AXWqu xw6eGzz7XYJMWcyAYP2cjFgLu+Jpwtjwb0nqQqFznnvf+e5EzC7FyHukfCzlPlKH971z Ai5mecFmoaL4przY97ZpbOcN00a1e6902BAiWoLIzxOdzQCydjjfX8Ej6RGR1O3XDszl Ws5F4wZBzKXL1SgRDyTA3rgvckIkk3c14UweWSlRkfmLmt95ZXf0DsZ5mMMHvnUrkM69 K0ww== Received: by 10.180.74.33 with SMTP id q1mr11206866wiv.4.1345882620957; Sat, 25 Aug 2012 01:17:00 -0700 (PDT) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.223.86.201 with HTTP; Sat, 25 Aug 2012 01:16:19 -0700 (PDT) In-Reply-To: <5036551E.1030804@lerdorf.com> References: <5036551E.1030804@lerdorf.com> Date: Sat, 25 Aug 2012 17:16:19 +0900 X-Google-Sender-Auth: dS3S4u-_I2IPQRaQj6Urhwul_-A Message-ID: To: Rasmus Lerdorf Cc: PHP internals Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] Default input encoding for htmlspecialchars/htmlentities From: yohgaki@ohgaki.net (Yasuo Ohgaki) Hi, I'm +1 for having internal/input/output/script encoding setting at PHP or Zend level. If the default is the problem is the problem, we should set default_charset default to UTF-8 and use them as default for internal/input/output/script and functions that affected by encoding. When XSS advisory was released at Feb. 2000, it stated encoding MUST be specified in HTTP response header. Setting default_charset is the best practice for security perspective anyway. If we use default_charset as default encoding, transition to 5.4 might be easier. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net 2012/8/24 Rasmus Lerdorf : > htmlspecialchars(), htmlentities(), html_entity_decode() and > get_html_translation_table() all take an encoding parameter that used to > default to iso-8859-1. We changed the default in PHP 5.4 to UTF-8. This > is a much more sensible default and in the case of the encoding > functions more secure as it prevents invalid UTF-8 from getting through. > If you use 8859-1 as the default but your app is actually in UTF-8 or > worse, some encoding that isn't low-ascii compatible then > htmlspecialchars()/htmlentities() aren't doing what you think they are > and you have a glaring security hole in your app. > > However, people are understandably lazy and don't want to think about > this stuff. They don't want to explicitly provide their input encoding > to these calls. We provided a solution to this and a way to write > portable apps and that was to pass in an empty string "" as the > encoding. If we saw this we would set the input encoding to match the > output encoding specified by the "default_charset" ini setting. We > couldn't just default to this default_charset because input and output > encodings may very well be different and we would risk making existing > apps insecure. For example an app using BIG5/CJK for its output encoding > might very well be pulling data from 8859/UTF-8 data sources and if we > invisibly switched htmlspecialchars/entities to match their output > encoding we would have problems. Invisibly switching them from 8859-1 to > UTF-8 could still be problematic, but it at least it fails safe in that > it doesn't let invalid UTF-8 through and encodes low-ascii the same way > it did before. > > The problem is that there is a lot of legacy code out there that doesn't > explicitly set the encoding on those calls and it is a lot of work to go > through and specify it on each call. I still personally prefer to have > people be explicit here, but I think it is slowing 5.4 adoption (see bug > 61354). > > In PHP 6 we tried to introduce separate input, script and output > encoding settings. Currently in 5.4 we don't have that, but we have > those 3 separately for mbstring and for iconv: > > iconv.input_encoding > iconv.internal_encoding > iconv.output_encoding > mbstring.http_input > mbstring.internal_encoding > mbstring.http_output > > Ideally we should be getting rid of the per-feature encoding settings > and have a single set of them that we refer to when we need them. This > is one of these places where we really need a default input encoding > setting. We could have it check mbstring.http_input, but there is a > wrinkle here that it has a fancy "auto" setting which we don't really > want in this case. So we could set it to iconv.input_encoding, but that > seems rather random and unintuitive. > > So do we create a new default_input_encoding ini directive mid-stream in > 5.4 for this? Of course with the longer-term in mind that this will be > part of a unified set of encoding settings in 5.5 and beyond. > > -Rasmus > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php >