Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:62433 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 58339 invoked from network); 23 Aug 2012 16:22:10 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Aug 2012 16:22:10 -0000 Authentication-Results: pb1.pair.com smtp.mail=ajf@ajf.me; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ajf@ajf.me; sender-id=pass Received-SPF: pass (pb1.pair.com: domain ajf.me designates 64.22.89.133 as permitted sender) X-PHP-List-Original-Sender: ajf@ajf.me X-Host-Fingerprint: 64.22.89.133 oxmail.registrar-servers.com Received: from [64.22.89.133] ([64.22.89.133:53122] helo=oxmail.registrar-servers.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 23/E3-40468-0B856305 for ; Thu, 23 Aug 2012 12:22:08 -0400 Received: from [192.168.0.200] (5ad3285b.bb.sky.com [90.211.40.91]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by oxmail.registrar-servers.com (Postfix) with ESMTPSA id 227CA7580E9; Thu, 23 Aug 2012 12:22:04 -0400 (EDT) Message-ID: <5036588F.2060501@ajf.me> Date: Thu, 23 Aug 2012 17:21:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Rasmus Lerdorf CC: PHP internals References: <5036551E.1030804@lerdorf.com> <503655C8.9070406@ajf.me> <50365704.6020504@lerdorf.com> In-Reply-To: <50365704.6020504@lerdorf.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Default input encoding for htmlspecialchars/htmlentities From: ajf@ajf.me (Andrew Faulds) On 23/08/12 17:15, Rasmus Lerdorf wrote: > On 08/23/2012 09:09 AM, Andrew Faulds wrote: >> Personally, I think you should have just two encodings: page_encoding >> and internal_encoding. The former is for form input and page output >> (could be latin-1, for instance), and internal_encoding is the internal >> representation (default to utf-8 - you can deal with all of, say, >> latin-1, as well as unicode entities). Input and output, on the web at >> least, are almost always going to match. > No, we need 3. The internal/script encoding doesn't have to be the same > as the input encoding. It isn't common in the Western world, but > elsewhere people do write their scripts in their local encoding which > may very well be different from their input and/or output encodings. > > -Rasmus Oh, you mean script encoding, form input/page output encoding and internal representation? Because I don't see a need for differing default input (i.e. file/form input) and default output (i.e. page/file output) encodings. -- Andrew Faulds http://ajf.me/