Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72160 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 97253 invoked from network); 4 Feb 2014 00:20:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 Feb 2014 00:20:23 -0000 Authentication-Results: pb1.pair.com smtp.mail=padraic.brady@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=padraic.brady@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.160.173 as permitted sender) X-PHP-List-Original-Sender: padraic.brady@gmail.com X-Host-Fingerprint: 209.85.160.173 mail-yk0-f173.google.com Received: from [209.85.160.173] ([209.85.160.173:58276] helo=mail-yk0-f173.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id F8/90-35654-64230F25 for ; Mon, 03 Feb 2014 19:20:22 -0500 Received: by mail-yk0-f173.google.com with SMTP id 20so43573792yks.4 for ; Mon, 03 Feb 2014 16:20:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=daHAmPhG9/7lo56Ik4TFUWDiTqDamTa8rVvXbpu34j8=; b=Dv+QuzVYX+QGuMvMPd+by9J/vWhW2HuKFIBGF6WbN8ky9aE9zsHgbtDFs1ptJrZ1cA nne/1kbpgN377U9Y4bObQZ4h1za2J9zJ8LI99J7XRV0T0GQpqMBfvKALxD5QizYhdifG nZ4EIEQwHrqML2a40O55bxhYzqhq7BFh8eucumjYHOy803SW/17NuXghrLfDESJa3Psn Yz4pDSrwIBlaNiCSDZ9ZEejtGgydZSuFVEIHRAnoJFp4Nu3OeroLgKgnNG8BdIiNeoT6 RLXVBml14C5YXxNMf2kL3xYTR/CSUZ6pZKxYU+sfy7tRBJtP9yc/5svOqcebEvWpawQm /JVw== MIME-Version: 1.0 X-Received: by 10.236.60.228 with SMTP id u64mr35288555yhc.34.1391473219287; Mon, 03 Feb 2014 16:20:19 -0800 (PST) Received: by 10.170.215.130 with HTTP; Mon, 3 Feb 2014 16:20:19 -0800 (PST) In-Reply-To: <52F02499.3000004@lsces.co.uk> References: <52EDBB30.3070209@ajf.me> <52EE1C2B.7030702@sugarcrm.com> <52EF50B6.1030404@sugarcrm.com> <52F014C3.4060007@sugarcrm.com> <52F01716.2040304@sugarcrm.com> <52F02499.3000004@lsces.co.uk> Date: Tue, 4 Feb 2014 00:20:19 +0000 Message-ID: To: Lester Caine Cc: PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] [RFC] Improve HTML escape From: padraic.brady@gmail.com (=?UTF-8?Q?P=C3=A1draic_Brady?=) Hi Lester, On 3 February 2014 23:22, Lester Caine wrote: > Yasuo Ohgaki wrote: >> >> I'm lost here. >> OWASP suggests to escape at least >> >> & --> & >> < --> < >> > --> > >> " --> " >> ' --> ' ' not recommended because its not in the HTML sp= ec >> (See: section 24.4.1) ' is in the XML and XHTML specs. >> / --> / forward slash is included as it helps end an HTML >> entity >> >> >> https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Ch= eat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML= _Element_Content >> >> I'm not sure why you state "already violate this requirement". > > > It may be that what you are asking for is a flag on htmlentities for 'OWA= SP' > compliant option. Others would probably view that as not then being html5 > compliant since html5 has it's own list of 'escaped' characters. One of t= he > irritating things I find is 'unescaping' a string does not return the > original string simply because the html5 rule has not been followed! A cl= ean > html5 result should be the default. OWASP compliance focuses on the special characters which are the same regardless of HTML spec. What is output MAY differ which is why it suggests something like hex encoding where differences between specs exist. > Looking at the Rule 2 from the OWASP they are actually asking for every > character below 256 to be escaped when used in an attribute! But the > important thing here is 'untrusted' data, and sanitising any externally > supplied data needs a little more care than simply trying to wrap it in > htmlentities which I think is what Stas is saying? Personally I try to av= oid > any path where input can be processed direct back to output, filter the > input, don't simply try and patch the output? It's not a question of validating/filtering input. Handling input get's it into the application where Mystery Process 1 - Infinity are performed. Who knows what these Mystery Processes do? I don't - I'm not writing everyones application for them! They could be grabbing data, transforming it, reading from the database, using a Composer package replaced en route by the NSA, etc. Ergo, we escape on output to HTML/JSON at all times and without exception. The same way we escape on output and without exception when the output target is a database. Input and Output are like borders - nobody gets across them without a customs check. It may seem unnecessary at times but that's because most of the point is to consistent to a fault to eliminate the risk of any errors in those Mystery Processes and to guarantee that the correct escaping is performed - DB? JSON? HTML? XML? RPC? Command Line? Also helps not having to dissect every single application route just to figure out every input's output encoding... That just drives me nuts, and I have seen it. It's easy to forget sometimes that other people have to maintain and audit your applications at times, so go easy on them! Paddy -- P=C3=A1draic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team Zend Framework PHP-FIG Representative