Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63161 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 94854 invoked from network); 19 Sep 2012 17:48:51 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Sep 2012 17:48:50 -0000 Authentication-Results: pb1.pair.com header.from=padraic.brady@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=padraic.brady@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.160.42 as permitted sender) X-PHP-List-Original-Sender: padraic.brady@gmail.com X-Host-Fingerprint: 209.85.160.42 mail-pb0-f42.google.com Received: from [209.85.160.42] ([209.85.160.42:34356] helo=mail-pb0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 71/6F-15057-1850A505 for ; Wed, 19 Sep 2012 13:48:50 -0400 Received: by pbbrp8 with SMTP id rp8so3006797pbb.29 for ; Wed, 19 Sep 2012 10:48:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=4pv7Ihm9MSWKv1V3wz/crZ7ffFi2N1sdF8NBiNuw0dk=; b=g8Y+kW/xT/PTmjLTjt/ZCquDMzNinCc6VhtdK4h4xknsZMv939FXuFCL//tvMGVU// 3WJi7wMnpKWAIGbrfn6K/ilfpU1ZX+Hem9D8kdS03pvLmlNxhVlTVoOP8ZewxFnF+tOk cCROQPIfpvuGMJJp1+fpWSmlmGjuVei6n8DZShbUW5rlF9MfzbAK2Lp+SRMdf90+aUeR /u1vHPuCjioxBhV8iVSdCJ7LZDHnJ5Tkl2FdOd0LsgydUfbJYisNI4L8qCX86B8/EvZM WilcXjUCDPy0YZ/55g7qEooSUbPPB9QsE1u0oaj39bFWHqx4+B2b2RbGWkQcUiN4Mtqh ZuMQ== MIME-Version: 1.0 Received: by 10.68.116.232 with SMTP id jz8mr115360pbb.77.1348076927025; Wed, 19 Sep 2012 10:48:47 -0700 (PDT) Received: by 10.66.73.42 with HTTP; Wed, 19 Sep 2012 10:48:46 -0700 (PDT) In-Reply-To: <5059E74D.8080002@ajf.me> References: <5058AD25.8010304@mrclay.org> <5058AFD7.7080703@ajf.me> <5059E3A2.3090805@ajf.me> <5059E74D.8080002@ajf.me> Date: Wed, 19 Sep 2012 18:48:46 +0100 Message-ID: To: Andrew Faulds Cc: Steve Clay , PHP Internals Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class From: padraic.brady@gmail.com (=?ISO-8859-1?Q?P=E1draic_Brady?=) Hi Andrew, On Wed, Sep 19, 2012 at 4:39 PM, Andrew Faulds wrote: >> Than I suggest including "For" in all of them. escapeForHtml, >> escapeForUrl, etc. That should clear it up somewhat that we're not >> targeting whole blocks of HTML/JS/CSS. > > That still isn't clear enough, I think. escapeHTMLAttributeValue and > escapeHTMLText. It needs to be clear what HTML context you're dealing wit= h. I think we're running into being overly prescriptive. Escaping can never, by definition, apply to anything that isn't a value string or integer, i.e. anything that is capable of altering the meaning of the HTML/Javascript or CSS into which its inserted. The original function names applied to specific escaping "strategies" rather than the actual locations that that strategy was useful for. There is only one HTML escaping strategy, one Javascript escaping strategy, etc. >>>> For example, I'd prefer escapeForCss vs escapeCSSStringLiteral though >>>> both would be valid English literal alternatives to escapeCss. >>> >>> You can't just have escapeForCSS, you need two functions: one for CSS >>> identifier names (.identifier, #identifier, etc.), and one for CSS >>> strings >>> (background-image: url('string'); or content: 'string') >> >> Not really, the target here is breaking out of a CSS or HTML context. >> If you allow users to alter identifiers or properties than escaping is >> just wrong - you should be sanitising instead to make sure the CSS is >> still well formed and agrees to a whitelist of allowed ids/props. > > If property values or identifiers can't be escaped, what can? What do you > mean? > > Are you meaning in style=3D"" or ? In which case, why have= it? > You can just use a bog-standard HTML escaping function. In the above we escape the value so that an attacker cannot "breakout" into the CSS propery setting context to create new styles or identifier blocks. So it applies only to the values of properties and nothing else. If anything else is allowed to be subject to user input - then escaping becomes moot because you now need to perform CSS sanitisation to a whitelist to prevent the injection of phishing or clickjacking attacks. The documentation (if not the RFC) should be the place to emphasise when and where to use escaping functions. >>> Also, escapeForJS isn't very clear, you should explicitly specify you'r= e >>> escaping a string of text for a JavaScript string literal. I don't thin= k >>> you >>> can escape JS identifier names. >> >> JS is purely for literal values and not any JS variables/statements or >> anything else. Those can never ever be subject to any form of >> untrusted input. > > It needs to be clear it's a string literal though, and a literal at that. > Otherwise it's a little unclear. Still, I'm more worried about the CSS. Again, I think being overly prescriptive is unnecessary. It has semantic value, of course, but the game is already lost if the user isn't aware of where to safely apply escaping (a different problem to applying the correct encoding over the wrong encoding). I think we're bumping into a slightly different area of education here. Once users know where escaping applies, the names even in their shorter forms are fairly obvious as to which context they apply to. I think that specific education is better served with good quality documentation and examples (I'm all for docs with a dose of reality). Paddy --=20 P=E1draic Brady http://blog.astrumfutura.com http://www.survivethedeepend.com Zend Framework Community Review Team