Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63180 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75410 invoked from network); 20 Sep 2012 04:21:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Sep 2012 04:21:06 -0000 Authentication-Results: pb1.pair.com smtp.mail=ircmaxell@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ircmaxell@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.170 as permitted sender) X-PHP-List-Original-Sender: ircmaxell@gmail.com X-Host-Fingerprint: 209.85.217.170 mail-lb0-f170.google.com Received: from [209.85.217.170] ([209.85.217.170:58976] helo=mail-lb0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 10/CC-15057-FA99A505 for ; Thu, 20 Sep 2012 00:21:04 -0400 Received: by lbbgp3 with SMTP id gp3so2066822lbb.29 for ; Wed, 19 Sep 2012 21:20:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0mJu7aEZzpk4B9mA/SJL3dIMyrzv+cLMXgoFjT7XzL0=; b=eavHRyONcWwWSYrBLGccOxTTnwLB9XcqUwYShMhynEKwvpFL2jSnnCRA5NR9zVPfjO l6w7idKDjsz6I5sUAK9E+nSZVR+CPIuyKcbfaC/kvYp6T9PjPTiXLCemmuIdmRWYP4l+ WT3e/EtyNuaYnXsU5UuOMxvJDCNOtYEB6ZMzBp6AWZ1Kc797aYiBk2xTaz0/CWtET6rO hhoyswY9iauMDYZVNhPDh8v6s5kzr0h7ERg6UYyQjiIQcem0ORALiuSzFqOMw96RUt+7 22gFkPZ4jyZR5Y3ozlGrwTakxViFp+qm1y5oZ545ehr4UVC7im3t5pzxfQYYODKUuJe9 0T3Q== MIME-Version: 1.0 Received: by 10.112.86.41 with SMTP id m9mr232117lbz.108.1348114859658; Wed, 19 Sep 2012 21:20:59 -0700 (PDT) Received: by 10.114.22.1 with HTTP; Wed, 19 Sep 2012 21:20:59 -0700 (PDT) In-Reply-To: <003201cd96d6$8db96e40$a92c4ac0$@org> References: <011201cd95c7$33d43c30$9b7cb490$@org> <011901cd95ce$bfea0900$3fbe1b00$@org> <003201cd96d6$8db96e40$a92c4ac0$@org> Date: Thu, 20 Sep 2012 00:20:59 -0400 Message-ID: To: "Bryan C. Geraghty" Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary=bcaec554e110538ca004ca1a7356 Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class From: ircmaxell@gmail.com (Anthony Ferrara) --bcaec554e110538ca004ca1a7356 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bryan, =93You hit the nail on the head here. You cannot black-list convert > ISO-8859-1 to UTF-8. However, when we talk about escaping, we're talking > about a context where the encoding is already correct, we're just > preventing special characters from imparting special meaning. In that cas= e, > escaping is the correct way of handling it.=94**** > > ** ** > > We can never safely assume that the encoding is correct. If the encoding > of the original data is different than the assumed encoding, characters > with =93special meaning=94 may have different values and will be allowed > through. For a simple proof-of-concept, see > http://shiflett.org/blog/2005/dec/google-xss-example. Now, that is a > specific exploit for an underlying vulnerability. The vulnerability is th= e > fact that htmlentities() doesn=92t decode the input before trying to esca= pe > characters. > Actually, in my mind, that's the role of filtering. You should filter the proper charset. Everything inside of the application should have a consistent character set. And if that's the case, these sorts of vulnerabilities (not to mention a whole host of possible bugs) are no longer possible... > What I=92m trying to convey is that all context relevant to the operation > matters. In this case, if characters are compared/replaced at the > byte-level, we need to decode to the byte-level before performing those > operations. To take that further, It=92s important for everyone to realiz= e > that encoding doesn=92t just apply to character sets; data is encoded for= a > specific layer. This is the same problem that the TCP and ISO layers solv= ed > decades ago; we=92re just adding layers above the application layer. You > wouldn=92t expect an HTML parser to be able to parse JavaScript because t= hey > are different encodings. If you wanted to translate an HTML implementatio= n > cleanly to a JavaScript implementation, you would have to decode the HTML > and then build a translator to build the same DOM elements in JavaScript.= I > know that=92s sort of a blurry line, but I need to wrap this up. Hopefull= y, > I=92ve conveyed the idea.**** > > ** ** > > The sooner we all grasp this concept of encoding layers, the sooner this > problem of injection/scripting at every layer goes away. The solution: > Decode all inputs, halt execution on decoding errors, and then re-encode > them. Yes, this is going to add overhead. But where security is concerned= , > we have to be willing to accept some overhead. > Again, that's the role of filtering. Inputs should never get to a presentation layer unfiltered. That's a bigger problem that needs to be addressed first. But I would concede that it's worth doing again at output to catch any issues. But those issues it catches should be seen as application bugs and not a caught attack vector... > Okay, with that out of the way, I=92ll reiterate my agreement with your > statement, =93I think it strongly depends upon the exact behavior of the > library. If we do wind up doing transcoding as well as escaping, then tha= t > may be valid. If we don't, then it wouldn't.=93**** > > ** ** > > If the aim of this API is to really tackle the problem, we need to go > beyond wrapping htmlentities() and htmlspecialchars() and change the name= s > to =93encode=94. If it=92s just to maintain the status quo and leave it t= o > developers who barely understand encoding or escaping to ensure that thei= r > entire stack is using the same encoding, then we should leave the name > as-is. > Just wrapping any library is often not a good idea. We'd need to add meaningful logic in addition to the namespace name change. So yes, I'm in favor of doing it right at that point... > The official PHP documentation discourages the use of > mysql_real_escape_string: > http://php.net/manual/en/function.mysql-real-escape-string.php. The > recommendation is to use a library that is character-set aware, like mysq= li > or PDO. But note that even using mysqli_real_escape_string or PDO:quote > requires you to manually set the connection-level character-set. I=92ve b= een > operating on the assumption (there I go assuming) that PDO prepared > statements were aware of the connection-level character set and mitigated > this problem; however, I just reviewed PDO=92s source code and I=92m star= ting > to question its implementation. As for your OWASP reference, keep in mind > that OWASP makes many tiers of recommendations. Notice that manually > escaping is the last option for mitigating injection problems. > In short, that's wrong (MRES is encouraged). But I've taken the reply off-list as it's off topic here. > In any case, I=92m not here to carry on an endless flame war. I just want= to > make sure that we=92re doing what=92s necessary to mitigate the number on= e > vulnerability in web applications. > I don't think this discussion is a flame war. I think it's a very good and constructive point that needs to be made. It's at least a whole lot more important and relevant than the last 40 posts on OOP vs Procedural names... Anthony --bcaec554e110538ca004ca1a7356--