Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:63180
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <003201cd96d6$8db96e40$a92c4ac0$@org>
References: <CALwr1Gn4OmHP=Qa7NKzpbkq+w8FEjUMZ3v678bbZVOGM23G35g@mail.gmail.com>
	<alpine.DEB.2.02.1209180809010.28929@whisky.home.derickrethans.nl>
	<CALwr1GnjwyE=yipK6UNkOm9m31LDDbpbcdgmr_nf0cUo+zOedw@mail.gmail.com>
	<CALwr1G=DOKmjudFFEAFY+6hmUh3q+qNe1ntWY1FLXP5GChLZdA@mail.gmail.com>
	<CAMUwpuSfd1U6aA_7o6d2jGm-i0PsLAf1MqKF8zvtcfzGFMGEXQ@mail.gmail.com>
	<CALwr1GkLMrenFjv49kE1a-i4vN3sVr2Rr1wRUgBWGg3=V34How@mail.gmail.com>
	<011201cd95c7$33d43c30$9b7cb490$@org>
	<CAAyV7nEqVq9QUeamrBr0VskscyMdfm8AzRSUPbTE8TV0b_rzYw@mail.gmail.com>
	<011901cd95ce$bfea0900$3fbe1b00$@org>
	<CAAyV7nHWyeUbxTJGgFW8scsO3jqF_seFf7yD8enWQ2_c0RY5yA@mail.gmail.com>
	<003201cd96d6$8db96e40$a92c4ac0$@org>
Date: Thu, 20 Sep 2012 00:20:59 -0400
Message-ID: <CAAyV7nEGmRAZWiv6QbNvxQZObfXZovZsPmRv-RiRA4iJjRGScg@mail.gmail.com>
To: "Bryan C. Geraghty" <bryan@ravensight.org>
Cc: internals@lists.php.net
Content-Type: multipart/alternative; boundary=bcaec554e110538ca004ca1a7356
Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class
From: ircmaxell@gmail.com (Anthony Ferrara)

--bcaec554e110538ca004ca1a7356
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Bryan,

=93You hit the nail on the head here. You cannot black-list convert
> ISO-8859-1 to UTF-8. However, when we talk about escaping, we're talking
> about a context where the encoding is already correct, we're just
> preventing special characters from imparting special meaning. In that cas=
e,
> escaping is the correct way of handling it.=94****
>
> ** **
>
> We can never safely assume that the encoding is correct. If the encoding
> of the original data is different than the assumed encoding, characters
> with =93special meaning=94 may have different values and will be allowed
> through. For a simple proof-of-concept, see
> http://shiflett.org/blog/2005/dec/google-xss-example.  Now, that is a
> specific exploit for an underlying vulnerability. The vulnerability is th=
e
> fact that htmlentities() doesn=92t decode the input before trying to esca=
pe
> characters.
>

Actually, in my mind, that's the role of filtering. You should filter the
proper charset. Everything inside of the application should have a
consistent character set. And if that's the case, these sorts of
vulnerabilities (not to mention a whole host of possible bugs) are no
longer possible...


> What I=92m trying to convey is that all context relevant to the operation
> matters. In this case, if characters are compared/replaced at the
> byte-level, we need to decode to the byte-level before performing those
> operations. To take that further, It=92s important for everyone to realiz=
e
> that encoding doesn=92t just apply to character sets; data is encoded for=
 a
> specific layer. This is the same problem that the TCP and ISO layers solv=
ed
> decades ago; we=92re just adding layers above the application layer. You
> wouldn=92t expect an HTML parser to be able to parse JavaScript because t=
hey
> are different encodings. If you wanted to translate an HTML implementatio=
n
> cleanly to a JavaScript implementation, you would have to decode the HTML
> and then build a translator to build the same DOM elements in JavaScript.=
 I
> know that=92s sort of a blurry line, but I need to wrap this up. Hopefull=
y,
> I=92ve conveyed the idea.****
>
> ** **
>
> The sooner we all grasp this concept of encoding layers, the sooner this
> problem of injection/scripting at every layer goes away. The solution:
> Decode all inputs, halt execution on decoding errors,  and then re-encode
> them. Yes, this is going to add overhead. But where security is concerned=
,
> we have to be willing to accept some overhead.
>

Again, that's the role of filtering. Inputs should never get to a
presentation layer unfiltered. That's a bigger problem that needs to be
addressed first. But I would concede that it's worth doing again at output
to catch any issues. But those issues it catches should be seen as
application bugs and not a caught attack vector...


> Okay, with that out of the way, I=92ll reiterate my agreement with your
> statement, =93I think it strongly depends upon the exact behavior of the
> library. If we do wind up doing transcoding as well as escaping, then tha=
t
> may be valid. If we don't, then it wouldn't.=93****
>
> ** **
>
> If the aim of this API is to really tackle the problem, we need to go
> beyond wrapping htmlentities() and htmlspecialchars() and change the name=
s
> to =93encode=94. If it=92s just to maintain the status quo and leave it t=
o
> developers who barely understand encoding or escaping to ensure that thei=
r
> entire stack is using the same encoding, then we should leave the name
> as-is.
>

Just wrapping any library is often not a good idea. We'd need to add
meaningful logic in addition to the namespace name change. So yes, I'm in
favor of doing it right at that point...


> The official PHP documentation discourages the use of
> mysql_real_escape_string:
> http://php.net/manual/en/function.mysql-real-escape-string.php. The
> recommendation is to use a library that is character-set aware, like mysq=
li
> or PDO. But note that even using mysqli_real_escape_string or PDO:quote
> requires you to manually set the connection-level character-set. I=92ve b=
een
> operating on the assumption (there I go assuming) that PDO prepared
> statements were aware of the connection-level character set and mitigated
> this problem; however, I just reviewed PDO=92s source code and I=92m star=
ting
> to question its implementation. As for your OWASP reference, keep in mind
> that OWASP makes many tiers of recommendations. Notice that manually
> escaping is the last option for mitigating injection problems.
>

In short, that's wrong (MRES is encouraged). But I've taken the reply
off-list as it's off topic here.


> In any case, I=92m not here to carry on an endless flame war. I just want=
 to
> make sure that we=92re doing what=92s necessary to mitigate the number on=
e
> vulnerability in web applications.
>

I don't think this discussion is a flame war. I think it's a very good and
constructive point that needs to be made. It's at least a whole lot more
important and relevant than the last 40 posts on OOP vs Procedural names...

Anthony

--bcaec554e110538ca004ca1a7356--