Decode, transcode, sanitize, filter, escape

12 years ago by Lester Caine — view source

unread

I am beginning to see this as another 'date/time' type of problem. Adopt the
standard that everything internally is UTC and many of the problems go away.

I can remember discussions on unicode and PHP6. PHP5 was just being RC'ed with
tools for handling unicode (mbstring) but there was no coherence on how to
handle things ... as there still isn't. I was hoping that PHP6 would be
internally unicode, and then one only had to ensure that the interfaces
correctly coded to and from unicode. Internally everything is easy because there
is no 'encoding problems'.

'Content' going in and out needs to be correctly processed and that is the base
of this. The bulk of my own 'persistent data' is content such as 'wiki', blog',
'forum posts', 'articles' and so on. Others will most likely say that I should
not be using 'html' as the storage medium, but it does provide a flexible
standard format and 'ckeditor' provides a generic editor for all content. The
problem of cause is that we are storing html tags within the data so 'crude'
filtering using htmlspecialchars is not practical. The current process sanitizes
data input for normal users, but still allows 'admin' users direct source access
which is still a security risk, but we have to trust someone.

My point here is that much of what is being discussed on 'a core anti-XSS
escaping class' is missing the some of the basic problems and 'filtering' is my
own take on the correct way of managing this! Many of the recent XSS holes have
simply been the likes of the 'highlight' function is smarty which had no
filtering at all ... and just needed sanitizing before anything was done with
it. This 'class' is purely targeting a small area of the problem and repackaging
functions which still need the user to understand which 'filter' to apply to
which string? If it expected to simply apply a process to the output which will
'protect users' then it can never succeed. Te users need to understand just
where to 'filter' the strings they are using and what filters to use.

Now if what is proposed is a 'class' that will decompose an html page with
embeded css and js and magically remove any XSS injection then it might be
useful, and I think the creator of that would be in line for a Nobel prise?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

12 years ago by Ferenc Kovacs — view source

unread

My point here is that much of what is being discussed on 'a core anti-XSS
escaping class' is missing the some of the basic problems and 'filtering'
is my own take on the correct way of managing this!

and this is where you are wrong.
see
https://www.owasp.org/index.php/Abridged_XSS_Prevention_Cheat_Sheet#A_Positive_XSS_Prevention_Model
and
https://www.owasp.org/index.php/Abridged_XSS_Prevention_Cheat_Sheet#Why_Can.27t_I_Just_HTML_Entity_Encode_Untrusted_Data.3F

Many of the recent XSS holes have simply been the likes of the 'highlight'
function is smarty which had no filtering at all ... and just needed
sanitizing before anything was done with it.

you didn't experienced all of the possible contexts where an XSS
vulnerability can take place. this doesn't mean that those vectors doesn't
exists.

This 'class' is purely targeting a small area of the problem and
repackaging functions which still need the user to understand which
'filter' to apply to which string?

nope.
this class is targeting to provide the developers with a tool to safely
encode content into each possible context.

If it expected to simply apply a process to the output which will 'protect

users' then it can never succeed.

escaping the output doesn't mean that you can't also filter the input
(usually they walk hand in hand: "filter in escape out")
you are the only one preaching here that half of that is an ok solution.
if you only filter the input, you cannot use more than one output context
without the risk of compromise, and you also put all your defense in the
belief that you data stored in your relational database (or cache, etc.) is
safely filtered.

Te users need to understand just where to 'filter' the strings they are
using and what filters to use.

yeah, that's one thing that we can't fix, as for properly encoding the
output you need to know the output context.

Now if what is proposed is a 'class' that will decompose an html page with
embeded css and js and magically remove any XSS injection then it might be
useful, and I think the creator of that would be in line for a Nobel prise?

?
how does it relate to the current discussion

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Lester,

'Content' going in and out needs to be correctly processed and that is the
base of this. The bulk of my own 'persistent data' is content such as
'wiki', blog', 'forum posts', 'articles' and so on. Others will most likely
say that I should not be using 'html' as the storage medium, but it does
provide a flexible standard format and 'ckeditor' provides a generic editor
for all content. The problem of cause is that we are storing html tags
within the data so 'crude' filtering using htmlspecialchars is not
practical. The current process sanitizes data input for normal users, but
still allows 'admin' users direct source access which is still a security
risk, but we have to trust someone.

How you store data is somewhat irrelevant. If you store it as plain
text with no markup, that doesn't guarantee that someone will never
sneak in and add markup. This applies whether the context is itself
HTML or anything else. As a result the "crude" filtering is anything
but crude. It's a simple and effective part of a Defense In Depth
strategy. A far better solution, while only starting to gain traction
is to adopt a Content Security Policy which informs browsers about
what your markup should enable (i.e. a whitelist). By default, this
disables all inline Javascript, for example, which is the usual target
of XSS attacks.

The other thing about Defense In Depth is that when its advantages
exceed its disadvantages, applying it should be automatic. If folk
extract a value from a database and echo it to a HTML template
unescaped because it "should be" safe, I consider that a security
vulnerability. What if an SQLi attack altered it? What if an admin is
crooked or their password cracked? What if...x100. Security
vulnerabilities are not isolated events - they can be combined and
chained together.

My point here is that much of what is being discussed on 'a core anti-XSS
escaping class' is missing the some of the basic problems and 'filtering' is
my own take on the correct way of managing this! Many of the recent XSS
holes have simply been the likes of the 'highlight' function is smarty which
had no filtering at all ... and just needed sanitizing before anything was
done with it. This 'class' is purely targeting a small area of the problem
and repackaging functions which still need the user to understand which
'filter' to apply to which string? If it expected to simply apply a process
to the output which will 'protect users' then it can never succeed. Te users
need to understand just where to 'filter' the strings they are using and
what filters to use.

Filtering/Input Sanitisation goes hand in hand with Output
Encoding/Escaping. You can't have one without the other and also claim
to have executed a Defense In Depth strategy. You're then stripping
away defenses based on the expectation that whatever remains will
never fail. If it does, and it does all the time in reality, then your
lack of escaping as a backup is a massive problem.

What does this mean? Escaping is not a small area of the problem -
it's one of the biggest areas of the problem - potentially bigger than
input sanitisation since invalid values are irrelevant to proper
escaping which operates blindly by design. A lack of escaping impacts
every single point in every shred of application output which contains
data sourced from everything not literally defined in the current
request and just one failure may be sufficient for an attacker to dump
encoded Javascript into the browser to steal cookies, perform
requests, track key presses, rewrite HTTPS links, attack browser
extensions, and any number of other effects.

Your final point is accurate to a point. Users, by and large, don't
understand XSS. This is not, however, a justification for withholding
tools that are useful to those who do know how to properly use them.
Education is a separate issue which I'm also trying to address:
http://phpsecurity.readthedocs.org

Now if what is proposed is a 'class' that will decompose an html page with
embeded css and js and magically remove any XSS injection then it might be
useful, and I think the creator of that would be in line for a Nobel prise?

HTMLPurifier by Edward Z. Yang. It only works on body content - not
the header section, but knock yourself out ;). There's also the far
less CPU intensive option of the Content Security Policy though we're
reliant on the penetration of modern browsers to distribute that
across more users. That said, Defense In Depth - folk should seriously
consider implementing this right now.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Leigh — view source

unread

My whole point here is identifying WHAT needs 'escaping'. You can't simply
'escape' the output stream, you still want html tags to get out?

This problem is specific to YOU, because (as far as I understood your
previous post) you decided to store big chunks of HTML in your data
store. It is not a problem with this proposal, or a problem in
general.

Perhaps HTMLPurifier
should be a requirement everywhere, but then you need to 'pre-process' the
content so as to allow through the 'text' that you want to display which may
well be an example of a XSS attack? You can only apply 'escape' to elements
that you have identified as needing it and need to let through those that do
not.

No, it should not be a requirement. Most people inject directly into
attributes or into tags in a template fashion, they don't have to
parse their own output, because they generate it in a sensible
fashion.

It is the 'filtering' out of the material that needs processing that is
the problem? And I have no doubt that someone will find a hole that allows
them to sneak past the filtering?

Again, your problem, not one with the escape function proposal.

Also again, may I direct you to the general user list, and maybe
someone there will feel like helping you with how to parse your HTML
blobs, since this really isn't the concern of internals.

12 years ago by Ferenc Kovacs — view source

unread

My whole point here is identifying WHAT needs 'escaping'. You can't
simply
'escape' the output stream, you still want html tags to get out?

This problem is specific to YOU, because (as far as I understood your
previous post) you decided to store big chunks of HTML in your data
store. It is not a problem with this proposal, or a problem in
general.

more specifically: accepting HTML, but trying to allow some of the tags but
still filtering most of it.
HTMLPurifier is the tool for this kind of job, but most people would
recommend using some kind of alternative markup format, like
BBCodehttp://en.wikipedia.org/wiki/BBCode
.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu

12 years ago by padraic.brady@gmail.com — view source

unread

Hey,

Also bear in mind another outcome of Defense In Depth - you can't
trust HTML generators ;). Alternative formats like BBCode and Markdown
(which actually allows arbitary HTML insertions as part of the
specification) generate HTML but do not necessarily filter or validate
the contents.

You also have a quality issue. Integrated apps like phpBB or Github
make alternative formats look safe because they are already restricted
and sanitised by the app itself. Isolated libraries adhering to the
specification probably won't do this at all leaving it up to the user
to perform sanitisation of the generated output.

They both still need HTMLPurifier or a similar whitelisted sanitiser
to ensure it's safe.

Simple example is to grab phpmarkdown from Twitter, parse some
Markdown containing a <script> tag and see what it outputs. :P

If I recall, a common problem with naive BBCode libs was not filtering
for javascript: URIs properly.

more specifically: accepting HTML, but trying to allow some of the tags but
still filtering most of it.
HTMLPurifier is the tool for this kind of job, but most people would
recommend using some kind of alternative markup format, like
BBCodehttp://en.wikipedia.org/wiki/BBCode

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Lester Caine — view source

unread

Ferenc Kovacs wrote:

 > My whole point here is identifying WHAT needs 'escaping'. You can't simply
 > 'escape' the output stream, you still want html tags to get out?

This problem is specific to YOU, because (as far as I understood your
previous post) you decided to store big chunks of HTML in your data
store. It is not a problem with this proposal, or a problem in
general.
more specifically: accepting HTML, but trying to allow some of the tags but
still filtering most of it.
HTMLPurifier is the tool for this kind of job, but most people would recommend
using some kind of alternative markup format, like BBCode
http://en.wikipedia.org/wiki/BBCode.

Which is another possible solution to the overall problem. Filter the incoming
data in a different way :) I'm more than happy with my OWN methods of handling
this problem, I was just point out that a LOT of people find ckeditor or one of
the html in-line editors and think that is a good way to go ... that was how I
started several years ago ... so I'm just putting my hand up and saying that
simply creating an 'anti-XSS escaping class' may not work for some people. It is
the whole package that is important.

( That is another tack on this was well Paddy )

Decode, transcode, sanitize, filter, escape

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL