Hello,
I'm having a problem with the DOM parser:
If I load the following using DOMDocument->loadHTML():
<script type="text/javascript"><!-- var d="";alert('This is an XSS test'); //"; </script>it will be converted to:
<script type="text/javascript"><!-- var d="";alert('This is an XSS test'); //"; </script>This is because parser is substituting '"' for '"'.
Is there a way to prevent this from happening?
__
Raymond
The problem is that you're escaping incorrectly for the context. It takes
more than just htmlspecialchars to escape for a javascript data context.
Check this out:
https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)Prevention_Cheat_Sheet#RULE.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values
Anthony
Hello,
I'm having a problem with the DOM parser:
If I load the following using DOMDocument->loadHTML():
<script type="text/javascript"><!-- var d="";alert('This is an XSS test'); //"; </script>it will be converted to:
<script type="text/javascript"><!-- var d="";alert('This is an XSS test'); //"; </script>This is because parser is substituting '"' for '"'.
Is there a way to prevent this from happening?
__
Raymond
Hi Anthony,
Thanks for the feedback. I do get your point about escaping for JavaScript
but the example shown was just to highlight the entity substitution issue
which could lead to unexpected results. In this case a developer might want
to use jQuery to append some html escaped values to an element which would
result in an error of a possible XSS attack.
IMO there should be a feature to control or prevent this behavior.
__
Raymond
On Mon, Jul 16, 2012 at 6:31 AM, Anthony Ferrara ircmaxell@gmail.comwrote:
The problem is that you're escaping incorrectly for the context. It takes
more than just htmlspecialchars to escape for a javascript data context.Check this out:
https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)Prevention_Cheat_Sheet#RULE.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values
Anthony
On Mon, Jul 16, 2012 at 12:54 AM, Raymond Irving xwisdom@gmail.comwrote:
Hello,
I'm having a problem with the DOM parser:
If I load the following using DOMDocument->loadHTML():
<script type="text/javascript"><!-- var d="";alert('This is an XSS test'); //"; </script>it will be converted to:
<script type="text/javascript"><!-- var d="";alert('This is an XSS test'); //"; </script>This is because parser is substituting '"' for '"'.
Is there a way to prevent this from happening?
__
Raymond
Raymond
Hi Anthony,
Thanks for the feedback. I do get your point about escaping for JavaScript
but the example shown was just to highlight the entity substitution issue
which could lead to unexpected results. In this case a developer might want
to use jQuery to append some html escaped values to an element which would
result in an error of a possible XSS attack.IMO there should be a feature to control or prevent this behavior.
This is standard and expected behavior. Since " has no special meaning
within a document (outside of an attribute declaration), there is no
requirement to escape it. And the standard practice when parsing XML/HTML
using a dom based parser is to decode the values. So the "" gets
turned into "". The short answer, is you're doing it wrong, so there's
nothing that can be done to prevent the behavior. It's the standard
behavior defined in the standards definitions relating to how to parse HTML
and XML.
Escape things properly, and you won't have to work around implementation
details...
Anthony
This is standard and expected behavior. Since " has no special meaning
within a document (outside of an attribute declaration), there is no
requirement to escape it. And the standard practice when parsing XML/HTML
using a dom based parser is to decode the values. So the "" gets
turned into "". The short answer, is you're doing it wrong, so there's
nothing that can be done to prevent the behavior. It's the standard
behavior defined in the standards definitions relating to how to parse HTML
and XML.
Well, speaking specifically to the standard practices of dom-based
parsers and avoiding the potential security issues, I think most of
what you said is correct UNLESS entities are within CDATA, script, or
style nodes, in which case the entity should remain in an HTML-aware
parser:
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/introduction.html#ID-E7C30824
That said, please correct me if I'm wrong this (as my wife would tell
you, I'm wrong all the time :)
Adam