Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.
Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.
That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).
https://wiki.php.net/rfc/escaper
Best regards,
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Hi Paddy,
Couldn't this just be a new option for the filter_var()
function?
$clean = filter_var($_POST['someVar'], XSS_CLEAN);
- Paul.
Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).https://wiki.php.net/rfc/escaper
Best regards,
Paddy--
Pádraic Bradyhttp://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Hi Paddy,
Couldn't this just be a new option for the
filter_var()
function?$clean = filter_var($_POST['someVar'], XSS_CLEAN);
I see from your RFC that you have a bunch of functions, I believe all
these could be options to filter_var, ie.: FILTER_ESCAPE_[URL, JS,
CSS, HTMLATTR].
- Paul.
- Paul.
Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).https://wiki.php.net/rfc/escaper
Best regards,
Paddy--
Pádraic Bradyhttp://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Paul
Hi Paddy,
Couldn't this just be a new option for the
filter_var()
function?$clean = filter_var($_POST['someVar'], XSS_CLEAN);
- Paul.
Not without losing significant semantic meaning. There's a huge difference
between filtering and escaping. Remember, Filter In, Escape Out.
If you really wanted something like that, then perhaps add a escape_var
extension. But I think the proposed API is better as it's more explicit.
Anthony
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).
I understand that this is really beneficial to have, but, I wonder, why
can't this be a composer-installable class, implemented in PHP? It
solves the issue that you need to find a volunteer, as well as that
updating it is a lot easier, and, you don't have to rely on shared
hosters having it enabled.
I realize that you want to have this
generally available, but for that we have ext/filter - which is not
really used too much I think. Why would this be different? IMO, we
should make a composer installable package for this, and then litter all
our escaping related document pages with links to this new package.
cheers,
Derick
--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine
Hi!
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.
We already have filter extension. Is it really necessary to invent yet
another way of filtering data?
Also, a problem with putting code of this complexity in core would be
that if it every had a defect - e.g. we forgot to account for some weird
browser quirk that does not follow RFCs, or some strange encoding
combination, or just a plain bug - it would be very hard for the users
to mitigate without upgrading PHP - which is not always under their
control. When using PHP code, they could just d/l new ZF class, but with
core implementation it'd be much harder.
So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Stas,
On Tue, Sep 18, 2012 at 12:51 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:
Hi!
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.We already have filter extension. Is it really necessary to invent yet
another way of filtering data?
Filtering is very different from escaping. They each handle similar but
unique problems:
Also, a problem with putting code of this complexity in core would be
that if it every had a defect - e.g. we forgot to account for some weird
browser quirk that does not follow RFCs, or some strange encoding
combination, or just a plain bug - it would be very hard for the users
to mitigate without upgrading PHP - which is not always under their
control. When using PHP code, they could just d/l new ZF class, but with
core implementation it'd be much harder.So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
Filtering is very different from escaping. They each handle similar but
unique problems:
It is a purely artificial distinction. Filtering is taking one set of
data and returning other set of data, it can be applied on input,
output, or anywhere you want to. Just because we used filtering for
input, does not mean we can't use the same for output, there is
absolutely no need to reinvent the wheel just because we're using it in
different place now. It is a mistake to think that because we started to
use filtering on input data, now the word "filtering" means it should
never applied to output and we have to invent whole new API to do the same.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
Filtering is very different from escaping. They each handle similar but
unique problems:
It is a purely artificial distinction. Filtering is taking one set of
data and returning other set of data, it can be applied on input,
output, or anywhere you want to. Just because we used filtering for
input, does not mean we can't use the same for output, there is
absolutely no need to reinvent the wheel just because we're using it in
different place now. It is a mistake to think that because we started to
use filtering on input data, now the word "filtering" means it should
never applied to output and we have to invent whole new API to do the same.
No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.
--
Andrew Faulds
http://ajf.me/
Hi!
No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.
Again, you are confusing particular implementation of a particular
filter with the idea of filtering. Moreover, even existing filters do
not match your description:
FILTER_SANITIZE_ENCODED, FILTER_SANITIZE_MAGIC_QUOTES,
FILTER_SANITIZE_SPECIAL_CHARS, FILTER_SANITIZE_FULL_SPECIAL_CHARS,
FILTER_SANITIZE_STRING, FILTER_CALLBACK
But in general, look at implementation of filters anywhere - like Apache
filters or IIS filters - nowhere it is said that filter can only remove
data.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.
Again, you are confusing particular implementation of a particular
filter with the idea of filtering. Moreover, even existing filters do
not match your description:
FILTER_SANITIZE_ENCODED, FILTER_SANITIZE_MAGIC_QUOTES,
FILTER_SANITIZE_SPECIAL_CHARS, FILTER_SANITIZE_FULL_SPECIAL_CHARS,
FILTER_SANITIZE_STRING,FILTER_CALLBACK
But in general, look at implementation of filters anywhere - like Apache
filters or IIS filters - nowhere it is said that filter can only remove
data.
Ah, sorry, I think I'm confusing the standard English language meaning
of filter with regards to the physical device or signal processing, with
the meaning in the field of computer science etc.
--
Andrew Faulds
http://ajf.me/
Stas,
On Tue, Sep 18, 2012 at 1:09 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:
Hi!
No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.Again, you are confusing particular implementation of a particular
filter with the idea of filtering. Moreover, even existing filters do
not match your description:
No, he's not. Filtering and escaping are two very significant concepts in
security. Just because PHP implemented some escaping concepts into the
filter function does not mean that the concerns are co-related.
FILTER_SANITIZE_ENCODED, FILTER_SANITIZE_MAGIC_QUOTES,
FILTER_SANITIZE_SPECIAL_CHARS, FILTER_SANITIZE_FULL_SPECIAL_CHARS,
FILTER_SANITIZE_STRING,FILTER_CALLBACK
But in general, look at implementation of filters anywhere - like Apache
filters or IIS filters - nowhere it is said that filter can only remove
data.
Actually, that's the basic definition of a filter (from a security
context). Just because people implemented other things and called them
filters does not make them filters in the context of this discussion.
The other point that you seem to be missing is that filtering is generic
for an application. You would apply the same filters for content that came
in from an HTTP post as content that came in from a JSON API call. The data
is what's filtered for your application.
Escaping on the other hand is context dependent. You need a different form
of escaping for each output type (HTML, HTML attribute, XML, XML attribute,
XML processing instruction, JSON, database query, etc). So you cannot do a
generic escaping like you can do a generic filtering. Escaping should be
done as close to the edge as possible.
Check out this post I did a while ago with a pretty drawn out section
talking about the two concepts...
http://blog.ircmaxell.com/2011/03/what-is-security-web-application.html
Anthony
Stas,
On Tue, Sep 18, 2012 at 1:09 PM, Stas Malyshev <smalyshev@sugarcrm.com
mailto:smalyshev@sugarcrm.com> wrote:Hi! > No it's not. A filter removes, but escaping lets the original content > pass through unchanged, with the necessary in-band signalling to make > sure that its content is not treated as in-band signalling. Again, you are confusing particular implementation of a particular filter with the idea of filtering. Moreover, even existing filters do not match your description:
No, he's not. Filtering and escaping are two very significant concepts
in security. Just because PHP implemented some escaping concepts into
the filter function does not mean that the concerns are co-related.
Ah, again you see, I'm confusing things :) In the security context,
English language context, and signal processing context, a filter
removes. In computer science, but not computer security, it processes.
I'm very confused :P
--
Andrew Faulds
http://ajf.me/
Andrew Faulds wrote:
No, he's not. Filtering and escaping are two very significant concepts in
security. Just because PHP implemented some escaping concepts into the filter
function does not mean that the concerns are co-related.
Ah, again you see, I'm confusing things :) In the security context, English
language context, and signal processing context, a filter removes. In computer
science, but not computer security, it processes.I'm very confused :P
A filter simply takes an input and produces an output. There is nothing to say
that the output can't be bigger than the input? I'd happily accept a filter that
takes one language in and outputs a different one. Alright that filter requires
a considerably more complex processing than taking a .css file and outputting it
as a colour coded document, or taking a piece of raw tagged html and outputting
in a format that allows it to be displayed rather than processed in the browser.
Certainly a dictionary definition of 'filter' always implies that a reduced set
of material comes out, so perhaps we need to use a different word, for the
process, but the same 'process' applies to all of these 'conversions'. An input
data format is converted to an output data format?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi!
No, he's not. Filtering and escaping are two very significant concepts
in security. Just because PHP implemented some escaping concepts into
the filter function does not mean that the concerns are co-related.
Again, you are taking very narrow definition of filterting, which is not
justified by anything but your very narrow use case, and try to present
it as if this is the only meaning filtering has (despite numerous
examples of using of filters in more generic sense) and that because of
this we need to duplicate APIs we already have, just because you can use
them in different context. To me, it makes no sense - you can apply data
filtering anywhere. If for your specific purpose of explaining how to
make better security architecture you choose to define "filtering" and
"escaping" as narrow distinct concepts, this is fine. This does not mean
that we can not use existing filter extension - with already implemented
methods doing exactly what is needed to be done - because they are to be
used in context which you call "escaping".
Actually, that's the basic definition of a filter (from a security
context). Just because people implemented other things and called them
filters does not make them filters in the context of this discussion.
It is your definition of a filter, which is in no way "basic" or universal.
The other point that you seem to be missing is that filtering is generic
for an application. You would apply the same filters for content that
came in from an HTTP post as content that came in from a JSON API call.
The data is what's filtered for your application.
Again, nowhere it is said that you can not apply different filters to
different data or different context. Again, you narrow down definition
of filtering, to which I see no purpose unless you seek to arrive at
pre-determined conclusion that we need to duplicate APIs because it's
called "filter".
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Again, you are taking very narrow definition of filterting, which is not
justified by anything but your very narrow use case, and try to present
it as if this is the only meaning filtering has (despite numerous
examples of using of filters in more generic sense) and that because of
this we need to duplicate APIs we already have, just because you can use
them in different context. To me, it makes no sense - you can apply data
filtering anywhere. If for your specific purpose of explaining how to
make better security architecture you choose to define "filtering" and
"escaping" as narrow distinct concepts, this is fine. This does not mean
that we can not use existing filter extension - with already implemented
methods doing exactly what is needed to be done - because they are to be
used in context which you call "escaping".
No, Stas, you are not realising that "filter" has a different meaning
depending which field it is used in. It has very different meanings in
computer science and referring to the physical apparatus, compared to
computer security.
Since stopping XSS is a computer security issue, we should discuss it as
such.
--
Andrew Faulds
http://ajf.me/
Again, nowhere it is said that you can not apply different filters to
different data or different context. Again, you narrow down definition
of filtering, to which I see no purpose unless you seek to arrive at
pre-determined conclusion that we need to duplicate APIs because it's
called "filter".
I agree that filtering can mean general processing of data, but if we embrace this
definition in the filter extension, why not deprecate all string functions and replace
them with FILTER_SANITIZE_* constants? I'd argue because naming matters, and option
constants should not be used to wildly change behavior.
Filter has already gone down this road--I doubt the value added by having a second, much
more verbose way to call htmlspecialchars()
--but I don't see why we must continue down
that path.
Steve
Hi!
Filter has already gone down this road--I doubt the value added by having a second, much
more verbose way to callhtmlspecialchars()
--but I don't see why we must continue down
that path.
So, you don't think there should be second, more verbose way to call
htmlspecialchars - that's why we should add third, more verbose way to
call htmlspecialchars? Somehow this does not sound convincing to me.
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
Filter has already gone down this road--I doubt the value added by having a second, much
more verbose way to callhtmlspecialchars()
--but I don't see why we must continue down
that path.So, you don't think there should be second, more verbose way to call
htmlspecialchars - that's why we should add third, more verbose way to
call htmlspecialchars? Somehow this does not sound convincing to me.
And note that there is actually a good reason. By having
htmlspecialchars as a filter you can use it as a default input filter.
A strict default filter acts as a safety net against typical XSS flaws.
If a developer forgets to apply the correct filter/encoding/escaping
(whatever you want to call it) mechanism, then the default filter makes
sure the most common cases are covered. Some people make the mistake of
associating default filtering like this with always storing escaped data
in the backend which simply isn't the case. It is nothing more than a
safety net and a way to make it easier to audit data filtering by
forcing developers to specify the escaping, if any, they want on every
piece of input data.
If we want to add more filters for more specific purposes, I am not
completely against it, although the more specific they get the more
churn there will be. We are not going to be able to kick out weekly
releases to address every new nuance of these very specific filters. But
they should be implemented as filters compatible with the filter
extension so people can use them within that existing context. That
doesn't preclude a more approachable function alias from also calling
them, of course, much like the htmlspecialchars case.
-Rasmus
Hi Rasmus,
If we want to add more filters for more specific purposes, I am not
completely against it, although the more specific they get the more
churn there will be. We are not going to be able to kick out weekly
releases to address every new nuance of these very specific filters. But
they should be implemented as filters compatible with the filter
extension so people can use them within that existing context. That
doesn't preclude a more approachable function alias from also calling
them, of course, much like the htmlspecialchars case.
I feel it needs to be reiterated that the escaper rules are very
predictable and very seldom change as the regular expressions in the
Zend\Escaper class demonstrate. Each is bound to official standards
for Javascript, CSS and HTML respectively and most of the rules,
defined using the OWASP's recommendations as implemented in ESAPI, are
really clearcut - escape everything except alphanumerics and a small
range of "safe" characters (CSS even has NO safe chars outside
alphanumerics). HTML and URL encoding are the only permissive variants
and these are already well known in PHP.
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Stas,
On Tue, Sep 18, 2012 at 2:21 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:
Hi!
Filter has already gone down this road--I doubt the value added by
having a second, much
more verbose way to callhtmlspecialchars()
--but I don't see why we must
continue down
that path.So, you don't think there should be second, more verbose way to call
htmlspecialchars - that's why we should add third, more verbose way to
call htmlspecialchars? Somehow this does not sound convincing to me.
No, we shouldn't. We should stick with oddly named functions with dubiously
complicated parameter combinations that nobody gets right, because they are
already there, and that's good enough, right? </sarcasm>
Look. Just because it's possible, doesn't mean that it shouldn't be
improved. The filter API is great. It really is. But for escaping output,
it's basically useless because it's not designed for that. It's designed
for filtering data. You can use it for "sanitizing", but that's more of a
hack. For example, filter will only work on the default character set (
http://www.php.net/manual/en/ini.core.php#ini.default-charset). But I may
have other character sets that I talk to different systems on. And Filter
has no way of handling that.
Filtering is a global concern. Escaping is a context concern. There's a
huge difference.
You can keep cramming the escaping concerns into a global filter handler
all you want. Or we could stop, and think about designing proper APIs for a
change. Where the API imparts both implementation and semantic meaning. An
API where it's easy to see if a user gets it right or not.
Personally, I'd rather have a dedicated API. It's the only way that
semantic meaning of the API will be preserved.
In this case, I would start it as a PECL extension implementing the ESAPI
library. Get the API right, and prove it works, then pull it into core.
Anthony Ferrara wrote:
Personally, I'd rather have a dedicated API. It's the only way that
semantic meaning of the API will be preserved.In this case, I would start it as a PECL extension implementing the ESAPI
library. Get the API right, and prove it works, then pull it into core.
Sounds the right way to placate the people who prefer this approach to building
code. We then have the option to leave it off if we don't use it ;) I've even
got a stock distribution that does not include any MySQL core stuff now so I'm
happy with this modular approach.
You can keep cramming the escaping concerns into a global filter handler
all you want. Or we could stop, and think about designing proper APIs for a
change. Where the API imparts both implementation and semantic meaning. An
API where it's easy to see if a user gets it right or not.
Looking at the output 'filtering' applied to my own systems, I am more than
happy that ACTUALLY I eliminate the problems people are quoting here, XSS
prevention, by ensuring that the data being STORED is cleaned up in a way that I
don't need to 'escape' anything. I do need to be able to 'encode' the output
date to correctly display it but nothing in that process should be able to
create an XSS problem? I do apply 'filtering' to the html tags which removes
those that I do not want to be stored in page content, and css is similarly
processed, so we have a run time dictionary that controls the processing rather
than that being hard coded in the filters.
Basically if you are STORING XSS intrusions then you have badly designed code as
there is no reason that it would be stored. If you want to 'display' suspect
code, then it is 'escaped' before it is stored so preventing a potential problem
if another viewer accesses the raw data!
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Lester,
Basically if you are STORING XSS intrusions then you have badly designed
code as there is no reason that it would be stored. If you want to
'display' suspect code, then it is 'escaped' before it is stored so
preventing a potential problem if another viewer accesses the raw data!
I wasn't going to reply to you, but this is just plain wrong.
You should always consider ANYTHING that's outside the runtime of your
application (not in memory of the current instance) to be insecure. Feel
free to store XSS in your database. Because you're not going to trust it
anyway. The second you trust content in your database, you've opened more
potential attack vectors.
The proper solution is to do context based escaping/encoding at the moment
of output. That way you can update your escaping code if you find a bug and
all your content will be protected. If you find a bug in your escaper when
you escape on input, how the heck to do you propose to fix it for all the
content already stored without looping over every single item and updating
the DB table (also a bad design).
No, you should Filter In (make sure that content meets your domain
criteria, eg username alphanumeric, email looks like an email, etc), and
Escape Out (prevent Injection and XSS vulnerabilities when that data leaves
your application). Any other way of doing it is going to lead to attack
vectors. But if you diligently Escape Out, Injection and XSS are
IMPOSSIBLE. And even if you have a bug in your algorithm that allows an
edge case, you can fix it once and have it fixed for good.
And that doesn't even approach the problem where the same data will be used
in different output contexts (which require separate escaping mechanisms).
Thereby completely destroying any security you thought you had by
pre-escaping.
So no, you're wrong. It's badly designed code to escape HTML prior to
storage.
Anthony Ferrara wrote:
Lester,
Basically if you are STORING XSS intrusions then you have badly designed code as there is no reason that it would be stored. If you want to 'display' suspect code, then it is 'escaped' before it is stored so preventing a potential problem if another viewer accesses the raw data!
I wasn't going to reply to you, but this is just plain wrong.
You should always consider ANYTHING that's outside the runtime of your
application (not in memory of the current instance) to be insecure. Feel free to
store XSS in your database. Because you're not going to trust it anyway. The
second you trust content in your database, you've opened more potential attack
vectors.The proper solution is to do context based escaping/encoding at the moment of
output. That way you can update your escaping code if you find a bug and all
your content will be protected. If you find a bug in your escaper when you
escape on input, how the heck to do you propose to fix it for all the content
already stored without looping over every single item and updating the DB table
(also a bad design).No, you should Filter In (make sure that content meets your domain criteria, eg
username alphanumeric, email looks like an email, etc), and Escape Out (prevent
Injection and XSS vulnerabilities when that data leaves your application). Any
other way of doing it is going to lead to attack vectors. But if you diligently
Escape Out, Injection and XSS are IMPOSSIBLE. And even if you have a bug in your
algorithm that allows an edge case, you can fix it once and have it fixed for good.And that doesn't even approach the problem where the same data will be used in
different output contexts (which require separate escaping mechanisms). Thereby
completely destroying any security you thought you had by pre-escaping.So no, you're wrong. It's badly designed code to escape HTML prior to storage.
I have to strongly dispute that. I will NEVER store 'dirty' html in the
database. In fact even ckeditor ensures that the stored data is clean and can be
output raw if necessary. IF a forum post, blog or wiki need to store suspect
code as an example then it should never be stored in it's dirty form. I HAVE to
escape it to get it IN to the database as otherwise it will not be saved. But
perhaps what I should point out here is that the formatting tags will not be
escaped, only the free format text. That is the only way to ensure that users
can't 'inject' extra tags into the pages manually. I could not handle escaping
the user input in any other way? You can't simply 'escape' the output at all?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Basically if you are STORING XSS intrusions then you have badly designed
code as there is no reason that it would be stored. If you want to
'display' suspect code, then it is 'escaped' before it is stored so
preventing a potential problem if another viewer accesses the raw data!
While that is true, there is always the risk of code being injected into
your data-store through another source or being initially bypassed
somewhere on your script. So yes, you absolutely want to have the correct
filters and sanitation in place prior to storing, but as an added
precaution I think it is wise to be always escaping data prior to output as
well.
- Mike
Anthony Ferrara wrote:
Personally, I'd rather have a dedicated API. It's the only way that
semantic meaning of the API will be preserved.In this case, I would start it as a PECL extension implementing the ESAPI
library. Get the API right, and prove it works, then pull it into core.Sounds the right way to placate the people who prefer this approach to
building code. We then have the option to leave it off if we don't use it
;) I've even got a stock distribution that does not include any MySQL core
stuff now so I'm happy with this modular approach.You can keep cramming the escaping concerns into a global filter handler
all you want. Or we could stop, and think about designing proper APIs for
a
change. Where the API imparts both implementation and semantic meaning. An
API where it's easy to see if a user gets it right or not.Looking at the output 'filtering' applied to my own systems, I am more
than happy that ACTUALLY I eliminate the problems people are quoting here,
XSS prevention, by ensuring that the data being STORED is cleaned up in a
way that I don't need to 'escape' anything. I do need to be able to
'encode' the output date to correctly display it but nothing in that
process should be able to create an XSS problem? I do apply 'filtering' to
the html tags which removes those that I do not want to be stored in page
content, and css is similarly processed, so we have a run time dictionary
that controls the processing rather than that being hard coded in the
filters.Basically if you are STORING XSS intrusions then you have badly designed
code as there is no reason that it would be stored. If you want to
'display' suspect code, then it is 'escaped' before it is stored so
preventing a potential problem if another viewer accesses the raw data!--
Lester Caine - G8HFLContact - http://lsces.co.uk/wiki/?page=**contacthttp://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.**ukhttp://rainbowdigitalmedia.co.uk--
--
"My command is this: Love each other as I
have loved you." John 15:12
Stas,
Encoding or Escaping is universally used in association with output in
the standard security language. I really don't see anything to debate
here. I take your point that the escaper could be implemented as a
series of filters. I think that would be the wrong move because nobody
will actually use the suggested function prototype in a template where
typing is at a premium and where filters are simply not synonymous
with output escaping. Few programmers would use the phrase filter in
this context, few will endure a long winded filter function prototype
without rewrapping it back to brevity, and how would character
encoding be dealt with in this?
Reusing an existing API that obviously doesn't fit the suggested RFC
characteristics is at least as bad as creating a new narrower API that
maintains brevity by being specialised.
Paddy
Hi!
No, he's not. Filtering and escaping are two very significant concepts
in security. Just because PHP implemented some escaping concepts into
the filter function does not mean that the concerns are co-related.Again, you are taking very narrow definition of filterting, which is not
justified by anything but your very narrow use case, and try to present
it as if this is the only meaning filtering has (despite numerous
examples of using of filters in more generic sense) and that because of
this we need to duplicate APIs we already have, just because you can use
them in different context. To me, it makes no sense - you can apply data
filtering anywhere. If for your specific purpose of explaining how to
make better security architecture you choose to define "filtering" and
"escaping" as narrow distinct concepts, this is fine. This does not mean
that we can not use existing filter extension - with already implemented
methods doing exactly what is needed to be done - because they are to be
used in context which you call "escaping".Actually, that's the basic definition of a filter (from a security
context). Just because people implemented other things and called them
filters does not make them filters in the context of this discussion.It is your definition of a filter, which is in no way "basic" or universal.
The other point that you seem to be missing is that filtering is generic
for an application. You would apply the same filters for content that
came in from an HTTP post as content that came in from a JSON API call.
The data is what's filtered for your application.Again, nowhere it is said that you can not apply different filters to
different data or different context. Again, you narrow down definition
of filtering, to which I see no purpose unless you seek to arrive at
pre-determined conclusion that we need to duplicate APIs because it's
called "filter".--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Hi Stas,
This is not an input filter and PHP already suffers the same
outrageous disadvantages by offering htmlspecialchars()
,
rawurlencode()
, etc. The rules for escaping are well established and
DO NOT change overnight. Those for Javascript and CSS are in their
respective standards. Those for HTML/XML have been known since the 90s
and still haven't changed. PHP seems quite happy about offering
encoding mechanisms if anything - where did json_encode()
spring from?
Browser defects are not PHP's problem.
Folk seem to be missing the point that this is output oriented to a
well understood set of rules.
Paddy
Hi!
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.We already have filter extension. Is it really necessary to invent yet
another way of filtering data?Also, a problem with putting code of this complexity in core would be
that if it every had a defect - e.g. we forgot to account for some weird
browser quirk that does not follow RFCs, or some strange encoding
combination, or just a plain bug - it would be very hard for the users
to mitigate without upgrading PHP - which is not always under their
control. When using PHP code, they could just d/l new ZF class, but with
core implementation it'd be much harder.So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
hi Pádraic!
Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).
Like the idea while I have to sit on it a bit to see the possible pitfalls :)
However I am really not a fan of using a class as namespace. All these
methods have nothing in common but what they do, they all treat
different inputs, may have different options, etc. Functions could
work just as fine for that, or if necessary (see my ajaxmin ext)
create a class per input and add the necessary properties for the
options. That could be much cleaner and forward compatible.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.
Some Quick Thoughts
Multiparadigm PHP
I hope any implementation would embrace procedural coding paradigms
AND OOP paradigms. I tend to code using a Functional Programming (FP)
style, and I don't need/want objects to be the only interface.
Extension First
It seems wise to get this working and tested as an extension first,
just as Rasmus and others suggested.
Ability To Pass Some HTML Through Without Escaping (Whitelisting)
Functions should allow whitelisting of elements when desired. For
example, html escaping may be desired for all elements in a paragraph
except for spans, br's, etc.
I've built a quick extension that I use in my web framework that does this:
https://github.com/AdamJonR/nephtali-php-ext
string nephtali_str_escape_html(string str [, array whitelist [,
string charset]])
The escaping works as outlined below:
- Escape all html special characters in str.
- Loop through whitelist items.
3a) If the item begins and ends with '/', consider it a regex and
replace the matches in the string with the original (htmlspecialchars
decoded) text (this works because <,>,",', and & are not meta
characters in regexes.)
3b) Otherwise, handle as a standard string and replace the matches
with the unescaped whitelist item text.
The idea is that, to be safe, everything should be first escaped.
Then, only unescape the items that match the whitelist (e.g.,
array('<p>','</p>','etc.').) The regex option is handy because you
often have situations where the internal contents of the tag vary
(e.g., id, class, href, etc.) and this allows you to pass these
through unescaped.
Of note, I've not officially released the extension, as I'm still
testing/developing it, but it serves as an example for ideas.
PHP Escaping-Specific Tags Could Be Considered
I wonder if PHP tags for escaping could be considered, as it seems
that there's still a plurality of developers that use PHP itself as
the templating language. For example:
// automatically echo'd and escaped for special html chars
<?php:html $obj->val ?>
// automatcially echo'd and escaped for special html chars whilst
letting through p's
<?php:html $obj->val, array('<p>','</p>') ?>
// automatcially echo'd and escaped for special html chars whilst
letting through p's and using different encoding
<?php:html $obj->val, array('<p>','</p>'), $encoding = 'something' ?>
// automatcially echo'd and escaped for special html chars, no
whitelisting allowed
<?php:attr $obj->val ?>
// automatcially echo'd and escaped for special url chars, no
whitelisting allowed
<?php:url $obj->val ?>
Thanks,
Adam
hi Pádraic,
Given the current discussions about the APIs (see my other reply too)
and its usage, and that this proposal is non invasive/self contained
in an extension, I would strongly suggest to already go with it in
PECL, do releases (stay alpha until you have a very good feeling about
the API stability), etc. It will also greatly help to get more
feedback.
Then it could be proposed again for being bundled at some point,
before we go features freeze for 5.5.
Cheers,
Hi all,
I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).https://wiki.php.net/rfc/escaper
Best regards,
Paddy--
Pádraic Bradyhttp://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team--
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi Pierre,
I also noticed your tweet ;).
Given the current discussions about the APIs (see my other reply too)
and its usage, and that this proposal is non invasive/self contained
in an extension, I would strongly suggest to already go with it in
PECL, do releases (stay alpha until you have a very good feeling about
the API stability), etc. It will also greatly help to get more
feedback.Then it could be proposed again for being bundled at some point,
before we go features freeze for 5.5.
I believe this is the path we'll be taking after some IRC discussions.
Though, I do think that taking the RFC route on this one was the only
realistic option for a PHP programmer with a minimal C skillset. It
ensured that the proposal gained exposure, lots of feedback and an
opportunity to pick up a real C programmer who could take it further.
In any case, hopefully I'll be back with real hardcore C code for PHP
5.5. In the meantime, if anyone has any lingering concerns or
questions about the RFC, let me know!
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team