RFC: Implementing a core anti-XSS escaping class

12 years ago by padraic.brady@gmail.com — view source

unread

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

Best regards,
Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Paul Dragoonis — view source

unread

Hi Paddy,

Couldn't this just be a new option for the filter_var() function?

$clean = filter_var($_POST['someVar'], XSS_CLEAN);

Paul.

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

Best regards,
Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Paul Dragoonis — view source

unread

Hi Paddy,

Couldn't this just be a new option for the filter_var() function?

$clean = filter_var($_POST['someVar'], XSS_CLEAN);

I see from your RFC that you have a bunch of functions, I believe all
these could be options to filter_var, ie.: FILTER_ESCAPE_[URL, JS,
CSS, HTMLATTR].

Paul.

Paul.

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

Best regards,
Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Anthony Ferrara — view source

unread

Paul

Hi Paddy,

Couldn't this just be a new option for the filter_var() function?

$clean = filter_var($_POST['someVar'], XSS_CLEAN);

Paul.

Not without losing significant semantic meaning. There's a huge difference
between filtering and escaping. Remember, Filter In, Escape Out.

If you really wanted something like that, then perhaps add a escape_var
extension. But I think the proposed API is better as it's more explicit.

Anthony

12 years ago by Derick Rethans — view source

unread

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

I understand that this is really beneficial to have, but, I wonder, why
can't this be a composer-installable class, implemented in PHP? It
solves the issue that you need to find a volunteer, as well as that
updating it is a lot easier, and, you don't have to rely on shared
hosters having it enabled.

I realize that you want to have this
generally available, but for that we have ext/filter - which is not
really used too much I think. Why would this be different? IMO, we
should make a composer installable package for this, and then litter all
our escaping related document pages with links to this new package.

cheers,
Derick

--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine

12 years ago by Stas Malyshev — view source

unread

Hi!

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

We already have filter extension. Is it really necessary to invent yet
another way of filtering data?

Also, a problem with putting code of this complexity in core would be
that if it every had a defect - e.g. we forgot to account for some weird
browser quirk that does not follow RFCs, or some strange encoding
combination, or just a plain bug - it would be very hard for the users
to mitigate without upgrading PHP - which is not always under their
control. When using PHP code, they could just d/l new ZF class, but with
core implementation it'd be much harder.

So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Anthony Ferrara — view source

unread

Stas,

On Tue, Sep 18, 2012 at 12:51 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:

Hi!

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

We already have filter extension. Is it really necessary to invent yet
another way of filtering data?

Filtering is very different from escaping. They each handle similar but
unique problems:

http://stackoverflow.com/questions/4218136/is-filter-input-escape-output-still-valid-with-pdo/4218219#4218219

Also, a problem with putting code of this complexity in core would be
that if it every had a defect - e.g. we forgot to account for some weird
browser quirk that does not follow RFCs, or some strange encoding
combination, or just a plain bug - it would be very hard for the users
to mitigate without upgrading PHP - which is not always under their
control. When using PHP code, they could just d/l new ZF class, but with
core implementation it'd be much harder.

So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Stas Malyshev — view source

unread

Hi!

Filtering is very different from escaping. They each handle similar but
unique problems:

It is a purely artificial distinction. Filtering is taking one set of
data and returning other set of data, it can be applied on input,
output, or anywhere you want to. Just because we used filtering for
input, does not mean we can't use the same for output, there is
absolutely no need to reinvent the wheel just because we're using it in
different place now. It is a mistake to think that because we started to
use filtering on input data, now the word "filtering" means it should
never applied to output and we have to invent whole new API to do the same.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Andrew Faulds — view source

unread

Hi!

Filtering is very different from escaping. They each handle similar but
unique problems:
It is a purely artificial distinction. Filtering is taking one set of
data and returning other set of data, it can be applied on input,
output, or anywhere you want to. Just because we used filtering for
input, does not mean we can't use the same for output, there is
absolutely no need to reinvent the wheel just because we're using it in
different place now. It is a mistake to think that because we started to
use filtering on input data, now the word "filtering" means it should
never applied to output and we have to invent whole new API to do the same.
No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.

--
Andrew Faulds
http://ajf.me/

12 years ago by Stas Malyshev — view source

unread

Hi!

No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.

Again, you are confusing particular implementation of a particular
filter with the idea of filtering. Moreover, even existing filters do
not match your description:
FILTER_SANITIZE_ENCODED, FILTER_SANITIZE_MAGIC_QUOTES,
FILTER_SANITIZE_SPECIAL_CHARS, FILTER_SANITIZE_FULL_SPECIAL_CHARS,
FILTER_SANITIZE_STRING, FILTER_CALLBACK

But in general, look at implementation of filters anywhere - like Apache
filters or IIS filters - nowhere it is said that filter can only remove
data.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Andrew Faulds — view source

unread

Hi!

No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.
Again, you are confusing particular implementation of a particular
filter with the idea of filtering. Moreover, even existing filters do
not match your description:
FILTER_SANITIZE_ENCODED, FILTER_SANITIZE_MAGIC_QUOTES,
FILTER_SANITIZE_SPECIAL_CHARS, FILTER_SANITIZE_FULL_SPECIAL_CHARS,
FILTER_SANITIZE_STRING, FILTER_CALLBACK

But in general, look at implementation of filters anywhere - like Apache
filters or IIS filters - nowhere it is said that filter can only remove
data.
Ah, sorry, I think I'm confusing the standard English language meaning
of filter with regards to the physical device or signal processing, with
the meaning in the field of computer science etc.

--
Andrew Faulds
http://ajf.me/

12 years ago by Anthony Ferrara — view source

unread

Stas,

On Tue, Sep 18, 2012 at 1:09 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:

Hi!

No it's not. A filter removes, but escaping lets the original content
pass through unchanged, with the necessary in-band signalling to make
sure that its content is not treated as in-band signalling.

Again, you are confusing particular implementation of a particular
filter with the idea of filtering. Moreover, even existing filters do
not match your description:

No, he's not. Filtering and escaping are two very significant concepts in
security. Just because PHP implemented some escaping concepts into the
filter function does not mean that the concerns are co-related.

FILTER_SANITIZE_ENCODED, FILTER_SANITIZE_MAGIC_QUOTES,
FILTER_SANITIZE_SPECIAL_CHARS, FILTER_SANITIZE_FULL_SPECIAL_CHARS,
FILTER_SANITIZE_STRING, FILTER_CALLBACK

But in general, look at implementation of filters anywhere - like Apache
filters or IIS filters - nowhere it is said that filter can only remove
data.

Actually, that's the basic definition of a filter (from a security
context). Just because people implemented other things and called them
filters does not make them filters in the context of this discussion.

The other point that you seem to be missing is that filtering is generic
for an application. You would apply the same filters for content that came
in from an HTTP post as content that came in from a JSON API call. The data
is what's filtered for your application.

Escaping on the other hand is context dependent. You need a different form
of escaping for each output type (HTML, HTML attribute, XML, XML attribute,
XML processing instruction, JSON, database query, etc). So you cannot do a
generic escaping like you can do a generic filtering. Escaping should be
done as close to the edge as possible.

Check out this post I did a while ago with a pretty drawn out section
talking about the two concepts...
http://blog.ircmaxell.com/2011/03/what-is-security-web-application.html

Anthony

12 years ago by Andrew Faulds — view source

unread

Stas,

On Tue, Sep 18, 2012 at 1:09 PM, Stas Malyshev <smalyshev@sugarcrm.com
mailto:smalyshev@sugarcrm.com> wrote:
Hi!

> No it's not. A filter removes, but escaping lets the original
content
> pass through unchanged, with the necessary in-band signalling to
make
> sure that its content is not treated as in-band signalling.

Again, you are confusing particular implementation of a particular
filter with the idea of filtering. Moreover, even existing filters do
not match your description:
No, he's not. Filtering and escaping are two very significant concepts
in security. Just because PHP implemented some escaping concepts into
the filter function does not mean that the concerns are co-related.
Ah, again you see, I'm confusing things :) In the security context,
English language context, and signal processing context, a filter
removes. In computer science, but not computer security, it processes.

I'm very confused :P

--
Andrew Faulds
http://ajf.me/

12 years ago by Lester Caine — view source

unread

Andrew Faulds wrote:

No, he's not. Filtering and escaping are two very significant concepts in
security. Just because PHP implemented some escaping concepts into the filter
function does not mean that the concerns are co-related.
Ah, again you see, I'm confusing things :) In the security context, English
language context, and signal processing context, a filter removes. In computer
science, but not computer security, it processes.

I'm very confused :P

A filter simply takes an input and produces an output. There is nothing to say
that the output can't be bigger than the input? I'd happily accept a filter that
takes one language in and outputs a different one. Alright that filter requires
a considerably more complex processing than taking a .css file and outputting it
as a colour coded document, or taking a piece of raw tagged html and outputting
in a format that allows it to be displayed rather than processed in the browser.

Certainly a dictionary definition of 'filter' always implies that a reduced set
of material comes out, so perhaps we need to use a different word, for the
process, but the same 'process' applies to all of these 'conversions'. An input
data format is converted to an output data format?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

12 years ago by Stas Malyshev — view source

unread

Hi!

No, he's not. Filtering and escaping are two very significant concepts
in security. Just because PHP implemented some escaping concepts into
the filter function does not mean that the concerns are co-related.

Again, you are taking very narrow definition of filterting, which is not
justified by anything but your very narrow use case, and try to present
it as if this is the only meaning filtering has (despite numerous
examples of using of filters in more generic sense) and that because of
this we need to duplicate APIs we already have, just because you can use
them in different context. To me, it makes no sense - you can apply data
filtering anywhere. If for your specific purpose of explaining how to
make better security architecture you choose to define "filtering" and
"escaping" as narrow distinct concepts, this is fine. This does not mean
that we can not use existing filter extension - with already implemented
methods doing exactly what is needed to be done - because they are to be
used in context which you call "escaping".

Actually, that's the basic definition of a filter (from a security
context). Just because people implemented other things and called them
filters does not make them filters in the context of this discussion.

It is your definition of a filter, which is in no way "basic" or universal.

The other point that you seem to be missing is that filtering is generic
for an application. You would apply the same filters for content that
came in from an HTTP post as content that came in from a JSON API call.
The data is what's filtered for your application.

Again, nowhere it is said that you can not apply different filters to
different data or different context. Again, you narrow down definition
of filtering, to which I see no purpose unless you seek to arrive at
pre-determined conclusion that we need to duplicate APIs because it's
called "filter".

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Andrew Faulds — view source

unread

Again, you are taking very narrow definition of filterting, which is not
justified by anything but your very narrow use case, and try to present
it as if this is the only meaning filtering has (despite numerous
examples of using of filters in more generic sense) and that because of
this we need to duplicate APIs we already have, just because you can use
them in different context. To me, it makes no sense - you can apply data
filtering anywhere. If for your specific purpose of explaining how to
make better security architecture you choose to define "filtering" and
"escaping" as narrow distinct concepts, this is fine. This does not mean
that we can not use existing filter extension - with already implemented
methods doing exactly what is needed to be done - because they are to be
used in context which you call "escaping".
No, Stas, you are not realising that "filter" has a different meaning
depending which field it is used in. It has very different meanings in
computer science and referring to the physical apparatus, compared to
computer security.

Since stopping XSS is a computer security issue, we should discuss it as
such.

--
Andrew Faulds
http://ajf.me/

12 years ago by Steve Clay — view source

unread

Again, nowhere it is said that you can not apply different filters to
different data or different context. Again, you narrow down definition
of filtering, to which I see no purpose unless you seek to arrive at
pre-determined conclusion that we need to duplicate APIs because it's
called "filter".

I agree that filtering can mean general processing of data, but if we embrace this
definition in the filter extension, why not deprecate all string functions and replace
them with FILTER_SANITIZE_* constants? I'd argue because naming matters, and option
constants should not be used to wildly change behavior.

Filter has already gone down this road--I doubt the value added by having a second, much
more verbose way to call htmlspecialchars()--but I don't see why we must continue down
that path.

Steve

http://www.mrclay.org/

12 years ago by Stas Malyshev — view source

unread

Hi!

Filter has already gone down this road--I doubt the value added by having a second, much
more verbose way to call htmlspecialchars()--but I don't see why we must continue down
that path.

So, you don't think there should be second, more verbose way to call
htmlspecialchars - that's why we should add third, more verbose way to
call htmlspecialchars? Somehow this does not sound convincing to me.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by Rasmus Lerdorf — view source

unread

Hi!

Filter has already gone down this road--I doubt the value added by having a second, much
more verbose way to call htmlspecialchars()--but I don't see why we must continue down
that path.

So, you don't think there should be second, more verbose way to call
htmlspecialchars - that's why we should add third, more verbose way to
call htmlspecialchars? Somehow this does not sound convincing to me.

And note that there is actually a good reason. By having
htmlspecialchars as a filter you can use it as a default input filter.

A strict default filter acts as a safety net against typical XSS flaws.
If a developer forgets to apply the correct filter/encoding/escaping
(whatever you want to call it) mechanism, then the default filter makes
sure the most common cases are covered. Some people make the mistake of
associating default filtering like this with always storing escaped data
in the backend which simply isn't the case. It is nothing more than a
safety net and a way to make it easier to audit data filtering by
forcing developers to specify the escaping, if any, they want on every
piece of input data.

If we want to add more filters for more specific purposes, I am not
completely against it, although the more specific they get the more
churn there will be. We are not going to be able to kick out weekly
releases to address every new nuance of these very specific filters. But
they should be implemented as filters compatible with the filter
extension so people can use them within that existing context. That
doesn't preclude a more approachable function alias from also calling
them, of course, much like the htmlspecialchars case.

-Rasmus

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Rasmus,

If we want to add more filters for more specific purposes, I am not
completely against it, although the more specific they get the more
churn there will be. We are not going to be able to kick out weekly
releases to address every new nuance of these very specific filters. But
they should be implemented as filters compatible with the filter
extension so people can use them within that existing context. That
doesn't preclude a more approachable function alias from also calling
them, of course, much like the htmlspecialchars case.

I feel it needs to be reiterated that the escaper rules are very
predictable and very seldom change as the regular expressions in the
Zend\Escaper class demonstrate. Each is bound to official standards
for Javascript, CSS and HTML respectively and most of the rules,
defined using the OWASP's recommendations as implemented in ESAPI, are
really clearcut - escape everything except alphanumerics and a small
range of "safe" characters (CSS even has NO safe chars outside
alphanumerics). HTML and URL encoding are the only permissive variants
and these are already well known in PHP.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Anthony Ferrara — view source

unread

Stas,

On Tue, Sep 18, 2012 at 2:21 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:

Hi!

Filter has already gone down this road--I doubt the value added by
having a second, much
more verbose way to call htmlspecialchars()--but I don't see why we must
continue down
that path.

So, you don't think there should be second, more verbose way to call
htmlspecialchars - that's why we should add third, more verbose way to
call htmlspecialchars? Somehow this does not sound convincing to me.

No, we shouldn't. We should stick with oddly named functions with dubiously
complicated parameter combinations that nobody gets right, because they are
already there, and that's good enough, right? </sarcasm>

Look. Just because it's possible, doesn't mean that it shouldn't be
improved. The filter API is great. It really is. But for escaping output,
it's basically useless because it's not designed for that. It's designed
for filtering data. You can use it for "sanitizing", but that's more of a
hack. For example, filter will only work on the default character set (
http://www.php.net/manual/en/ini.core.php#ini.default-charset). But I may
have other character sets that I talk to different systems on. And Filter
has no way of handling that.

Filtering is a global concern. Escaping is a context concern. There's a
huge difference.

You can keep cramming the escaping concerns into a global filter handler
all you want. Or we could stop, and think about designing proper APIs for a
change. Where the API imparts both implementation and semantic meaning. An
API where it's easy to see if a user gets it right or not.

Personally, I'd rather have a dedicated API. It's the only way that
semantic meaning of the API will be preserved.

In this case, I would start it as a PECL extension implementing the ESAPI
library. Get the API right, and prove it works, then pull it into core.

12 years ago by Lester Caine — view source

unread

Anthony Ferrara wrote:

Personally, I'd rather have a dedicated API. It's the only way that
semantic meaning of the API will be preserved.

In this case, I would start it as a PECL extension implementing the ESAPI
library. Get the API right, and prove it works, then pull it into core.

Sounds the right way to placate the people who prefer this approach to building
code. We then have the option to leave it off if we don't use it ;) I've even
got a stock distribution that does not include any MySQL core stuff now so I'm
happy with this modular approach.

You can keep cramming the escaping concerns into a global filter handler
all you want. Or we could stop, and think about designing proper APIs for a
change. Where the API imparts both implementation and semantic meaning. An
API where it's easy to see if a user gets it right or not.

Looking at the output 'filtering' applied to my own systems, I am more than
happy that ACTUALLY I eliminate the problems people are quoting here, XSS
prevention, by ensuring that the data being STORED is cleaned up in a way that I
don't need to 'escape' anything. I do need to be able to 'encode' the output
date to correctly display it but nothing in that process should be able to
create an XSS problem? I do apply 'filtering' to the html tags which removes
those that I do not want to be stored in page content, and css is similarly
processed, so we have a run time dictionary that controls the processing rather
than that being hard coded in the filters.

Basically if you are STORING XSS intrusions then you have badly designed code as
there is no reason that it would be stored. If you want to 'display' suspect
code, then it is 'escaped' before it is stored so preventing a potential problem
if another viewer accesses the raw data!

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

12 years ago by Anthony Ferrara — view source

unread

Lester,

Basically if you are STORING XSS intrusions then you have badly designed

code as there is no reason that it would be stored. If you want to
'display' suspect code, then it is 'escaped' before it is stored so
preventing a potential problem if another viewer accesses the raw data!

I wasn't going to reply to you, but this is just plain wrong.

You should always consider ANYTHING that's outside the runtime of your
application (not in memory of the current instance) to be insecure. Feel
free to store XSS in your database. Because you're not going to trust it
anyway. The second you trust content in your database, you've opened more
potential attack vectors.

The proper solution is to do context based escaping/encoding at the moment
of output. That way you can update your escaping code if you find a bug and
all your content will be protected. If you find a bug in your escaper when
you escape on input, how the heck to do you propose to fix it for all the
content already stored without looping over every single item and updating
the DB table (also a bad design).

No, you should Filter In (make sure that content meets your domain
criteria, eg username alphanumeric, email looks like an email, etc), and
Escape Out (prevent Injection and XSS vulnerabilities when that data leaves
your application). Any other way of doing it is going to lead to attack
vectors. But if you diligently Escape Out, Injection and XSS are
IMPOSSIBLE. And even if you have a bug in your algorithm that allows an
edge case, you can fix it once and have it fixed for good.

And that doesn't even approach the problem where the same data will be used
in different output contexts (which require separate escaping mechanisms).
Thereby completely destroying any security you thought you had by
pre-escaping.

So no, you're wrong. It's badly designed code to escape HTML prior to
storage.

12 years ago by Lester Caine — view source

unread

Anthony Ferrara wrote:

Lester,
Basically if you are STORING XSS intrusions then you have badly designed
code as there is no reason that it would be stored. If you want to 'display'
suspect code, then it is 'escaped' before it is stored so preventing a
potential problem if another viewer accesses the raw data!
I wasn't going to reply to you, but this is just plain wrong.

You should always consider ANYTHING that's outside the runtime of your
application (not in memory of the current instance) to be insecure. Feel free to
store XSS in your database. Because you're not going to trust it anyway. The
second you trust content in your database, you've opened more potential attack
vectors.

The proper solution is to do context based escaping/encoding at the moment of
output. That way you can update your escaping code if you find a bug and all
your content will be protected. If you find a bug in your escaper when you
escape on input, how the heck to do you propose to fix it for all the content
already stored without looping over every single item and updating the DB table
(also a bad design).

No, you should Filter In (make sure that content meets your domain criteria, eg
username alphanumeric, email looks like an email, etc), and Escape Out (prevent
Injection and XSS vulnerabilities when that data leaves your application). Any
other way of doing it is going to lead to attack vectors. But if you diligently
Escape Out, Injection and XSS are IMPOSSIBLE. And even if you have a bug in your
algorithm that allows an edge case, you can fix it once and have it fixed for good.

And that doesn't even approach the problem where the same data will be used in
different output contexts (which require separate escaping mechanisms). Thereby
completely destroying any security you thought you had by pre-escaping.

So no, you're wrong. It's badly designed code to escape HTML prior to storage.

I have to strongly dispute that. I will NEVER store 'dirty' html in the
database. In fact even ckeditor ensures that the stored data is clean and can be
output raw if necessary. IF a forum post, blog or wiki need to store suspect
code as an example then it should never be stored in it's dirty form. I HAVE to
escape it to get it IN to the database as otherwise it will not be saved. But
perhaps what I should point out here is that the formatting tags will not be
escaped, only the free format text. That is the only way to ensure that users
can't 'inject' extra tags into the pages manually. I could not handle escaping
the user input in any other way? You can't simply 'escape' the output at all?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

12 years ago by Michael Stowe — view source

unread

Basically if you are STORING XSS intrusions then you have badly designed
code as there is no reason that it would be stored. If you want to
'display' suspect code, then it is 'escaped' before it is stored so
preventing a potential problem if another viewer accesses the raw data!

While that is true, there is always the risk of code being injected into
your data-store through another source or being initially bypassed
somewhere on your script. So yes, you absolutely want to have the correct
filters and sanitation in place prior to storing, but as an added
precaution I think it is wise to be always escaping data prior to output as
well.

Mike

Anthony Ferrara wrote:

Personally, I'd rather have a dedicated API. It's the only way that
semantic meaning of the API will be preserved.

In this case, I would start it as a PECL extension implementing the ESAPI
library. Get the API right, and prove it works, then pull it into core.

Sounds the right way to placate the people who prefer this approach to
building code. We then have the option to leave it off if we don't use it
;) I've even got a stock distribution that does not include any MySQL core
stuff now so I'm happy with this modular approach.

You can keep cramming the escaping concerns into a global filter handler

all you want. Or we could stop, and think about designing proper APIs for
a
change. Where the API imparts both implementation and semantic meaning. An
API where it's easy to see if a user gets it right or not.

Looking at the output 'filtering' applied to my own systems, I am more
than happy that ACTUALLY I eliminate the problems people are quoting here,
XSS prevention, by ensuring that the data being STORED is cleaned up in a
way that I don't need to 'escape' anything. I do need to be able to
'encode' the output date to correctly display it but nothing in that
process should be able to create an XSS problem? I do apply 'filtering' to
the html tags which removes those that I do not want to be stored in page
content, and css is similarly processed, so we have a run time dictionary
that controls the processing rather than that being hard coded in the
filters.

Basically if you are STORING XSS intrusions then you have badly designed
code as there is no reason that it would be stored. If you want to
'display' suspect code, then it is 'escaped' before it is stored so
preventing a potential problem if another viewer accesses the raw data!

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=**contact http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.**uk http://rainbowdigitalmedia.co.uk

--

--

"My command is this: Love each other as I
have loved you." John 15:12

12 years ago by padraic.brady@gmail.com — view source

unread

Stas,

Encoding or Escaping is universally used in association with output in
the standard security language. I really don't see anything to debate
here. I take your point that the escaper could be implemented as a
series of filters. I think that would be the wrong move because nobody
will actually use the suggested function prototype in a template where
typing is at a premium and where filters are simply not synonymous
with output escaping. Few programmers would use the phrase filter in
this context, few will endure a long winded filter function prototype
without rewrapping it back to brevity, and how would character
encoding be dealt with in this?

Reusing an existing API that obviously doesn't fit the suggested RFC
characteristics is at least as bad as creating a new narrower API that
maintains brevity by being specialised.

Paddy

Hi!

No, he's not. Filtering and escaping are two very significant concepts
in security. Just because PHP implemented some escaping concepts into
the filter function does not mean that the concerns are co-related.

Again, you are taking very narrow definition of filterting, which is not
justified by anything but your very narrow use case, and try to present
it as if this is the only meaning filtering has (despite numerous
examples of using of filters in more generic sense) and that because of
this we need to duplicate APIs we already have, just because you can use
them in different context. To me, it makes no sense - you can apply data
filtering anywhere. If for your specific purpose of explaining how to
make better security architecture you choose to define "filtering" and
"escaping" as narrow distinct concepts, this is fine. This does not mean
that we can not use existing filter extension - with already implemented
methods doing exactly what is needed to be done - because they are to be
used in context which you call "escaping".

Actually, that's the basic definition of a filter (from a security
context). Just because people implemented other things and called them
filters does not make them filters in the context of this discussion.

It is your definition of a filter, which is in no way "basic" or universal.

The other point that you seem to be missing is that filtering is generic
for an application. You would apply the same filters for content that
came in from an HTTP post as content that came in from a JSON API call.
The data is what's filtered for your application.

Again, nowhere it is said that you can not apply different filters to
different data or different context. Again, you narrow down definition
of filtering, to which I see no purpose unless you seek to arrive at
pre-determined conclusion that we need to duplicate APIs because it's
called "filter".

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Stas,

This is not an input filter and PHP already suffers the same
outrageous disadvantages by offering htmlspecialchars(),
rawurlencode(), etc. The rules for escaping are well established and
DO NOT change overnight. Those for Javascript and CSS are in their
respective standards. Those for HTML/XML have been known since the 90s
and still haven't changed. PHP seems quite happy about offering
encoding mechanisms if anything - where did json_encode() spring from?

Browser defects are not PHP's problem.

Folk seem to be missing the point that this is output oriented to a
well understood set of rules.

Paddy

Hi!

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

We already have filter extension. Is it really necessary to invent yet
another way of filtering data?

Also, a problem with putting code of this complexity in core would be
that if it every had a defect - e.g. we forgot to account for some weird
browser quirk that does not follow RFCs, or some strange encoding
combination, or just a plain bug - it would be very hard for the users
to mitigate without upgrading PHP - which is not always under their
control. When using PHP code, they could just d/l new ZF class, but with
core implementation it'd be much harder.

So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Pierre Joye — view source

unread

hi Pádraic!

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

Like the idea while I have to sit on it a bit to see the possible pitfalls :)

However I am really not a fan of using a class as namespace. All these
methods have nothing in common but what they do, they all treat
different inputs, may have different options, etc. Functions could
work just as fine for that, or if necessary (see my ajaxmin ext)
create a class per input and add the necessary properties for the
options. That could be much cleaner and forward compatible.

Cheers,

Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

12 years ago by Adam Jon Richardson — view source

unread

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

https://wiki.php.net/rfc/escaper

Some Quick Thoughts

Multiparadigm PHP

I hope any implementation would embrace procedural coding paradigms
AND OOP paradigms. I tend to code using a Functional Programming (FP)
style, and I don't need/want objects to be the only interface.

Extension First

It seems wise to get this working and tested as an extension first,
just as Rasmus and others suggested.

Ability To Pass Some HTML Through Without Escaping (Whitelisting)

Functions should allow whitelisting of elements when desired. For
example, html escaping may be desired for all elements in a paragraph
except for spans, br's, etc.

I've built a quick extension that I use in my web framework that does this:
https://github.com/AdamJonR/nephtali-php-ext

string nephtali_str_escape_html(string str [, array whitelist [,
string charset]])

The escaping works as outlined below:

Escape all html special characters in str.
Loop through whitelist items.
3a) If the item begins and ends with '/', consider it a regex and
replace the matches in the string with the original (htmlspecialchars
decoded) text (this works because <,>,",', and & are not meta
characters in regexes.)
3b) Otherwise, handle as a standard string and replace the matches
with the unescaped whitelist item text.

The idea is that, to be safe, everything should be first escaped.
Then, only unescape the items that match the whitelist (e.g.,
array('<p>','</p>','etc.').) The regex option is handy because you
often have situations where the internal contents of the tag vary
(e.g., id, class, href, etc.) and this allows you to pass these
through unescaped.

Of note, I've not officially released the extension, as I'm still
testing/developing it, but it serves as an example for ideas.

PHP Escaping-Specific Tags Could Be Considered

I wonder if PHP tags for escaping could be considered, as it seems
that there's still a plurality of developers that use PHP itself as
the templating language. For example:

// automatically echo'd and escaped for special html chars
<?php:html $obj->val ?>

// automatcially echo'd and escaped for special html chars whilst
letting through p's
<?php:html $obj->val, array('<p>','</p>') ?>

// automatcially echo'd and escaped for special html chars whilst
letting through p's and using different encoding
<?php:html $obj->val, array('<p>','</p>'), $encoding = 'something' ?>

// automatcially echo'd and escaped for special html chars, no
whitelisting allowed
<?php:attr $obj->val ?>

// automatcially echo'd and escaped for special url chars, no
whitelisting allowed
<?php:url $obj->val ?>

Thanks,

Adam

12 years ago by Pierre Joye — view source

unread

hi Pádraic,

Given the current discussions about the APIs (see my other reply too)
and its usage, and that this proposal is non invasive/self contained
in an extension, I would strongly suggest to already go with it in
PECL, do releases (stay alpha until you have a very good feeling about
the API stability), etc. It will also greatly help to get more
feedback.

Then it could be proposed again for being bundled at some point,
before we go features freeze for 5.5.

Cheers,

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

Best regards,
Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--

--
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Pierre,

I also noticed your tweet ;).

Given the current discussions about the APIs (see my other reply too)
and its usage, and that this proposal is non invasive/self contained
in an extension, I would strongly suggest to already go with it in
PECL, do releases (stay alpha until you have a very good feeling about
the API stability), etc. It will also greatly help to get more
feedback.

Then it could be proposed again for being bundled at some point,
before we go features freeze for 5.5.

I believe this is the path we'll be taking after some IRC discussions.

Though, I do think that taking the RFC route on this one was the only
realistic option for a PHP programmer with a minimal C skillset. It
ensured that the proposal gained exposure, lots of feedback and an
opportunity to pick up a real C programmer who could take it further.

In any case, hopefully I'll be back with real hardcore C code for PHP
5.5. In the meantime, if anyone has any lingering concerns or
questions about the RFC, let me know!

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Paul Dragoonis — view source

unread

Yep, I see where my suggestion for filter_var() isn't relevant.

I use symfony2's ecaper in the PPI\Templating\ component, and really like it.
Zend2's also seems pretty good.

It'd be nice to have this available as a ./ext/spl/ class or an
independent extension (really needed for 1 class?).

Cheers,
Paul.

Hi Paul,

The thing is that filter_var() is strongly associated with input
sanitisation whereas Escaper addresses the other end of output. Also,
escaping is inextricably linked to character encoding - we can't run
into situations where the functions are specific to something like
UTF-8 when the character encodings used in real life are far more
diverse. Additionally, the RFC was an attempt to make escaping as
explicit and restrictive as possible - give a user too many options,
or too many dispersed units of functionality, and they'll invariably
confuse and misinterpret themselves to Hell ;).

Note: There is a stack of folk, for example, who use the ext/filter
URL validator for HTTP validation - it also passes php:// and
javascript:// URLs. If we're not explicit, they won't ever notice when
they're doing it wrong.

Paddy

Hi Paddy,

Couldn't this just be a new option for the filter_var() function?

$clean = filter_var($_POST['someVar'], XSS_CLEAN);

I see from your RFC that you have a bunch of functions, I believe all
these could be options to filter_var, ie.: FILTER_ESCAPE_[URL, JS,
CSS, HTMLATTR].

Paul.

Paul.

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

Best regards,
Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Michael Shadle — view source

unread

Yep, I see where my suggestion for filter_var() isn't relevant.

I use symfony2's ecaper in the PPI\Templating\ component, and really like it.
Zend2's also seems pretty good.

It'd be nice to have this available as a ./ext/spl/ class or an
independent extension (really needed for 1 class?).

Cheers,
Paul.

Please provide it as a procedural extension as well like filter is. I don't care if there is an OO counterpart, but I detest OO and don't want to see basic php things being introduced now (if accepted) only as OO. It feels like PHP is losing its roots...

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Derick,

This is already available over composer. The RFC contains links to the
two frameworks which have implemented Escapers in line with the RFC.

The point of the RFC is to ensure a consistent API for escaping is
available to all PHP programmers without resorting to userland
solutions. Existing functions are widely misused, misconfigured or
have builtin security issues yet are popularly advanced as "escaping"
for XSS.

XSS is also...XSS. It's either the first or second most common
vulnerability in web applications (depending on whose data you use). I
think it warrants PHP distributing a proper solution out of the box.

Paddy

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

I understand that this is really beneficial to have, but, I wonder, why
can't this be a composer-installable class, implemented in PHP? It
solves the issue that you need to find a volunteer, as well as that
updating it is a lot easier, and, you don't have to rely on shared
hosters having it enabled.

I realize that you want to have this
generally available, but for that we have ext/filter - which is not
really used too much I think. Why would this be different? IMO, we
should make a composer installable package for this, and then litter all
our escaping related document pages with links to this new package.

cheers,
Derick

--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by jpauli — view source

unread

Hi Derick,

This is already available over composer. The RFC contains links to the
two frameworks which have implemented Escapers in line with the RFC.

The point of the RFC is to ensure a consistent API for escaping is
available to all PHP programmers without resorting to userland
solutions. Existing functions are widely misused, misconfigured or
have builtin security issues yet are popularly advanced as "escaping"
for XSS.

XSS is also...XSS. It's either the first or second most common
vulnerability in web applications (depending on whose data you use). I
think it warrants PHP distributing a proper solution out of the box.

Paddy

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

I understand that this is really beneficial to have, but, I wonder, why
can't this be a composer-installable class, implemented in PHP? It
solves the issue that you need to find a volunteer, as well as that
updating it is a lot easier, and, you don't have to rely on shared
hosters having it enabled.

I realize that you want to have this
generally available, but for that we have ext/filter - which is not
really used too much I think. Why would this be different? IMO, we
should make a composer installable package for this, and then litter all
our escaping related document pages with links to this new package.

cheers,
Derick

--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--

Implementing this to Core may be very nice, but as well very hard to do.
String escaping is a pain to implement in C. One would tell : once
it's done, it's OK, but unfortunately, that's not the case, as XSS
rules evolve throught time as the attacks evolve.

See the escape modules web servers tried to push (mod_security and its
counterpart in Nginx), its PITA to maintain if you want something that
covers a large area.
By the way : why not let the web server do this as nowadays, they seem
to manage that problem ?

Julien.P

12 years ago by Michael Shadle — view source

unread

Also as there is also htmlspecialchars() which most people use for escaping this seems like a better, more centralized functionality and better nomenclature for escaping on output in general with options for various types (and should just be utf-8 by default :))

12 years ago by Rasmus Lerdorf — view source

unread

Also as there is also htmlspecialchars() which most people use for escaping this seems like a better, more centralized functionality and better nomenclature for escaping on output in general with options for various types (and should just be utf-8 by default :))

It is utf-8 by default as of PHP 5.4.

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Michael,

See the link near the bottom of the RFC - even htmlspecialchars() has
unusual behaviour that's potentially insecure. I have no objections to
there being functions, of course, and the RFC makes that clear.
However, many programmers like me are obsessed are objects so having
an SPL class will obviously be near and dear to my design patterned
heart ;).

Paddy

Also as there is also htmlspecialchars() which most people use for escaping this seems like a better, more centralized functionality and better nomenclature for escaping on output in general with options for various types (and should just be utf-8 by default :))

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Michael Shadle — view source

unread

Hi Michael,

See the link near the bottom of the RFC - even htmlspecialchars() has
unusual behaviour that's potentially insecure. I have no objections to
there being functions, of course, and the RFC makes that clear.
However, many programmers like me are obsessed are objects so having
an SPL class will obviously be near and dear to my design patterned
heart ;).

After looking over the RFC finally, would it be that crazy to consider
this an extension of the standard string functions?

str_escape($string, $encoding, $flags) or probably better
str_escape($string, $flags, $encoding) - since encoding could be
defaulted to UTF-8, but flags are really what differentiate the
behavior...

Then there is not a handful of functions but rather one that can be
used as the abstraction point and the flags passed to it will change
it's behavior, much like the filter functions.

(I just see this falling under one solid defacto escape function
standard, and it could live by itself as "escape" or something, or as
it operates on strings, prefix it as such)

12 years ago by Paul Dragoonis — view source

unread

Hi Michael,

See the link near the bottom of the RFC - even htmlspecialchars() has
unusual behaviour that's potentially insecure. I have no objections to
there being functions, of course, and the RFC makes that clear.
However, many programmers like me are obsessed are objects so having
an SPL class will obviously be near and dear to my design patterned
heart ;).

After looking over the RFC finally, would it be that crazy to consider
this an extension of the standard string functions?

str_escape($string, $encoding, $flags) or probably better
str_escape($string, $flags, $encoding) - since encoding could be

I'm also proposing escape_var(), just like filter_var(). If it's
str_escape() then that's still sane, it's consistent and that's all
that matters. :-)

Paul.

defaulted to UTF-8, but flags are really what differentiate the
behavior...

Then there is not a handful of functions but rather one that can be
used as the abstraction point and the flags passed to it will change
it's behavior, much like the filter functions.

(I just see this falling under one solid defacto escape function
standard, and it could live by itself as "escape" or something, or as
it operates on strings, prefix it as such)

12 years ago by David Muir — view source

unread

Hi Michael,

See the link near the bottom of the RFC - even htmlspecialchars() has
unusual behaviour that's potentially insecure. I have no objections to
there being functions, of course, and the RFC makes that clear.
However, many programmers like me are obsessed are objects so having
an SPL class will obviously be near and dear to my design patterned
heart ;).
After looking over the RFC finally, would it be that crazy to consider
this an extension of the standard string functions?

str_escape($string, $encoding, $flags) or probably better
str_escape($string, $flags, $encoding) - since encoding could be
defaulted to UTF-8, but flags are really what differentiate the
behavior...

Then there is not a handful of functions but rather one that can be
used as the abstraction point and the flags passed to it will change
it's behavior, much like the filter functions.

(I just see this falling under one solid defacto escape function
standard, and it could live by itself as "escape" or something, or as
it operates on strings, prefix it as such)

I'm on the fence with a default encoding. I tend to always use UTF-8, so
would probably just use the default. But that means I become less aware
of encoding being central to the issue, and therefore much easier to
accidentally leave it as UTF-8 when it should be something else.

-1 on the flag based api, though. Would much rather have dedicated
functions. It's easier to read, and less verbose.

<?php $enc = getMyEncoding();?>

<?php $enc = getMyEncoding();?>

<?php $e = new Escaper(getMyEncoding())?>

<table title="<?=$e->escapeHtmlAttr($someTitle)?>"> <tr> <td><?=$e->escapeHtml($foo)?></td> <td><?=$e->escapeHtml($bar)?></td> </tr> </table>

I prefer the OO API since it's succinct, but if we are to have a
procedural API as well, lets make it concise and clear. The last thing
we want for solving the no 1 security threat to web-apps is a confusing
and hard-to-use API like the filter extension.

Cheers,
David

12 years ago by Levi Morrison — view source

unread

On Tue, Sep 18, 2012 at 10:32 AM, Pádraic Brady padraic.brady@gmail.com
wrote:

Hi Michael,

See the link near the bottom of the RFC - even htmlspecialchars() has
unusual behaviour that's potentially insecure. I have no objections to
there being functions, of course, and the RFC makes that clear.
However, many programmers like me are obsessed are objects so having
an SPL class will obviously be near and dear to my design patterned
heart ;).

After looking over the RFC finally, would it be that crazy to consider
this an extension of the standard string functions?

str_escape($string, $encoding, $flags) or probably better
str_escape($string, $flags, $encoding) - since encoding could be
defaulted to UTF-8, but flags are really what differentiate the
behavior...

Then there is not a handful of functions but rather one that can be
used as the abstraction point and the flags passed to it will change
it's behavior, much like the filter functions.

(I just see this falling under one solid defacto escape function
standard, and it could live by itself as "escape" or something, or as
it operates on strings, prefix it as such)

I'm on the fence with a default encoding. I tend to always use UTF-8, so
would probably just use the default. But that means I become less aware of
encoding being central to the issue, and therefore much easier to
accidentally leave it as UTF-8 when it should be something else.

-1 on the flag based api, though. Would much rather have dedicated
functions. It's easier to read, and less verbose.

<?php $enc = getMyEncoding();?>
<table title="<?=str_escape($foo, ESCAPE_HTML_ATTRIBUTE, $enc)?>"> <tr> <td><?=str_escape($foo, ESCAPE_HTML, $enc)?></td> <td><?=str_escape($bar, ESCAPE_HTML, $enc)?></td> </tr> </table> 
<?php $enc = getMyEncoding();?>
<table title="<?=escape_html_attribute($foo, $enc)?>"> <tr> <td><?=escape_html($foo, $enc)?></td> <td><?=escape_html($bar, $enc)?></td> </tr> </table> 
<?php $e = new Escaper(getMyEncoding())?>
<table title="<?=$e->escapeHtmlAttr($someTitle)?>"> <tr> <td><?=$e->escapeHtml($foo)?></td> <td><?=$e->escapeHtml($bar)?></td> </tr> </table>
I prefer the OO API since it's succinct, but if we are to have a procedural
API as well, lets make it concise and clear. The last thing we want for
solving the no 1 security threat to web-apps is a confusing and hard-to-use
API like the filter extension.

Cheers,
David

+1 to most of what you said.

I will disagree on the filter extension. Its main problems are lack of
publicity and functions with similar names that don't do what you
would expect (Consider is_int). I use FILTER_VALIDATE_BOOLEAN and
FILTER_VALIDATE_INT on a regular basis, and have used other filters. I
will agree it's API could have been better, but it is not confusing in
my opinion.

I do want to make it clear that I support a simple, clean API if we do
implement more escaping functions.

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Michael,

After looking over the RFC finally, would it be that crazy to consider
this an extension of the standard string functions?

str_escape($string, $encoding, $flags) or probably better
str_escape($string, $flags, $encoding) - since encoding could be
defaulted to UTF-8, but flags are really what differentiate the
behavior...

Then there is not a handful of functions but rather one that can be
used as the abstraction point and the flags passed to it will change
it's behavior, much like the filter functions.

(I just see this falling under one solid defacto escape function
standard, and it could live by itself as "escape" or something, or as
it operates on strings, prefix it as such)

I think the filter_var() approach to using flags to switch core
behaviour is flawed for any number of reasons but consider being a
programmer writing PHP templates...

htmlspecialchars($value, ENT_QUOTES|ENT_SUBSTITUTE, 'utf-8');
str_escape($string, ESCAPE_HTML_BODY, 'utf-8');

vs

escape_html($value, 'utf-8');
$e->escapeHtml($value);

Brevity and a clear meaning have their advantages.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Julien,

I think you're mixing these up. The RFC addresses escaping or encoding
of data on output to HTML/XML (e.g. PHP templates or Twig). It doesn't
act as an input filter to catch attempted XSS/SQLi where fuzzing can
disguise the attempt and wheedle its way past countless regular
expressions - these are always broken in time. In this specific case,
the rules for escaping are well known and very rarely change unless
HTML sees some very dramatic changes in how tags and attributes are
defined.

Paddy

Hi Derick,

This is already available over composer. The RFC contains links to the
two frameworks which have implemented Escapers in line with the RFC.

The point of the RFC is to ensure a consistent API for escaping is
available to all PHP programmers without resorting to userland
solutions. Existing functions are widely misused, misconfigured or
have builtin security issues yet are popularly advanced as "escaping"
for XSS.

XSS is also...XSS. It's either the first or second most common
vulnerability in web applications (depending on whose data you use). I
think it warrants PHP distributing a proper solution out of the box.

Paddy

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

I understand that this is really beneficial to have, but, I wonder, why
can't this be a composer-installable class, implemented in PHP? It
solves the issue that you need to find a volunteer, as well as that
updating it is a lot easier, and, you don't have to rely on shared
hosters having it enabled.

I realize that you want to have this
generally available, but for that we have ext/filter - which is not
really used too much I think. Why would this be different? IMO, we
should make a composer installable package for this, and then litter all
our escaping related document pages with links to this new package.

cheers,
Derick

--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--

Implementing this to Core may be very nice, but as well very hard to do.
String escaping is a pain to implement in C. One would tell : once
it's done, it's OK, but unfortunately, that's not the case, as XSS
rules evolve throught time as the attacks evolve.

See the escape modules web servers tried to push (mod_security and its
counterpart in Nginx), its PITA to maintain if you want something that
covers a large area.
By the way : why not let the web server do this as nowadays, they seem
to manage that problem ?

Julien.P

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Bryan C. Geraghty — view source

unread

Hello everyone,

Paddy is correct here. The purpose of this API is output ENCODING which is a
very good thing. This discussion provides a very good case for a point I
made via Twitter this morning: In this RFC, all uses of the term "escape"
should be replaced by the term "encode".

This is not solely a problem with this RFC. The term "escape" is being used
developers in the industry when they mean "encoding". This is bad thing
because, from a security perspective, escaping is exactly the opposite of
encoding.

Escaping is done by setting up a black-list and replacing those elements
with an approved variant.
Encoding is done by converting all of the input data into the target
format. Some bytes may end up being exactly the same but they are all
processed.

I understand why people on this list are associating the functionality
defined in this RFC with filtering because the name is leading them astray.

Besides the fundamental difference in the definitions of each item, the
security industry is using the term "encoding"; take a look at the OWASP
documentation for a quick example.

If we want developers with little application security background to be able
to understand these things, we need to be consistent.

Bryan

-----Original Message-----
From: Pádraic Brady [mailto:padraic.brady@gmail.com]
Sent: Tuesday, September 18, 2012 12:29 PM
To: jpauli
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

Hi Julien,

I think you're mixing these up. The RFC addresses escaping or encoding of
data on output to HTML/XML (e.g. PHP templates or Twig). It doesn't act as
an input filter to catch attempted XSS/SQLi where fuzzing can disguise the
attempt and wheedle its way past countless regular expressions - these are
always broken in time. In this specific case, the rules for escaping are
well known and very rarely change unless HTML sees some very dramatic
changes in how tags and attributes are defined.

Paddy

On Tue, Sep 18, 2012 at 2:27 PM, Pádraic Brady padraic.brady@gmail.com
wrote:

Hi Derick,

This is already available over composer. The RFC contains links to
the two frameworks which have implemented Escapers in line with the RFC.

The point of the RFC is to ensure a consistent API for escaping is
available to all PHP programmers without resorting to userland
solutions. Existing functions are widely misused, misconfigured or
have builtin security issues yet are popularly advanced as "escaping"
for XSS.

XSS is also...XSS. It's either the first or second most common
vulnerability in web applications (depending on whose data you use).
I think it warrants PHP distributing a proper solution out of the box.

Paddy

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities
in web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this,
I've written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to
pepper me with questions. As the RFC notes, I'm obviously not a C
programmer so I'm reliant on finding a volunteer who's willing to
take this one under their wing (or into their basement - whichever
works).

https://wiki.php.net/rfc/escaper

I understand that this is really beneficial to have, but, I wonder,
why can't this be a composer-installable class, implemented in PHP?
It solves the issue that you need to find a volunteer, as well as
that updating it is a lot easier, and, you don't have to rely on
shared hosters having it enabled.

I realize that you want to have this generally available, but for
that we have ext/filter - which is not really used too much I
think. Why would this be different? IMO, we should make a composer
installable package for this, and then litter all our escaping
related document pages with links to this new package.

cheers,
Derick

--
http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a
donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
To unsubscribe,
visit: http://www.php.net/unsub.php

Implementing this to Core may be very nice, but as well very hard to do.
String escaping is a pain to implement in C. One would tell : once
it's done, it's OK, but unfortunately, that's not the case, as XSS
rules evolve throught time as the attacks evolve.

See the escape modules web servers tried to push (mod_security and its
counterpart in Nginx), its PITA to maintain if you want something that
covers a large area.
By the way : why not let the web server do this as nowadays, they seem
to manage that problem ?

Julien.P

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
To unsubscribe, visit:
http://www.php.net/unsub.php

12 years ago by Anthony Ferrara — view source

unread

Bryan et al,

On Tue, Sep 18, 2012 at 1:58 PM, Bryan C. Geraghty bryan@ravensight.orgwrote:

Hello everyone,

Paddy is correct here. The purpose of this API is output ENCODING which is
a
very good thing. This discussion provides a very good case for a point I
made via Twitter this morning: In this RFC, all uses of the term "escape"
should be replaced by the term "encode".

This is not solely a problem with this RFC. The term "escape" is being used
developers in the industry when they mean "encoding". This is bad thing
because, from a security perspective, escaping is exactly the opposite of
encoding.

It's a very common thing: http://cwe.mitre.org/data/definitions/116.html

The usage of the "encoding" and "escaping" terms varies widely. For
example, in some programming languages, the terms are used interchangeably,
while other languages provide APIs that use both terms for different tasks.
This overlapping usage extends to the Web, such as the "escape" JavaScript
function whose purpose is stated to be encoding. Of course, the concepts of
encoding and escaping predate the Web by decades. Given such a context, it
is difficult for CWE to adopt a consistent vocabulary that will not be
misinterpreted by some constituency.

I think that picking one, and sticking with it is fine. No matter which is
chosen...

Escaping is done by setting up a black-list and replacing those elements

with an approved variant.

Encoding is done by converting all of the input data into the target
format. Some bytes may end up being exactly the same but they are all
processed.

With the end result being the exact same...

I understand why people on this list are associating the functionality
defined in this RFC with filtering because the name is leading them astray.

Besides the fundamental difference in the definitions of each item, the
security industry is using the term "encoding"; take a look at the OWASP
documentation for a quick example.

The OWASP documentation uses them interchangeably. However, specifically
for this task, the ESAPI is defined as a:
https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

The OWASP ESAPI https://www.owasp.org/index.php/ESAPI project has created
an escaping library in a variety of languages including Java, PHP, Classic
ASP, Cold Fusion, Python, and Haskell.

If we want developers with little application security background to be able

to understand these things, we need to be consistent.

In this case, I'm not sure consistency with the industry is as important
(mainly because the industry is itself inconsistent). The important thing
is to pick one and stick to it. I would suggest "escape" mainly because
people in PHP are already familiar with it (via mysql_real_escape_string,
etc)...

Anthony

12 years ago by Bryan C. Geraghty — view source

unread

Antony,

I'll concede that the term "escaping" is improperly used in many places;
even in the OWASP documentation.

But I'll point out that the CWE document is identifying a distinction in the
two terms by saying, "This overlapping usage extends to the Web, such as
the "escape" JavaScript function whose purpose is stated to be encoding".

But when you say, "With the end result being the exact same...", I don't
think you've thought it through. I've read some of your stuff and I'm pretty
confident that you understand the benefits of white-listing over
black-listing. For the uninitiated, yes, a black-list can be configured to
produce the same results at a given point-in-time, but the fundamental
approach is different. A white-list operates on an explicit specification
and lets nothing else through. A black-list assumes that the input data is
mostly correct and it filters out the bad. To add to that, how do you
convert from ISO-8859-1 to UTF-8 with a black-list or by escaping?

Your reference to mysql_real_escape_string is exactly the point I'm trying
to make. The use of that function is "discouraged" because it DID escape; it
looked for specific bad characters. It was fundamentally flawed. And that is
the functionality PHP developers, as you just demonstrated, will refer to.
The current recommendation is to use a library that properly encodes the
entire data stream.

I'll also agree that consistency with the industry is not as important
because there seem to be plenty of misuses. However, I do think that we
should use terminology that sets the functionality apart. So, given the
operating mode difference and the precedent set by mysql_escape_string,
mysql_real_escape_string, etc., I think "encode" is the way to go.

Thanks,

Bryan

From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, September 18, 2012 1:09 PM
To: Bryan C. Geraghty
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

Bryan et al,

On Tue, Sep 18, 2012 at 1:58 PM, Bryan C. Geraghty bryan@ravensight.org
wrote:

Hello everyone,

Paddy is correct here. The purpose of this API is output ENCODING which is a
very good thing. This discussion provides a very good case for a point I
made via Twitter this morning: In this RFC, all uses of the term "escape"
should be replaced by the term "encode".

This is not solely a problem with this RFC. The term "escape" is being used
developers in the industry when they mean "encoding". This is bad thing
because, from a security perspective, escaping is exactly the opposite of
encoding.

It's a very common thing: http://cwe.mitre.org/data/definitions/116.html

The usage of the "encoding" and "escaping" terms varies widely. For example,
in some programming languages, the terms are used interchangeably, while
other languages provide APIs that use both terms for different tasks. This
overlapping usage extends to the Web, such as the "escape" JavaScript
function whose purpose is stated to be encoding. Of course, the concepts of
encoding and escaping predate the Web by decades. Given such a context, it
is difficult for CWE to adopt a consistent vocabulary that will not be
misinterpreted by some constituency.

I think that picking one, and sticking with it is fine. No matter which is
chosen...

Escaping is done by setting up a black-list and replacing those elements
with an approved variant.
Encoding is done by converting all of the input data into the target
format. Some bytes may end up being exactly the same but they are all
processed.

With the end result being the exact same...

I understand why people on this list are associating the functionality
defined in this RFC with filtering because the name is leading them astray.

Besides the fundamental difference in the definitions of each item, the
security industry is using the term "encoding"; take a look at the OWASP
documentation for a quick example.

The OWASP documentation uses them interchangeably. However, specifically for
this task, the ESAPI is defined as a:
https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)Prevention_Cheat
Sheet

The OWASP https://www.owasp.org/index.php/ESAPI ESAPI project has created
an escaping library in a variety of languages including Java, PHP, Classic
ASP, Cold Fusion, Python, and Haskell.

If we want developers with little application security background to be able
to understand these things, we need to be consistent.

In this case, I'm not sure consistency with the industry is as important
(mainly because the industry is itself inconsistent). The important thing is
to pick one and stick to it. I would suggest "escape" mainly because people
in PHP are already familiar with it (via mysql_real_escape_string, etc)...

Anthony

12 years ago by Anthony Ferrara — view source

unread

Bryan,

On Tue, Sep 18, 2012 at 2:52 PM, Bryan C. Geraghty bryan@ravensight.orgwrote:

Antony,****

I’ll concede that the term “escaping” is improperly used in many places;
even in the OWASP documentation.****

But I’ll point out that the CWE document is identifying a distinction in
the two terms by saying, “This overlapping usage extends to the Web,
such as the "escape" JavaScript function whose purpose is stated to be
encoding”.

There is a distinction between them. But in this case it's not particularly
relevant (as both work quite fine). I'll elaborate further in a second.

But when you say, “With the end result being the exact same...”, I don’t
think you’ve thought it through. I’ve read some of your stuff and I’m
pretty confident that you understand the benefits of white-listing over
black-listing. For the uninitiated, yes, a black-list can be configured to
produce the same results at a given point-in-time, but the fundamental
approach is different. A white-list operates on an explicit specification
and lets nothing else through. A black-list assumes that the input data is
mostly correct and it filters out the bad. To add to that, how do you
convert from ISO-8859-1 to UTF-8 with a black-list or by escaping?

You hit the nail on the head here. You cannot black-list convert ISO-8859-1
to UTF-8. However, when we talk about escaping, we're talking about a
context where the encoding is already correct, we're just preventing
special characters from imparting special meaning. In that case, escaping
is the correct way of handling it.

But if you wanted to output arbitrary input into a UTF-8 document, you
would also need to ensure that it's encoded properly into UTF-8. So I can
see your distinction applying to that case. But from a different angle.

Escaping preserves the security context. Encoding preserves the semantic
context. You could escape away all invalid UTF-8 bytes, but you'd loose the
meaning of the original character set. So semantically, encoding is
necessary. But from a security perspective, the encoding doesn't really
matter much. What matters is the security context (not injecting harmful
code, etc).

Now, both can be handled by the same routine. But that's not necessary to
preserve the security aspect. And that's why I objected to using the term
"encoding" here. If we want to go that route, that's fine. But you don't
need to encode for security. Escaping will handle that (possibly at the
expense of invalid semantic meaning).

Your reference to mysql_real_escape_string is exactly the point I’m trying
to make. The use of that function is “discouraged” because it DID escape;
it looked for specific bad characters. It was fundamentally flawed. And
that is the functionality PHP developers, as you just demonstrated, will
refer to. The current recommendation is to use a library that properly
encodes the entire data stream.

How is mres fundamentally flawed? And how is it discouraged? It's actually
listed as a valid defense by OWASP:
https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet#Defense_Option_3:_Escaping_All_User_Supplied_Input

The only 2 ways of securely getting data to MySQL is either by escaping, or
binding as a parameter on a prepared statement. Neither of which encodes a
data stream (the PS uses a binary format that puts the data in plain binary
form, as is, with a header to identify length).

Black listing works fine for a specified format (like XML, like HTML, like
SQL, like JavaScript). Where you get in trouble with black lists is when
your data format isn't specified (hence edge-cases aren't well known) or
when you're not serializing to a format (generic input black lists). But
for escaping output, black lists are a very well known, well understood,
and easily implemented approach.

I’ll also agree that consistency with the industry is not as important
because there seem to be plenty of misuses. However, I do think that we
should use terminology that sets the functionality apart. So, given the
operating mode difference and the precedent set by mysql_escape_string,
mysql_real_escape_string, etc., I think “encode” is the way to go.

I think it strongly depends upon the exact behavior of the library. If we
do wind up doing transcoding as well as escaping, then that may be valid.
If we don't, then it wouldn't.

But I think we can both agree on the need...

Anthony

12 years ago by Bryan C. Geraghty — view source

unread

Anthony,

I'll run through some of the semantics related to your response in case it
provides any insight that would encourage a direction change for this
functionality. But overall, I agree 100% with your closing line, "I think it
strongly depends upon the exact behavior of the library. If we do wind up
doing transcoding as well as escaping, then that may be valid. If we don't,
then it wouldn't."

Now, onto the nitty-gritty.

"You hit the nail on the head here. You cannot black-list convert ISO-8859-1
to UTF-8. However, when we talk about escaping, we're talking about a
context where the encoding is already correct, we're just preventing special
characters from imparting special meaning. In that case, escaping is the
correct way of handling it."

We can never safely assume that the encoding is correct. If the encoding of
the original data is different than the assumed encoding, characters with
"special meaning" may have different values and will be allowed through. For
a simple proof-of-concept, see
http://shiflett.org/blog/2005/dec/google-xss-example. Now, that is a
specific exploit for an underlying vulnerability. The vulnerability is the
fact that htmlentities() doesn't decode the input before trying to escape
characters.

"But if you wanted to output arbitrary input into a UTF-8 document, you
would also need to ensure that it's encoded properly into UTF-8. So I can
see your distinction applying to that case. But from a different angle."

This is getting closer to the root of the problem.

"Escaping preserves the security context. Encoding preserves the semantic
context. You could escape away all invalid UTF-8 bytes, but you'd loose the
meaning of the original character set. So semantically, encoding is
necessary. But from a security perspective, the encoding doesn't really
matter much. What matters is the security context (not injecting harmful
code, etc)."

What I'm trying to convey is that all context relevant to the operation
matters. In this case, if characters are compared/replaced at the
byte-level, we need to decode to the byte-level before performing those
operations. To take that further, It's important for everyone to realize
that encoding doesn't just apply to character sets; data is encoded for a
specific layer. This is the same problem that the TCP and ISO layers solved
decades ago; we're just adding layers above the application layer. You
wouldn't expect an HTML parser to be able to parse JavaScript because they
are different encodings. If you wanted to translate an HTML implementation
cleanly to a JavaScript implementation, you would have to decode the HTML
and then build a translator to build the same DOM elements in JavaScript. I
know that's sort of a blurry line, but I need to wrap this up. Hopefully,
I've conveyed the idea.

The sooner we all grasp this concept of encoding layers, the sooner this
problem of injection/scripting at every layer goes away. The solution:
Decode all inputs, halt execution on decoding errors, and then re-encode
them. Yes, this is going to add overhead. But where security is concerned,
we have to be willing to accept some overhead.

Okay, with that out of the way, I'll reiterate my agreement with your
statement, "I think it strongly depends upon the exact behavior of the
library. If we do wind up doing transcoding as well as escaping, then that
may be valid. If we don't, then it wouldn't."

If the aim of this API is to really tackle the problem, we need to go beyond
wrapping htmlentities() and htmlspecialchars() and change the names to
"encode". If it's just to maintain the status quo and leave it to developers
who barely understand encoding or escaping to ensure that their entire stack
is using the same encoding, then we should leave the name as-is.

How is mres fundamentally flawed? And how is it discouraged? It's actually
listed as a valid defense by OWASP:
https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet#Defense
_Option_3:_Escaping_All_User_Supplied_Input

The official PHP documentation discourages the use of
mysql_real_escape_string:
http://php.net/manual/en/function.mysql-real-escape-string.php. The
recommendation is to use a library that is character-set aware, like mysqli
or PDO. But note that even using mysqli_real_escape_string or PDO:quote
requires you to manually set the connection-level character-set. I've been
operating on the assumption (there I go assuming) that PDO prepared
statements were aware of the connection-level character set and mitigated
this problem; however, I just reviewed PDO's source code and I'm starting to
question its implementation. As for your OWASP reference, keep in mind that
OWASP makes many tiers of recommendations. Notice that manually escaping is
the last option for mitigating injection problems.

In any case, I'm not here to carry on an endless flame war. I just want to
make sure that we're doing what's necessary to mitigate the number one
vulnerability in web applications.

Thanks,

Bryan

From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, September 18, 2012 2:12 PM
To: Bryan C. Geraghty
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

Bryan,

On Tue, Sep 18, 2012 at 2:52 PM, Bryan C. Geraghty bryan@ravensight.org
wrote:

Antony,

I'll concede that the term "escaping" is improperly used in many places;
even in the OWASP documentation.

But I'll point out that the CWE document is identifying a distinction in the
two terms by saying, "This overlapping usage extends to the Web, such as
the "escape" JavaScript function whose purpose is stated to be encoding".

There is a distinction between them. But in this case it's not particularly
relevant (as both work quite fine). I'll elaborate further in a second.

But when you say, "With the end result being the exact same...", I don't
think you've thought it through. I've read some of your stuff and I'm pretty
confident that you understand the benefits of white-listing over
black-listing. For the uninitiated, yes, a black-list can be configured to
produce the same results at a given point-in-time, but the fundamental
approach is different. A white-list operates on an explicit specification
and lets nothing else through. A black-list assumes that the input data is
mostly correct and it filters out the bad. To add to that, how do you
convert from ISO-8859-1 to UTF-8 with a black-list or by escaping?

You hit the nail on the head here. You cannot black-list convert ISO-8859-1
to UTF-8. However, when we talk about escaping, we're talking about a
context where the encoding is already correct, we're just preventing special
characters from imparting special meaning. In that case, escaping is the
correct way of handling it.

But if you wanted to output arbitrary input into a UTF-8 document, you would
also need to ensure that it's encoded properly into UTF-8. So I can see your
distinction applying to that case. But from a different angle.

Escaping preserves the security context. Encoding preserves the semantic
context. You could escape away all invalid UTF-8 bytes, but you'd loose the
meaning of the original character set. So semantically, encoding is
necessary. But from a security perspective, the encoding doesn't really
matter much. What matters is the security context (not injecting harmful
code, etc).

Now, both can be handled by the same routine. But that's not necessary to
preserve the security aspect. And that's why I objected to using the term
"encoding" here. If we want to go that route, that's fine. But you don't
need to encode for security. Escaping will handle that (possibly at the
expense of invalid semantic meaning).

Your reference to mysql_real_escape_string is exactly the point I'm trying
to make. The use of that function is "discouraged" because it DID escape; it
looked for specific bad characters. It was fundamentally flawed. And that is
the functionality PHP developers, as you just demonstrated, will refer to.
The current recommendation is to use a library that properly encodes the
entire data stream.

How is mres fundamentally flawed? And how is it discouraged? It's actually
listed as a valid defense by OWASP:
https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet#Defense
_Option_3:_Escaping_All_User_Supplied_Input

The only 2 ways of securely getting data to MySQL is either by escaping, or
binding as a parameter on a prepared statement. Neither of which encodes a
data stream (the PS uses a binary format that puts the data in plain binary
form, as is, with a header to identify length).

Black listing works fine for a specified format (like XML, like HTML, like
SQL, like JavaScript). Where you get in trouble with black lists is when
your data format isn't specified (hence edge-cases aren't well known) or
when you're not serializing to a format (generic input black lists). But for
escaping output, black lists are a very well known, well understood, and
easily implemented approach.

I'll also agree that consistency with the industry is not as important
because there seem to be plenty of misuses. However, I do think that we
should use terminology that sets the functionality apart. So, given the
operating mode difference and the precedent set by mysql_escape_string,
mysql_real_escape_string, etc., I think "encode" is the way to go.

I think it strongly depends upon the exact behavior of the library. If we do
wind up doing transcoding as well as escaping, then that may be valid. If we
don't, then it wouldn't.

But I think we can both agree on the need...

Anthony

12 years ago by Anthony Ferrara — view source

unread

Bryan,

“You hit the nail on the head here. You cannot black-list convert

ISO-8859-1 to UTF-8. However, when we talk about escaping, we're talking
about a context where the encoding is already correct, we're just
preventing special characters from imparting special meaning. In that case,
escaping is the correct way of handling it.”****

We can never safely assume that the encoding is correct. If the encoding
of the original data is different than the assumed encoding, characters
with “special meaning” may have different values and will be allowed
through. For a simple proof-of-concept, see
http://shiflett.org/blog/2005/dec/google-xss-example. Now, that is a
specific exploit for an underlying vulnerability. The vulnerability is the
fact that htmlentities() doesn’t decode the input before trying to escape
characters.

Actually, in my mind, that's the role of filtering. You should filter the
proper charset. Everything inside of the application should have a
consistent character set. And if that's the case, these sorts of
vulnerabilities (not to mention a whole host of possible bugs) are no
longer possible...

What I’m trying to convey is that all context relevant to the operation
matters. In this case, if characters are compared/replaced at the
byte-level, we need to decode to the byte-level before performing those
operations. To take that further, It’s important for everyone to realize
that encoding doesn’t just apply to character sets; data is encoded for a
specific layer. This is the same problem that the TCP and ISO layers solved
decades ago; we’re just adding layers above the application layer. You
wouldn’t expect an HTML parser to be able to parse JavaScript because they
are different encodings. If you wanted to translate an HTML implementation
cleanly to a JavaScript implementation, you would have to decode the HTML
and then build a translator to build the same DOM elements in JavaScript. I
know that’s sort of a blurry line, but I need to wrap this up. Hopefully,
I’ve conveyed the idea.****

The sooner we all grasp this concept of encoding layers, the sooner this
problem of injection/scripting at every layer goes away. The solution:
Decode all inputs, halt execution on decoding errors, and then re-encode
them. Yes, this is going to add overhead. But where security is concerned,
we have to be willing to accept some overhead.

Again, that's the role of filtering. Inputs should never get to a
presentation layer unfiltered. That's a bigger problem that needs to be
addressed first. But I would concede that it's worth doing again at output
to catch any issues. But those issues it catches should be seen as
application bugs and not a caught attack vector...

Okay, with that out of the way, I’ll reiterate my agreement with your
statement, “I think it strongly depends upon the exact behavior of the
library. If we do wind up doing transcoding as well as escaping, then that
may be valid. If we don't, then it wouldn't.“****

If the aim of this API is to really tackle the problem, we need to go
beyond wrapping htmlentities() and htmlspecialchars() and change the names
to “encode”. If it’s just to maintain the status quo and leave it to
developers who barely understand encoding or escaping to ensure that their
entire stack is using the same encoding, then we should leave the name
as-is.

Just wrapping any library is often not a good idea. We'd need to add
meaningful logic in addition to the namespace name change. So yes, I'm in
favor of doing it right at that point...

The official PHP documentation discourages the use of
mysql_real_escape_string:
http://php.net/manual/en/function.mysql-real-escape-string.php. The
recommendation is to use a library that is character-set aware, like mysqli
or PDO. But note that even using mysqli_real_escape_string or PDO:quote
requires you to manually set the connection-level character-set. I’ve been
operating on the assumption (there I go assuming) that PDO prepared
statements were aware of the connection-level character set and mitigated
this problem; however, I just reviewed PDO’s source code and I’m starting
to question its implementation. As for your OWASP reference, keep in mind
that OWASP makes many tiers of recommendations. Notice that manually
escaping is the last option for mitigating injection problems.

In short, that's wrong (MRES is encouraged). But I've taken the reply
off-list as it's off topic here.

In any case, I’m not here to carry on an endless flame war. I just want to
make sure that we’re doing what’s necessary to mitigate the number one
vulnerability in web applications.

I don't think this discussion is a flame war. I think it's a very good and
constructive point that needs to be made. It's at least a whole lot more
important and relevant than the last 40 posts on OOP vs Procedural names...

Anthony

12 years ago by padraic.brady@gmail.com — view source

unread

Hi al

In any case, I’m not here to carry on an endless flame war. I just want to
make sure that we’re doing what’s necessary to mitigate the number one
vulnerability in web applications.

I don't think this discussion is a flame war. I think it's a very good and
constructive point that needs to be made. It's at least a whole lot more
important and relevant than the last 40 posts on OOP vs Procedural names...

Anthony

I wouldn't categorise this as a flamewar either. I'm remaining silent
simply because everyone is making points and I have other emails to
respond to ;), but any debate of this nature around the RFC appears to
have relevance.

Flame away :P

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Anthony Ferrara — view source

unread

Julien,

Implementing this to Core may be very nice, but as well very hard to do.

String escaping is a pain to implement in C. One would tell : once
it's done, it's OK, but unfortunately, that's not the case, as XSS
rules evolve throught time as the attacks evolve.

See the escape modules web servers tried to push (mod_security and its
counterpart in Nginx), its PITA to maintain if you want something that
covers a large area.
By the way : why not let the web server do this as nowadays, they seem
to manage that problem ?

As Padraic indicated, this is solving a different problem than the web
server even can. This has to be solved at the application layer (it
physically can't be sovled above it)...

As far as implementation pains, if I was to support this, I would want to
see something like the ESAPI (Enterprise Security API - by OWASP) used for
the actual implementation: http://code.google.com/p/owasp-esapi-c/

Perhaps providing a thin wrapper around it, but I wouldn't go much further
than that. And I don't think I'd support our own implementation (not using
an established C library)...

Anthony

12 years ago by Stas Malyshev — view source

unread

Hi!

The point of the RFC is to ensure a consistent API for escaping is
available to all PHP programmers without resorting to userland

I do not see why "without resorting to userland" is a worthy goal in
every case. It's like saying "I want to code in Python without ever
using import" or "I want to code in Perl without ever using CPAN". Makes
no sense, right? Why we should insist on this in PHP?

solutions. Existing functions are widely misused, misconfigured or
have builtin security issues yet are popularly advanced as "escaping"
for XSS.

Do you think your functions won't be misused, misconfigured and never
would have bugs? Exactly the same would happen. Having yet another API
doing the same as old API is not a solution.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Stas,

Hi!

The point of the RFC is to ensure a consistent API for escaping is
available to all PHP programmers without resorting to userland

I do not see why "without resorting to userland" is a worthy goal in
every case. It's like saying "I want to code in Python without ever
using import" or "I want to code in Perl without ever using CPAN". Makes
no sense, right? Why we should insist on this in PHP?

Programmers haven't figured out how to use the 1-2 covering functions
that already exist and you expect them to do it in userland code?
Maybe we should ditch json_encode() tomorrow. I can do it in userland
code too. PHP does a LOT of things possible in userland code. The
argument I made in the RFC boils down to simply giving programmers a
helping hand. They are writing insecure code because PHP isn't
fulfilling that need for one of the most serious security risks in PHP
today. Surely that warrants action to serve programmers?

solutions. Existing functions are widely misused, misconfigured or
have builtin security issues yet are popularly advanced as "escaping"
for XSS.

Do you think your functions won't be misused, misconfigured and never
would have bugs? Exactly the same would happen. Having yet another API
doing the same as old API is not a solution.

They have one configuration value. All other behaviour is fixed. How
is this remotely similar to the "old API"? Misuse can be constrained
to calling the wrong function and setting the wrong character
encoding. That's 2 versus the list of flaws in htmlspecialchars() I
blogged about (the link is in the RFC) and whatever might
theoretically exist if PHP actually had Javascript and CSS options.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Stas Malyshev — view source

unread

Hi!

Programmers haven't figured out how to use the 1-2 covering functions
that already exist and you expect them to do it in userland code?

I expect them to use libraries. I don't think anything that is written
in PHP means it's wrong and has to be rewritten in C.

Maybe we should ditch json_encode() tomorrow. I can do it in userland
code too. PHP does a LOT of things possible in userland code. The
argument I made in the RFC boils down to simply giving programmers a
helping hand. They are writing insecure code because PHP isn't
fulfilling that need for one of the most serious security risks in PHP
today. Surely that warrants action to serve programmers?

We already have basic functions that do that, and we have extension that
does that. If you need more, I'm not sure you should do it in C. If you
do just the same under a different name, I don't think it should be done
at all.

encoding. That's 2 versus the list of flaws in htmlspecialchars() I
blogged about (the link is in the RFC) and whatever might
theoretically exist if PHP actually had Javascript and CSS options.

I think the approach of creating third data filtering API (plain
functions, filter, and now this) in PHP core is wrong. I do not see why
the same functions can not be (in case of CSS) or already are not (in
case of most others) implemented in existing functionality. If the whole
question is that people don't know which one to use in which context,
creating an entirely new core API does not sound like a good solution to
me.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

12 years ago by padraic.brady@gmail.com — view source

unread

Hi!

Programmers haven't figured out how to use the 1-2 covering functions
that already exist and you expect them to do it in userland code?

I expect them to use libraries. I don't think anything that is written
in PHP means it's wrong and has to be rewritten in C.

The libraries and the frameworks are wrong. I'm serious too. Try
finding just one that does Javascript escaping properly (if at all).

Maybe we should ditch json_encode() tomorrow. I can do it in userland
code too. PHP does a LOT of things possible in userland code. The
argument I made in the RFC boils down to simply giving programmers a
helping hand. They are writing insecure code because PHP isn't
fulfilling that need for one of the most serious security risks in PHP
today. Surely that warrants action to serve programmers?

We already have basic functions that do that, and we have extension that
does that. If you need more, I'm not sure you should do it in C. If you
do just the same under a different name, I don't think it should be done
at all.

Links? Last I checked, no, we don't already have functions for all of
this. And those that do exist have insecure behaviour - I did link to
the relevant article on htmlspecialchars() which details that with
examples you can download from Github.

encoding. That's 2 versus the list of flaws in htmlspecialchars() I
blogged about (the link is in the RFC) and whatever might
theoretically exist if PHP actually had Javascript and CSS options.

I think the approach of creating third data filtering API (plain
functions, filter, and now this) in PHP core is wrong. I do not see why
the same functions can not be (in case of CSS) or already are not (in
case of most others) implemented in existing functionality. If the whole
question is that people don't know which one to use in which context,
creating an entirely new core API does not sound like a good solution to
me.

The fact is that people neither know how to implement these safely AND
do not know when and where to use them in their correct combinations.
Security related code needs a lot of peer review. It's beyond the
scope of the average programmer, barely within the scope of large
frameworks but should be perfectly within PHP's scope to manage and
have some level of confidence in its security while doing us all a
favour by addressing a significant security risk in PHP applications.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Paul,

The thing is that filter_var() is strongly associated with input
sanitisation whereas Escaper addresses the other end of output. Also,
escaping is inextricably linked to character encoding - we can't run
into situations where the functions are specific to something like
UTF-8 when the character encodings used in real life are far more
diverse. Additionally, the RFC was an attempt to make escaping as
explicit and restrictive as possible - give a user too many options,
or too many dispersed units of functionality, and they'll invariably
confuse and misinterpret themselves to Hell ;).

Note: There is a stack of folk, for example, who use the ext/filter
URL validator for HTTP validation - it also passes php:// and
javascript:// URLs. If we're not explicit, they won't ever notice when
they're doing it wrong.

Paddy

Hi Paddy,

Couldn't this just be a new option for the filter_var() function?

$clean = filter_var($_POST['someVar'], XSS_CLEAN);

I see from your RFC that you have a bunch of functions, I believe all
these could be options to filter_var, ie.: FILTER_ESCAPE_[URL, JS,
CSS, HTMLATTR].

Paul.

Paul.

Hi all,

I've written an RFC for PHP over at: https://wiki.php.net/rfc/escaper.
The RFC is a proposal to implement a standardised means of escaping
data which is being output into XML/HTML.

Cross-Site Scripting remains one of the most common vulnerabilities in
web applications and there is a continued lack of understanding
surrounding how to properly escape data. To try and offset this, I've
written articles, attempted to raise awareness and wrote the
Zend\Escaper class for Zend Framework. Symfony 2's Twig has since
adopted similar measures in line with its own focus on security.

That's all. The RFC should be self-explanatory and feel free to pepper
me with questions. As the RFC notes, I'm obviously not a C programmer
so I'm reliant on finding a volunteer who's willing to take this one
under their wing (or into their basement - whichever works).

https://wiki.php.net/rfc/escaper

Best regards,
Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Andrew Faulds — view source

unread

First issue: I've not studied the referenced PHP implementations, but
in cases where multiple contexts seem to apply it's not clear from the
RFC which function(s) should be used, and if multiple, how their calls
would be composed. Examples:

HTML style attribute: escapeHtmlAttr, escapeCss, or both?
HTML on* attributes: escapeHtmlAttr, escapeJs, or both?
HTML href/src attributes: escapeHtmlAttrs, escapeUrl, or both?
HTML script/style elements: Is escapeHtml needed?

I can probably correctly guess some of these, but I think ideally the
method and class names should make this more obvious. If escapeJs is
only for string literals in JS code (again, the name doesn't make that
clear to me), what does escapeCss actually do, since string literals
aren't very common in CSS?
I echo this. I think some clearer names might help, I think something
like these:

escapeHTMLAttribute for attributes, escapeHTMLText for text inside
<element> tags, escapeXMLAttribute and escapeXMLContent,
escapeJSStringLiteral, escapeCSSIdentifier, and another needs adding
(for url('*') things), escapeCSSStringLiteral.

--
Andrew Faulds
http://ajf.me/

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Steve,

Missed this one in the rush of emails...

I echo this. I think some clearer names might help, I think something like
these:

escapeHTMLAttribute for attributes, escapeHTMLText for text inside <element>
tags, escapeXMLAttribute and escapeXMLContent, escapeJSStringLiteral,
escapeCSSIdentifier, and another needs adding (for url('*') things),
escapeCSSStringLiteral.

The ESAPI API uses encodeForHTML, encodeForCss, etc. We can name these
in a few different styles which would all be semantically correct but
my own sentiment is often to keep the naming simple.

For example, I'd prefer escapeForCss vs escapeCSSStringLiteral though
both would be valid English literal alternatives to escapeCss.

It's also worth bearing in mind that these escaping functions are
distinct and separate from the concept of sanitisation or a sanitising
filter. For each escaping option there is a sanitisation alternative
where untrusted input (whether from a user, database or 3rd party
service) contains markup you want to allow through "unescaped". For
example, a feed aggregator would need to output HTML from a 3rd party
feed and it may contain URLs that also need to be validated.

HTML, of course, has HTMLPurifier - easily the best HTML sanitiser.
URLs must always be validated to a known good whitelist (not
filter_var() only).

CSS can also be sanitised if the user has access to properties and not
just the property values.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Steve,

I can add some examples but it's not clear cut all the time. For
example, contexts can nest three levels deep in some cases even ;).
HTML inside Javascript inside HTML. Then there's the boogie man of
other forms of DOM-based XSS...

So the nesting is quite simple to compose but the rules governing it
are wholly separate from this RFC and depend on the HTML output being
written. The RFC just addresses the building blocks themselves.

$e = new Escaper; // default to UTF-8
$e->escapeHtml($e-escapeJs($e->escapeHtml('<p>'))); // e.g. for
including <p> into markup via a Javascript string defined in a HTML
attribute interpreted as PCDATA.

Oh, and that does happen. It's far from recommended these days - we
should all start applying the new Content-Security Policy standard.

Paddy

That's all. The RFC should be self-explanatory and feel free to pepper

...

https://wiki.php.net/rfc/escaper

I like where this is going and agree that PHP officially embracing an API
would be helpful even for users stuck on old PHP versions.

First issue: I've not studied the referenced PHP implementations, but in
cases where multiple contexts seem to apply it's not clear from the RFC
which function(s) should be used, and if multiple, how their calls would be
composed. Examples:

HTML style attribute: escapeHtmlAttr, escapeCss, or both?
HTML on* attributes: escapeHtmlAttr, escapeJs, or both?
HTML href/src attributes: escapeHtmlAttrs, escapeUrl, or both?
HTML script/style elements: Is escapeHtml needed?

I can probably correctly guess some of these, but I think ideally the method
and class names should make this more obvious. If escapeJs is only for
string literals in JS code (again, the name doesn't make that clear to me),
what does escapeCss actually do, since string literals aren't very common in
CSS?

Example code would be helpful to clarify both issues, but I still think
naming is very important here, and with all the contexts we have to consider
the names in the RFC don't scream what to use them for.

Steve

http://www.mrclay.org/

--

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Lester Caine — view source

unread

Leigh wrote:

so perhaps we need to use a different word, for

the process, but the same 'process' applies to all of these 'conversions'.
An input data format is converted to an output data format?
How about an encoder, or an escaper.

Actually - just coder
It's encoding, recoding or decoding ... 'escaper' is just a special case
recoding ...

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Ángel,

The methods all refer to literal strings, values or digits. We can't
reasonably escape data while allowing valid markup for the current
context since that's a contradiction by its very nature. If you needed
to let user values drive CSS names, Javascript functions or variable
naming, or HTML markup, you need something completely different. For
example, HTML markup can be sanitised against a whitelist using
HTMLPurifier.

I'm fine with the concept, but I'm not sold on the interface.
It should be really clear when each of them should be used.

escapeHtml()
Ok, this is going to be used to show content inside a html document.

escapeHtmlAttr()
Use when using unquoted html attributes, otherwise use html escaping.
When was the last time I saw an unquotted attribute with user-provided content?

Hopefully never since that's the ideal ;). However, HTML5 allows
unquoted attributes which is perfectly valid. We don't make the user's
choice on this but we could provide the relevant tool for escaping if
they are completely and irredeemably insane :P.

I think it should be replaced by a quoteHtmlAttr() function which properly
escapes the content and adds the quotes for you (or it might skip them
if it determines it's not needed in this case).

The RFC focuses on escaping - not sanitising or reformatting.

escapeJs()
Escape javascript... but inside <script> tags, I guess? So it's not to
be used
for dynamically generated javascript. Not so clear.

Javascript literal strings (as defined by the standard).

escapeCss()
I'm not even sure in which cases would this be needed. Standalone CSS,
inside
a <style> tag, as style="" attribute?

CSS values like a font size or background color. If user data is
allowed to alter names or any other CSS markup, you would need
sanitisation rather than escaping.

escapeUrl()
"It is included primarily for consistency". When do I need to use
escapeUrl and
when escapeHtml? What if it's an url inside a css tag inside a html
document?

Basically any URL inside any attribute. It encodes part of a URL - the
overall URL would still need to be validated separately.

It makes things more confusing, so I'd remove it.

Needs to be included to maintain consistency in having a full set of
go-to escapers.

It should be clear what you are passing to that function and in which
context
it expects you to leave the output.

It might not be obvious but these are very straightforward to link to
specific contexts. Here's the clearest explanation of where all of
this fits into templating:
https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet

I should probably add that as a link to the RFC (Anthony will finally
get an ESAPI reference out of me ;)).

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by padraic.brady@gmail.com — view source

unread

Bear in mind the RFC, in userland (and likely any PECL ext) implements
the ESAPI rules. They've been hacked on a lot over the years which is
why I made sure they were followed exactly. It's very unlikely that a
browser bug could scupper these unless they allowed in more unencoded
characters to be taken advantage of. There are benefits to reusing
pre-peer review rules.

Paddy

Hi Rasmus,

If we want to add more filters for more specific purposes, I am not
completely against it, although the more specific they get the more
churn there will be. We are not going to be able to kick out weekly
releases to address every new nuance of these very specific filters. But
they should be implemented as filters compatible with the filter
extension so people can use them within that existing context. That
doesn't preclude a more approachable function alias from also calling
them, of course, much like the htmlspecialchars case.

I feel it needs to be reiterated that the escaper rules are very
predictable and very seldom change as the regular expressions in the
Zend\Escaper class demonstrate. Each is bound to official standards
for Javascript, CSS and HTML respectively and most of the rules,
defined using the OWASP's recommendations as implemented in ESAPI, are
really clearcut - escape everything except alphanumerics and a small
range of "safe" characters (CSS even has NO safe chars outside
alphanumerics). HTML and URL encoding are the only permissive variants
and these are already well known in PHP.

Except the browsers all have different quirks. At the very least during
the first year of its life this code it going to change a lot as the
security community whacks away at it. This should start as a pecl
extension so it can iterate rapidly and be available to PHP 5.3/5.4 users.

-Rasmus

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by Paul Dragoonis — view source

unread

Bear in mind the RFC, in userland (and likely any PECL ext) implements
the ESAPI rules. They've been hacked on a lot over the years which is
why I made sure they were followed exactly. It's very unlikely that a
browser bug could scupper these unless they allowed in more unencoded
characters to be taken advantage of. There are benefits to reusing
pre-peer review rules.

Sure, but you have potential for buffer overflows, regex
backtrack/recursion issues and general programming errors when this
moves to C. I guarantee there will be dozens of bugs in the first
version no matter who writes it.

Hi Rasmus,

The existing implementations at symfony\zend are working pretty well.
They're using string manipulation and regex functions.
If we port it to C, can't we still use the exact same functions that
the PHP_FUNCTION() macros are calling to pretty much clone it.

Would this minimise the amount of change (if any) ?

Paul

-Rasmus

12 years ago by Andrew Faulds — view source

unread

The existing implementations at symfony\zend are working pretty well.
They're using string manipulation and regex functions.
If we port it to C, can't we still use the exact same functions that
the PHP_FUNCTION() macros are calling to pretty much clone it.

Would this minimise the amount of change (if any) ?
To some extent, yes, but Rasmus is right in that there would be certain
bugs. In C, we don't have implicit string casting, for example.

--
Andrew Faulds
http://ajf.me/

12 years ago by Paul Dragoonis — view source

unread

@All,

I'd like to provide a real use case since i feel people have went off
on a tangent of their own. i.e: a list of blog posts.

<?php foreach($posts as $post): ?>

<p><a href="/blog/view/<?=$post->getID();?>" title="<?=$escaper->escapeHtmlAttr($post->getTitle());?>"> <?=$escaper->escapeHtml($post->getTitle());?> </a></p> <?php endforeach; ?>

Please see the different needs for escaping generalised html output,
and the same but within an attribute.
This is an important problem that we need to try and solve, the
htmlspecialchars() stuff isn't good enough else we wouldn't need
custom preg_match() solution like in the proposed RFC.

I'm happy for this to be a SPL class or a function such as
escape_var() with options on it (similar to how filter_var() works
right now). Adding additional extensions in todays PHP eco-system is
actually not going to help us at all since only like 2% of people are
ever going to install it. It has to be in ./ext/standard/ or
./ext/spl/.

@Rasmus/Stas
Are you happy with us adding a new class or function to ./ext/spl/ or
./ext/standard/. This isn't one of these shiny "must have" features,
it's actually addressing a very important problem.

For PHP developers to benefit from the escaping functions provided by
zend/symfony they have to actually be using those frameworks and
that's really a small portion of PHP code out there in the wild. If we
can introduce the new escape_var() function or a new OO class (as per
the RFC) then it's going to be readily available in the future.

Many thanks,
Paul Dragoonis.

Hi Ángel,

The methods all refer to literal strings, values or digits. We can't
reasonably escape data while allowing valid markup for the current
context since that's a contradiction by its very nature. If you needed
to let user values drive CSS names, Javascript functions or variable
naming, or HTML markup, you need something completely different. For
example, HTML markup can be sanitised against a whitelist using
HTMLPurifier.

I'm fine with the concept, but I'm not sold on the interface.
It should be really clear when each of them should be used.

escapeHtml()
Ok, this is going to be used to show content inside a html document.

escapeHtmlAttr()
Use when using unquoted html attributes, otherwise use html escaping.
When was the last time I saw an unquotted attribute with user-provided content?
Hopefully never since that's the ideal ;). However, HTML5 allows
unquoted attributes which is perfectly valid. We don't make the user's
choice on this but we could provide the relevant tool for escaping if
they are completely and irredeemably insane :P.
Someone may be insane enough to try to destroy his planet, but "some insane
soul might want it" is no reason to build such weapon. :)

As it's a crazy thing to do, we shouldn't provide means to do it. If
your parameter
is not a hardcoded number, just quote it and use escapeX function on its
content.

I think it should be replaced by a quoteHtmlAttr() function which properly
escapes the content and adds the quotes for you (or it might skip them
if it determines it's not needed in this case).
The RFC focuses on escaping - not sanitising or reformatting.
As an api client I just want to pass a parameter to the attribute.

Doing
echo '<b style="' . escaper->escapeHtml("font-weight: normal") . '">';
or
echo '<b style=' . escaper->quoteHtmlAttrib("font-weight: normal") . '>';

is equivalent, just a distinction on the function contract. But in the
second case the function avoids the ambiguity on whether the attribute
used double quotes, single ones or no quote at all, since it can choose
the one it "prefers".

The goal is to make easy to write secure code. I think the second way
does it better. If we need to change the name of the rfc, so be it.

escapeJs()
Escape javascript... but inside <script> tags, I guess? So it's not to
be used
for dynamically generated javascript. Not so clear.
Javascript literal strings (as defined by the standard).

Ok. We have the ' or " problem again, though.

escapeCss()
I'm not even sure in which cases would this be needed. Standalone CSS,
inside
a <style> tag, as style="" attribute?
CSS values like a font size or background color. If user data is
allowed to alter names or any other CSS markup, you would need
sanitisation rather than escaping.
I was thinking in things like dynamic class names (I had no idea why you
could
want it, though :). It may be better named escapeCssValue()

escapeUrl()
"It is included primarily for consistency". When do I need to use
escapeUrl and
when escapeHtml? What if it's an url inside a css tag inside a html
document?
Basically any URL inside any attribute. It encodes part of a URL - the
overall URL would still need to be validated separately.

If it encodes part of a url, it's not for any url.

By "any URL inside any attribute", I'd expect an usage like:

echo '<a href="' . escaper->escapeHtml( escaper->escapeUrl(
"https://wiki.php.net/rfc/escaper"; ) ) . '">See the rfc</a>';

Of course, with the rawurlencode semantics, that
https%3A%2F%2Fwiki.php.net%2Frfc%2Fescaper would be a relative url :)
(passing a full url could be interesting for urlencoding non-ansi
characters on the url, although most modern browsers deal
fine with the raw bytes)

It makes things more confusing, so I'd remove it.
Needs to be included to maintain consistency in having a full set of
go-to escapers.
It could need renaming.

It should be clear what you are passing to that function and in which
context
it expects you to leave the output.
It might not be obvious but these are very straightforward to link to
specific contexts. Here's the clearest explanation of where all of
this fits into templating:
https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet

I should probably add that as a link to the RFC (Anthony will finally
get an ESAPI reference out of me ;)).

Paddy
That's a document worth reading by everyone, but I still think the
functions of the methods
should be clearer from their names.

Regards

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Steve,

Who's he? :)

Everybody knows Steve! :P I know at least 10!

The ESAPI API uses encodeForHTML, encodeForCss, etc. We can name these
in a few different styles which would all be semantically correct but
my own sentiment is often to keep the naming simple.

The naming is unclear just now though.

Than I suggest including "For" in all of them. escapeForHtml,
escapeForUrl, etc. That should clear it up somewhat that we're not
targeting whole blocks of HTML/JS/CSS.

For example, I'd prefer escapeForCss vs escapeCSSStringLiteral though
both would be valid English literal alternatives to escapeCss.

You can't just have escapeForCSS, you need two functions: one for CSS
identifier names (.identifier, #identifier, etc.), and one for CSS strings
(background-image: url('string'); or content: 'string')

Not really, the target here is breaking out of a CSS or HTML context.
If you allow users to alter identifiers or properties than escaping is
just wrong - you should be sanitising instead to make sure the CSS is
still well formed and agrees to a whitelist of allowed ids/props.

Also, escapeForJS isn't very clear, you should explicitly specify you're
escaping a string of text for a JavaScript string literal. I don't think you
can escape JS identifier names.

JS is purely for literal values and not any JS variables/statements or
anything else. Those can never ever be subject to any form of
untrusted input.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Andrew,

Than I suggest including "For" in all of them. escapeForHtml,
escapeForUrl, etc. That should clear it up somewhat that we're not
targeting whole blocks of HTML/JS/CSS.

That still isn't clear enough, I think. escapeHTMLAttributeValue and
escapeHTMLText. It needs to be clear what HTML context you're dealing with.

I think we're running into being overly prescriptive. Escaping can
never, by definition, apply to anything that isn't a value string or
integer, i.e. anything that is capable of altering the meaning of the
HTML/Javascript or CSS into which its inserted. The original function
names applied to specific escaping "strategies" rather than the actual
locations that that strategy was useful for. There is only one HTML
escaping strategy, one Javascript escaping strategy, etc.

For example, I'd prefer escapeForCss vs escapeCSSStringLiteral though
both would be valid English literal alternatives to escapeCss.

You can't just have escapeForCSS, you need two functions: one for CSS
identifier names (.identifier, #identifier, etc.), and one for CSS
strings
(background-image: url('string'); or content: 'string')

Not really, the target here is breaking out of a CSS or HTML context.
If you allow users to alter identifiers or properties than escaping is
just wrong - you should be sanitising instead to make sure the CSS is
still well formed and agrees to a whitelist of allowed ids/props.

If property values or identifiers can't be escaped, what can? What do you
mean?

Are you meaning in style="" or <style></style>? In which case, why have it?
You can just use a bog-standard HTML escaping function.

In the above we escape the value so that an attacker cannot "breakout"
into the CSS propery setting context to create new styles or
identifier blocks. So it applies only to the values of properties and
nothing else. If anything else is allowed to be subject to user input

then escaping becomes moot because you now need to perform CSS
sanitisation to a whitelist to prevent the injection of phishing or
clickjacking attacks.

The documentation (if not the RFC) should be the place to emphasise
when and where to use escaping functions.

Also, escapeForJS isn't very clear, you should explicitly specify you're
escaping a string of text for a JavaScript string literal. I don't think
you
can escape JS identifier names.

JS is purely for literal values and not any JS variables/statements or
anything else. Those can never ever be subject to any form of
untrusted input.

It needs to be clear it's a string literal though, and a literal at that.
Otherwise it's a little unclear. Still, I'm more worried about the CSS.

Again, I think being overly prescriptive is unnecessary. It has
semantic value, of course, but the game is already lost if the user
isn't aware of where to safely apply escaping (a different problem to
applying the correct encoding over the wrong encoding). I think we're
bumping into a slightly different area of education here. Once users
know where escaping applies, the names even in their shorter forms are
fairly obvious as to which context they apply to. I think that
specific education is better served with good quality documentation
and examples (I'm all for docs with a dose of reality).

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

12 years ago by padraic.brady@gmail.com — view source

unread

Hi Steve,

The CSS escaping strategy would escape all non-alphanumerics to CSS
hex sequences ;). As a result, HTML escaping is not strictly
necessary.

Paddy

<style> body { background-color: <? echo $e->escapeCss('white'); ?> } </style>

Hmmm, the following is a valid value:

"</style><script>alert('xss')"

...for both the content and font-family CSS properties. Gotta love HTML!

What would escapeCss do with them? Do we need to wrap in escapeHtml?

Steve

http://www.mrclay.org/

--

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team

RFC: Implementing a core anti-XSS escaping class

So far I am not convinced we should really do it. But if somebody creates PECL extension and it proves popular, it may be merged into core once it does.

So far I am not convinced we should really do it. But if somebody creates PECL extension and it proves popular, it may be merged into core once it does.

But in general, look at implementation of filters anywhere - like Apache filters or IIS filters - nowhere it is said that filter can only remove data.

-- Lester Caine - G8HFL

Steve

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

--

So far I am not convinced we should really do it. But if somebody creates PECL extension and it proves popular, it may be merged into core once it does.

Cheers,

Steve

-- Lester Caine - G8HFL

Steve

So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.

So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.

But in general, look at implementation of filters anywhere - like Apache
filters or IIS filters - nowhere it is said that filter can only remove
data.

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

So far I am not convinced we should really do it. But if somebody
creates PECL extension and it proves popular, it may be merged into core
once it does.

--
Lester Caine - G8HFL