Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:63102
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: error (pb1.pair.com: domain ravensight.org from 209.85.219.42 cause and error)
To: "'Anthony Ferrara'" <ircmaxell@gmail.com>
Cc: <internals@lists.php.net>
References: <CALwr1Gn4OmHP=Qa7NKzpbkq+w8FEjUMZ3v678bbZVOGM23G35g@mail.gmail.com> <alpine.DEB.2.02.1209180809010.28929@whisky.home.derickrethans.nl> <CALwr1GnjwyE=yipK6UNkOm9m31LDDbpbcdgmr_nf0cUo+zOedw@mail.gmail.com> <CALwr1G=DOKmjudFFEAFY+6hmUh3q+qNe1ntWY1FLXP5GChLZdA@mail.gmail.com> <CAMUwpuSfd1U6aA_7o6d2jGm-i0PsLAf1MqKF8zvtcfzGFMGEXQ@mail.gmail.com> <CALwr1GkLMrenFjv49kE1a-i4vN3sVr2Rr1wRUgBWGg3=V34How@mail.gmail.com> <011201cd95c7$33d43c30$9b7cb490$@org> <CAAyV7nEqVq9QUeamrBr0VskscyMdfm8AzRSUPbTE8TV0b_rzYw@mail.gmail.com>
In-Reply-To: <CAAyV7nEqVq9QUeamrBr0VskscyMdfm8AzRSUPbTE8TV0b_rzYw@mail.gmail.com>
Date: Tue, 18 Sep 2012 13:52:26 -0500
Message-ID: <011901cd95ce$bfea0900$3fbe1b00$@org>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_011A_01CD95A4.D7140100"
thread-index: Ac2VyLqV3W6P5j88Sleu7rb7WeSTdQAANfHw
Content-Language: en-us
Subject: RE: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class
From: bryan@ravensight.org ("Bryan C. Geraghty")

------=_NextPart_000_011A_01CD95A4.D7140100
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

Antony,

 

I'll concede that the term "escaping" is improperly used in many places;
even in the OWASP documentation.

 

But I'll point out that the CWE document is identifying a distinction in the
two terms by saying,  "This overlapping usage extends to the Web, such as
the "escape" JavaScript function whose purpose is stated to be encoding".

 

But when you say, "With the end result being the exact same...", I don't
think you've thought it through. I've read some of your stuff and I'm pretty
confident that you understand the benefits of white-listing over
black-listing. For the uninitiated, yes, a black-list can be configured to
produce the same results at a given point-in-time, but the fundamental
approach is different.  A white-list operates on an explicit specification
and lets nothing else through. A black-list assumes that the input data is
mostly correct and it filters out the bad. To add to that, how do you
convert from ISO-8859-1 to UTF-8 with a black-list or by escaping?

 

Your reference to mysql_real_escape_string is exactly the point I'm trying
to make. The use of that function is "discouraged" because it DID escape; it
looked for specific bad characters. It was fundamentally flawed. And that is
the functionality PHP developers, as you just demonstrated, will refer to.
The current recommendation is to use a library that properly encodes the
entire data stream.

 

I'll also agree that consistency with the industry is not as important
because there seem to be plenty of misuses. However, I do think that we
should use terminology that sets the functionality apart. So, given the
operating mode difference and the precedent set by mysql_escape_string,
mysql_real_escape_string, etc., I think "encode" is the way to go.

 

Thanks,

Bryan

 

From: Anthony Ferrara [mailto:ircmaxell@gmail.com] 
Sent: Tuesday, September 18, 2012 1:09 PM
To: Bryan C. Geraghty
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class

 

Bryan et al,

On Tue, Sep 18, 2012 at 1:58 PM, Bryan C. Geraghty <bryan@ravensight.org>
wrote:

Hello everyone,

Paddy is correct here. The purpose of this API is output ENCODING which is a
very good thing. This discussion provides a very good case for a point I
made via Twitter this morning: In this RFC, all uses of the term "escape"
should be replaced by the term "encode".

This is not solely a problem with this RFC. The term "escape" is being used
developers in the industry when they mean "encoding". This is bad thing
because, from a security perspective, escaping is exactly the opposite of
encoding.

 

It's a very common thing: http://cwe.mitre.org/data/definitions/116.html

 

> 


The usage of the "encoding" and "escaping" terms varies widely. For example,
in some programming languages, the terms are used interchangeably, while
other languages provide APIs that use both terms for different tasks. This
overlapping usage extends to the Web, such as the "escape" JavaScript
function whose purpose is stated to be encoding. Of course, the concepts of
encoding and escaping predate the Web by decades. Given such a context, it
is difficult for CWE to adopt a consistent vocabulary that will not be
misinterpreted by some constituency.

> 

 

I think that picking one, and sticking with it is fine. No matter which is
chosen... 

 

- Escaping is done by setting up a black-list and replacing those elements
with an approved variant.
- Encoding is done by converting all of the input data into the target
format. Some bytes may end up being exactly the same but they are all
processed.

 

With the end result being the exact same...

 

I understand why people on this list are associating the functionality
defined in this RFC with filtering because the name is leading them astray.

Besides the fundamental difference in the definitions of each item, the
security industry is using the term "encoding"; take a look at the OWASP
documentation for a quick example.

 

The OWASP documentation uses them interchangeably. However, specifically for
this task, the ESAPI is defined as a:
https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_
Sheet

 

> 

The OWASP  <https://www.owasp.org/index.php/ESAPI> ESAPI project has created
an escaping library in a variety of languages including Java, PHP, Classic
ASP, Cold Fusion, Python, and Haskell.

> 

 

If we want developers with little application security background to be able
to understand these things, we need to be consistent.

 

In this case, I'm not sure consistency with the industry is as important
(mainly because the industry is itself inconsistent). The important thing is
to pick one and stick to it. I would suggest "escape" mainly because people
in PHP are already familiar with it (via mysql_real_escape_string, etc)...

 

Anthony


------=_NextPart_000_011A_01CD95A4.D7140100--