Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63105 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 27049 invoked from network); 18 Sep 2012 19:12:27 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Sep 2012 19:12:27 -0000 Authentication-Results: pb1.pair.com header.from=ircmaxell@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=ircmaxell@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.42 as permitted sender) X-PHP-List-Original-Sender: ircmaxell@gmail.com X-Host-Fingerprint: 209.85.215.42 mail-lpp01m010-f42.google.com Received: from [209.85.215.42] ([209.85.215.42:51508] helo=mail-lpp01m010-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 51/EF-07072-997C8505 for ; Tue, 18 Sep 2012 15:12:26 -0400 Received: by lahl5 with SMTP id l5so146661lah.29 for ; Tue, 18 Sep 2012 12:12:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=jX0fJKtZkjX4vj2U3J1/kQaP9R79q2m9UlSa++TBLiQ=; b=y8w9AVoNyPK0i1ID4UjwWs+9xPtFt3lDCoMCsF3BtbWFvKLXSSthkLP4p8053jHkn2 sWvdTOZpzlPY5bsHyxNqiQl2FPcft2G6WsaG1OC4fKrRmafDymq813L132khG2WWrcKk T5tOPRHz0sXf5knKymbnVajmlVpzt14iWrOFHtcblpra7TDxa0K2kRnUONwKnjBk5u2p glAAyeJ1qnNHbk+YyNMSDToEM8sxv7Dl8EwCj1D4u5cxhFxtWQ9AwTEbZ7uFaSoGwUjN B6DqJt4EI3b24G8CyYbEJORLB5pQT6h2zCso54yznM5NkedDyJgB86q+vqxYI8d+qpGX ju9A== MIME-Version: 1.0 Received: by 10.152.144.168 with SMTP id sn8mr721250lab.1.1347995541973; Tue, 18 Sep 2012 12:12:21 -0700 (PDT) Received: by 10.114.22.1 with HTTP; Tue, 18 Sep 2012 12:12:21 -0700 (PDT) In-Reply-To: <011901cd95ce$bfea0900$3fbe1b00$@org> References: <011201cd95c7$33d43c30$9b7cb490$@org> <011901cd95ce$bfea0900$3fbe1b00$@org> Date: Tue, 18 Sep 2012 15:12:21 -0400 Message-ID: To: "Bryan C. Geraghty" Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary=e89a8f2346877028ab04c9feabf7 Subject: Re: [PHP-DEV] RFC: Implementing a core anti-XSS escaping class From: ircmaxell@gmail.com (Anthony Ferrara) --e89a8f2346877028ab04c9feabf7 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bryan, On Tue, Sep 18, 2012 at 2:52 PM, Bryan C. Geraghty wr= ote: > Antony,**** > > ** ** > > I=92ll concede that the term =93escaping=94 is improperly used in many pl= aces; > even in the OWASP documentation.**** > > ** ** > > But I=92ll point out that the CWE document is identifying a distinction i= n > the two terms by saying, =93This overlapping usage extends to the Web, > such as the "escape" JavaScript function whose purpose is stated to be > encoding=94. > There is a distinction between them. But in this case it's not particularly relevant (as both work quite fine). I'll elaborate further in a second. > But when you say, =93With the end result being the exact same...=94, I do= n=92t > think you=92ve thought it through. I=92ve read some of your stuff and I= =92m > pretty confident that you understand the benefits of white-listing over > black-listing. For the uninitiated, yes, a black-list can be configured t= o > produce the same results at a given point-in-time, but the fundamental > approach is different. A white-list operates on an explicit specificatio= n > and lets nothing else through. A black-list assumes that the input data i= s > mostly correct and it filters out the bad. To add to that, how do you > convert from ISO-8859-1 to UTF-8 with a black-list or by escaping? > You hit the nail on the head here. You cannot black-list convert ISO-8859-1 to UTF-8. However, when we talk about escaping, we're talking about a context where the encoding is already correct, we're just preventing special characters from imparting special meaning. In that case, escaping is the correct way of handling it. But if you wanted to output arbitrary input into a UTF-8 document, you would also need to ensure that it's encoded properly into UTF-8. So I can see your distinction applying to that case. But from a different angle. Escaping preserves the security context. Encoding preserves the semantic context. You could escape away all invalid UTF-8 bytes, but you'd loose the meaning of the original character set. So semantically, encoding is necessary. But from a security perspective, the encoding doesn't really matter much. What matters is the security context (not injecting harmful code, etc). Now, both can be handled by the same routine. But that's not necessary to preserve the security aspect. And that's why I objected to using the term "encoding" here. If we want to go that route, that's fine. But you don't need to encode for security. Escaping will handle that (possibly at the expense of invalid semantic meaning). > Your reference to mysql_real_escape_string is exactly the point I=92m try= ing > to make. The use of that function is =93discouraged=94 because it DID esc= ape; > it looked for specific bad characters. It was fundamentally flawed. And > that is the functionality PHP developers, as you just demonstrated, will > refer to. The current recommendation is to use a library that properly > encodes the entire data stream. > How is mres fundamentally flawed? And how is it discouraged? It's actually listed as a valid defense by OWASP: https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet#Defens= e_Option_3:_Escaping_All_User_Supplied_Input The only 2 ways of securely getting data to MySQL is either by escaping, or binding as a parameter on a prepared statement. Neither of which encodes a data stream (the PS uses a binary format that puts the data in plain binary form, as is, with a header to identify length). Black listing works fine for a specified format (like XML, like HTML, like SQL, like JavaScript). Where you get in trouble with black lists is when your data format isn't specified (hence edge-cases aren't well known) or when you're not serializing to a format (generic input black lists). But for escaping output, black lists are a very well known, well understood, and easily implemented approach. > I=92ll also agree that consistency with the industry is not as important > because there seem to be plenty of misuses. However, I do think that we > should use terminology that sets the functionality apart. So, given the > operating mode difference and the precedent set by mysql_escape_string, > mysql_real_escape_string, etc., I think =93encode=94 is the way to go. > I think it strongly depends upon the exact behavior of the library. If we do wind up doing transcoding as well as escaping, then that may be valid. If we don't, then it wouldn't. But I think we can both agree on the need... Anthony --e89a8f2346877028ab04c9feabf7--