Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:88637 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 25704 invoked from network); 2 Oct 2015 09:42:09 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 2 Oct 2015 09:42:09 -0000 Authentication-Results: pb1.pair.com smtp.mail=craig@craigfrancis.co.uk; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=craig@craigfrancis.co.uk; sender-id=pass Received-SPF: pass (pb1.pair.com: domain craigfrancis.co.uk designates 209.85.212.177 as permitted sender) X-PHP-List-Original-Sender: craig@craigfrancis.co.uk X-Host-Fingerprint: 209.85.212.177 mail-wi0-f177.google.com Received: from [209.85.212.177] ([209.85.212.177:34920] helo=mail-wi0-f177.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 6A/47-23989-0715E065 for ; Fri, 02 Oct 2015 05:42:08 -0400 Received: by wicge5 with SMTP id ge5so25101304wic.0 for ; Fri, 02 Oct 2015 02:42:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=93jwafYIHDMj9vbNjMTvIaq0gKwXJQYGnltDPn9ktsM=; b=Amy9MD+ARm5e0K9OTAdzE5rYfJ49owNn7aMgH7U2qnR1ufY68014a+R9pqQiTYwb4l ROf4m/X2dtL3hD9jZ5kFx6lmfAYMnIUP0jJSDvpplZ+b8+VaQaIJhingM/cn3dKcTUv1 nCPAKZeJut4kmZuaOneL/1uq4WrFM/b1vyu8c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=93jwafYIHDMj9vbNjMTvIaq0gKwXJQYGnltDPn9ktsM=; b=GtZbtGGe0M9yXiiAUFipln1x0/q730dvVj5fOlcHRsGljk2axtxFehpXqiLkpnPwSv e90u0vtCG8hKqUHGBzMFpc475B0fO/PtIr64+HH6o0rP0K33qw3AyD6Mf7kQYdxuRhX8 Ahtzch/3ao3IfBKBzPnJ9nB1ob3JG+gbgEfC7RFKZUkr8o7bKm2TDv5wDHHxYFaFisQR WOmM6hR9kFT1lim6VX8aBzcruSGhGm0xiLS95YhyLqmjs/1WdS9V0GLX9H31Ma9jgL/K yh131Ga6DuA+A5eZoFFrLCCEQI6F74j7q5IeK2kWVU3jb6khSltxxx/FjSXC+/QFrYid OZrw== X-Gm-Message-State: ALoCoQk174DS0aoKyRcdBpE1nynOWLiXvwUPonvGR4md4jdUB6KwPpo2e57agr/gmnIxLWg3S2RO X-Received: by 10.194.61.78 with SMTP id n14mr15063357wjr.75.1443778925761; Fri, 02 Oct 2015 02:42:05 -0700 (PDT) Received: from [192.168.1.12] (cpc79329-chap9-2-0-cust385.18-1.cable.virginm.net. [82.44.123.130]) by smtp.gmail.com with ESMTPSA id pk7sm10457207wjb.2.2015.10.02.02.42.04 (version=TLS1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 02 Oct 2015 02:42:04 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) In-Reply-To: Date: Fri, 2 Oct 2015 10:42:02 +0100 Cc: Arvids Godjuks , Christopher Owen , Kalle Sommer Nielsen , Internals , wietse@porcupine.org, Xinchen Hui Content-Transfer-Encoding: quoted-printable Message-ID: References: To: Anthony Ferrara X-Mailer: Apple Mail (2.1878.6) Subject: Re: [PHP-DEV] taint From: craig@craigfrancis.co.uk (Craig Francis) While skim reading emails (just got back from holiday), I wanted to = add... On 15 Sep 2015, at 17:23, Anthony Ferrara wrote: > All, >=20 > On Tue, Sep 15, 2015 at 11:15 AM, Arvids Godjuks > wrote: >> I fully support your effort to get this into the PHP to be part of = core >> extensions, or at least one of those that keep up with the language >> releases. >> This is a very good tool to have, and you can actually run it in = production >> to catch things that may slipped the stating (things happen). And = it's >> invaluable during during testing. >=20 > I'm against any form of tainting for a few reasons. >=20 > First, it's only really useful as a first-order heuristic. A failure > does not indicate a security vulnerability, and a non-failure does not > indicate the absence of one. An example of a false-positive would be > if (preg_match('(^[a-z0-9]+$)', $input)) echo $input;. Taint-based > systems would flag that as a possible XSS, where it's clearly not in > any context. Personally I would argue that if the `echo $input` was for printing a = variable in a HTML construct, so it MUST go though one of the HTML = encoding functions, because that regexp could be changed later. e.g. an inexperienced developer being told that usernames on a website = can now support angle brackets, so "just edit the regexp for the = username validation". This is keeping in mind that the regexp is now usually kept separate = from the view code (e.g. MVC style) > An example of a false-negative would be the following: >=20 > >Bar. If the > input is "foo onclick=3Ddosomethingbad();", an XSS is still executed. True, but without htmlspecialchars, it will ALWAYS be bad... so while a = tainting system won't be able to pick up all problems, it can at least = identify more mistakes than it not existing (percentage wise, well that = will depend on the programmer). Maybe you could think of it like having a second pair of eyes looking = over your shoulder, politely asking you if you forgot something (a bit = like how pair programming helps improve code quality). >=20 > Another example of a false negative would be: >=20 > $query =3D "SELECT * FROM foo WHERE bar =3D " . > mysqli_real_escape_string($c, $input) . ";"; >=20 > A false positive is potentially OK. A false negative is not. >=20 All of the examples have been about using pg_escape_identifier(), = because that adds the quote marks... mysqli_real_escape_string by itself = would not be enough to say that it safe for an SQL string :-) > Second, it will encourage the improper pattern of "sanitize" functions > to de-taint input. These are functions which call > `mysqli_real_escape_string(htmlspecialchars(whatever(etc($input))))`. > No matter what the context the variables are used in. This doesn't > improve security (and in many cases can demonstrably harm it). It just > encourages people to work around it. And that ignores that there's an > `untaint()` function that they can call on any variable to "silence > the errors". I don't think a sanitize() function would be implemented though, as it = would make a mess of variables being passed though more than once... = resulting in things like double html encoding (e.g. &), and = these stand out much better than the current sanitize() functions which = do things like silently stripping "bad" characters like double quotes = and less than signs. If anything, a proper tainting system would help us move away from these = horrible functions (says he having had to remove a couple before now). > Third, it ignores context. This is related to the first two, but I > think is a separate concern. An example from the taint RFC > (https://wiki.php.net/rfc/taint) is the shell-execution. If the > variable is used in the context of command, one escape function is > needed. If it's used as an argument, another is needed. Detecting > which is not something that's trivial for a language-level taint > function. And calling the wrong function defeats the purpose. So if we > can't detect the context, we can't reliably say whether it's safe or > not. Not sure I understand, if anything this gives the strings a context? Rather than every variable being a simple string (number, etc), they = start off being plain text (or a string defined as a constant in PHP), = and their escaping is checked in the context of the function that was = called (e.g. exec needing escapeshellarg). > In the hands of an experienced engineer/security researcher, taint > systems can be a really handy tool to help find possible issues. In > the hands of a junior engineer, it can actually make things worse > because it encourages them to just "silence the errors". While I originally hoped that it would be enabled by default, I think = for now it would have to be off by default, and probably only used by = "experienced engineer/security researchers". At the moment, static analysis seems to be the closest alternative I can = find (and they are pretty awful). > Considering that it's not a reliable tool, and is only a heuristic, I > would suggest that this be implemented as an extension, and not in the > core of a programming language. Maybe, but I think with some discussions, having it part of the core = language would help get all the other extensions to use it (if needed), = and help improve viability. See it like having Notices present, but off by default (which they are). The other positive is that distros such as Debian / RedHat will then = ship with it by default (disabled), rather than having to ask/hope that = they will provide a version with it (keeping in mind that they provide = security updates without needing to re-compile). > With that said, if there are engine hooks necessary to implement this > feature better as an extension, I'm all for adding those hooks. I'm > just against adding something to the language that won't actually be > able to accomplish what it promises. >=20 > Escaping is always context-sensitive, in every case. Tainting as has > been proposed in the past doesn't (and literally can't in the general > case) know the context. Therefore it can't reliably know the correct > way to escape. I think it can, but I agree that there are some edge cases that it won't = be able to catch, such as: $url =3D $_GET['u']; // ?u=3Djavascript:alert(document.cookies); echo 'Cookies'; > Consider the case of addslashes() and magic_quotes_gpc. The reason > that they are insecure is that they don't know the context that the > variable is used in. This leads to character set issues, as well as > cases where backslashes enter contexts they don't escape (like HTML > output). We learned years ago that context-insensitive escaping > doesn't work and is literally bad. Let's not make the same mistakes > again (even if it's for the correct reasons this time). I am repeating myself now, but yes, magic_quotes_gpc did not know about = the context those variables were going into... tainting on the other = hand is checking the variables, which are going into a known context, = have been escaped (with a couple of possible issues, but they can be = addressed though other methods). Craig