Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:26980 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 58465 invoked by uid 1010); 15 Dec 2006 20:44:40 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 58450 invoked from network); 15 Dec 2006 20:44:40 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Dec 2006 20:44:40 -0000 Authentication-Results: pb1.pair.com header.from=iliaal@gmail.com; sender-id=pass; domainkeys=good Authentication-Results: pb1.pair.com smtp.mail=iliaal@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 66.249.82.234 as permitted sender) DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: iliaal@gmail.com X-Host-Fingerprint: 66.249.82.234 wx-out-0506.google.com Linux 2.4/2.6 Received: from [66.249.82.234] ([66.249.82.234:59209] helo=wx-out-0506.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 4F/44-30677-2F803854 for ; Fri, 15 Dec 2006 15:44:05 -0500 Received: by wx-out-0506.google.com with SMTP id i27so832481wxd for ; Fri, 15 Dec 2006 12:43:28 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:in-reply-to:references:mime-version:content-type:message-id:cc:content-transfer-encoding:from:subject:date:to:x-mailer:sender; b=ngtq2UBC9jzqg5r0tJkLY/y6K8DxCrcfgs+FtLeBTCXpnluKZB98sOnmXtyGC9xd/vPAil6jKuHmpLRa1A0wJpglzSOgHE0qfbDUTufisuQlPZe9oKGFWy1TiD/jDJoYClBtdieqIB9ygUMZ8tgGyZkz9dFOFgPPrXUi++sf+tw= Received: by 10.70.33.7 with SMTP id g7mr1821349wxg.1166215407857; Fri, 15 Dec 2006 12:43:27 -0800 (PST) Received: from ?192.168.1.6? ( [74.108.69.82]) by mx.google.com with ESMTP id i33sm5203678wxd.2006.12.15.12.43.26; Fri, 15 Dec 2006 12:43:27 -0800 (PST) In-Reply-To: <20061215201448.B16D8BC1AB@spike.porcupine.org> References: <20061215201448.B16D8BC1AB@spike.porcupine.org> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <7AE00699-23C2-4759-A50C-3D94199DA85A@prohost.org> Cc: PHP internals Content-Transfer-Encoding: 7bit Date: Fri, 15 Dec 2006 15:43:23 -0500 To: Wietse Venema X-Mailer: Apple Mail (2.752.3) Sender: Ilia Alshanetsky Subject: Re: [PHP-DEV] Run-time taint support proposal From: ilia@prohost.org (Ilia Alshanetsky) On 15-Dec-06, at 3:14 PM, Wietse Venema wrote: > This is a proposal to add basic Perl/Ruby like tainting support to > PHP: an option that is turned off by default, and that programmers > may turn on at runtime to alert them when they make the common > mistake of using uncleansed input with include, echo, system, open, > etc. This would work with unmodified third-party extensions. I doubt it is plausible to make it work entirely without touching external extensions that those extensions may be changing behavior of data from tainted to un-tainted and vice versa. > Taint support is not a sandbox; a malicious PHP script can still > open a pipe to a shell process and feed uncleansed commands to it. > Taint support can be an ingredient to build a sandbox, but that > involves lots more. See for example the Ruby reference at the end. Sounds awefuly like yet another safe_mode, something that proclaims security, yet being unable to provide it. > Of course when overhead is low enough, people might want to turn > on taint checks in production, to implement a multi-layer defense. > Wise people know that no single layer provides perfect protection. > People already do this with other scripting languages. Unlikely to ever be the case, the overhead of taint modes is generally quite significant. > - Education: automatic cleansing systems don't make programmers > aware that network data is inherently untrustworthy. Instead, > they teach the exact opposite: don't worry about data hygiene. > This of course means they will get bitten elsewhere anyway. Most people program not to learn how, but to solve problems. Which is why automatic filtering has been the holy grail of security as it allows developers to avoid thinking about input validation beyond the initial setup and move on with their lives. > - Expectation: automatic cleansing systems have to be perfect. If > the safety net catches some but not all cross-site scripting or > SQL injection attacks, then the system has a security hole and > people lose confidence. This gives security a bad reputation. Same argument can be made about taint mode, judging by Perl and Ruby where there are tricks to bypass it, same argument applies. > - Overhead: as strings are sliced, diced, and tossed around, the > automatic cleansing safety net has to keep track of exactly which > characters in a substring are derived from untrusted input, and > which characters are not, so that the safety net can later recognize > malicious content in the middle of html/shell/sql/etc. commands. If you look at filter, there is no tracking of malicious chars, the data is simple cleansed of them or rejected all together, this is a one time event. > - More overhead: special-purpose code is needed in all functions > and all primitives that execute html/shell/sql/etc. commands. > This code is needed because each context has a different definition > of what is "malicious" content in the middle of a request. That's why you can use RAW mode and filter the data when necessary. > Compared to this, the run-time overhead of maintaining and testing > taint bits in PHP is miniscule, if my experiences with the prototype > are meaningful. I am highly skeptical regarding this claim. > - Each ZVAL is marked tainted or not tainted (i.e. we don't taint > individual characters within substrings). Black and white is all. > In some future, someone may want to explore the possibility of > more than two shades. But not now. That means an additional element to a struct that has thousands of instances in most scripts, this will be the first overhead caused by the memory footprint increase. > - Primitives and functions such as echo, eval, or mysql_query are > not allowed to receive tainted input. When this happens the script > terminates with a run-time error. It is a bad idea for software > to continue after a security violation. You would need to go through some 5,000+ functions that PHP offers and determine which one can and cannot receive tainted data, something that virtually guarantees things will be missed, bring us back to the safe_mode/open_basedir problem. > - PHP propagates taintedness across expressions. If an input to > an expression is tainted, then the result of that expression is > tainted too. There are exceptions to this rule: these are called > sanitisers, as discussed next. That goes counter to your original point that extensions do not need to be taint aware, what you propose would require adjustment of nearly every single extension. The additional tainted, not-tainted checks will add further overhead. > - The PHP application programmer untaints data by explicit assignment > with an untainted value. For example, the result from > htmlentities() > or mysql_real_escape_string() is not tainted. People could apply > the wrong sanitizer if they really want to. Remember, the purpose > is to help programmers by telling what data needs cleansing. It > is up to them to make the right decision. If we wanted to force > the use of the "right" sanitizer then we would need multiple > shades of untaintedness. This would not be practical. Again, many functions have different behaviors etc... Let's take an example htmlspecialchars() is great against XSS but does nothing for exec(), so if you htmlspecialchars a string then pass it to exec, it thinks that the data is non-tainted and executes it resulting in command injection. Overall, as it stands I do not believe that this is a good idea and as is my vote would be -0.5 on its inclusion into PHP. Ilia Alshanetsky