Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:26980
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 66.249.82.234 as permitted sender)
DomainKey-Status: good
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:in-reply-to:references:mime-version:content-type:message-id:cc:content-transfer-encoding:from:subject:date:to:x-mailer:sender;
        b=ngtq2UBC9jzqg5r0tJkLY/y6K8DxCrcfgs+FtLeBTCXpnluKZB98sOnmXtyGC9xd/vPAil6jKuHmpLRa1A0wJpglzSOgHE0qfbDUTufisuQlPZe9oKGFWy1TiD/jDJoYClBtdieqIB9ygUMZ8tgGyZkz9dFOFgPPrXUi++sf+tw=
In-Reply-To: <20061215201448.B16D8BC1AB@spike.porcupine.org>
References: <20061215201448.B16D8BC1AB@spike.porcupine.org>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-ID: <7AE00699-23C2-4759-A50C-3D94199DA85A@prohost.org>
Cc: PHP internals <internals@lists.php.net>
Content-Transfer-Encoding: 7bit
Date: Fri, 15 Dec 2006 15:43:23 -0500
To: Wietse Venema <wietse@porcupine.org>
Sender: Ilia Alshanetsky <iliaal@gmail.com>
Subject: Re: [PHP-DEV] Run-time taint support proposal
From: ilia@prohost.org (Ilia Alshanetsky)

On 15-Dec-06, at 3:14 PM, Wietse Venema wrote:

> This is a proposal to add basic Perl/Ruby like tainting support to
> PHP: an option that is turned off by default, and that programmers
> may turn on at runtime to alert them when they make the common
> mistake of using uncleansed input with include, echo, system, open,
> etc.  This would work with unmodified third-party extensions.

I doubt it is plausible to make it work entirely without touching  
external extensions that those extensions may be changing behavior of  
data from tainted to un-tainted and vice versa.

> Taint support is not a sandbox; a malicious PHP script can still
> open a pipe to a shell process and feed uncleansed commands to it.
> Taint support can be an ingredient to build a sandbox, but that
> involves lots more. See for example the Ruby reference at the end.

Sounds awefuly like yet another safe_mode, something that proclaims  
security, yet being unable to provide it.

> Of course when overhead is low enough, people might want to turn
> on taint checks in production, to implement a multi-layer defense.
> Wise people know that no single layer provides perfect protection.
> People already do this with other scripting languages.

Unlikely to ever be the case, the overhead of taint modes is  
generally quite significant.

> - Education: automatic cleansing systems don't make programmers
>   aware that network data is inherently untrustworthy. Instead,
>   they teach the exact opposite: don't worry about data hygiene.
>   This of course means they will get bitten elsewhere anyway.

Most people program not to learn how, but to solve problems. Which is  
why automatic filtering has been the holy grail of security as it  
allows developers to avoid thinking about input validation beyond the  
initial setup and move on with their lives.

> - Expectation: automatic cleansing systems have to be perfect. If
>   the safety net catches some but not all cross-site scripting or
>   SQL injection attacks, then the system has a security hole and
>   people lose confidence. This gives security a bad reputation.

Same argument can be made about taint mode, judging by Perl and Ruby  
where there are tricks to bypass it, same argument applies.

> - Overhead: as strings are sliced, diced, and tossed around, the
>   automatic cleansing safety net has to keep track of exactly which
>   characters in a substring are derived from untrusted input, and
>   which characters are not, so that the safety net can later recognize
>   malicious content in the middle of html/shell/sql/etc.  commands.

If you look at filter, there is no tracking of malicious chars, the  
data is simple cleansed of them or rejected all together, this is a  
one time event.

> - More overhead: special-purpose code is needed in all functions
>   and all primitives that execute html/shell/sql/etc.  commands.
>   This code is needed because each context has a different definition
>   of what is "malicious" content in the middle of a request.

That's why you can use RAW mode and filter the data when necessary.

> Compared to this, the run-time overhead of maintaining and testing
> taint bits in PHP is miniscule, if my experiences with the prototype
> are meaningful.

I am highly skeptical regarding this claim.


> - Each ZVAL is marked tainted or not tainted (i.e.  we don't taint
>   individual characters within substrings). Black and white is all.
>   In some future, someone may want to explore the possibility of
>   more than two shades. But not now.

That means an additional element to a struct that has thousands of  
instances in most scripts, this will be the first overhead caused by  
the memory footprint increase.

> - Primitives and functions such as echo, eval, or mysql_query are
>   not allowed to receive tainted input. When this happens the script
>   terminates with a run-time error.  It is a bad idea for software
>   to continue after a security violation.

You would need to go through some 5,000+ functions that PHP offers  
and determine which one can and cannot receive tainted data,  
something that virtually guarantees things will be missed, bring us  
back to the safe_mode/open_basedir problem.

> - PHP propagates taintedness across expressions.  If an input to
>   an expression is tainted, then the result of that expression is
>   tainted too. There are exceptions to this rule: these are called
>   sanitisers, as discussed next.

That goes counter to your original point that extensions do not need  
to be taint aware, what you propose would require adjustment of  
nearly every single extension. The additional tainted, not-tainted  
checks will add further overhead.

> - The PHP application programmer untaints data by explicit assignment
>   with an untainted value.  For example, the result from  
> htmlentities()
>   or mysql_real_escape_string() is not tainted. People could apply
>   the wrong sanitizer if they really want to. Remember, the purpose
>   is to help programmers by telling what data needs cleansing.  It
>   is up to them to make the right decision.  If we wanted to force
>   the use of the "right" sanitizer then we would need multiple
>   shades of untaintedness. This would not be practical.

Again, many functions have different behaviors etc... Let's take an  
example htmlspecialchars() is great against XSS but does nothing for  
exec(), so if you htmlspecialchars a string then pass it to exec, it  
thinks that the data is non-tainted and executes it resulting in  
command injection.

Overall, as it stands I do not believe that this is a good idea and  
as is my vote would be -0.5 on its inclusion into PHP.

Ilia Alshanetsky