Literal / Taint checking

6 years ago by Craig Francis — view source

unread

Hi,

How likely would it be for PHP to do Literal tracking of variables?

This is something that's being discussed JavaScript TC39 at the moment [1],
and I think it would be even more useful in PHP.

We already know we should use parameterized/prepared SQL, but there is no
way to prove the SQL string hasn't been tainted by external data in large
projects, or even in an ORM.

This could also work for templating systems (blocking HTML injection) and
commands.

Internally it would need to introduce a flag on every variable, and a
single function to check if a given variable has only been created by
Literal(s).

Unlike the taint extension, there should be no way to override this (e.g.
no taint/untaint functions); and if it was part of the core language, it
will continue to work after every update.

One day certain functions (e.g. mysqli_query) might use this information to
generate a error/warning/notice; but for now, having it available for
checking would be more than enough.

Craig

public function exec($sql, $parameters = []) {
    if (!*is_literal*($sql)) {
        throw new Exception('SQL must be a literal.');
    }
    $statement = $this->pdo->prepare($sql);
    $statement->execute($parameters);
    return $statement->fetchAll();
}

...

$sql = 'SELECT * FROM table WHERE id = ?';

$result = $db->exec($sql, [$id]);

[1] https://github.com/tc39/proposal-array-is-template-object
https://github.com/mikewest/tc39-proposal-literals

6 years ago by Benjamin Eberlei — view source

unread

On Thu, Aug 15, 2019 at 8:03 PM Craig Francis craig@craigfrancis.co.uk
wrote:

Hi,

How likely would it be for PHP to do Literal tracking of variables?

This is something that's being discussed JavaScript TC39 at the moment [1],
and I think it would be even more useful in PHP.

We already know we should use parameterized/prepared SQL, but there is no
way to prove the SQL string hasn't been tainted by external data in large
projects, or even in an ORM.

This could also work for templating systems (blocking HTML injection) and
commands.

Internally it would need to introduce a flag on every variable, and a
single function to check if a given variable has only been created by
Literal(s).

Unlike the taint extension, there should be no way to override this (e.g.
no taint/untaint functions); and if it was part of the core language, it
will continue to work after every update.

One day certain functions (e.g. mysqli_query) might use this information to
generate a error/warning/notice; but for now, having it available for
checking would be more than enough.

Craig

It is an interesting topic indeed! I remember that laruence wrote an
extension for this a while ago, I have never tried it myself though. You
can find it here: https://github.com/laruence/taint

public function exec($sql, $parameters = []) {
    if (!*is_literal*($sql)) {
        throw new Exception('SQL must be a literal.');
    }
    $statement = $this->pdo->prepare($sql);
    $statement->execute($parameters);
    return $statement->fetchAll();
}

...

$sql = 'SELECT * FROM table WHERE id = ?';

$result = $db->exec($sql, [$id]);

[1] https://github.com/tc39/proposal-array-is-template-object
https://github.com/mikewest/tc39-proposal-literals

6 years ago by Craig Francis — view source

unread

On Thu, Aug 15, 2019 at 8:03 PM Craig Francis craig@craigfrancis.co.uk
wrote:

Hi,

How likely would it be for PHP to do Literal tracking of variables?

This is something that's being discussed JavaScript TC39 at the moment
[1],
and I think it would be even more useful in PHP.

We already know we should use parameterized/prepared SQL, but there is no
way to prove the SQL string hasn't been tainted by external data in large
projects, or even in an ORM.

This could also work for templating systems (blocking HTML injection) and
commands.

Internally it would need to introduce a flag on every variable, and a
single function to check if a given variable has only been created by
Literal(s).

Unlike the taint extension, there should be no way to override this (e.g.
no taint/untaint functions); and if it was part of the core language, it
will continue to work after every update.

One day certain functions (e.g. mysqli_query) might use this information
to
generate a error/warning/notice; but for now, having it available for
checking would be more than enough.

Craig

It is an interesting topic indeed! I remember that laruence wrote an
extension for this a while ago, I have never tried it myself though. You
can find it here: https://github.com/laruence/taint

Thanks,

I've been using that extension for a few years - laruence has done a
fantastic job with it.

But it can be a bit buggy; and due to it being a taint based system, with
the ability to taint/untaint, it introduces some problems.

https://github.com/laruence/taint/issues/54

6 years ago by Matthew Brown — view source

unread

There are already some userland taint-checking solutions for PHP e.g. the
Phan taint-check plugin from MediaWiki:
https://www.mediawiki.org/wiki/Phan-taint-check-plugin

I'm working on my own userland solution, too (based on Facebook's
approach). Demo is here: https://psalm.dev/r/ebb9522fea

6 years ago by Craig Francis — view source

unread

On Thu, 15 Aug 2019 at 7:43 pm, Matthew Brown matthewmatthew@gmail.com
wrote:

There are already some userland taint-checking solutions for PHP e.g. the
Phan taint-check plugin from MediaWiki:
https://www.mediawiki.org/wiki/Phan-taint-check-plugin

I'm working on my own userland solution, too (based on Facebook's
approach). Demo is here: https://psalm.dev/r/ebb9522fea

Hi Matthew,

If anything, this proposal would help user-land solutions (it gives them
more information while the code is in running).

At the moment, they all need to make their own parsers, or extensions, and
they all have blind spots.

I’d also like us to move slowly away from taint checkers that allow for
tainted strings to be marked as un-tainted, as these allow mistakes to be
made.

Please excuse any typos, on my phone, but how about:

$sql = ‘... WHERE id = ’ . mysqli_real_escape_string($db, $_GET[‘id’]);

It’s been escaped, so surely it’s not tainted any more?

Unfortunately, because it’s not surrounded with quote marks, it’s not safe.

It also relies on there not being any parsing issues within the database
engine itself (parameterised queries help here, as those values aren’t part
of the SQL parsing process).

Craig

6 years ago by Matthew Brown — view source

unread

If anything, this proposal would help user-land solutions (it gives them
more information while the code is in running).

Well, it might help runtime-based user-land solutions, but not static
analysis-based solutions.

In our bug disclosure program at Vimeo we've had no SQL injection issues
reported, but a number of XSS issues (echoing attacker-controlled data),
and those issues cannot so easily be prevented by this technique as there's
generally little reason to echo literal values.

I can also think of a number of user-constructed SQL queries (e.g. WHERE
... IN) that require non-literal values to work (if this were to come to
pass there might be a set of special unsafe methods).

6 years ago by Craig Francis — view source

unread

On Thu, 15 Aug 2019 at 21:37, Matthew Brown matthewmatthew@gmail.com
wrote:

If anything, this proposal would help user-land solutions (it gives them

more information while the code is in running).

Well, it might help runtime-based user-land solutions, but not static
analysis-based solutions.

I mostly see us needing to use both solutions - static analysis does a deep
dive (ideally helped with any information the PHP engine can provide, even
if it's just parsing), and this runtime check running constantly - only
because static analysis by itself can skip bits, e.g.

https://gerrit.wikimedia.org/g/mediawiki/tools/phan/SecurityCheckPlugin/#general-limitations

In our bug disclosure program at Vimeo we've had no SQL injection issues
reported, but a number of XSS issues (echoing attacker-controlled data),
and those issues cannot so easily be prevented by this technique as there's
generally little reason to echo literal values.

This proposal can ensure SQL injection is impossible, rather than our
current processes, which has some gaps (I'm glad to see you haven't had any
reported issues, but I believe you're in the minority).

This can also be expanded for command line injection issues (kind of moving
away from escapeshellarg).

I've not spent enough time on the templating side of things yet (I've been
working more on the browser side for this, e.g. CSP and
application/xhtml+xml).

But I'm hopeful this proposal can still be useful, in a similar way to how
the JavaScript changes will help templating (ref Trusted Types).

Even if we only use this for guarding some inputs - e.g. a templating
system being sure which bits are safe HTML literals, loaded into a
DomDocument, and unsafe user data being applied with setAttribute() after
some sanity checks.

But there are some annoying edge cases, which means that I don't think this
can be perfect:

$user_homepage = 'javascript:alert(document.cookie)';

<a href="<?= htmlentities($user_homepage) ?>">Example</a>

I can also think of a number of user-constructed SQL queries (e.g. WHERE
... IN) that require non-literal values to work (if this were to come to
pass there might be a set of special unsafe methods).

This is what I've been using for WHERE ... IN to create a literal for the
SQL string:

$parameters = [];

$in_sql = $db->parameter_in($parameters, 'i', $ids); // I'm using
maxdb_stmt::bind_param which needs a type.

$sql = 'SELECT * FROM table WHERE id IN (' . $in_sql . ')';

$db->fetch_all($sql, $parameters)

...

public function parameter_in(&$parameters, $type, $values) {
$count = count($values);
if ($count == 0) {
throw new Exception('At least 1 value is required for an IN list');
}
if ($type == 'i') {
$values = array_map('intval', $values);
} else if ($type == 's') {
$values = array_map('strval', $values);
} else {
throw new Exception('Unknown parameter type for parameter_in(),
should be "i" or "s"');
}
foreach ($values as $value) {
$parameters[] = [$type, $value];
}
return substr(str_repeat('?,', $count), 0, -1); // Returns a literal
string.
}

5 years ago by Craig Francis — view source

unread

Hi,

While there was a brief discussion about an is_literal() method in
August, I'm wondering where I can go next?

Just as a reminder, the main objection seemed to be that Taint checking is
the current solution. For example, those created by Laruence[1],
MediaWiki[2], and Matthew[3]. But this can never be as good at the PHP
engine explicitly stating a variable only contains literal values, where
it can be checked at runtime, and be a key part of the development process.

And while I'm using SQL injection in my examples (because it's easy to show
how it can enforce the use of parameterised queries); it would also be
useful to protect against command line injection, and HTML/XSS as well
(e.g. a templating system can only accept HTML as literal strings, and
the user supplied values be provided separately).

I'm assuming this would change the zval structure (to include an
"is_literal" flag?), and it would be more of a PHP 8.0 change, rather than
8.1.

Craig

Broken taint check, due to missing quote marks:

$sql = ‘... WHERE id = ’ . mysqli_real_escape_string($db, $_GET[‘id’]);

Support for "WHERE ... IN", ideally done via an abstraction, so you don't
need to write this every time:

$sql = '... WHERE id IN (' . substr(str_repeat('?,', count($ids)), 0, -1) .
')';

[1] https://github.com/laruence/taint
[2] https://www.mediawiki.org/wiki/Phan-taint-check-plugin
[3] https://psalm.dev/r/ebb9522fea

On Thu, 15 Aug 2019 at 19:02, Craig Francis craig@craigfrancis.co.uk
wrote:

Hi,

How likely would it be for PHP to do Literal tracking of variables?

This is something that's being discussed JavaScript TC39 at the moment
[1], and I think it would be even more useful in PHP.

We already know we should use parameterized/prepared SQL, but there is no
way to prove the SQL string hasn't been tainted by external data in large
projects, or even in an ORM.

This could also work for templating systems (blocking HTML injection) and
commands.

Internally it would need to introduce a flag on every variable, and a
single function to check if a given variable has only been created by
Literal(s).

Unlike the taint extension, there should be no way to override this (e.g.
no taint/untaint functions); and if it was part of the core language, it
will continue to work after every update.

One day certain functions (e.g. mysqli_query) might use this information
to generate a error/warning/notice; but for now, having it available for
checking would be more than enough.

Craig
public function exec($sql, $parameters = []) {
    if (!*is_literal*($sql)) {
        throw new Exception('SQL must be a literal.');
    }
    $statement = $this->pdo->prepare($sql);
    $statement->execute($parameters);
    return $statement->fetchAll();
}
...
$sql = 'SELECT * FROM table WHERE id = ?';

$result = $db->exec($sql, [$id]);
[1] https://github.com/tc39/proposal-array-is-template-object
https://github.com/mikewest/tc39-proposal-literals

5 years ago by Craig Francis — view source

unread

Hi,

As I'm not sure how to make any more process on this, I've added added a
Feature Request:

https://bugs.php.net/bug.php?id=79359

It shows how this change in PHP could stop SQL injection, and proposes a
way it could be used against HTML injection as well.

Craig

On Thu, 13 Feb 2020 at 12:31, Craig Francis craig@craigfrancis.co.uk
wrote:

Hi,

While there was a brief discussion about an is_literal() method in
August, I'm wondering where I can go next?

Just as a reminder, the main objection seemed to be that Taint checking is
the current solution. For example, those created by Laruence[1],
MediaWiki[2], and Matthew[3]. But this can never be as good at the PHP
engine explicitly stating a variable only contains literal values, where
it can be checked at runtime, and be a key part of the development process.

And while I'm using SQL injection in my examples (because it's easy to
show how it can enforce the use of parameterised queries); it would also be
useful to protect against command line injection, and HTML/XSS as well
(e.g. a templating system can only accept HTML as literal strings, and
the user supplied values be provided separately).

I'm assuming this would change the zval structure (to include an
"is_literal" flag?), and it would be more of a PHP 8.0 change, rather than
8.1.

Craig

Broken taint check, due to missing quote marks:

$sql = ‘... WHERE id = ’ . mysqli_real_escape_string($db, $_GET[‘id’]);

Support for "WHERE ... IN", ideally done via an abstraction, so you don't
need to write this every time:

$sql = '... WHERE id IN (' . substr(str_repeat('?,', count($ids)), 0, -1)
. ')';

[1] https://github.com/laruence/taint
[2] https://www.mediawiki.org/wiki/Phan-taint-check-plugin
[3] https://psalm.dev/r/ebb9522fea

On Thu, 15 Aug 2019 at 19:02, Craig Francis craig@craigfrancis.co.uk
wrote:
Hi,

How likely would it be for PHP to do Literal tracking of variables?

This is something that's being discussed JavaScript TC39 at the moment
[1], and I think it would be even more useful in PHP.

We already know we should use parameterized/prepared SQL, but there is no
way to prove the SQL string hasn't been tainted by external data in large
projects, or even in an ORM.

This could also work for templating systems (blocking HTML injection) and
commands.

Internally it would need to introduce a flag on every variable, and a
single function to check if a given variable has only been created by
Literal(s).

Unlike the taint extension, there should be no way to override this (e.g.
no taint/untaint functions); and if it was part of the core language, it
will continue to work after every update.

One day certain functions (e.g. mysqli_query) might use this information
to generate a error/warning/notice; but for now, having it available for
checking would be more than enough.

Craig
public function exec($sql, $parameters = []) {
    if (!*is_literal*($sql)) {
        throw new Exception('SQL must be a literal.');
    }
    $statement = $this->pdo->prepare($sql);
    $statement->execute($parameters);
    return $statement->fetchAll();
}
...
$sql = 'SELECT * FROM table WHERE id = ?';

$result = $db->exec($sql, [$id]);
[1] https://github.com/tc39/proposal-array-is-template-object
https://github.com/mikewest/tc39-proposal-literals

5 years ago by Rowan Tommins — view source

unread

Hi,

As I'm not sure how to make any more process on this, I've added added a
Feature Request:

https://bugs.php.net/bug.php?id=79359

It shows how this change in PHP could stop SQL injection, and proposes a
way it could be used against HTML injection as well.

Hi Craig,

In my experience, the bug tracker is likely to get you less attention than
this list, rather than more. For this kind of significant change, the way
to get a more in-depth discussion going is to draft an RFC; there are some
instructions and tips on how to go about that at
https://wiki.php.net/rfc/howto and
https://blogs.oracle.com/opal/the-mysterious-php-rfc-process-and-how-you-can-change-the-web

The idea of an RFC is to sit down and design exactly how the proposed
feature would work; that helps move the discussion forward, because people
can see exactly how it might look, and means you're offering something to
the community rather than asking it of them. The RFC doesn't have to
include a full implementation, but if you don't know much about the
technical details, you might need help from someone who does to make sure
the proposal is realistic.

I see you've linked an older RFC in the feature request; it would be worth
digging out the archived discussion from when that was proposed, to see why
it stalled. It may just be that people were distracted by other things, or
there may be issues raised which you can consider in a new proposal. If you
haven't already, you could try contacting the author as well.

In general, I think it's an interesting idea, but as the saying goes "the
devil is in the detail", so I don't have much to say without a concrete
proposal for what it would look like.

Regards,

Rowan Tommins
[IMSoP]

5 years ago by Craig Francis — view source

unread

[...] the way to get a more in-depth discussion going is to draft an RFC

Thanks Rowan,

I've created a Wiki account (craigfrancis), and I believe the next step is
to ask for RFC karma?

And is there is anyone who can help with the technical details? I'd really
appreciate it. If it helps, I've got a development budget of £1,000 I could
put towards this (I will need an invoice, a simple PDF would do).

As to the discussion around the older RFC[1]...

The last message I can find was on the 1st September 2015[2].

The RFC focused on SQL injection, where it was noted that "unfiltered input
can affect way more than only SQL"[3], and it isn't ideal for "just for one
use case"[4] - my proposed is_literal() can be used for other issues,
such as Cross-Site Scripting[5], Command Line Injection, etc.

There was a belief that education was the answer[6] - but having this check
would allow developers to identify (and block) mistakes at runtime.

Xinchen mentioned how it was complex in PHP5 to implement the Taint
extension - but "with PHP7's new zend_string, and string flags, the
implementation will become easier"[7]. And while the Taint checking is
useful, it does not address the mistakes that can happen with escaping.

As to why I'm deviating away from the original RFC...

By providing a is_literal() function, it allows the developer to
determine how they want to use it - where they can skip it for certain
tasks[8], and database drivers (or other extensions) could use it in the
future to raise a notice/warning/error[9].

It gives a mechanism to ensure inputs are split between the command (a
literal), and user supplied values - which is what Yasuo was asking for[10].

Also, by focusing on just literals (as in, only values defined within PHP
scripts), we avoid any concerns about escaping (which can go wrong), and we
won't need to identify which sources are trusted[11].

For the last 5 years I've been writing my SQL with literals only, and it's
worked very well... with just one oddity (which I still consider a literal):

<?php
$in_sql = implode(',', array_fill(0, $count, '?'));
// or
$in_sql = substr(str_repeat('?,', $count), 0, -1);
?>

On Mon, 9 Mar 2020 at 13:47, Craig Francis craig@craigfrancis.co.uk
wrote:

Hi,

As I'm not sure how to make any more process on this, I've added added a
Feature Request:

https://bugs.php.net/bug.php?id=79359

It shows how this change in PHP could stop SQL injection, and proposes a
way it could be used against HTML injection as well.

Hi Craig,

In my experience, the bug tracker is likely to get you less attention than
this list, rather than more. For this kind of significant change, the way
to get a more in-depth discussion going is to draft an RFC; there are some
instructions and tips on how to go about that at
https://wiki.php.net/rfc/howto and

https://blogs.oracle.com/opal/the-mysterious-php-rfc-process-and-how-you-can-change-the-web

The idea of an RFC is to sit down and design exactly how the proposed
feature would work; that helps move the discussion forward, because people
can see exactly how it might look, and means you're offering something to
the community rather than asking it of them. The RFC doesn't have to
include a full implementation, but if you don't know much about the
technical details, you might need help from someone who does to make sure
the proposal is realistic.

I see you've linked an older RFC in the feature request; it would be worth
digging out the archived discussion from when that was proposed, to see why
it stalled. It may just be that people were distracted by other things, or
there may be issues raised which you can consider in a new proposal. If you
haven't already, you could try contacting the author as well.

In general, I think it's an interesting idea, but as the saying goes "the
devil is in the detail", so I don't have much to say without a concrete
proposal for what it would look like.

Regards,

Rowan Tommins
[IMSoP]