[RFC] is_trusted - was is_literal

4 years ago by Yasuo Ohgaki — view source — reply

unread

Hi,

The name "is_trusted" is misleading.
Literal is nothing but literal.

<html> <?php eval('$var= '. $_GET['a'] );

if (is_trusted($var)) echo $var;
?>

</html>

Literals cannot always be trusted.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

On Tue, Jun 22, 2021 at 5:25 AM Craig Francis craig@craigfrancis.co.uk
wrote:

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

To recap,

We have chosen the name is_trusted(), based 18 votes for, vs 3 against.

Integers are now included, which will help adoption:

https://wiki.php.net/rfc/is_literal

(Joe’s currently updating the implementation to have the new name, but all
the functionality is there).

I’m glad this RFC has been well received; and thank you for all the
feedback, I really think it‘s benefitting the implementation.

Craig

4 years ago by Craig Francis — view source — reply

unread

Hi,

The name "is_trusted" is misleading.
Literal is nothing but literal.
<html> <?php eval('$var= '. $_GET['a'] );
if (is_trusted($var)) echo $var;
?>
</html>
Literals cannot always be trusted.

That’s explained in the RFC, under “Limitations” and “Faking it”…

“That said, we do not pretend there aren't ways around this (e.g. using
var_export), but doing so is clearly the developer doing something wrong.
We want to provide safety rails, but there is nothing stopping the
developer from jumping over them if that's their choice.”

4 years ago by Derick Rethans — view source — reply

unread

The name "is_trusted" is misleading.
Literal is nothing but literal.

I agree with this. The name is_trusted is going to be the same naming mistake as "safe mode" was. Developers will put their trust in it that it is 100% guaranteed safe.

cheers,
Derick

4 years ago by Benjamin Morel — view source — reply

unread

The name "is_trusted" is misleading.
Literal is nothing but literal.

I agree with this. The name is_trusted is going to be the same naming
mistake as "safe mode" was. Developers will put their trust in it that it
is 100% guaranteed safe.

FWIW, agreed, too. Trusted is vague and may imply some false sense of
security. Literal is literally what it says on the tin.

— Benjamin

4 years ago by Craig Francis — view source — reply

unread

On Tue, 22 Jun 2021 at 12:18 am, Benjamin Morel benjamin.morel@gmail.com
wrote:

The name "is_trusted" is misleading.
Literal is nothing but literal.

I agree with this. The name is_trusted is going to be the same naming
mistake as "safe mode" was. Developers will put their trust in it that it
is 100% guaranteed safe.

FWIW, agreed, too. Trusted is vague and may imply some false sense of
security. Literal is literally what it says on the tin.

I can follow up properly tomorrow, but by popular request we do support
integers as well (could be seen as stretching the definition of “literal” a
bit).

And we did ask for suggestions last week, which ended up with a vote (as I
couldn’t decide).

That said, I’m really glad that the only issue we seem to have is the name.

Craig

4 years ago by Stephen Reay — view source — reply

unread

On Tue, 22 Jun 2021 at 12:18 am, Benjamin Morel <benjamin.morel@gmail.com mailto:benjamin.morel@gmail.com>
wrote:

The name "is_trusted" is misleading.
Literal is nothing but literal.

I agree with this. The name is_trusted is going to be the same naming
mistake as "safe mode" was. Developers will put their trust in it that it
is 100% guaranteed safe.

FWIW, agreed, too. Trusted is vague and may imply some false sense of
security. Literal is literally what it says on the tin.

I can follow up properly tomorrow, but by popular request we do support
integers as well (could be seen as stretching the definition of “literal” a
bit).

And we did ask for suggestions last week, which ended up with a vote (as I
couldn’t decide).

That said, I’m really glad that the only issue we seem to have is the name.

Craig

So I just want to make sure I understand the progression on this so far.

It started out with people wanting a way to check that a string was a literal string, in code somewhere, and does not come from user input. Ok makes sense. The name makes sense too.

Then someone said they wanted to check if an integer was a literal too - but because of technical limitations, it now allows any integer, regardless of where it came from, to be treated as a literal.

Then because it’s not actually checking for literals, people thought the name “trusted” made more sense?

That nobody thinks “any user supplied integer must be surely safe” is kind of hilarious, and sad at the same time.

Knowing that a string is literal would be very helpful. Knowing that the string potentially still contains user input, in spite of the one thing it claims to do, is not just unhelpful, it makes the entire thing useless.

I can’t vote, but this whole thing would be a No from me unless it was the original scope - a variable is a literal defined in code somewhere. If there are technical limitations with some types, then leave them off the list of what it will check.

4 years ago by Stephen Reay — view source — reply

unread

On Tue, 22 Jun 2021 at 12:18 am, Benjamin Morel <benjamin.morel@gmail.com mailto:benjamin.morel@gmail.com <mailto:benjamin.morel@gmail.com mailto:benjamin.morel@gmail.com>>
wrote:

The name "is_trusted" is misleading.
Literal is nothing but literal.

I agree with this. The name is_trusted is going to be the same naming
mistake as "safe mode" was. Developers will put their trust in it that it
is 100% guaranteed safe.

FWIW, agreed, too. Trusted is vague and may imply some false sense of
security. Literal is literally what it says on the tin.

I can follow up properly tomorrow, but by popular request we do support
integers as well (could be seen as stretching the definition of “literal” a
bit).

And we did ask for suggestions last week, which ended up with a vote (as I
couldn’t decide).

That said, I’m really glad that the only issue we seem to have is the name.

Craig

So I just want to make sure I understand the progression on this so far.

It started out with people wanting a way to check that a string was a literal string, in code somewhere, and does not come from user input. Ok makes sense. The name makes sense too.

Then someone said they wanted to check if an integer was a literal too - but because of technical limitations, it now allows any integer, regardless of where it came from, to be treated as a literal.

Then because it’s not actually checking for literals, people thought the name “trusted” made more sense?

That nobody thinks “any user supplied integer must be surely safe” is kind of hilarious, and sad at the same time.

Knowing that a string is literal would be very helpful. Knowing that the string potentially still contains user input, in spite of the one thing it claims to do, is not just unhelpful, it makes the entire thing useless.

I can’t vote, but this whole thing would be a No from me unless it was the original scope - a variable is a literal defined in code somewhere. If there are technical limitations with some types, then leave them off the list of what it will check.

s/nobody/anybody/

I blame a lack of caffeine.

4 years ago by Craig Francis — view source — reply

unread

So I just want to make sure I understand the progression on this so far.
It started out with people wanting a way to check that a string was a
literal string, in code somewhere, and does not come from user input. Ok
makes sense. The name makes sense too.

The primary reason was never just to define literal strings, the intention
has always been to create a practical, implementable solution to address
the issue of Injection Vulnerabilities (SQl/HTML/CLI/etc).

The name is_literal() has always just been a placeholder, it came up when
I first started looking at this problem because that was the most obvious
thing I knew we could anchor around. (Unfortunately I think it was easy to
make assumptions based solely on that name, rather than focussing on the
issue it is meant to address).

So, we cannot look for literals only - while it was part of the solution,
it was very much incomplete. Bearing in mind, there is considerable amount
of existing code and tutorials out there which include integers in their
SQL/HTML/CLI/etc, and they are perfectly safe in doing so. Making a
solution which does not support integers is not going to be adopted/used
because the task of rewriting and changing everything, for no benefit, will
not be considered by developers.

Likewise, a lot of code already builds SQL/HTML/CLI/etc strings via
concatenation and sprintf(), and forcing everyone to use a query builder is
likely to cause most people to not even consider using this.

It's all well thinking of one thing that might theoretically/idealistically
solve the issue, but it also needs to have a plan on how this will be
practically implemented and used by developers (which this has done).

Craig

4 years ago by Stephen Reay — view source — reply

unread

So I just want to make sure I understand the progression on this so far. It started out with people wanting a way to check that a string was a literal string, in code somewhere, and does not come from user input. Ok makes sense. The name makes sense too.

The primary reason was never just to define literal strings, the intention has always been to create a practical, implementable solution to address the issue of Injection Vulnerabilities (SQl/HTML/CLI/etc).

Preventing injection vulnerabilities may be your goal but I’m talking about the intended behaviour of this one function. Your original email says this:

Distinguishing strings from a trusted developer from strings that may be attacker controlled

If you feel that somehow doesn’t mean the same as "check that a string was a literal string, in code somewhere, and does not come from user input”, then we need to crack open a dictionary and work out which words one of us doesn’t know the meaning of.

The name is_literal() has always just been a placeholder, it came up when I first started looking at this problem because that was the most obvious thing I knew we could anchor around. (Unfortunately I think it was easy to make assumptions based solely on that name, rather than focussing on the issue it is meant to address).

So, we cannot look for literals only - while it was part of the solution, it was very much incomplete. Bearing in mind, there is considerable amount of existing code and tutorials out there which include integers in their SQL/HTML/CLI/etc, and they are perfectly safe in doing so. Making a solution which does not support integers is not going to be adopted/used because the task of rewriting and changing everything, for no benefit, will not be considered by developers.

There is a considerable amount of existing code that includes strings in SQL, HTML without danger too. Plenty of string values are fine, and plenty of integer values are fine. That doesn’t mean we should just blindly trust a value that came from the user, just because it’s a number.
The saying goes “never trust user input” not “never trust user input unless it’s a number”.

Likewise, a lot of code already builds SQL/HTML/CLI/etc strings via concatenation and sprintf(), and forcing everyone to use a query builder is likely to cause most people to not even consider using this.

If they won’t adopt an existing solution to the problem why would they adopt this?
You’ve said very recently that this is not intended to be used directly by most developers, and instead used within libraries and frameworks. It seems a little weird to then make concessions that will defeat the stated goal, in the name of adoption.

It's all well thinking of one thing that might theoretically/idealistically solve the issue, but it also needs to have a plan on how this will be practically implemented and used by developers (which this has done).

Having a plan for how to implement something doesn’t help much when the thing you’re implementing deliberately ignores a specific type of ‘untrusted’ input..

4 years ago by Craig Francis — view source — reply

unread

On Tue, 22 Jun 2021 at 3:05 pm, Stephen Reay php-lists@koalephant.com
wrote:

On Tue, 22 Jun 2021 at 09:59, Stephen Reay php-lists@koalephant.com
wrote:

So I just want to make sure I understand the progression on this so far.
It started out with people wanting a way to check that a string was a
literal string, in code somewhere, and does not come from user input. Ok
makes sense. The name makes sense too.

The primary reason was never just to define literal strings, the intention
has always been to create a practical, implementable solution to address
the issue of Injection Vulnerabilities (SQl/HTML/CLI/etc).

Preventing injection vulnerabilities may be your goal but I’m talking
about the intended behaviour of this one function.

I understand this is not the one true perfect solution, but that solution
does not exist without tanking performance and never being accepted solely
on that basis alone.

If you can point me to an example where including integers in this has
introduced a security vulnerability then please do, and I mean it, that’s
what this process is for, I genuinely want people to come forward with them
so we can refine this.

The only thing that you could suggest as an alternative is to not include
integers at all. That is not realistic for the reasons I’ve stated not just
here, but in replies to others, and the RFC itself. So if we can’t do that,
and we can’t mark developer-integers for reasons mentioned before, what
else is there? That we just don’t do it??

Torpedoing the idea because it is not 100% idealistically perfect, and
ignoring the fact that it, (as it stands right now with integers and all),
Prevents Injection Vulnerabilities - the most common security
vulnerability that dominates the top of the leaderboards again and again -
and does that without adding in any new security vulnerabilities, is just
nonsensical.

Craig

4 years ago by Stephen Reay — view source — reply

unread

If you can point me to an example where including integers in this has
introduced a security vulnerability then please do, and I mean it, that’s
what this process is for, I genuinely want people to come forward with them
so we can refine this.

It took me about a minute to think of this:

"select * from customer_purchases where {$column} = :value”.

The developer inadvertently passes the same “trusted value” in as the $column substitute and the value parameter. It must be safe because we ran it through is_trusted!

The query now executes as:

"select * from customer_purchases where 12345 = 12345”

You cannot magically make all dynamically generated queries safe - they tried that about a quarter of a century ago. Hint: it did not end well - and explicitly allowing some user input is just mind boggling given the stated goals.

4 years ago by lauri.kentta@gmail.com — view source — reply

unread

On 22 Jun 2021, at 21:38, Craig Francis craig@craigfrancis.co.uk
wrote:

If you can point me to an example where including integers in this has
introduced a security vulnerability then please do, and I mean it,
that’s
what this process is for, I genuinely want people to come forward with
them
so we can refine this.

It took me about a minute to think of this:

"select * from customer_purchases where {$column} = :value”.

The developer inadvertently passes the same “trusted value” in as the
$column substitute and the value parameter. It must be safe because
we ran it through is_trusted!

The query now executes as:

"select * from customer_purchases where 12345 = 12345”

You cannot magically make all dynamically generated queries safe -
they tried that about a quarter of a century ago. Hint: it did not end
well - and explicitly allowing some user input is just mind boggling
given the stated goals.

Your example is interesting and kind of valid. Looks like there is a bug
in the code: $column is sometimes an user-defined integer and sometimes
a valid literal column name, clearly this should not happen. (If it was
supposed to be integer, you would pass it as a parameter just like
:value, right?)

Is this RFC about preventing bugs (accidentally used wrong variable) or
preventing bad code (user input intentional but used the wrong way)? I
thought it was more the about bad code. Bugs can live even in literal
queries as well as outside queries, where is_literal/is_trusted can't
reach them.

Another line of thought: One possible approach would be to accept only
explicit integer casts (sprintf %d and %u, and a new function like
implode_ints() and/or strval_int()) and otherwise only accept strings.
This would avoid the accidental case where $x is supposed to be a
trusted string but is an untrusted integer instead, like the given
example.

--
Lauri Kenttä

4 years ago by Mike Schinkel — view source — reply

unread

The name "is_trusted" is misleading.
Literal is nothing but literal.

I agree with this. The name is_trusted is going to be the same naming
mistake as "safe mode" was. Developers will put their trust in it that it
is 100% guaranteed safe.

FWIW, agreed, too. Trusted is vague and may imply some false sense of
security. Literal is literally what it says on the tin.

I proposed the name change originally.

For my inspiration take a look at Trusted Types API in Javascript:

https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API

And also Trusted Types API from the W3C

https://w3c.github.io/webappsec-trusted-types/dist/spec/ https://w3c.github.io/webappsec-trusted-types/dist/spec/

The reason is_trusted() would be better than is_literal() is so that the function could in the future handle other trusted types that are definitely not literal and allow most code that using is_trusted() to continue to work.

So, for example, in the future we could add a new TrustedInterface and if an object implemented the trusted interface and a __ToString() method it could be considered trusted by code that uses the is_trusted() to guard against untrusted code.

Alternately if in the future PHP added a TrustedAttribute we could use on the __ToString() method to declare an object that could be used when a trusted string is required.

Conversely, I am strongly opposed to is_literal() as proposed. The current RFC proposes half a solution without first envisioning what the other half of the solution will look like.

If an is_literal() RFC were passed I would envision the following things happening:

Bloggers would start blogging about how all PHP users should use is_literal() when accepting SQL and HTML code, but won't provide any of the nuances about special-cases because many won't understand them, they will just see a new PHP feature to hype.
PHP listing tools will jump on is_literal() and champion its use. Just look at Psalm.
Coding standards teams in large organizations will see that the listing tools are flagging non-literals and wil start requiring all SQL and HTML etc. code to be literals. And since they already bar the use of eval() there will be no way to work around the requirement when you actually know what you are doing.

And because they are in large organizations, they will be divorced from the developers in the trenches who 95% of the time will have no problem with the requirement, but 5% of the time will simply not be able to achieve their development needs. Unfortunately these trench developers will not have enough power in the organization to get the coding standards teams to even list to their issues, especially if they are an agency or contractor.

Library developers will similarly adopt is_literal() as a requirement to use their library. While the most popular ones will likely be enlightened enough they will address edge cases. many in the long tail will not. So lots of the long-tail PHP libraries out there — and especially the ones only needed by a small audience so there won't be alternatives — will not address exceptions. And for any of many reasons, many of those library developers will not be responsive to those who submit PRs to provide workarounds.

THIS is the future I do not want to see, and if is_trusted() is renamed back to is_literal() this is the future I predict. Maybe it won't happen, but I personally do not want to risk it.

The biggest problem with the RFC is it does not address how to bypass someone else's requirement for use of is_literal() when you know what you are doing and you absolutely need to. Craig and I have had several long discussions about this, and he said he did not want to allow that because someone might misuse it (Craig, if I am paraphrasing incorrectly, please do correct me here.)

And I completely understand Craig's misgivings, but then he would not address mine. His only solution to get around the special-case needs that will exist is to use eval(). which is a lot like saying: "So you are in pain? Why not try Meth?"

Craig shared this talk from Google which inspired him:

https://www.youtube.com/watch?v=ccfEu-Jj0as https://www.youtube.com/watch?v=ccfEu-Jj0as

I would highly recommend watching this before voting on Craig's RFC.

The speaker talks about how they used "CompileTimeConstants" to reduce security bugs, BUT:

They used a @CompileTimeConstant attribute and a static checker to guard against unsafe strings instead of a function baked into Java. So convention and process over language enforcement.

https://errorprone.info/bugpattern/CompileTimeConstant https://errorprone.info/bugpattern/CompileTimeConstant
https://github.com/google/error-prone/blob/master/annotations/src/main/java/com/google/errorprone/annotations/CompileTimeConstant.java https://github.com/google/error-prone/blob/master/annotations/src/main/java/com/google/errorprone/annotations/CompileTimeConstant.java

They developed APIs that allow packaging data to be used in place of literals. See qb.setParameter() at 9:45.
They did this for code in Google's mono-repo, so they have control over how people will be using it, and they have the ability to address special cases. If PHP adds is_literal() the cat will be out the bag and we won't be able to address special cases easily.
Even Craig mentioned to me that a literal cannot be trusted in all contexts. An HTML literal passed to SQL is not what SQL wants, and might in some rare case being able to bring down a DB. If so, how then to do we quickly fix PHP so that can't happen? Adding a false sense of security does not seem like a good idea.
Craig complained about escaping because it can't be used universally without understanding use-case. So maybe what we need instead are use-case specific tools that can sanitize use-case specific string content? SqlSanitizer, HtmlSanitizer, etc? And/or templating systems built into PHP that can handle escaping (I'm not necessarily suggesting these, but do think they should be considered.)
At ~35:45 the speaker states it is their responsibility as API vendors to sanitize and escape strings. He also said most data is user-supplied data and not literals. So he is saying you MUST allow for user-supplied data, but is_literal() does not allow for that.
They implement a whitelist of exceptions, because they were Google they based the whitelist on their internal projects. But PHP cannot do that with is_literal() because it does not address exceptions.
He said you need a "lightweight approval process for exceptions" at around ~40:00. Again, including is_literal() results in an extremely heavyweight process for user's exceptions.
He mentions "Design for Reviewability" around 44:00; if there were a make_literal() it would be very easy to find places that need to be reviewed for literal exceptions.

So, IF we add is_trusted() we can plan to add other trusted data types later.

If we add is_literal(), we should map out all the issues in advance and make sure we know that workable solutions at least exist. In addition we will need some way to create data that can pass as literal (but then it's not a literal, hence why "is_trusted" makes more sense.) Google did it with a method attribute. Maybe PHP should too?

In summary, is_literal() is a hammer and people will us it to treat everything like a nail. At least is_trusted() has the future potential to be treated like a screwdriver.

-Mike
P.S. If I had my druthers is_trusted() would not pass either, and we'd hash out the full vision for security before we go implementing something that could paint us into a corner forever, or worse one that would create a security hole that will be hard to fix.

4 years ago by Rowan Tommins — view source — reply

unread

For my inspiration take a look at Trusted Types API in Javascript:

https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API
https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API

There is an extremely important difference here: there is no single type in that system called "TrustedString", there are separate types for each context ("injection sink"). The W3C article calls this out explicitly:

Note: This allows the authors to specify the intention when creating a given value, and the user agents to introduce checks based on the type of such value to preserve the authors' intent. For example, if authors intend a value to be used as an HTML snippet, an attempt to load a script from that value would fail.

This is not just an implementation detail, it's an absolutely essential part of the concept.

So let me add my name to the chorus saying that is_trusted() is a bad name, and the added features that led to its selection make this feature worse not better.

Most of a "trusted types" implementation can be written in pure PHP, because all you need is an object with a private string property and some appropriate constructors.

The one part you can't do is trust strings provided in source differently from strings provided by the user, and the original proposal provided a straightforward mechanism for that purpose.

There is no reason why is_literal itself needs to know about "trusted value objects", or have a long list of special cases to construct a string that is "not actually a literal but we can't think of any way to exploit it". That can all be handled by the userland code:

Create a class called TrustedSql
Accept a parameter of type string|TrustedSql. If the parameter is a string, reject it if is_literal returns false
Or, if a pseudo-type is provided as well, just accept literal_string|TrustedSql
If the user wants to provide dynamic SQL, they need to construct a TrustedSql object using whatever mechanism the library wants to provide. That can include an audited sprintf pattern, imploding arrays of integers, whatever turns out to be useful.

I think the engine should leave the complexities of defining "trusted" to APIs specialising in a particular "injection sink", and provide the low-level building block they need, which is the original simple is_literal function.

Regards,

--
Rowan Tommins
[IMSoP]

4 years ago by Levi Morrison via internals — view source — reply

unread

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

To recap,

We have chosen the name is_trusted(), based 18 votes for, vs 3 against.

I guess I missed the vote on this one; I'm definitely against the name change.

4 years ago by lauri.kentta@gmail.com — view source — reply

unread

Integers are now included, which will help adoption:

https://wiki.php.net/rfc/is_literal

Thanks for the great improvements!

sprintf seems to have some issues, though.

Currently you're checking the parameter types, not the formats.
The parameter type matters only when coercing to a string (%s).
Otherwise sprintf should consider the format, not the parameter.

Example:
<?php
function test($s) { var_dump($s, is_trusted($s)); }
setlocale(LC_ALL, "de_DE.UTF-8");
test(sprintf("SET c = %c, f = %f, e = %e", 0x27, 1234, 1234));
test(sprintf("SET d = %d, x = %x, b = %b", 1e2, 1e2, 1e2));
test(sprintf("SET weird_d = %''*d", 4, 1));
test(sprintf("SET s = '%s', int to string should be ok", 123));
?>

Currently:
string(43) "SET c = ', f = 1234,000000, e = 1.234000e+3"
bool(true)
string(32) "SET d = 100, x = 64, b = 1100100"
bool(false)
string(18) "SET weird_d = '''1"
bool(true)
string(41) "SET s = '123', int to string should be ok"
bool(true)

Obviously the results with ints and floats should be the opposite.

If you really want to allow %c, so be it, but I'd disallow it on the
grounds that (1) it's probably not used in secure strings (usage data,
anyone?), and thus (2) it could easily be a misspelled %d (for example,
'%c' instead of '%d' could silently produce an empty result in a query),
and (3) you're allowing a simple workaround with %s and chr() which
makes the intent more obvious.

In general, as this is supposed to be a security feature, allowing
multiple ways for uninformed people to produce "trusted" but actually
very unsafe values doesn't look like a good idea. I'm not sure if
allowing trusted single characters to be created through chr or %c
serves any useful purpose, but I can imagine people using either one
without realizing that they can create any character, including \0 or '
or " or non-UTF-8. Better to leave only chr(), one less thing to worry
about.

Custom padding is a weird edge case, maybe just disallow that too?

As you said yourself, it's not easy to prove anything safe. ;)

--
Lauri Kenttä

4 years ago by Hans Henrik Bergan — view source — reply

unread

How is one supposed to use this? like

if(!is_trusted($val)){
$val = htmlentities($str, ENT_QUOTES | ENT_HTML401 | ENT_SUBSTITUTE |
ENT_DISALLOWED, 'UTF-8', true);
}
echo "<div>$val</div>";
(...)
if(!is_trusted($val)){
$val = $mysqli->real_escape_string($val);
}
$mysqli->query("INSERT INTO tbl VALUES('$val');");

like that? (my first impression is that this sounds stupid, but i haven't
given it enough thought to be sure)

Integers are now included, which will help adoption:

https://wiki.php.net/rfc/is_literal

Thanks for the great improvements!

sprintf seems to have some issues, though.

Currently you're checking the parameter types, not the formats.
The parameter type matters only when coercing to a string (%s).
Otherwise sprintf should consider the format, not the parameter.

Example:
<?php
function test($s) { var_dump($s, is_trusted($s)); }
setlocale(LC_ALL, "de_DE.UTF-8");
test(sprintf("SET c = %c, f = %f, e = %e", 0x27, 1234, 1234));
test(sprintf("SET d = %d, x = %x, b = %b", 1e2, 1e2, 1e2));
test(sprintf("SET weird_d = %''*d", 4, 1));
test(sprintf("SET s = '%s', int to string should be ok", 123));
?>

Currently:
string(43) "SET c = ', f = 1234,000000, e = 1.234000e+3"
bool(true)
string(32) "SET d = 100, x = 64, b = 1100100"
bool(false)
string(18) "SET weird_d = '''1"
bool(true)
string(41) "SET s = '123', int to string should be ok"
bool(true)

Obviously the results with ints and floats should be the opposite.

If you really want to allow %c, so be it, but I'd disallow it on the
grounds that (1) it's probably not used in secure strings (usage data,
anyone?), and thus (2) it could easily be a misspelled %d (for example,
'%c' instead of '%d' could silently produce an empty result in a query),
and (3) you're allowing a simple workaround with %s and chr() which
makes the intent more obvious.

In general, as this is supposed to be a security feature, allowing
multiple ways for uninformed people to produce "trusted" but actually
very unsafe values doesn't look like a good idea. I'm not sure if
allowing trusted single characters to be created through chr or %c
serves any useful purpose, but I can imagine people using either one
without realizing that they can create any character, including \0 or '
or " or non-UTF-8. Better to leave only chr(), one less thing to worry
about.

Custom padding is a weird edge case, maybe just disallow that too?

As you said yourself, it's not easy to prove anything safe. ;)

--
Lauri Kenttä

--

To unsubscribe, visit: https://www.php.net/unsub.php

4 years ago by Craig Francis — view source — reply

unread

On Tue, 22 Jun 2021 at 11:31 am, Hans Henrik Bergan divinity76@gmail.com
wrote:

How is one supposed to use this? like

if(!is_trusted($val)){
$val = htmlentities($str, ENT_QUOTES | ENT_HTML401 | ENT_SUBSTITUTE |
ENT_DISALLOWED, 'UTF-8', true);
}
echo "<div>$val</div>";

No, if anything that’s the opposite, and almost Taint Checking.

While this is covered in the RFC (https://wiki.php.net/rfc/is_literal) and
will be best read in context, in summary:

The developer does not use this function, instead you rely on libraries to
do that work for you. In this case you would use a HTML Templating Library
(which knows about all the complexities of HTML encoding), and you simply
provide the trusted string ‘<div>?</div>‘ and the values separately.

The Libraries will then use is_trusted(), with something like this:

https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4

Craig

4 years ago by Pierre Joye — view source — reply

unread

Hello Craig,

Very well written RFC, good job!

Others have said it already, but here are my thoughts. Many moons ago,
I was on this way as well and the filter extension came out. As it
fits for some projects, the actual gains were very far, to say the
least, from what I would have expected.

Since quite some time, and that thinking is intensified with the
support of native annotation, is that such a thing does not fit into
the core language(s). It will never be "trustable" 100%, which defeats
its main purpose or existance.

The large majority of apps or frameworks out there provide their own
interface to deal with external data input. Advanced ones like Symfony
can provide you an Entity where a parameter is the ID of that entity,
handling all safety checks. Others will provide a Request
implementation with getters as specific types, etc. This allows it to
be tightly linked to the actual usage or logic. It is impossible to
even agree on such an interface in the core.

As well intended as it looks, I think input data filtering is better
implemented in userland. And we may keep ourselves from reintroducing
trusted or safe mode, no matter where. I hope I don't sound too
negative, I am really convinced this is a bad idea and introducing it
again in 8.x will hunt the core for a decade to come. :)

Best,

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

To recap,

We have chosen the name is_trusted(), based 18 votes for, vs 3 against.

Integers are now included, which will help adoption:

https://wiki.php.net/rfc/is_literal

(Joe’s currently updating the implementation to have the new name, but all
the functionality is there).

I’m glad this RFC has been well received; and thank you for all the
feedback, I really think it‘s benefitting the implementation.

Craig

--
Pierre

@pierrejoye | http://www.libgd.org

[RFC] is_trusted - was is_literal

-- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net]

--
Aleksander Machniak
Kolab Groupware Developer [https://kolab.org]
Roundcube Webmail Developer [https://roundcube.net]