[RFC] Name issue - is_literal/is_trusted

4 years ago by Mel Dafert — view source

unread

is_trusted() - suggested by Moritz and separately by Mike, was the
second option in the original vote, and was based on the idea that what is
returned can be 'trusted' to be free from external code.

Another idea - is_internal(), since it is not external code, and internal would be the
opposite of external.
Not quite sure how helpful/realistic this one is, but I wanted to bring it up either
way.

4 years ago by Craig Francis — view source

unread

Another idea - is_internal(), since it is not external code, and
internal would be the
opposite of external.

Unfortunately, because we cannot record internal vs external integers (too
big of a change to how integers are stored), we are currently allowing all
integers, as that helps adoption, without affecting security.

Craig

4 years ago by Derick Rethans — view source

unread

Hi Internals,

As the name is_trusted() seems to be causing contention, I think more
than the alternative option would. Since we want to get this right, and
we
still have time before the feature freeze, this might be worth
re-looking
at. Particularly if you are one of those unsure about it, read on.

Just changing the name to something not misleading isn't going to change my opinion about voting for this. It's grown into a concoction of different things than the original is_literal, that just checks a single string for its literalness.

cheers,
Derick

4 years ago by Kamil Tekiela — view source

unread

FWIW, I would prefer is_literal, but without integers in scope. Modern code
is often type hinted. I would expect that a lot of libraries would accept
only strings and then check whether it's a literal string. I don't think
accepting integers in scope increases security or improves ease of use.
However, I am more concerned about other things that have been mentioned in
this thread and the overall disagreement. I'd rather not rush this change.
Also, I am concerned about the performance of applications that are heavy
on string processing. Although I haven't tested this yet myself.

--Kamil

4 years ago by Guilliam Xavier — view source

unread

On Tue, Jun 22, 2021 at 8:11 PM Craig Francis craig@craigfrancis.co.uk
wrote:

The Function:

Is a security-based function that prevents Injection Vulnerabilities in
PHP.

Flags strings written by the developer, including when concatenated.

Also accepts integer values, which as purely numerical cannot contain
code/dangerous characters. (Due to technical limitations within PHP, it's
not possible for these to be flagged as user or developer in the codebase
itself without performance issues).

is_safe_from_injections()?
is_secure_against_injections()?
can_be_trusted_to_not_contain_injection_vulnerabilities()? (okay not
this one)

Alternatively, if integers are too controversial, how about reverting the
implementation to is_literal() but provide a function like
to_literal(int $int): string (or just a "polyfill" for userland, could be
a one-liner implode(array_map(fn ($c) => ['0','1','2','3','4','5','6','7','8','9','-'=>'-'][$c], str_split((string)$int)))), so that those implode(',', [1,2,3]) could
use implode(',', array_map('to_literal', [1,2,3]))?

Regards,

--
Guilliam Xavier

4 years ago by Craig Francis — view source

unread

On Wed, 23 Jun 2021 at 11:27 am, Guilliam Xavier guilliam.xavier@gmail.com
wrote:

Alternatively, if integers are too controversial, how about reverting the
implementation to is_literal()

Starting to look like that, yeah.

4 years ago by Hossein Baghayi — view source

unread

Hello,
What about is_vulnerable? Its behaviour would be the inverse of
is_literal.
I mean we don't have to avoid the other side of the coin.

On Tue, 22 Jun 2021 at 22:41, Craig Francis craig@craigfrancis.co.uk
wrote:

Hi Internals,

As the name is_trusted() seems to be causing contention, I think more
than the alternative option would. Since we want to get this right, and we
still have time before the feature freeze, this might be worth re-looking
at. Particularly if you are one of those unsure about it, read on.

The name is_trusted() was chosen by a community vote over several days.
While I’m of a similar opinion that "trusted" might be misleading for some
in the strength of its word, I do not want to simply override 18 of the 21
people, who I assume read the RFC, asked questions to clarify on the
mailing list, understood how it works, and have chosen that name.

However, clearly some people missed the vote and its discussion time, and
some voted but then perhaps wanted clarifying on what the RFC was fully
about later. If we say that's about five people, then assuming there is a
larger audience who reads but does not post (as the voting numbers
indicated) then I'm inclined to guesstimate that maybe that means 3x the
number of people share those feelings. And with that number it starts to
feel like maybe we should double-check here.

While a one-word name is always going to be misunderstood by some people,
we want to be as clear as possible.

The Function:

Is a security-based function that prevents Injection Vulnerabilities in
PHP.

Flags strings written by the developer, including when concatenated.

Also accepts integer values, which as purely numerical cannot contain
code/dangerous characters. (Due to technical limitations within PHP, it's
not possible for these to be flagged as user or developer in the codebase
itself without performance issues).

(RFC for full breakdown: https://wiki.php.net/rfc/is_literal)

Ideally we want a one-word name that suggests this as best we can - one
word to be consistent with other is_*() functions.

is_literal() was my original placeholder name. However, given that it's
not just literal strings, but also supports concatenation and integers I
felt it may be misleading with the definition of 'literal' stretched so far
it might get confusing, and is why I didn't include my original name for it
in the poll. However, if you feel it would be more accurate my mind isn't
fixed on it.

is_known() - suggested by Joe, who created the implementation, was one
of two options in the original vote, and was based on the principle that
the value be 'known' to the developer to be free from external code and be
within a 'known' understanding of the values that should be going in it.

is_trusted() - suggested by Moritz and separately by Mike, was the
second option in the original vote, and was based on the idea that what is
returned can be 'trusted' to be free from external code.

I suggest that people who are serious in their feelings about this, offer
the name that they would prefer (including potentially making one
themselves that fits the RFC content and style mentioned above) so we can
assess whether the current name needs a second look.

Thanks,
Craig

4 years ago by Larry Garfield — view source

unread

Hello,
What about is_vulnerable? Its behaviour would be the inverse of
is_literal.
I mean we don't have to avoid the other side of the coin.

That has the same core problem as is_trusted. It's making a claim about the probable security status of a value, which I promise you, you will not get right 100% of the time. is_literal, is asserting only that the value came from the source code originally, not from user input. That is something you can assert one way or another and be guaranteed correct. What the implications are for what you can then do with it are an entirely separate, and highly squishy and use-case-specific, question.

I'm still very torn on is_literal; I fear that the people who would benefit from it are the very people that don't use the tools that would leverage it (DBALs et al), and so the net benefit will be small, but misuse of it will make DBALs weaker and less able to handle the highly-dynamic cases that I am used to working with. I may be convinced of is_literal if the major DBAL authors back it, but I'm still not sure.

I am definitely -1 on is_trusted or any other claim of fit-for-purpose, rather than a claim of origin. That's guaranteed to be incorrect often enough that it makes things worse rather than better.

--Larry Garfield

4 years ago by Craig Francis — view source

unread

I'm still very torn on is_literal; I fear that the people who would
benefit from it are the very people that don't use the tools that would
leverage it (DBALs et al), and so the net benefit will be small.

This RFC will not help those who aren’t using libraries (DBALs), that’s
something we could look at in the future (I have a suggestion in the Future
Scope section, but whatever that involves, it will need to use this flag,
so it would need to be in place first).

But - and why I’m here - my experience is that it’s still a big issue for
those who do use libraries. It frequently comes up at software
agencies/companies that maintain their code (built with free
libraries/frameworks), and employ junior developers (i.e. the cheapest),
who make many "quick edits" (time is money), and in doing so introduce the
issues the RFC covers (and not to say we more experienced coders don’t
occasionally make mistakes too!). While non-library users are the main
cause, the library users are still a big part of why Injection
Vulnerabilities remain at the top of the OWASP Top 10.

Craig

4 years ago by Scott Arciszewski — view source

unread

I'm still very torn on is_literal; I fear that the people who would
benefit from it are the very people that don't use the tools that would
leverage it (DBALs et al), and so the net benefit will be small.

This RFC will not help those who aren’t using libraries (DBALs), that’s
something we could look at in the future (I have a suggestion in the Future
Scope section, but whatever that involves, it will need to use this flag,
so it would need to be in place first).

But - and why I’m here - my experience is that it’s still a big issue for
those who do use libraries. It frequently comes up at software
agencies/companies that maintain their code (built with free
libraries/frameworks), and employ junior developers (i.e. the cheapest),
who make many "quick edits" (time is money), and in doing so introduce the
issues the RFC covers (and not to say we more experienced coders don’t
occasionally make mistakes too!). While non-library users are the main
cause, the library users are still a big part of why Injection
Vulnerabilities remain at the top of the OWASP Top 10.

Craig

Hi,

I was asked for my thoughts on this RFC (and its naming) from a
security engineering perspective.

The old is_literal() isn't correct if it includes non-string values
(i.e. integers).
The new is_trusted() is potentially misleading, especially to people
who don't read the docs.

My knee-jerk reaction was simply, "Why not is_untainted()?" but that
invokes the imagery of taint-checking, which the RFC explicitly
doesn't implement. A better name might be is_noble(), where we get to
define the concept of noble inputs (name inspired by the Noble gases
from Chemistry).

The main reason I don't like is_trusted() is that everyone's threat
model and risk tolerance is different, and trust is too nebulous a
concept for a built-in function. But also, it's really easy to jump
the guard-rail: https://3v4l.org/4GM8Q#focus=rfc.literals

One concern that Joe Watkins asked about is: Is it reasonable to cover
both integers and strings that are not influenceable by potential
external attackers (n.b. ones that can't already overwrite your source
code)? Outside the chr()/pack()/sprintf()/etc. technique demonstrated
above, there's no risk of injection inherent to concatenating a
trusted string with an untrusted integer.

Injection attacks (SQL injection, LDAP injection, XSS, etc.) are, at
their core, an instance of type confusion between data and code. In
order for the injection to do anything, it needs to be in the same
input domain as the language the code is written in. Try as you might,
there is no integer that will, upon concatenation with a string,
produce a control character for HTML (i.e. >) or SQL (i.e. ').

Therefore, if the concern is Injection attacks, integer inputs do not
need to be tracked to provide a security gain.

This is only true for integers, not all numeric types. I haven't
investigated the safety of floats in every possible context, and the
e, +, and . characters could be problematic in corner cases.

TL;DR

Why not is_noble()?
String + int concatenation isn't an injection risk.

Cheers,
Scott

4 years ago by Bruce Weirdan — view source

unread

String + int concatenation isn't an injection risk.

I think this demonstrates it very well could be:
https://externals.io/message/114988#115038

--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com

4 years ago by Craig Francis — view source

unread

String + int concatenation isn't an injection risk.

I think this demonstrates it very well could be:
https://externals.io/message/114988#115038

That’s the developer choosing to use a variable, and it’s no different than
the developer using a library to add the value via proper quoting/escaping.

Craig

4 years ago by Scott Arciszewski — view source

unread

String + int concatenation isn't an injection risk.

I think this demonstrates it very well could be:
https://externals.io/message/114988#115038

--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com

Respectfully, the example you linked is not an example of an
Injection vulnerability. The failure condition of this query is
"return all rows from the table already being queried", not "return
arbitrary data the attacker selects from any table that the
application can read".

Being able to arbitrarily select a column is a bad design (and you
should feel bad, as per the meme, if you let this happen in
production), but it differs from Injection vulnerabilities in one
fundamental way: The attacker cannot change the structure of the SQL
query being executed.

Here's an example of an injection vulnerability:

$pdo->prepare("SELECT b, c, d, e FROM table WHERE a = '$foo'");

If you set $foo to ' UNION SELECT NULL, NULL, NULL, pwhash FROM accounts WHERE username = 'Admin, you'll leak contents from another
table in the SQL result. This is the danger posted by
string-to-string concatenation, and what we mean by SQL Injection.

This doesn't have to stop all dumb things that PHP developers can do.
It's enough to only stop the catastrophically dumb things (especially
if we don't call the function is_trusted()).

You can still invent scenarios where int-to-string concatenation
results in buggy behavior, but it isn't the game-over security
vulnerability that string-to-string concatenation is. And that's the
entire point.

4 years ago by Bruce Weirdan — view source

unread

The failure condition of this query is
"return all rows from the table already being queried", not "return
arbitrary data the attacker selects from any table that the
application can read".

Imagine that was a DELETE rather than SELECT and the failure condition
becomes 'the table is emptied'.
It may have less disastrous consequences (depending on how good your
backup / restore procedures are) compared to arbitrary reads you
demonstrated, but it is still, quite clearly, a glaring security hole
caused by user input in SQL query - AKA SQL injection in layman's
terms.

it differs from Injection vulnerabilities in one
fundamental way: The attacker cannot change the structure of the SQL
query being executed.

I would say replacing a column name with value is changing the
structure of SQL query, and, basically, in exactly the way you
describe SQL injection: confusing the code (column name) with data.

I wholeheartedly welcome this RFC as it was originally proposed:
is_literal() doing exactly what it says on the tin, without any
security claims. But it has gone far from there real quick and now
people can't even name the thing.

--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com

4 years ago by Scott Arciszewski — view source

unread

On Thu, Jun 24, 2021 at 3:41 AM Scott Arciszewski scott@paragonie.com
wrote:

The failure condition of this query is
"return all rows from the table already being queried", not "return
arbitrary data the attacker selects from any table that the
application can read".

Imagine that was a DELETE rather than SELECT and the failure condition
becomes 'the table is emptied'.
It may have less disastrous consequences (depending on how good your
backup / restore procedures are) compared to arbitrary reads you
demonstrated, but it is still, quite clearly, a glaring security hole
caused by user input in SQL query - AKA SQL injection in layman's
terms.

it differs from Injection vulnerabilities in one
fundamental way: The attacker cannot change the structure of the SQL
query being executed.

I would say replacing a column name with value is changing the
structure of SQL query, and, basically, in exactly the way you
describe SQL injection: confusing the code (column name) with data.

I wholeheartedly welcome this RFC as it was originally proposed:
is_literal() doing exactly what it says on the tin, without any
security claims. But it has gone far from there real quick and now
people can't even name the thing.

--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com

We can agree that it is a bug. We don't agree on the definition of SQL
injection.

Changing a column name to a number (which prepared statements shouldn't
allow in the first place) is a bug. This changes the effect of the command,
but the structure of the query remains unchanged.

4 years ago by Stephen Reay — view source

unread

On Thu, Jun 24, 2021 at 3:41 AM Scott Arciszewski scott@paragonie.com
wrote:

The failure condition of this query is
"return all rows from the table already being queried", not "return
arbitrary data the attacker selects from any table that the
application can read".

Imagine that was a DELETE rather than SELECT and the failure condition
becomes 'the table is emptied'.
It may have less disastrous consequences (depending on how good your
backup / restore procedures are) compared to arbitrary reads you
demonstrated, but it is still, quite clearly, a glaring security hole
caused by user input in SQL query - AKA SQL injection in layman's
terms.

it differs from Injection vulnerabilities in one
fundamental way: The attacker cannot change the structure of the SQL
query being executed.

I would say replacing a column name with value is changing the
structure of SQL query, and, basically, in exactly the way you
describe SQL injection: confusing the code (column name) with data.

I wholeheartedly welcome this RFC as it was originally proposed:
is_literal() doing exactly what it says on the tin, without any
security claims. But it has gone far from there real quick and now
people can't even name the thing.

--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com

We can agree that it is a bug. We don't agree on the definition of SQL
injection.

Changing a column name to a number (which prepared statements shouldn't
allow in the first place) is a bug. This changes the effect of the command,
but the structure of the query remains unchanged.

Hi Scott,

I wrote that example where an integer could be dangerous.

So firstly - just to clarify, because some replies seemed to be confused on the topic, as was literally mentioned in the original comment, it is definitely not correct behaviour - it’s a developer mistake that might work some of the time, and thus go unnoticed in testing. If you can show me a developer who’s never inadvertently passed the wrong parameter in some condition, I’ll show you an imaginary developer.

Additionally - pointing out that this is a "developer error” doesn’t help your case. Using non-parameterised queries should already be a “developer error” for anyone who can walk and breathe at the same time - and yet that usage is being actively encouraged if this function supports integers. I’ve still seen zero responses about legitimate reasons this needs to support integers - giving people a shitty way to build an IN() clause is not legitimate. Parameterisation exists, and works.

I don’t even understand why you mentioned prepared statements (I guess you meant using parameterised queries?) - the column name inherently can’t be parameterised - hence having to use a string substitution in the query.

That part was weird and confusing, but not as odd as your claim that altering the query, so that it causes the where clause to become moot, is not an SQL Injection? REALLY? That’s your claim?

I did a little research, and it turns out Wikipedia (https://en.wikipedia.org/wiki/SQL_injection#Technical_implementations https://en.wikipedia.org/wiki/SQL_injection), Cloudflare (https://www.cloudflare.com/en-au/learning/security/threats/sql-injection/ https://www.cloudflare.com/en-au/learning/security/threats/sql-injection/), and OWASP (https://owasp.org/www-community/attacks/SQL_Injection#example-2 https://owasp.org/www-community/attacks/SQL_Injection) all have examples with a 1=1 type query manipulation. Do you want to write and tell them that they’re all wrong, or should I ask them to call you?

Also, while researching the specifics of what is considered an “SQL Injection” I came across an article, that talks specifically about the dangers of allowing user input (i.e. the thing is_trusted is meant to prevent) as a column or table identifier. It’s from this little organisation, you may have heard of them: “Paragon Initiative” (https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide).

I would absolutely make use of a function that tells me if the string given is in fact from something controlled by the developer. But once that same string can also include input from the request or the environment or whatever by nature of integers, the function becomes useless for the stated purpose.

Cheers

Stephen

4 years ago by Scott Arciszewski — view source

unread

Hi Scott,

I wrote that example where an integer could be dangerous.

I don't disagree that it's an example where an integer could be dangerous.

Danger is too broad to have a meaningful discussion about. You can, of
course, always do dangerous things if you're determined or creative
enough.

So firstly - just to clarify, because some replies seemed to be confused on the topic, as was literally mentioned in the original comment, it is definitely not correct behaviour - it’s a developer mistake that might work some of the time, and thus go unnoticed in testing. If you can show me a developer who’s never inadvertently passed the wrong parameter in some condition, I’ll show you an imaginary developer.

Agreed.

Additionally - pointing out that this is a "developer error” doesn’t help your case. Using non-parameterised queries should already be a “developer error” for anyone who can walk and breathe at the same time - and yet that usage is being actively encouraged if this function supports integers. I’ve still seen zero responses about legitimate reasons this needs to support integers - giving people a shitty way to build an IN() clause is not legitimate. Parameterisation exists, and works.

I don't have a strong opinion on this.

I don’t even understand why you mentioned prepared statements (I guess you meant using parameterised queries?) - the column name inherently can’t be parameterised - hence having to use a string substitution in the query.

They're effectively synonyms.

That part was weird and confusing, but not as odd as your claim that altering the query, so that it causes the where clause to become moot, is not an SQL Injection? REALLY? That’s your claim?

Yes, let me try to explain this more clearly.

If you can inject code, it's a code injection vulnerability. SQL
injection is a specific type of code injection vulnerability. If you
can trick the SQL into doing something invalid without injecting
code, and you call it "SQL injection", that's a category error.

Supplying an integer in the place of a column name is a bug. The bug
can even be dangerous. The bug can EVEN be a security problem for the
system.

But calling it SQL injection is categorically incorrect. Changing the
behavior of a SQL command without injecting additional code is not
SQL injection.

If we're trying to stop all forms of misuse and insecurity that can
result from string concatenation, cool. But be very clear about that.

I did a little research, and it turns out Wikipedia (https://en.wikipedia.org/wiki/SQL_injection#Technical_implementations), Cloudflare (https://www.cloudflare.com/en-au/learning/security/threats/sql-injection/), and OWASP (https://owasp.org/www-community/attacks/SQL_Injection#example-2) all have examples with a 1=1 type query manipulation. Do you want to write and tell them that they’re all wrong, or should I ask them to call you?

If you inject a 1=1 clause where one didn't exist before, that's
injection. Notice the introduction of an OR operator in the examples
you cited.

My classification of the original example as "Not Injection" has
nothing to do with the fact that numbers are being compared with
numbers. Rather, there was no code injection.

Also, while researching the specifics of what is considered an “SQL Injection” I came across an article, that talks specifically about the dangers of allowing user input (i.e. the thing is_trusted is meant to prevent) as a column or table identifier. It’s from this little organisation, you may have heard of them: “Paragon Initiative” (https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide).

Snark is unnecessary.

I would absolutely make use of a function that tells me if the string given is in fact from something controlled by the developer. But once that same string can also include input from the request or the environment or whatever by nature of integers, the function becomes useless for the stated purpose.

Why not two functions then?

is_noble_string() -- more restrictive
is_noble() -- YOLO

Cheers

Stephen

4 years ago by Guilliam Xavier — view source

unread

On Thu, Jun 24, 2021 at 9:14 AM Scott Arciszewski scott@paragonie.com
wrote:

On Thu, Jun 24, 2021 at 2:10 AM Stephen Reay php-lists@koalephant.com
wrote:

I would absolutely make use of a function that tells me if the string
given is in fact from something controlled by the developer. But once that
same string can also include input from the request or the environment or
whatever by nature of integers, the function becomes useless for the stated
purpose.

Why not two functions then?

is_noble_string() -- more restrictive

is_noble() -- YOLO

I was going to ask basically the same [with different names] a few days ago
("why can't we have both?"), but then remembered
https://externals.io/message/114835#114951 , esp. the end:

"""
And to support having 2 functions, we would need 2 flags on strings. These
flags are limited, and managing 2 flags would affect performance.
"""

Regards,

--
Guilliam Xavier

4 years ago by Scott Arciszewski — view source

unread

On Thu, Jun 24, 2021 at 4:34 AM Guilliam Xavier
guilliam.xavier@gmail.com wrote:

I would absolutely make use of a function that tells me if the string given is in fact from something controlled by the developer. But once that same string can also include input from the request or the environment or whatever by nature of integers, the function becomes useless for the stated purpose.

Why not two functions then?

is_noble_string() -- more restrictive

is_noble() -- YOLO

I was going to ask basically the same [with different names] a few days ago ("why can't we have both?"), but then remembered https://externals.io/message/114835#114951 , esp. the end:

"""
And to support having 2 functions, we would need 2 flags on strings. These
flags are limited, and managing 2 flags would affect performance.
"""

Regards,

--
Guilliam Xavier

Thanks for the reference to that part of the discussion that I missed.

Aside:

I encourage everyone to look at EasyDB (especially EasyStatement) for
handling WHERE x IN (a, b, c, ...) statements in SQL.
https://github.com/paragonie/easydb

Additionally, Ionizer is useful for input filtering and asserting type
safety: https://github.com/paragonie/ionizer

If you're doing dynamic, on-the-fly SQL query generation (based, in
part, on user input), these are two framework-agnostic tools that can
help make your code safer against code injection and other attacks.

4 years ago by Stephen Reay — view source

unread

Hi Scott,

I wrote that example where an integer could be dangerous.

I don't disagree that it's an example where an integer could be dangerous.

Danger is too broad to have a meaningful discussion about. You can, of
course, always do dangerous things if you're determined or creative
enough.

So firstly - just to clarify, because some replies seemed to be confused on the topic, as was literally mentioned in the original comment, it is definitely not correct behaviour - it’s a developer mistake that might work some of the time, and thus go unnoticed in testing. If you can show me a developer who’s never inadvertently passed the wrong parameter in some condition, I’ll show you an imaginary developer.

Agreed.

Additionally - pointing out that this is a "developer error” doesn’t help your case. Using non-parameterised queries should already be a “developer error” for anyone who can walk and breathe at the same time - and yet that usage is being actively encouraged if this function supports integers. I’ve still seen zero responses about legitimate reasons this needs to support integers - giving people a shitty way to build an IN() clause is not legitimate. Parameterisation exists, and works.

I don't have a strong opinion on this.

I don’t even understand why you mentioned prepared statements (I guess you meant using parameterised queries?) - the column name inherently can’t be parameterised - hence having to use a string substitution in the query.

They're effectively synonyms.

I mean, they’re not - you can prepare a statement with zero parameters in it.
But my point was why you even mentioned parameterised queries OR prepared statements at all. You cannot parameterise the column name, hence the entire reason this RFC exists. The entire point is to make sure that the bits that you need to insert variables, are something the developer wrote, not some input from the user.

If that isn’t the purpose of the RFC, then it describes its actual goal really poorly.

That part was weird and confusing, but not as odd as your claim that altering the query, so that it causes the where clause to become moot, is not an SQL Injection? REALLY? That’s your claim?

Yes, let me try to explain this more clearly.

If you can inject code, it's a code injection vulnerability. SQL
injection is a specific type of code injection vulnerability. If you
can trick the SQL into doing something invalid without injecting
code, and you call it "SQL injection", that's a category error.

Supplying an integer in the place of a column name is a bug. The bug
can even be dangerous. The bug can EVEN be a security problem for the
system.

But calling it SQL injection is categorically incorrect. Changing the
behavior of a SQL command without injecting additional code is not
SQL injection.

If we're trying to stop all forms of misuse and insecurity that can
result from string concatenation, cool. But be very clear about that.

I did a little research, and it turns out Wikipedia (https://en.wikipedia.org/wiki/SQL_injection#Technical_implementations), Cloudflare (https://www.cloudflare.com/en-au/learning/security/threats/sql-injection/), and OWASP (https://owasp.org/www-community/attacks/SQL_Injection#example-2) all have examples with a 1=1 type query manipulation. Do you want to write and tell them that they’re all wrong, or should I ask them to call you?

If you inject a 1=1 clause where one didn't exist before, that's
injection. Notice the introduction of an OR operator in the examples
you cited.

Please, explain to us all, how where foo=‘bar’ OR 1=1 is functionally different than where 123=123 because the developer screwed the pooch in some specific condition and passed the ID twice, rather than the column name and the id, respectively.

My classification of the original example as "Not Injection" has
nothing to do with the fact that numbers are being compared with
numbers. Rather, there was no code injection.

Also, while researching the specifics of what is considered an “SQL Injection” I came across an article, that talks specifically about the dangers of allowing user input (i.e. the thing is_trusted is meant to prevent) as a column or table identifier. It’s from this little organisation, you may have heard of them: “Paragon Initiative” (https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide).
Snark is unnecessary.

You call it snark, I call it ironic hypocrisy.

I would absolutely make use of a function that tells me if the string given is in fact from something controlled by the developer. But once that same string can also include input from the request or the environment or whatever by nature of integers, the function becomes useless for the stated purpose.

Why not two functions then?

is_noble_string() -- more restrictive

is_noble() -- YOLO

I mean sure, if someone wants to vote on the two choices or whatever, I’ll go get the popcorn, but I still haven’t seen any answers that make sense to this question: Why integers at all, though?

Cheers

Stephen

--

To unsubscribe, visit: https://www.php.net/unsub.php

4 years ago by Scott Arciszewski — view source

unread

If you inject a 1=1 clause where one didn't exist before, that's
injection. Notice the introduction of an OR operator in the examples
you cited.

Please, explain to us all, how where foo=‘bar’ OR 1=1 is functionally different than where 123=123 because the developer screwed the pooch in some specific condition and passed the ID twice, rather than the column name and the id, respectively.

SQL injection occurs when you send specially crafted user input and it
injects SQL code.

If you aren't injecting SQL, it's not SQL injection. I don't
understand why this isn't clicking for you.

The scenario you've concocted is a Confused Deputy. Logic bugs can be
severe. Logic bugs can be sev:crit security issues! But if you aren't
injecting SQL, calling it SQL Injection is wrong.

Arguing from consequences ("SQL Injection can drop a table! This logic
bug can drop a table! Therefore they're the same thing") rather than
the specific properties of the system design failure in question ("one
injects code, the other doesn't") isn't a useful way to classify
vulnerabilities.

To reiterate:

I never claimed that it wasn't a bug.
I never claimed it wasn't impactful.
I never claimed it wasn't security-affecting.

I've simply said that this isn't an example of SQL Injection, because
nothing is being injected.

My classification of the original example as "Not Injection" has
nothing to do with the fact that numbers are being compared with
numbers. Rather, there was no code injection.

Also, while researching the specifics of what is considered an “SQL Injection” I came across an article, that talks specifically about the dangers of allowing user input (i.e. the thing is_trusted is meant to prevent) as a column or table identifier. It’s from this little organisation, you may have heard of them: “Paragon Initiative” (https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide).
Snark is unnecessary.

You call it snark, I call it ironic hypocrisy.

In my original email, I said. "Outside the chr()/pack()/sprintf()/etc.
technique demonstrated above, there's no risk of injection inherent to
concatenating a trusted string with an untrusted integer." You brought
up a high-severity logic bug that could happen if someone concatenates
variables with wild abandon, but it's still not an example of
injection.

If you're going to call someone a hypocrite (which is a needless
personal attack), take care that you're actually correct when you do
so.

4 years ago by Larry Garfield — view source

unread

I never claimed that it wasn't a bug.

I never claimed it wasn't impactful.

I never claimed it wasn't security-affecting.

I've simply said that this isn't an example of SQL Injection, because
nothing is being injected.

My classification of the original example as "Not Injection" has
nothing to do with the fact that numbers are being compared with
numbers. Rather, there was no code injection.

Also, while researching the specifics of what is considered an “SQL Injection” I came across an article, that talks specifically about the dangers of allowing user input (i.e. the thing is_trusted is meant to prevent) as a column or table identifier. It’s from this little organisation, you may have heard of them: “Paragon Initiative” (https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide).
Snark is unnecessary.

You call it snark, I call it ironic hypocrisy.

In my original email, I said. "Outside the chr()/pack()/sprintf()/etc.
technique demonstrated above, there's no risk of injection inherent to
concatenating a trusted string with an untrusted integer." You brought
up a high-severity logic bug that could happen if someone concatenates
variables with wild abandon, but it's still not an example of
injection.

If you're going to call someone a hypocrite (which is a needless
personal attack), take care that you're actually correct when you do
so.

First, I think it's best if we all stop the definitional meta-debate, it's not really helpful.

Second, FFS people, use the Reply to List feature. I've gotten double copies of the last 30 messages in this thread.

Third, the core point here is:

A literal string comes from the developer, in source.
Thus a literal string is safe against SQL security errors.
An integer cannot contain SQL syntax.
Thus a literal string concat with an integer cannot contain SQL syntax errors.
Thus a literal string concat with an integer is as safe as a literal string.
Thus we can still call a literal string contact with an integer a literal string.

That's the logic that leads to "a string literal contact an integer should still return true from is_literal()." It's the logic that most users would go through upon seeing that, and act on that conclusion.

However, it's been demonstrated that step 4 is NOT true, and a literal string contact with an int can lead to a security vulnerability (whether it's technically an SQL injection per se is not relevant).

Thus, I would argue that "literal string concat int --> literal string" is incorrect and should be removed from the RFC, as it is misleading. That it may hurt adoption is irrelevant, as creating a false sense of security is vastly worse than slower adoption.

--Larry Garfield

4 years ago by Scott Arciszewski — view source

unread

On Thu, Jun 24, 2021 at 3:41 AM Scott Arciszewski scott@paragonie.com
wrote:

The failure condition of this query is
"return all rows from the table already being queried", not "return
arbitrary data the attacker selects from any table that the
application can read".

Imagine that was a DELETE rather than SELECT and the failure condition
becomes 'the table is emptied'.
It may have less disastrous consequences (depending on how good your
backup / restore procedures are) compared to arbitrary reads you
demonstrated, but it is still, quite clearly, a glaring security hole
caused by user input in SQL query - AKA SQL injection in layman's
terms.

it differs from Injection vulnerabilities in one
fundamental way: The attacker cannot change the structure of the SQL
query being executed.

I would say replacing a column name with value is changing the
structure of SQL query, and, basically, in exactly the way you
describe SQL injection: confusing the code (column name) with data.

I wholeheartedly welcome this RFC as it was originally proposed:
is_literal() doing exactly what it says on the tin, without any
security claims. But it has gone far from there real quick and now
people can't even name the thing.

--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com

We can agree that it is a bug. We don't agree on the definition of SQL
injection.

Changing a column name to a number (which prepared statements shouldn't
allow in the first place) is a bug. This changes the effect of the command,
but the structure of the query remains unchanged.

Hi Scott,

I wrote that example where an integer could be dangerous.

So firstly - just to clarify, because some replies seemed to be confused on the topic, as was literally mentioned in the original comment, it is definitely not correct behaviour - it’s a developer mistake that might work some of the time, and thus go unnoticed in testing. If you can show me a developer who’s never inadvertently passed the wrong parameter in some condition, I’ll show you an imaginary developer.

Additionally - pointing out that this is a "developer error” doesn’t help your case. Using non-parameterised queries should already be a “developer error” for anyone who can walk and breathe at the same time - and yet that usage is being actively encouraged if this function supports integers. I’ve still seen zero responses about legitimate reasons this needs to support integers - giving people a shitty way to build an IN() clause is not legitimate. Parameterisation exists, and works.

I don’t even understand why you mentioned prepared statements (I guess you meant using parameterised queries?) - the column name inherently can’t be parameterised - hence having to use a string substitution in the query.

That part was weird and confusing, but not as odd as your claim that altering the query, so that it causes the where clause to become moot, is not an SQL Injection? REALLY? That’s your claim?

I did a little research, and it turns out Wikipedia (https://en.wikipedia.org/wiki/SQL_injection#Technical_implementations), Cloudflare (https://www.cloudflare.com/en-au/learning/security/threats/sql-injection/), and OWASP (https://owasp.org/www-community/attacks/SQL_Injection#example-2) all have examples with a 1=1 type query manipulation. Do you want to write and tell them that they’re all wrong, or should I ask them to call you?

Also, while researching the specifics of what is considered an “SQL Injection” I came across an article, that talks specifically about the dangers of allowing user input (i.e. the thing is_trusted is meant to prevent) as a column or table identifier. It’s from this little organisation, you may have heard of them: “Paragon Initiative” (https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide).

I would absolutely make use of a function that tells me if the string given is in fact from something controlled by the developer. But once that same string can also include input from the request or the environment or whatever by nature of integers, the function becomes useless for the stated purpose.

Cheers

Stephen

One other thing. What do you think the result of this code will be?
How many rows do you think it will spit out?

&lt;?php
$db = new PDO('sqlite::memory:');

// These are the only reasonable settings to ever use:
$db->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

// Let's create a table:
$db->exec("CREATE TABLE foo (username TEXT, pwhash TEXT, email TEXT)");

// Dummy rows:
$db->exec("INSERT INTO foo(username, pwhash, email) VALUES ('scott',
'lolno', 'scott@paragonie.com')");
$db->exec("INSERT INTO foo(username, pwhash, email) VALUES ('robyn',
'fake!', 'robyn@paragonie.com')");

// Okay, now let's do a prepared statement, with an integer field
$int = 12345;

$stmt = $db->prepare("SELECT * FROM foo WHERE {$int} = ?");
$stmt->execute([12345]);
var_dump($stmt->fetchAll(PDO::FETCH_ASSOC));

4 years ago by Stephen Reay — view source

unread

On Thu, Jun 24, 2021 at 3:41 AM Scott Arciszewski scott@paragonie.com
wrote:

The failure condition of this query is
"return all rows from the table already being queried", not "return
arbitrary data the attacker selects from any table that the
application can read".

Imagine that was a DELETE rather than SELECT and the failure condition
becomes 'the table is emptied'.
It may have less disastrous consequences (depending on how good your
backup / restore procedures are) compared to arbitrary reads you
demonstrated, but it is still, quite clearly, a glaring security hole
caused by user input in SQL query - AKA SQL injection in layman's
terms.

it differs from Injection vulnerabilities in one
fundamental way: The attacker cannot change the structure of the SQL
query being executed.

I would say replacing a column name with value is changing the
structure of SQL query, and, basically, in exactly the way you
describe SQL injection: confusing the code (column name) with data.

I wholeheartedly welcome this RFC as it was originally proposed:
is_literal() doing exactly what it says on the tin, without any
security claims. But it has gone far from there real quick and now
people can't even name the thing.

--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com

We can agree that it is a bug. We don't agree on the definition of SQL
injection.

Changing a column name to a number (which prepared statements shouldn't
allow in the first place) is a bug. This changes the effect of the command,
but the structure of the query remains unchanged.

Hi Scott,

I wrote that example where an integer could be dangerous.

So firstly - just to clarify, because some replies seemed to be confused on the topic, as was literally mentioned in the original comment, it is definitely not correct behaviour - it’s a developer mistake that might work some of the time, and thus go unnoticed in testing. If you can show me a developer who’s never inadvertently passed the wrong parameter in some condition, I’ll show you an imaginary developer.

Additionally - pointing out that this is a "developer error” doesn’t help your case. Using non-parameterised queries should already be a “developer error” for anyone who can walk and breathe at the same time - and yet that usage is being actively encouraged if this function supports integers. I’ve still seen zero responses about legitimate reasons this needs to support integers - giving people a shitty way to build an IN() clause is not legitimate. Parameterisation exists, and works.

I don’t even understand why you mentioned prepared statements (I guess you meant using parameterised queries?) - the column name inherently can’t be parameterised - hence having to use a string substitution in the query.

That part was weird and confusing, but not as odd as your claim that altering the query, so that it causes the where clause to become moot, is not an SQL Injection? REALLY? That’s your claim?

I did a little research, and it turns out Wikipedia (https://en.wikipedia.org/wiki/SQL_injection#Technical_implementations), Cloudflare (https://www.cloudflare.com/en-au/learning/security/threats/sql-injection/), and OWASP (https://owasp.org/www-community/attacks/SQL_Injection#example-2) all have examples with a 1=1 type query manipulation. Do you want to write and tell them that they’re all wrong, or should I ask them to call you?

Also, while researching the specifics of what is considered an “SQL Injection” I came across an article, that talks specifically about the dangers of allowing user input (i.e. the thing is_trusted is meant to prevent) as a column or table identifier. It’s from this little organisation, you may have heard of them: “Paragon Initiative” (https://paragonie.com/blog/2015/05/preventing-sql-injection-in-php-applications-easy-and-definitive-guide).

I would absolutely make use of a function that tells me if the string given is in fact from something controlled by the developer. But once that same string can also include input from the request or the environment or whatever by nature of integers, the function becomes useless for the stated purpose.

Cheers

Stephen

One other thing. What do you think the result of this code will be?
How many rows do you think it will spit out?
&lt;?php
$db = new PDO('sqlite::memory:');

// These are the only reasonable settings to ever use:
$db->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

// Let's create a table:
$db->exec("CREATE TABLE foo (username TEXT, pwhash TEXT, email TEXT)");

// Dummy rows:
$db->exec("INSERT INTO foo(username, pwhash, email) VALUES ('scott',
'lolno', 'scott@paragonie.com')");
$db->exec("INSERT INTO foo(username, pwhash, email) VALUES ('robyn',
'fake!', 'robyn@paragonie.com')");

// Okay, now let's do a prepared statement, with an integer field
$int = 12345;

$stmt = $db->prepare("SELECT * FROM foo WHERE {$int} = ?");
$stmt->execute([12345]);
var_dump($stmt->fetchAll(PDO::FETCH_ASSOC));

Two things:
(a) try that same query, unaltered on mysql. It won’t be 0 rows. I’m pretty sure its also not 0 on Postgres.

(b) try binding the parameter as an integer and running it on SQLite. It won’t be zero rows. Actually I can even show you that one: https://3v4l.org/VL7UC#focus=8.0.7 https://3v4l.org/VL7UC#focus=8.0.7

So what do we call the function now: is_trusted_string_or_integer_but_dont_dare_use_the_int_as_an_int_or_with_mysql_at_all() ?

And, this horse is practically glue based on how much I’m beating it, but still I have to keep asking: Why integers at all? Who is using “2” or some other digit-only identifier as a table or column name, but also doesn’t define that name as a string?

4 years ago by Kamil Tekiela — view source

unread

Hi Stephen,

I believe the idea was for dynamically generate table names, or numbered
tables/columns. E.g.

function getTable(string $table){
// is_literal check here
}

$number = (int) $_GET['tableno'];
if($number < 0 || $number > 10) {
throw new Exception("Invalid number");
}

$tablename = 'table_'.$number;
getTable($tablename);

The number is concatenated to the table name.

--Kamil

4 years ago by Stephen Reay — view source

unread

Hi Stephen,

I believe the idea was for dynamically generate table names, or numbered tables/columns. E.g.

function getTable(string $table){
// is_literal check here
}

$number = (int) $_GET['tableno'];
if($number < 0 || $number > 10) {
throw new Exception("Invalid number");
}

$tablename = 'table_'.$number;
getTable($tablename);

The number is concatenated to the table name.

—Kamil

Hi Kamil,

Thanks for at least trying to answer this question.

I’m sure someone somewhere does that and thinks its a good idea. I respectfully (to you; probably less respectfully to someone if they tell me they do this) disagree. I don’t think PHP should necessarily shy away from features because they’re potentially dangerous, but I also don’t think it should be adding new features/functions that are more dangerous, just to make some weird (IMO bad-practice) edge cases easier.

I’d suggest if they insist on that bizarre naming pattern, and want to use a literal string check, they could define an array of string numbers that represent their table names.

$tbls = [‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ...];

getTable(’table_’ . $tbls[$number]);

Cheers

Stephen

4 years ago by Mike Schinkel — view source

unread

Hi Stephen,

I believe the idea was for dynamically generate table names, or numbered tables/columns. E.g.

function getTable(string $table){
// is_literal check here
}

$number = (int) $_GET['tableno'];
if($number < 0 || $number > 10) {
throw new Exception("Invalid number");
}

$tablename = 'table_'.$number;
getTable($tablename);

The number is concatenated to the table name.

—Kamil

Hi Kamil,

Thanks for at least trying to answer this question.

I’m sure someone somewhere does that and thinks its a good idea. I respectfully (to you; probably less respectfully to someone if they tell me they do this) disagree. I don’t think PHP should necessarily shy away from features because they’re potentially dangerous, but I also don’t think it should be adding new features/functions that are more dangerous, just to make some weird (IMO bad-practice) edge cases easier.

WordPress Multisite does exactly that.

Whether or not them doing so is a "good idea" is irrelevant as there are a large number of website that use that mode of WordPress currently active on the web.

I’d suggest if they insist on that bizarre naming pattern, and want to use a literal string check, they could define an array of string numbers that represent their table names.

$tbls = [‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ...];

getTable(’table_’ . $tbls[$number]);

Some WP MS installations support millions of thousands sites. See WordPress.com http://wordpress.com/.

But yes, I guess it could be possible for them to hack hack together 'table_983761' out of literals via a Rube Goldbergian-function, if forced to.

but still I have to keep asking: Why integers at all?

While I'm not a fan of this approach, there is a lot of existing code and
tutorials that use:

$sql = 'WHERE id IN (' . implode(',', array_map('intval', $ids)) . ')';

$sql = sprintf('SELECT * FROM table WHERE id = %d;', intval($id));

And WordPress (and I am sure a lot of other legacy code) does not support parameterized queries in the DB object, at least not without jumping through tons of hoops. Not to mention the 60k existing open-source plugins and the likely million custom plugins in the wild.

-Mike
P.S. Of course we could ignore the entirety of WordPress, but that just does not strike me as a prudent course of action.

4 years ago by Joe Watkins — view source

unread

Nobody has demonstrated that "string" . int can lead to anything but
mistakes.

It CANNOT lead to injection, and that's what we're talking about, we're not
talking about a function that protects you from all possible security
concerns or bugs.

The actual definition of injection matters, when we are talking about
injection, obviously.

You can't have your own definition.

Cheers
Joe

On Jun 24, 2021, at 6:33 AM, Stephen Reay php-lists@koalephant.com
wrote:

Hi Stephen,

I believe the idea was for dynamically generate table names, or
numbered tables/columns. E.g.

function getTable(string $table){
// is_literal check here
}

$number = (int) $_GET['tableno'];
if($number < 0 || $number > 10) {
throw new Exception("Invalid number");
}

$tablename = 'table_'.$number;
getTable($tablename);

The number is concatenated to the table name.

—Kamil

Hi Kamil,

Thanks for at least trying to answer this question.

I’m sure someone somewhere does that and thinks its a good idea. I
respectfully (to you; probably less respectfully to someone if they tell me
they do this) disagree. I don’t think PHP should necessarily shy away from
features because they’re potentially dangerous, but I also don’t think it
should be adding new features/functions that are more dangerous, just to
make some weird (IMO bad-practice) edge cases easier.

WordPress Multisite does exactly that.

Whether or not them doing so is a "good idea" is irrelevant as there are a
large number of website that use that mode of WordPress currently active on
the web.

I’d suggest if they insist on that bizarre naming pattern, and want to
use a literal string check, they could define an array of string numbers
that represent their table names.

$tbls = [‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ...];

getTable(’table_’ . $tbls[$number]);

Some WP MS installations support millions of thousands sites. See
WordPress.com http://wordpress.com/.

But yes, I guess it could be possible for them to hack hack together
'table_983761' out of literals via a Rube Goldbergian-function, if forced
to.

On Jun 24, 2021, at 6:35 AM, Stephen Reay php-lists@koalephant.com
wrote:

On 24 Jun 2021, at 17:16, Craig Francis craig@craigfrancis.co.uk
wrote:

On Thu, 24 Jun 2021 at 10:55, Stephen Reay php-lists@koalephant.com
wrote:

but still I have to keep asking: Why integers at all?

While I'm not a fan of this approach, there is a lot of existing code
and
tutorials that use:

$sql = 'WHERE id IN (' . implode(',', array_map('intval', $ids)) . ')';

$sql = sprintf('SELECT * FROM table WHERE id = %d;', intval($id));

And WordPress (and I am sure a lot of other legacy code) does not support
parameterized queries in the DB object, at least not without jumping
through tons of hoops. Not to mention the 60k existing open-source plugins
and the likely million custom plugins in the wild.

-Mike
P.S. Of course we could ignore the entirety of WordPress, but that just
does not strike me as a prudent course of action.

4 years ago by Dan Ackroyd — view source

unread

we're not
talking about a function that protects you from all possible security
concerns or bugs.

I know. We're talking about something that is meant to protect
programmers against silly mistakes.

And this RFC doesn't do that.

It allows silly mistakes to slip through and make it to production. As
per https://news-web.php.net/php.internals/114858:

Assume that both UserPreferences and getInfoPanel code is covered by
unit tests. The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.

I'd really like anyone who is promoting this RFC to answer the
question from https://news-web.php.net/php.internals/115010 :

Please can you go into some detail about what you think people are
meant to do when they detect a non-literal used where a literal is
expected?

There is a whole load of hand waving going on of "you can protect
yourself!" but then no detail of what this RFC proposes people
actually do, which means people can't evaluate whether it's worth
adding or not.

I know not carrying the literal flag through string concatenation
would make this slightly harder to use. But that would come with lower
long term maintenance, and less mistakes getting through to
production. For security concerns, that seems a good tradeoff to make.

I'd also quite like an answer to the following:

Why are you prioritising speed of adoption, over reducing the cost of
using this feature for projects over say the next 5 or 10 years?

I think this feature should be of most value to teams with large
code-bases, where maintaining the code (and figuring out security
problems) has a much higher cost than for small code-bases, where this
feature might not deliver much value. But by carrying the literal flag
through on string concat, that results in it being annoying to use for
teams with large code-bases....which seems self-defeating.

cheers
Dan
Ack

4 years ago by Craig Francis — view source

unread

Please can you go into some detail about what you think people are
meant to do when they detect a non-literal used where a literal is
expected?

There is a whole load of hand waving going on of "you can protect
yourself!" but then no detail of what this RFC proposes people
actually do, which means people can't evaluate whether it's worth
adding or not.

As noted by Lauri Kenttä, "I'd imagine people use custom error handlers
which report errors (warnings, notices) to ticket system or email or some
other convenient place, so they don't need to read through logs."

By using a simple function, it allows the library to handle the situation
however it likes. In the RFC there is a "How it can be used by libraries"
link, which shows how a library can easily provide options for a basic
warning, exception, or just ignoring. But a library could also only run
these checks in development mode (off in production), or do additional
checks to see if the value is likely to be an issue (value matches a field
name), etc.

https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php

Why are you prioritising speed of adoption, over reducing the cost of
using this feature for projects over say the next 5 or 10 years?

I think this feature should be of most value to teams with large
code-bases, where maintaining the code (and figuring out security
problems) has a much higher cost than for small code-bases, where this
feature might not deliver much value. But by carrying the literal flag
through on string concat, that results in it being annoying to use for
teams with large code-bases....which seems self-defeating.

As noted by Rowan Tommins, "It would also not solve the problem of the
assertion only triggering in production without a careful integration test.
That can only be solved with a static analyser, which would also have
enough information to isolate the cause."

Static Analysis via literal-string is now supported in Psalm (thanks
Matthew Brown):
https://psalm.dev/r/9440908f39

And with a dedicated type (a future RFC, as it needs to cover different
questions that build on this implementation), that would make Static
Analysis even better for debugging, rather than a solution that triggers
exceptions for everyone.

Craig

4 years ago by Dan Ackroyd — view source

unread

Dan Ackroyd wrote:
Please can you go into some detail about what you think people are
meant to do when they detect a non-literal used where a literal is
expected?

By using a simple function, it allows the library to handle
the situation however it likes.

That's not an answer to the question.

Just saying "you can handle it however you like" skips over the core
problem with RFC, which is that by supporting carrying the literal
flag through string concatenations,

It allows silly mistakes to slip through and make it to production. As
per https://news-web.php.net/php.internals/114858:

Assume that both UserPreferences and getInfoPanel code is covered by
unit tests. The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.

For is_literal checks, what's going to happen is:

programmer makes a mistake, that isn't caught by a test.
programmer deploys site, and entries about non-literal strings start
being written to log files.
someone then needs to remember to check the log files, and pull out
the appropriate entries, and make a ticket for them to be
investigated.
someone then needs to pick up that ticket, investigate the cause of
the non-literal string being used, and trace it back to which
module/library is causing it, then update the ticket for someone on
that team to fix it.
someone on that team needs to have that ticket prioritised before
they start work on it.

I'd really like anyone who is promoting this RFC to answer the
question from https://news-web.php.net/php.internals/115010 :

Why are you prioritising speed of adoption, over reducing the cost of
using this feature for projects over say the next 5 or 10 years?

It's going to take more time to track down an error than it will be to
start using the feature.

I had planned to use this RFC to prevent XSS attacks, and in the
current implementation I'm going to get a notice of "There's some
userdata in an inappropriate place. Have fun finding it!" because the
information about where that user data is in the HTML will be lost.

It's still bizarre to me that the RFC is proposing something that
allows mistakes to slip through to production.

cheers
Dan
Ack

btw, email or ticketing systems are not an appropriate place to log
this type of problem. They could happen on every page load, and so
generate hundreds of errors per second. As amusing as it is to look
back at the times that I have generated 10,000 error emails per second
in the past, I don't want to repeat that experience.

Also, I'm going to cross-post this, as it's unreasonable to expect
people to read all of the different threads.

4 years ago by Joe Watkins — view source

unread

Please, stop spamming us with the same comments.

Cheers
Joe

On Fri, 25 Jun 2021 at 00:57, Craig Francis craig@craigfrancis.co.uk
wrote:

Dan Ackroyd wrote:
Please can you go into some detail about what you think people are
meant to do when they detect a non-literal used where a literal is
expected?

By using a simple function, it allows the library to handle
the situation however it likes.

That's not an answer to the question.

Just saying "you can handle it however you like" skips over the core
problem with RFC, which is that by supporting carrying the literal
flag through string concatenations,

It allows silly mistakes to slip through and make it to production. As
per https://news-web.php.net/php.internals/114858:

Assume that both UserPreferences and getInfoPanel code is covered by
unit tests. The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.

For is_literal checks, what's going to happen is:

programmer makes a mistake, that isn't caught by a test.

programmer deploys site, and entries about non-literal strings start
being written to log files.

someone then needs to remember to check the log files, and pull out
the appropriate entries, and make a ticket for them to be
investigated.

someone then needs to pick up that ticket, investigate the cause of
the non-literal string being used, and trace it back to which
module/library is causing it, then update the ticket for someone on
that team to fix it.

someone on that team needs to have that ticket prioritised before
they start work on it.

I'd really like anyone who is promoting this RFC to answer the
question from https://news-web.php.net/php.internals/115010 :

Why are you prioritising speed of adoption, over reducing the cost of
using this feature for projects over say the next 5 or 10 years?

It's going to take more time to track down an error than it will be to
start using the feature.

I had planned to use this RFC to prevent XSS attacks, and in the
current implementation I'm going to get a notice of "There's some
userdata in an inappropriate place. Have fun finding it!" because the
information about where that user data is in the HTML will be lost.

It's still bizarre to me that the RFC is proposing something that
allows mistakes to slip through to production.

cheers
Dan
Ack

btw, email or ticketing systems are not an appropriate place to log
this type of problem. They could happen on every page load, and so
generate hundreds of errors per second. As amusing as it is to look
back at the times that I have generated 10,000 error emails per second
in the past, I don't want to repeat that experience.

Also, I'm going to cross-post this, as it's unreasonable to expect
people to read all of the different threads.

4 years ago by Rowan Tommins — view source

unread

It allows silly mistakes to slip through and make it to production. As
per https://news-web.php.net/php.internals/114858:

Perhaps you missed my reply earlier where I pointed out that the traceability problem you've identified is valid, but not solved by removing the flag on concatenation: https://news-web.php.net/php.internals/114868

Regardless of whether you get an error message from assert(is_literal($foo) && is_literal($bar)) or literal_concat($foo, $bar) the actual big will almost certainly have happened somewhere else in the program, and you'll need to trace the data flow of $foo and $bar to find out where.

To avoid that, you'd have to avoid propagating the flag on any operation, including assignment and return statements, which would clearly be pretty pointless.

Also, I'm going to cross-post this, as it's unreasonable to expect
people to read all of the different threads.

Please don't. All it will achieve is more list traffic, more fragmentation, and more messages that people who aren't interested have to ignore.

Regards,

--
Rowan Tommins
[IMSoP]

4 years ago by Dan Ackroyd — view source

unread

the actual bug will almost certainly have happened somewhere else in the
program, and you'll need to trace the data flow of $foo and $bar to find out where.

That depends on what you mean by bug. In particular I don't agree that
"The actual root cause is in getColor(),". Whether user-controlled
data is dangerous or not depends on where it's used, not where it's
coming from.

For me, the bug when $up->getColor() changes from returning a literal
string to a user supplied value is in this line of code:

"<div style='color: " . $up->getColor() . "'>";

Where a programmer has assumed that the value returned by
$up->getColor() is a literal string. It's not a bug for that function
to return a user controlled value.

so tracking the root cause could be just as difficult.

Not as difficult. It would give the exact line of code where the bad
assumption was made.

literal_concat("<div style='color: ", $up->getColor(), "'>");

When $up->getColor() is changed from returning a literal string to a
user controlled one, it would throw an exception. You could look at
the exception, see that the 2nd parameter isn't literal and is coming
from the getColor method, and know this code isn't safe and needs to
be changed to something that can do the appropriate escaping.

Even if the origin of the value was more obscure as you suggest, so
maybe something like:

function getDiv($color) {
$div = literal_concat("<div style='color: ", $color, "'>");
// ...
return $div;
}

$color = $up->getColor();
$html .= getDiv($color);

Although the origin of where the variable comes from wouldn't be in
the stacktrace, the error still would be.

There is a line of code that has a bad assumption of whether $color is
a literal string or not, and it's that line of code that needs to be
changed, to use something that understands HTML escaping, in
particular how to escape user input for html attribute context.

So even when the origin of a value might be hard to understand, the
precise line of code that has the bad assumption would be part of the
stacktrace, which makes figuring out the problem be a lot easier.

That's why it worked for Google. By not having to understand where
variables come from, but only having the error be where there are used
inappropriately, you completely get rid of the "having to understand a
large chunk of code" problem, and instead only have to consider where
the variables are being used, and in what context they are being used.

cheers
Dan
Ack

4 years ago by Rowan Tommins — view source

unread

There is a line of code that has a bad assumption of whether $color is
a literal string or not, and it's that line of code that needs to be
changed, to use something that understands HTML escaping, in
particular how to escape user input for html attribute context.

All we can say for sure is that the code is making an assertion about
its input, and the assertion is failing. It may be that the assertion
needs to be revised, and assume that the input is dangerous; or it may
be that the assertion is correct and the calling code is violating the
contract.

Consider this example:

function addBlockToSidebar(string $blockHtml): void {
$this->sidebarHtml .= '<div class="block">' . $blockHtml . '</div>';
}

The contract of this function is that the input is a piece of HTML which
isn't under an attacker's control. We can assert that the caller meets
that contract either using assert(is_literal($blockHtml)) or by using
literal_concat() instead of the . operator.

Either way, if that assertion fails, it indicates a bug somewhere else
in the program, not in this function, and not necessarily in the call
stack of the error:

public function getFooBlock(): string {
// literal string, but no assertion is present
$block = '<div>blah blah</div>';

     if ( isset($this->customBanner) ) {
         // $this->customBanner is user-controlled
         $block .= $this->customBanner;
     }
     return $block;
}

$block = getFooBlock();
addBlockToSidebar($block);

Here, the new code in getFooBlock() is at fault, but is not in the stack
trace when the assertion fails. Clearing the flag on all concatenations
makes no difference to this example: the concatenation is with a
non-literal string, so clears it anyway.

Regards,

--
Rowan Tommins
[IMSoP]

4 years ago by Craig Francis — view source

unread

but still I have to keep asking: Why integers at all?

While I'm not a fan of this approach, there is a lot of existing code and
tutorials that use:

$sql = 'WHERE id IN (' . implode(',', array_map('intval', $ids)) . ')';

$sql = sprintf('SELECT * FROM table WHERE id = %d;', intval($id));

Craig

4 years ago by Stephen Reay — view source

unread

but still I have to keep asking: Why integers at all?

While I'm not a fan of this approach, there is a lot of existing code and
tutorials that use:

$sql = 'WHERE id IN (' . implode(',', array_map('intval', $ids)) . ')';

$sql = sprintf('SELECT * FROM table WHERE id = %d;', intval($id));

Craig

Yeah you’ve said this about a dozen times now. Parameterisation exists. Query builders that do this already using parameterisation, exist.

4 years ago by Rowan Tommins — view source

unread

but still I have to keep asking: Why integers at all?

While I'm not a fan of this approach, there is a lot of existing code and
tutorials that use:

$sql = 'WHERE id IN (' . implode(',', array_map('intval', $ids)) . ')';

$sql = sprintf('SELECT * FROM table WHERE id = %d;', intval($id));
Yeah you’ve said this about a dozen times now. Parameterisation exists. Query builders that do this already using parameterisation, exist.

I think it's clear that a conversation is going round in circles if you
are complaining both that you haven't had an answer to your question,
and that you've had the same answer too many times.

If you understand the answer but disagree with it, there is nothing more
to be said.

Unless you have a new point to make, I think it's probably best to
"agree to disagree" at this point, particularly as the discussion seems
to be getting heated and personal, which is not fun for anyone.

Regards,

--
Rowan Tommins
[IMSoP]