4 years ago by Joe Watkins — view source

unread

Afternoon all,

While this is not at all my idea, I wrote the patch, so my words may seem
bias/hollow. Still, here are some words ...

In the past this kind of feature would have been extremely invasive, it
would have had so many edges because of the way we handled strings that it
was never really feasible.

Today the implementation of this is not particularly invasive or
complicated, and it lends a very useful tool to the developer.

This starting place, of second class support seems like a reasonable
starting place, but there is a possible future where we have first class
support.

This may or may not be appealing and we'll be in a better position to judge
when we see how this may be used in the wild.

Cheers
Joe

On Sat, 12 Jun 2021 at 19:00, Craig Francis craig@craigfrancis.co.uk
wrote:

Hi Internals,

I'd like to start the discussion on the is_literal() RFC:

https://wiki.php.net/rfc/is_literal

is_literal() brings a proven way to identify Injection Vulnerabilities to
PHP, already used by Google in their Java and Go projects, and is currently
being added to JavaScript. It's a lightweight and simple approach:
"Distinguishing strings from a trusted developer from strings that may be
attacker controlled", allowing Libraries to identify the mistakes made by
the thousands of developers using them incorrectly.

When Libraries use is_literal() to protect against these mistakes, we can
trust their output, and (as covered in the Future scope section) PHP can
then raise warnings with certain native functions like PDO::query,
mysqli_query, exec, preg_match, etc. (we would only consider warnings,
anything stricter like exceptions would be in many years time, if at all -
the intention is to alert and inform people, not break things).

The length is due to the FAQ Section, on why it's needed, how it can be
used by Libraries, and the important differences of using this flag versus
the flawed Taint Checking approach with its false sense of security
(error-prone escaping).

Thanks,
Craig Francis

PS: If anyone wants to discuss face-to-face on Zoom, I'll be available (UK
Time/BST/UTC+1) at:

https://chat.craigfrancis.co.uk/

Saturday 12th June, 6pm to 8pm;
Sunday 13th June, 10am to Midday;
Monday 14th June, 5pm to 7pm;
Tuesday 15th June, 9pm to 11pm;
Thursday 16th June, 10am to Midday;
(other times on request)

4 years ago by Dan Ackroyd — view source

unread

Hi Craig, Joe,

While I think the idea behind this RFC is great, the current
implementation is terrible, as it will cause some fatal errors in
production.

The problem

Imagine we have this code:

function getInfoPanel(UserPreferences $up): string
{
$html = "<div style='color: " . $up->getColor() . "'>";
$html .= // more html here
$html .= "</div>"

return $html;

}

class UserPreferences {
private DB $db;

function getColor(): string {
    $preferredColor = $this->db->getColor();

    if ($preferredColor === 'light') {
        return '#fff';
    }

    if ($preferredColor === 'dark') {
        return '#000';
    }
    return '#fff'; // default is light
}

}

And then the string returned by getInfoPanel() is used elsewhere that
does a check for is_literal. That check will pass.

Now imagine a feature request comes in, to allow people to customise
the color of their UI, so the UserPreferences code is changed to:

class UserPreferences {
private DB $db;

function getColor(): string {
    $preferredColor = $this->db->getColor();

    if ($preferredColor === 'light') {
        return '#fff';
    }

    if ($preferredColor === 'dark') {
        return '#000';
    }
    return $preferredColor; // user has set custom color.
}

}

Assume that both UserPreferences and getInfoPanel code is covered by
unit tests. The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.

This would be a complete pain in the butt to fix.

The only information would be that the html string is not
literal....but there wouldn't be any info about which bit was the
problem. Someone would have to manually check which code had been
called to produce the non-literal string.

That could mean the site would be unavailable for users that had set
a custom color. Or more likely, the developers would just turn off the
is_literal check or at least downgrade it to a mere logging error aka
something to be ignored.

How to fix this in the RFC

By not carrying the literal flag on string concatenation, but
requiring developers to use functions like literal_concat() and
literal_implode(), the 'false affordance'* goes away:

function getInfoPanel(UserPreferences $up): string
{
$html[] = "<div style='color: " . $up->getColor() . "'>";
$html[] = // more html here
$html[] = "</div>"

return literal_concat(...$html);

}

Now, when the feature request comes in to allow users to set custom
colors, the code will fail on the literal_concat. When that function
checks internally that all the strings passed to it are literal, it
will throw an exception from the function that has made the bad
assumption about whether a string is user input or not.

Yes, it's slightly ironic that making errors be louder and happen
earlier in the program is better than making them happen later.

Wouldn't forcing people to use literal_concat() and

literal_implode() be annoying

I don't think so.

The code change required is pretty small, and can be done either by a
junior developer, or could be automated by things like
https://github.com/rectorphp/rector. Even if there are 100 places
where an existing series of string concatenations needs to be
converted to chucking in an array, and then using literal_concat() and
each of those takes 10 minutes... that's 2 days to do the conversion.
In return for which the application becomes a lot more secure than it
was previously.

But additionally, I don't think many people are actually going to be
using literal_concat() in many places. My guess is that the vast
majority of places where you are combining bits of strings together,
you would realise that you can switch to using type-specific escaping,
as you're going to need to do the escaping anyway.

Is this a problem in other languages?

The similar implementation used in Google is used for Java. As Java is
statically compiled, any attempt to pass a non-literal string to a
function that expects a literal string fails to compile. And if it
won't compile, it won't run, and so won't have errors in production.

The only thing that needs to be changed in my opinion is not carrying
the literal flag through string concatenation. Although that would be
slightly more work to use the feature, avoiding having inevitable and
hard to debug errors in production is the trade-off.

cheers
Dan
Ack

false affordance - something that works, except when it doesn't.

4 years ago by Matthew Brown — view source

unread

Hi Craig, Joe,

While I think the idea behind this RFC is great, the current
implementation is terrible, as it will cause some fatal errors in
production.

I'm not sure it's productive to call the implementation terrible. With
Someniatko's suggestion of a literal-string type, that issue (like so many
others) could be caught by a type-checker:

/** @return literal-string */
function getColor(): string {
    return $_GET["color"]; // this would trigger a typechecker error
}

In fact I think there's a solid case for adding support for literal-string
to type checkers like mine today — mainly to prevent patterns that might
later lead to SQL injection.

4 years ago by Thomas Nunninger — view source

unread

Hi!

class UserPreferences {
private DB $db;
 function getColor(): string {
     $preferredColor = $this->db->getColor();

     if ($preferredColor === 'light') {
         return '#fff';
     }

     if ($preferredColor === 'dark') {
         return '#000';
     }
     return $preferredColor; // user has set custom color.
 }
}

Assume that both UserPreferences and getInfoPanel code is covered by
unit tests. The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.

This would be a complete pain in the butt to fix.

When you write about testing in this database scenario, I find some
additional issue: When you try to create pure unit tests you would
somehow mock/stubb or replace the database with some test double. In
that situation your tests will still pass even if testing the variant
that the value comes from the database. Only some (infrastructure or
end-to-end) test that covers the business logic plus the corresponding
infrastructure by accident would uncover an error.

I wonder if we would need some method to mark a value/variable as
non-literal. Or perhaps mark some return value (or input parameters?) in
an interface/class as non-literal using some annotation?

That still puts the burden on the developer to think about that issue.
But it's probably better than nothing.

Perhaps someone has a better idea?

Cheers
Thomas

4 years ago by Marco Pivetta — view source

unread

Hey Thomas,

Hi!
class UserPreferences {
private DB $db;
 function getColor(): string {
     $preferredColor = $this->db->getColor();

     if ($preferredColor === 'light') {
         return '#fff';
     }

     if ($preferredColor === 'dark') {
         return '#000';
     }
     return $preferredColor; // user has set custom color.
 }
}

Assume that both UserPreferences and getInfoPanel code is covered by
unit tests. The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.

This would be a complete pain in the butt to fix.
When you write about testing in this database scenario, I find some
additional issue: When you try to create pure unit tests you would
somehow mock/stubb or replace the database with some test double. In
that situation your tests will still pass even if testing the variant
that the value comes from the database. Only some (infrastructure or
end-to-end) test that covers the business logic plus the corresponding
infrastructure by accident would uncover an error.

I wonder if we would need some method to mark a value/variable as
non-literal. Or perhaps mark some return value (or input parameters?) in
an interface/class as non-literal using some annotation?

That still puts the burden on the developer to think about that issue.
But it's probably better than nothing.

Perhaps someone has a better idea?

A small userland utility that writes the data to an in-memory stream, then
reads from it, is sufficient.

4 years ago by Dan Ackroyd — view source

unread

Thomas Nunninger wrote:

Only some (infrastructure or
end-to-end) test that covers the business logic plus the corresponding
infrastructure by accident would uncover an error.

I think testing will be much less of an issue than you might expect,
but partly only if string concatenation doesn't carry the flag around.

The whole point of the idea of literal strings is to make it be easier
to write some code that:

i) is safe.
ii) can be reasoned about at scale.

Passing bare strings around is not so great for either of those
things. You have to manually remember "this is a string that is
intended to be used in a particular way. And doing that in a large
code base, when your engineering department is large enough to work in
separate teams is particularly difficult.

Instead of using bare strings, using a more specific type e.g.
HtmlAttribute::fromString('#fff'); and then passing that around would
be much easier to reason about.

Similarly for the thoughts about concatenating numbers into strings.
Yeah, people might think they want that, but in practice using an SQL
builder that allows you to put the number in the right place like:

$sqlBuilder->add('SELECT * FROM foo LIMIT %d', FIXED_LIMIT);

And then predictable when the request comes in to allow users to set
their own limit, changing it to:

$sqlBuilder->add('SELECT * FROM foo LIMIT %d', $_GET['limit']);

Doesn't take any time to refactor the code, because it's already using
an appropriate library that handles variables correctly.

cheers
Dan
Ack

4 years ago by Mike Schinkel — view source

unread

The whole point of the idea of literal strings is to make it be easier
to write some code that:

i) is safe.
ii) can be reasoned about at scale.

Passing bare strings around is not so great for either of those
things. You have to manually remember "this is a string that is
intended to be used in a particular way. And doing that in a large
code base, when your engineering department is large enough to work in
separate teams is particularly difficult.

Instead of using bare strings, using a more specific type e.g.
HtmlAttribute::fromString('#fff'); and then passing that around would
be much easier to reason about.

THIS.

Similarly for the thoughts about concatenating numbers into strings.
Yeah, people might think they want that, but in practice using an SQL
builder that allows you to put the number in the right place like:

$sqlBuilder->add('SELECT * FROM foo LIMIT %d', FIXED_LIMIT);

And then predictable when the request comes in to allow users to set
their own limit, changing it to:

$sqlBuilder->add('SELECT * FROM foo LIMIT %d', $_GET['limit']);

Doesn't take any time to refactor the code, because it's already using
an appropriate library that handles variables correctly.

So, IMO this begs the question:

Should(n't?) PHP add a basic SQL builder class that can be extended for special cases, e.g. different flavors of SQL?

The problem with depending on userland to do this is that there will never be a single standard adopted.

If built into PHP it could:

Legitimize it across all PHP developers,
To make sure its always available in the most basic PHP install,
To allow implementing a performant SQL parser (which is not reasonably done in PHP),
To make creating a target for polyfills for prior versions possible,
And to create a standard that all PHP projects that use SQL can adopt?

-Mike
P.S. PHP could also implement a sanitizing templating language for SQL as possibly a different or even additional approach.

4 years ago by Dan Ackroyd — view source

unread

Should(n't?) PHP add a basic SQL builder class that can be extended for special cases, e.g. different flavors of SQL?

No. Or at least not yet.

This type of thing is much better done in userland, where the api can
evolve at a fast rate, rather than being limited by the fixed release
schedule of PHP.

If, after people had settled on a good api, someone could make a
strong argument for adding it to core, it could be evaluated then.

cheers
Dan
Ack

4 years ago by Mike Schinkel — view source

unread

Should(n't?) PHP add a basic SQL builder class that can be extended for special cases, e.g. different flavors of SQL?

No. Or at least not yet.

This type of thing is much better done in userland, where the api can
evolve at a fast rate, rather than being limited by the fixed release
schedule of PHP.

If, after people had settled on a good api, someone could make a
strong argument for adding it to core, it could be evaluated then.

Sure. But that is not inconsistent with what I was asking/proposing.

There would be a benefit to collectively working towards to defect-standard solution rather than leave it to the chaos of userland to come up many different approache, even if initially in userland.

Even a PSR might be appropriate, but I tend to think the interface-centric approach only works well things there the benefit it interoperability and not when the benefit is implementation.

-Mike

4 years ago by Pierre — view source

unread

Le 22/06/2021 à 11:28, Dan Ackroyd a écrit :

Should(n't?) PHP add a basic SQL builder class that can be extended for special cases, e.g. different flavors of SQL?

No. Or at least not yet.

This type of thing is much better done in userland, where the api can
evolve at a fast rate, rather than being limited by the fixed release
schedule of PHP.

Agreed, PHP is probably not the right place for an SQL builder, there's
too many dialects, too many standard or non-standard features, in the
end SQL builders almost always end-up being opinionated by its
designer's original need, philosophy, or SQL usage habits, and tailored
for users which use certain paradigms.

An SQL query builder is already business domain tied, in some way. Of
course most are very generic, but often don't handle properly SQL
advanced or modern features, whereas some other where built for ORMs and
such and may a reduced advanced SQL features that fits the original need.

I don't wish to see an SQL query builder in PHP core, instead of
favoring usage of modern SQL, it'll in my opinion (of course, that's
subjective) lower the expectation of people and probably normalize
SQL-92 being the only SQL spoken by people writing PHP (just
exaggerating a bit).

Regards

--

Pierre

4 years ago by Craig Francis — view source

unread

[...] it will cause some fatal errors in production.

No, the only way fatal errors can occur is if you choose to make them fatal
(in userland code).

I have included examples of the literal_concat() and literal_implode()
functions in the RFC, showing how they can be implemented in userland
instead, as those functions can be useful for some projects.

However requiring developers to rewrite all of their code to use
literal_concat() and literal_implode(), with those functions triggering an
exception whenever anything goes wrong, that's going to make adoption very
difficult.

Craig

4 years ago by Dan Ackroyd — view source

unread

Dan Ackroyd wrote:

The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.

Craig Francis wrote:

No, the only way fatal errors can occur is if you choose to make them fatal (in userland code).

Please can you go into some detail about what you think people are
meant to do when they detect a non-literal used where a literal is
expected?

This seems like a core flaw in the proposed RFC and I think you aren't
addressing it.

that's going to make adoption very difficult.

Why are you prioritising speed of adoption, over reducing the cost of
using this feature for projects over say the next 5 or 10 years?

It kind of sounds like the idea behind the RFC has been a compromise
between the idea in Mr Kern's talk, and people who are not sure this
feature is a good idea or not.

This feature should be of most use to larger teams, where reasoning
about the application is difficult to begin with. For them an extra
few hours migrating a large code base to use the appropriate
techniques is a fine trade-off over increased difficulty maintaining
their code.

No, the only way fatal errors can occur is if you choose to make them fatal (in userland code).

If people aren't going to make using a non-literal where a literal is
expected, be an error, the only alternative I can see is logging it.
Please correct me if you think people should be doing something other
than those two things.

From my experience, any process that involves looking in log files is
a pretty bad process.

For is_literal checks, what's going to happen is:

programmer makes a mistake, that isn't caught by a test.
programmer deploys site, and entries about non-literal strings start
being written to log files.
someone then needs to remember to check the log files, and pull out
the appropriate entries, and make a ticket for them to be
investigated.
someone then needs to pick up that ticket, investigate the cause of
the non-literal string being used, and trace it back to which
module/library is causing it, then update the ticket for someone on
that team to fix it.
someone on that team needs to have that ticket prioritised before
they start work on it.

Apologising for expressing myself poorly before, but this is a really,
really, really, really, really, really result to have. It makes using
this feature either have a high maintenance burden, and allows
security flaws to exist until someone gets around to fix them, or it
results in things failing in production.

I know it's slightly annoying to require any combining of strings,
where you want to preserve 'literal-ness', to have to jump through the
hoop of using a specific function, but it's just a few seconds work,
that would save hours of debugging.

The whole reason why the solution worked for Google was that it made
it cost less for programmers to do the right thing, rather than having
to remember fix problems after they are found.*

I don't think there would be an easy way to fix this if the RFC was
passed in its current form. Changing from string concatenation
carrying the literal flag around to not, would be a huge BC break,
that would require multiple versions to fix. If it was the other way
round so that we don't support carrying the literal flag around, and
(much to my embarrassment) it does mean that people can't actually use
the feature, because it's too difficult, it would be much easier to
move to carrying the flag around.

However requiring developers to rewrite all of their code to use literal_concat() and literal_implode()

No-one is forcing developers to rewrite their code. And I don't think
many people would actually use those functions.

The vast majority of people are going to continue to use the functions
provided by Wordpress, Doctrine etc, and any checks on literal, or
combining of literals would be internal to those functions.

As I wrote to Thomas, I think the majority of people are more likely
to shift to using type specific wrapping types, rather than copying
bare strings around.

cheers
Dan
Ack

*For anyone wondering, unfortunately we can't use the same
implementation as Google, due to PHP not being statically compiled,
and also not being able to set 'type flags' on strings.

4 years ago by lauri.kentta@gmail.com — view source

unread

Dan Ackroyd wrote:
If people aren't going to make using a non-literal where a literal is
expected, be an error, the only alternative I can see is logging it.
Please correct me if you think people should be doing something other
than those two things.

Maybe I'm mistaken, but I'd imagine people use custom error handlers
which report errors (warnings, notices) to ticket system or email or
some other convenient place, so they don't need to read through logs.

It should be well known that unexpected things happen in production.
You can already have e.g. undefined variables or bad array offsets
in a non-tested code path. Non-literal string is just one more
kind of bug, it can be reported just like all the rest.

--
Lauri Kenttä

4 years ago by Rowan Tommins — view source

unread

Hi Dan,

While I think the idea behind this RFC is great, the current
implementation is terrible, as it will cause some fatal errors in
production.

I think you have raised a legitimate concern here, but agree with
Matthew Brown that calling the implementation "terrible" is neither
warranted nor helpful.

It's also a bit bizarre to criticise the RFC for causing fatal errors
even though it doesn't introduce any, then propose a different version
which does introduce fatal errors.

The only information would be that the html string is not
literal....but there wouldn't be any info about which bit was the
problem. Someone would have to manually check which code had been
called to produce the non-literal string.

The difficulty of tracing why something is not in the expected state is
indeed a tricky one, but I don't think mandating literal_implode solves
it well enough to be worth the pain. Essentially, changing getInfoPanel
to use literal_implode is equivalent to this:

function getInfoPanel(UserPreferences $up): string
{
$html = "<div style='color: " . $up->getColor() . "'>";
$html .= // more html here
$html .= "</div>"
 assert( is_literal($html) );
 return $html;
}

However, there could still be an arbitrarily long code path between the
user input and that assertion (or call to literal_implode), so tracking
the root cause could be just as difficult.

The actual root cause is in getColor(), so the actual assertion needed
is there:

class UserPreferences {
private DB $db;

 function getColor(): string {
     $preferredColor = $this->db->getColor();

     if ($preferredColor === 'light') {
         return '#fff';
     }

     if ($preferredColor === 'dark') {
         return '#000';
     }
     assert( is_literal($preferredColor) );
     return $preferredColor; // user has set custom color.
 }

}

Adding that assertion would be easier and more elegant if it was
possible to set a "literal_string" return type, but it would still be up
to the user to add it. That leads to the original problem, but in
reverse: when writing the getColor() function, the author needs to know
where it will eventually be used to know that an assertion is relevant.

It would also not solve the problem of the assertion only triggering in
production without a careful integration test. That can only be solved
with a static analyser, which would also have enough information to
isolate the cause.

Regards,

--
Rowan Tommins
[IMSoP]

4 years ago by lauri.kentta@gmail.com — view source

unread

Hi,

I wrote the untaint() / make_literal() function, just in case.

implode("", array_map(fn($c) => $chars[ord($c)], str_split($s, 1)))

https://3v4l.org/EaN9Z#focus=rfc.literals

Sorry and bye.

--
Lauri Kenttä

4 years ago by Craig Francis — view source

unread

Hi,

I wrote the untaint() / make_literal() function, just in case.

implode("", array_map(fn($c) => $chars[ord($c)], str_split($s, 1)))

https://3v4l.org/EaN9Z#focus=rfc.literals

Sorry and bye.

Yes, I have a similar example in the RFC (eval).

The is_literal() function provides a guard rail (for everyone, but
especially for less knowledge developers), but as with most security
things, there’s nothing stopping you from deliberately climbing up and
jumping off, but at that point it’s very clearly your choice to do so.

Thanks for looking it over, and I hope it will be useful.

--
Lauri Kenttä

--

To unsubscribe, visit: https://www.php.net/unsub.php

4 years ago by lauri.kentta@gmail.com — view source

unread

On Sat, 12 Jun 2021 at 19:59, Lauri Kenttä lauri.kentta@gmail.com
wrote:

Hi,

I wrote the untaint() / make_literal() function, just in case.

implode("", array_map(fn($c) => $chars[ord($c)], str_split($s, 1)))

https://3v4l.org/EaN9Z#focus=rfc.literals

Sorry and bye.

Yes, I have a similar example in the RFC (eval).

Oh, the irresponsible use of eval was so overwhelming that I missed the
new string literal there. You could add var_export to make it more
realistic.

Anyway, the RFC is well motivated and thoroughly thought out. The
approach is very simple but fits many use cases, compared to previous
alternatives which were complex but more limited in the end. When
libraries and PDO start to use this, we can finally get rid of SQL
injections and a number of self-made input handling tricks, if only
people have learned to read warnings...

So thanks for writing this RFC. I hope it passes.

--
Lauri Kenttä

4 years ago by Mike Schinkel — view source

unread

Hi Craig,

Hi Internals,

I'd like to start the discussion on the is_literal() RFC:

https://wiki.php.net/rfc/is_literal

Nice! There is an awful lot to like here.

And few bits of concern.

What's to like?

Adding a proactive method for guarding against injection vulnerabilities would be a huge step forward for PHP. This should not be underemphasized.
That you added the section "Previous Work" and covered what other languages are doing in this regard.
The "Thanks" section and all the external references and prior comments that provides insight into how this RFC was developed.
And I love that you publicly stated these points in an RFC:

a. Why not use static analysis? It will never be used by most developers.
b. Why not educate everyone? You can't - developer training simply does not scale, and mistakes still happen.

What's of concern?

(Note: the last point could address concerns #2 and #4.)

Bike-shedding a bit, but I find a bit of cognitive dissonance over the semantic incorrectness of is_literal(), similar to Jakob Givoni's concern.

I also fear it will be confusing for developers who know what a literal is, and thus I agree with Jakob.

Minimally I'd ask for a separate vote on is_literal() vs. is_from_literal().

Regarding the sections "Non-Parameterised Values" and "Future Scope," if I understand correctly, your vision is to ultimately require all SQL and other vulnerable literals to be written as literals in the source code for them to be used or otherwise disallowed by future library code and/or PHP internal functionality?

If I did not misunderstand I believe you would disallow an important class of functionality from being implemented in PHP, including several existing and widely used applications such as but not limited to PhpMyAdmin and Adminer that open and interact with arbitrary SQL databases. It would also disallow using any data returned by APIs where the PHP application cannot know in advance what data will be acquired, even if that data has been properly escaped and/or sanitized.

I notice your RFC does not grant sprintf() the ability to return a string with the "literal" flag even though sprintf() can frequently return strings composed from known literals and safe numeric value. And I have frequently see it used for this purpose, e.g.:

$fields = ['id','name'];
$table = 'my_table';

$sql = sprintf( 'SELECT %s FROM %s WHERE ID=%d',
implode( ',', $fields ),
$my_table,
intval($_GET['id']));

Of course your section "WHERE IN" "addresses" this, but please see #4 below.

I argue sprintf() should be able to pass thru literal flags and validate integer values to determine that the result is literal. Am I missing something here?

Regarding the section "WHERE IN" you argue that we should push everyone to use parameters properly — which is a great idea in theory — but that would have a significant breakage in existing sites online. While many here in Internals seem to look down on WordPress if does power over 40%[1] of the web at this point and this would break most if not all those sites if the site owners or their hosts were to upgrade to a version of PHP that was militant about requiring is_literal() for mysqli_query()[2].

And yes, WordPress could upgrade their core code[3] to address this, but that would not also fix the 60k plugins and 10k themes on WordPress.org http://wordpress.org/, or likely the million+ combined custom plugins and themes in use on the web.

Is it really reasonable to change PHP to just break of all those sites so that we can force all the site owners to beg and plead with their plugin and theme developers to better protect them from injection errors? Seems that medicine might be worse than the cure, at least for all those site owners.

Given #2 and #4, I think your section "Support Functions" is missing what would allow WordPress, PhpMyAdmin, Adminer and all other PHP applications to address the concerns in #2 and #4 and it is neither literal_concat() nor literal_implode().

The function we would need is the one Lauri Kenttä wrote (in jest?), but we'd need one that is performant, not a userland version. Name it as_literal() or make_literal() or even as_unsafe_literal() — whichever chosen is unimportant IMO — but applications like the ones I mentioned need to be able to say "I absolutely know what I am explicitly doing so do not throw a warning or an error telling me my value is not a literal, dammit!"

Certainly most people would not use such a method by accident, especially if called something like as_unsafe_literal(). There might even be ways to validate that this was not used by a copy-and-paste artist, but I will leave that validation to other people who are smarter than me.

Regarding the section "Reflection API," shouldn't their be an is_literal()/is_from_literal() method added to the ReflectionParameter() and ReflectionProperty() classes?

So assuming you address at least #5 — and thus #2 and #4 indirectly — I would really like to see this RFC pass.

-Mike

[1] https://w3techs.com/blog/entry/40_percent_of_the_web_uses_wordpress https://w3techs.com/blog/entry/40_percent_of_the_web_uses_wordpress
[2] https://www.php.net/manual/en/mysqli.query.php#refsect1-mysqli.query-parameters https://www.php.net/manual/en/mysqli.query.php#refsect1-mysqli.query-parameters
[3] https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2056 <https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2056

4 years ago by Craig Francis — view source

unread

Nice! There is an awful lot to like here.

Thank you.

[...] Minimally I'd ask for a separate vote on is_literal() vs.
is_from_literal().

It would definitely be possible to have a vote on the name. I only went
with is_literal() because it's shorter (I'm a slow typer), and to match the
current convention of existing "is_*" functions (which all use a single
word suffix).

Regarding the sections "Non-Parameterised Values" and "Future Scope," if

I understand correctly, your vision is to ultimately require all SQL and
other vulnerable literals to be written as literals in the source code for
them to be used or otherwise disallowed by future library code and/or PHP
internal functionality?

No, fortunately that's not the case, as you're right, we do need to support
PhpMyAdmin, Adminer, systems like Drupal (which take certain values, like
the table name, from user input), and (as noted in your 4th point), there
is a lot of existing code that would break if we were militant about
requiring is_literal() for mysqli_query() and other functions.

Native functions like mysqli_query() should be covered in a future RFC, but
still only use warnings. The exact details would need to be confirmed, but
should provide a simple way to mark certain non-literal strings as trusted,
and still have a way to switch off those warnings entirely (e.g. legacy
systems).

Ideally as future scope, we would also provide a way for certain projects
to opt-in to having exceptions raised (the militant option), which would
not be appropriate for the vast majority of developers, but it's the level
I'd find useful for some of my projects (dealing with medical records and
other sensitive information).

As to how we could (in a future RFC) mark certain non-literal strings as
trusted, I've briefly touched on how this can work in the "Future Scope"
section (it's an idea I copied from Trusted Types in JavaScript, which
follows the same principle). I only did a summary because the exact
implementation would need to be discussed later (there are a few options,
but they all rely on is_literal() being implemented first).

First, we need libraries to use is_literal() to check inputs from
developers (i.e. the code they didn't write), which is their main source of
injection vulnerabilities. What goes on inside the library is entirely up
to them (they know that they are doing). For their output, they create a
stringable value-object, which (as future scope) could be marked as trusted
for certain functions (like mysqli_query).

These value-objects could be as simple as a single private property
(value), a __construct() method that takes the libraries output, and a
__toString() method to return it; or the library might want to be extra
sure of its output, and use something like the Query Builder example linked
in the "Non-Literal Values" section.

As future scope, this allows functions like mysqli_query() to simply check
for a literal, or one of these trusted value-objects (or have warnings
switched off entirely).

I notice your RFC does not grant sprintf() the ability to return a

string with the "literal" flag even though sprintf() can frequently return
strings composed from known literals and safe numeric value.

Joe has created a literal_sprintf() function for testing, I think it does
exactly what you're suggesting, and I believe that logic can be applied
directly to sprintf().

The only reason I've been hesitant is to keep this implementation of
is_literal() limited to identifying literals and supporting concatenation
(to keep it simple, and easy to use).

It's like the "String Splitting" part in the RFC, it's probably fine, and
we can do it if there's interest; but we also need to consider what issues
it might cause (if any). From a security point of view, it's always best to
keep something as simple as possible (and that may well include supporting
sprintf), because "You cannot prove security. You can only prove
insecurity".

Regarding the section "WHERE IN" you argue that we should push everyone
to use parameters properly — which is a great idea in theory — but that
would have a significant breakage in existing sites online.

Hopefully my answer to #2 addresses this, as you're right, that would
definitely cause more problems than it solves. Certainly for this
implementation I support libraries providing warnings and think those would
be the most appropriate, rather than exceptions. (Though obviously
libraries know their user bases best of course).

In summary of my future scope answer above, libraries could be able to mark
their output as trusted for functions like mysqli_query(). For SQL it would
ideally still be a literal string, but that's not always possible
(especially with HTML), so that's where the trusted stringable
value-objects would be used. For legacy systems, developers could simply
switch off the warnings entirely (but at least that's a choice they are
actively choosing to make).

Given #2 and #4, I think your section "Support Functions" is missing
[...] as_literal() or make_literal() or even as_unsafe_literal()

I'm not against it, but I don't think it's needed (and with it raising
warnings not exceptions it won’t be necessary to prevent things breaking).

In the Example Library I made under the "Try It" section in the RFC:

https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4

It checks for literals but also accepts a value-object allowing you to
bypass that check (which I think is probably the best way to implement this
at the moment). It shows how an unsafe_value value-object could be used
and accepted by the example library, in the same way DB::raw() works in
Laravel. In the future a similar setup in PHP’s native functions could
accept literals or value-objects which can be trusted, so we shouldn’t need
a way to lie to PHP and tell it to treat one as the other.

Regarding the section "Reflection API," shouldn't their be an

is_literal()/is_from_literal() method added to the ReflectionParameter()
and ReflectionProperty() classes?

If I can just check we are talking about the same thing, so where we have:

class my_class {
private string|int $my_property = 0;
private function my_method(string|int $my_parameter) {
}
}

$reflector = new ReflectionClass('my_class');
$property = $reflector->getProperty('my_property');
$method = $reflector->getMethod('my_method');
$parameter = $method->getParameters()[0];

var_dump(
$property->getType()->__toString(),
$parameter->getType()->__toString(),
);

The ReflectionParameter/ReflectionProperty allows us to return their
respective types (ReflectionType). In this case, it's saying these two are
"string|int", which isn't the current value, just what they can accept.

I think this is touching on an area that Joe mentioned yesterday, "a
possible future where we have first class support" in a more intricate way.
I think there is value in this, as it would allow PHP itself to reject
non-literal values, but that should be a separate RFC, as that's a bit more
of a language change, and will need its own discussion.

So assuming you address at least #5 — and thus #2 and #4 indirectly — I

would really like to see this RFC pass.

-Mike

Thank you Mike

Craig

4 years ago by someniatko — view source

unread

Hi!

From what I understood reading the RFC, this looks like a new type,
"literal string", specifically a subtype of "string". Have you
considered instead of forcing library authors to use is_literal()
check everywhere on their input boundary and making sure no place
where input is passed, is forgotten, to let them type their whole
codebase, including internal, against the new literal-string type,
which would be enforced by PHP all the time? From my perspective as a
PHP developer, it would serve clearer intent than only checking in a
few places which presumably may take user input, i.e. you could
explicitly state "this function does only work with literal strings".

Regards,
someniatko

4 years ago by Craig Francis — view source

unread

[...] Have you considered [a] new literal-string type, which would be
enforced by PHP all the time?

Thanks someniatko, a "clearer intent" is a really nice way of putting it.

It is something that Joe is considering, but for a future version,
simply because it will require more discussion (which I'm happy to help
with).

And I agree, there would be value in having a literal-string type,
especially when working with IDE's and Static Analysers; but it brings a
couple of issues that the is_literal() function solves:

Libraries need to support older versions of PHP, so they wouldn't be
able to use the literal-string for many years; whereas they can use
function_exists('is_literal') today, probably to provide their own
function that simply returns true on pre PHP 8.1.
The is_literal() function gives the libraries a choice on how to handle
the situation. At the moment they just receive a string of unknown origin,
so they can trigger a warning, while also supporting legacy projects who
want to switch off the warnings. And some developers might want to use a
library that has an option that can raise exceptions (so they can be
certain there are no issues).

Craig

4 years ago by someniatko — view source

unread

And I agree, there would be value in having a literal-string type, especially when working with IDE's and Static Analysers; but it brings a couple of issues that the is_literal() function solves:

Yeah, those are valid arguments why we should have a dedicated
is_literal() function, and in fact we could have both the type and
the function at the same time, the same as we have string type and
is_string() function. My worry is, if type is not added in the
present RFC, adding it later may look as "too low value" and probably
will have a lower chance of being accepted.

Libraries need to support older versions of PHP, so they wouldn't be able to use the literal-string for many years; whereas they can use function_exists('is_literal') today, probably to provide their own function that simply returns true on pre PHP 8.1.

If the type is added in say 8.2, they wouldn't be able to use it even
longer. Of course this can be re-modelled using a value object which
performs the is_literal() check in the constructor, however I think a
native PHP type would be better, for example in the case of variance
checks (being a subtype of "string").

The is_literal() function gives the libraries a choice on how to handle the situation. At the moment they just receive a string of unknown origin, so they can trigger a warning, while also supporting legacy projects who want to switch off the warnings. And some developers might want to use a library that has an option that can raise exceptions (so they can be certain there are no issues).

If we have both the fn and the type, libraries would still have that
choice, either enforce it solidly and throw TypeError (PHP 8.1), or
polyfill the is_literal() function to always return true and be
able to both take advantage of PHP 8.1 and work on the earlier
versions.

I hope these points will be really considered, and PHP's type system
will evolve into more robust and strong one. (offtopic: hoping we will
have typed vector and dict types someday too)

Craig

Regards,
someniatko

4 years ago by someniatko — view source

unread

Nevertheless, I am sure even without a dedicated type it is still a
valuable addition to the engine. Just to clarify, I understand that if
this would require too much additional work to implement the type, IMO
it's okay to ship the feature without it for now. I was just replying
to other arguments, which are not related to the work amount :P

Thank you for your work!

Regards,
someniatko

4 years ago by Craig Francis — view source

unread

I am sure even without a dedicated type it is still a valuable addition to
the engine [...] IMO it's okay to ship the feature without it for now.

Thanks someniatko, it's certainly something I'm interested in (8.2?), as I
think a dedicated type would be useful, I'm just cautious of trying to
cover too many things before the 8.1 deadline - especially because
types will be part of the native functions discussion.

4 years ago by Dik Takken — view source

unread

Hi!

From what I understood reading the RFC, this looks like a new type,
"literal string", specifically a subtype of "string". Have you
considered instead of forcing library authors to use is_literal()
check everywhere on their input boundary and making sure no place
where input is passed, is forgotten, to let them type their whole
codebase, including internal, against the new literal-string type,
which would be enforced by PHP all the time? From my perspective as a
PHP developer, it would serve clearer intent than only checking in a
few places which presumably may take user input, i.e. you could
explicitly state "this function does only work with literal strings".

That sounds like a great future extension. Then it would be nice to have
the name of the new type match the name of the function that checks for
it. In case the existing 'string' type would be paired with a new
'lstring' type the existing is_string() would pair nicely with a new
is_lstring() function.

Regards,
Dik Takken

4 years ago by Craig Francis — view source

unread

That sounds like a great future extension. Then it would be nice to have
the name of the new type match the name of the function that checks for
it. In case the existing 'string' type would be paired with a new
'lstring' type the existing is_string() would pair nicely with a new
is_lstring() function.

That's a good suggestion for a name as well, especially when thinking about
its use as a type.

I've put up a basic MarkDown file on GitHub (the repo has been my general
dumping ground for files related to this).

https://github.com/craigfrancis/php-is-literal-rfc/blob/main/name.md

I'm happy to make additions and/or edits (can also accept pull requests if
you prefer).

Then maybe next week we can have a simple poll somewhere? Not sure if
anyone has any suggestions, but maybe Google Forms, where we can simply
enter you name and choose your preference (just going on a basic honour
system).

Craig

4 years ago by Matthew Brown — view source

unread

On Sat, 12 Jun 2021 at 13:00, Craig Francis craig@craigfrancis.co.uk
wrote:

Hi Internals,

I'd like to start the discussion on the is_literal() RFC:

https://wiki.php.net/rfc/is_literal

is_literal() brings a proven way to identify Injection Vulnerabilities to
PHP, already used by Google in their Java and Go projects, and is currently
being added to JavaScript. It's a lightweight and simple approach:
"Distinguishing strings from a trusted developer from strings that may be
attacker controlled", allowing Libraries to identify the mistakes made by
the thousands of developers using them incorrectly.

When Libraries use is_literal() to protect against these mistakes, we can
trust their output, and (as covered in the Future scope section) PHP can
then raise warnings with certain native functions like PDO::query,
mysqli_query, exec, preg_match, etc. (we would only consider warnings,
anything stricter like exceptions would be in many years time, if at all -
the intention is to alert and inform people, not break things).

The length is due to the FAQ Section, on why it's needed, how it can be
used by Libraries, and the important differences of using this flag versus
the flawed Taint Checking approach with its false sense of security
(error-prone escaping).

Thanks,
Craig Francis

PS: If anyone wants to discuss face-to-face on Zoom, I'll be available (UK
Time/BST/UTC+1) at:

https://chat.craigfrancis.co.uk/

Saturday 12th June, 6pm to 8pm;
Sunday 13th June, 10am to Midday;
Monday 14th June, 5pm to 7pm;
Tuesday 15th June, 9pm to 11pm;
Thursday 16th June, 10am to Midday;
(other times on request)

Hi Craig and Dan,

I was skeptical about the first draft of this RFC when I saw it last
month, but now I see the light (especially with the concat changes). This
looks like a very solid solution for any library authors wanting to add a
layer of protection against SQL injection attacks. Sorry for any
unnecessary grief I caused.

The only remaining issue I have is performance — Psalm and other static
analysis tools perform quite a lot of concatenation (and never have to
worry about user input). What sorts of slowdowns do you see when running
those tools?

Best wishes,

Matt

4 years ago by Craig Francis — view source

unread

On Sun, 13 Jun 2021 at 22:30, Matthew Brown matthewmatthew@gmail.com
wrote:

I was skeptical about the first draft of this RFC when I saw it last month
[...] Sorry for any unnecessary grief I caused.

Hi Matthew, actually I'd like to thank you for that early feedback, the
first draft did need work doing to it, and your feedback, along
with others, helped me focus on the main concerns.

The only remaining issue I have is performance — Psalm and other static
analysis tools perform quite a lot of concatenation (and never have to
worry about user input). What sorts of slowdowns do you see when running
those tools?

For Psalm in particular, I’ve just done a quick run of 10 alternating tests
on a website with 90,000 lines of code, at error level 2 (the default
level), using --no-cache, using PHP 8.1.0-dev from a couple of weeks ago.
Without is_literal it averaged 8.355s, and with is_literal, 8.373s - so
that’s a +0.018s difference, or +0.22% (full numbers below).

It’s not full lab-condition testing, but hopefully more than representative
enough.

(And repeating the RFC, just for a quick reference: Máté, who does
stress-tests on servers, found that the Laravel Demo was +0.124%,
and Symfony Demo was +0.161%. There was also a stress test, which simply
concatenated 4 strings, 5 million times, with no other actions, so not
exactly representative of a typical PHP script, which was +3.719%. None of
these tests involved connecting to the database, as the variability
introduced made it impossible to measure that level of difference).

Full Psalm-check Stats:

A = Without is_literal
B = With is_literal

A = Checks took 8.36 seconds and used 229.894MB of memory
B = Checks took 8.46 seconds and used 229.894MB of memory
A = Checks took 8.43 seconds and used 229.894MB of memory
B = Checks took 8.57 seconds and used 229.894MB of memory
A = Checks took 8.50 seconds and used 229.894MB of memory
B = Checks took 8.40 seconds and used 229.894MB of memory
A = Checks took 8.36 seconds and used 229.894MB of memory
B = Checks took 8.32 seconds and used 229.894MB of memory
A = Checks took 8.31 seconds and used 229.894MB of memory
B = Checks took 8.38 seconds and used 229.894MB of memory
A = Checks took 8.30 seconds and used 229.894MB of memory
B = Checks took 8.29 seconds and used 229.894MB of memory
A = Checks took 8.31 seconds and used 229.894MB of memory
B = Checks took 8.37 seconds and used 229.894MB of memory
A = Checks took 8.37 seconds and used 229.894MB of memory
B = Checks took 8.32 seconds and used 229.894MB of memory
A = Checks took 8.31 seconds and used 229.894MB of memory
B = Checks took 8.29 seconds and used 229.894MB of memory
A = Checks took 8.30 seconds and used 229.894MB of memory
B = Checks took 8.33 seconds and used 229.894MB of memory

4 years ago by Matthew Brown — view source

unread

I've just added support for a literal-string type to Psalm:
https://psalm.dev/r/9440908f39

This will remain whether or not the RFC passes, but if it passes the
is_literal check will be used to inform Psalm in a similar fashion to how
array_is_list (coming in 8.1) will inform Psalm's list array type.

Adding the literal-string type guard to our main SQL creation endpoint
identifies roughly 20% of all DB queries that would not pass the
is_literal check, and would need to use some sort of explicitly unsafe
endpoint. From my perspective biggest issue with the current approach is
that

implode(', ', [1, 2, 3])

is treated as non-literal. We have a lot of queries that look like

WHERE foo IN (' . implode(', ', self::ALLOWED_IDS) . ')'

and these would all have to take the "unsafe" path, which is a bit
annoying. Could you add the same flag to ints, or is that too hard?

On Sat, 12 Jun 2021 at 13:00, Craig Francis craig@craigfrancis.co.uk
wrote:

Hi Internals,

I'd like to start the discussion on the is_literal() RFC:

https://wiki.php.net/rfc/is_literal

is_literal() brings a proven way to identify Injection Vulnerabilities to
PHP, already used by Google in their Java and Go projects, and is currently
being added to JavaScript. It's a lightweight and simple approach:
"Distinguishing strings from a trusted developer from strings that may be
attacker controlled", allowing Libraries to identify the mistakes made by
the thousands of developers using them incorrectly.

When Libraries use is_literal() to protect against these mistakes, we can
trust their output, and (as covered in the Future scope section) PHP can
then raise warnings with certain native functions like PDO::query,
mysqli_query, exec, preg_match, etc. (we would only consider warnings,
anything stricter like exceptions would be in many years time, if at all -
the intention is to alert and inform people, not break things).

The length is due to the FAQ Section, on why it's needed, how it can be
used by Libraries, and the important differences of using this flag versus
the flawed Taint Checking approach with its false sense of security
(error-prone escaping).

Thanks,
Craig Francis

PS: If anyone wants to discuss face-to-face on Zoom, I'll be available (UK
Time/BST/UTC+1) at:

https://chat.craigfrancis.co.uk/

Saturday 12th June, 6pm to 8pm;
Sunday 13th June, 10am to Midday;
Monday 14th June, 5pm to 7pm;
Tuesday 15th June, 9pm to 11pm;
Thursday 16th June, 10am to Midday;
(other times on request)

4 years ago by Claude Pache — view source

unread

Le 12 juin 2021 à 19:00, Craig Francis craig@craigfrancis.co.uk a écrit :
Hi Internals,

I'd like to start the discussion on the is_literal() RFC:

https://wiki.php.net/rfc/is_literal

Hi,

I’ve found an interesting bug in the implementation:

https://3v4l.org/JfSHX/rfc#rfc.literals

—Claude

4 years ago by Joe Watkins — view source

unread

Morning,

https://3v4l.org/nJhc1/rfc#focus=rfc.literals

It's not so much a bug as a side effect or quirk.

Note that, the result is correct, in the sense that you do have a literal
string - it is not marking an unsafe string as safe.

It's just that existing implementation details - RETURN_CHAR using interned
strings, and literal constant strings being interned by the compiler - lead
to a literal string being "re-used".

This is just a natural consequence of literal strings retaining their
literalness in the way we want them too, nothing dangerous is going on.

Cheers
Joe

Le 12 juin 2021 à 19:00, Craig Francis craig@craigfrancis.co.uk a
écrit :

Hi Internals,

I'd like to start the discussion on the is_literal() RFC:

https://wiki.php.net/rfc/is_literal

Hi,

I’ve found an interesting bug in the implementation:

https://3v4l.org/JfSHX/rfc#rfc.literals

—Claude

4 years ago by Claude Pache — view source

unread

Le 15 juin 2021 à 09:19, Joe Watkins krakjoe@gmail.com a écrit :

Morning,

https://3v4l.org/nJhc1/rfc#focus=rfc.literals https://3v4l.org/nJhc1/rfc#focus=rfc.literals

It's not so much a bug as a side effect or quirk.

Note that, the result is correct, in the sense that you do have a literal string - it is not marking an unsafe string as safe.

It's just that existing implementation details - RETURN_CHAR using interned strings, and literal constant strings being interned by the compiler - lead to a literal string being "re-used".

This is just a natural consequence of literal strings retaining their literalness in the way we want them too, nothing dangerous is going on.

Hi,

I disagree that it is not dangerous. At the very least, it makes is_literal() unpredictable, therefore unreliable, which is bad. The following git is more illustrative of the issue:

https://3v4l.org/SNJDJ/rfc#focus=rfc.literals https://3v4l.org/SNJDJ/rfc#focus=rfc.literals

The fact that a given string happens to be written literally somewhere else in the program doesn’t mean that it is written literally in that context. Taking the example from the RFC:

$qb->select('u')
->from('User', 'u')
->where('u.id = ' . $_GET['id']); // INSECURE
I want that the $qb->where() method is able to reliably detect that the string provided is not literal, even when it happens by chance that the exact same string appears literally somewhere else in my code.

—Claude

4 years ago by Joe Watkins — view source

unread

Hi,

It's not unpredictable.

krakjoe@Fiji:/opt/src/php-src$ cat nothing.php
<?php
$var = 'not-literal';

var_dump(
$_ENV['var'],
is_literal($_ENV['var']));
?>
krakjoe@Fiji:/opt/src/php-src$ var=not-literal sapi/cli/php nothing.php
string(11) "not-literal"
bool(false)

The engine doesn't intern "randomly", and doesn't intern input.

Cheers
Joe

Le 15 juin 2021 à 09:19, Joe Watkins krakjoe@gmail.com a écrit :

Morning,

https://3v4l.org/nJhc1/rfc#focus=rfc.literals

It's not so much a bug as a side effect or quirk.

Note that, the result is correct, in the sense that you do have a literal
string - it is not marking an unsafe string as safe.

It's just that existing implementation details - RETURN_CHAR using
interned strings, and literal constant strings being interned by the
compiler - lead to a literal string being "re-used".

This is just a natural consequence of literal strings retaining their
literalness in the way we want them too, nothing dangerous is going on.

Hi,

I disagree that it is not dangerous. At the very least, it makes
is_literal() unpredictable, therefore unreliable, which is bad. The
following git is more illustrative of the issue:

https://3v4l.org/SNJDJ/rfc#focus=rfc.literals

The fact that a given string happens to be written literally somewhere
else in the program doesn’t mean that it is written literally in that
context. Taking the example from the RFC:

$qb->select('u')
->from('User', 'u')
->where('u.id = ' . $_GET['id']); // INSECURE

I want that the $qb->where() method is able to reliably detect that the
string provided is not literal, even when it happens by chance that the
exact same string appears literally somewhere else in my code.

—Claude

4 years ago by Rowan Tommins — view source

unread

https://3v4l.org/nJhc1/rfc#focus=rfc.literals

It's not so much a bug as a side effect or quirk.

Note that, the result is correct, in the sense that you do have a literal
string - it is not marking an unsafe string as safe.

It's possible to create more complex cases of this, e.g.
https://3v4l.org/GFCQC/rfc#focus=rfc.literals

$literal = 'p';
$ord = ord('o');
$chr = chr($ord+1); // the whole of chr(ord('o')+1) is optimized to a
literal 'p'
var_dump($chr, is_literal($chr)); // 'p', true

There's a lot of potential for optimizations to leak there, but it's
probably safe, as long as the optimizations all rely on compile-time
information, and therefore can't be controlled by the user. Are there
any run-time optimizations that could also be leaked, e.g. JIT?

Regards,

--
Rowan Tommins
[IMSoP]

4 years ago by lauri.kentta@gmail.com — view source

unread

It's just that existing implementation details - RETURN_CHAR using
interned
strings, and literal constant strings being interned by the compiler -
lead
to a literal string being "re-used".

This is just a natural consequence of literal strings retaining their
literalness in the way we want them too, nothing dangerous is going on.

This happens with substr as well, even with user input.
With just one character, the possibilities are rather limited, but still
I'm sure someone finds a way to turn this into a vulnerability.

Such as:
$c = substr($input, 0, 1); # $c is now literal.
$sql = "SELECT * FROM t WHERE c LIKE '$c%'";

Hopefully this "retaining literalness" does not extend to longer strings
after any operations.

What about Opcache? A few years back I debugged some interning-related
bugs which only manifested with Opcache, so I'm just thinking maybe
there's some way to exploit Opcache to make some string interned under
special circumstances, and just maybe that string can then be
accidentally reused in a SQL query. I'm not in the mood to dive into
Opcache internals to check this, so I hope someone with prior knowledge
can tell what kind of strings get interned by Opcache. Array keys?
Object properties?

--
Lauri Kenttä

4 years ago by Craig Francis — view source

unread

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

Hi Internals,

Following up on the is_literal() RFC, thanks for the feedback. It looks
like there are only 2 minor open issues - updating the implementation to
allow integers, and what to call the flag/function.

Matthew Brown wants to support integer values, simply because so much code
already includes them, and I cannot find a single way that integers alone
can cause issues from an Injection Vulnerability point of view (but if
anyone can, I absolutely welcome their input). Other variable types like
floats and booleans will still be excluded (because the value you put in
often is not what you get out, e.g. true/TRUE being converted to "1").

Which leads us to the name, because "is_literal" may be, uh, too literal.
So can we come up with something better?

It needs to be a name that suggests: This variable contains programmer
defined strings, integers, and interned values (as noticed by Claude Pache
and Rowan Tommins). Ideally staying in the standard convention of
‘is_singleword’.

Joe has suggested is_known(), where "we have the concept of known strings
internally", which I like (though might clash with more userland functions
than ‘is_literal’). Last night I also wondered about is_constrained(), as
the value has limited sources. But I'd also welcome more suggestions, and
am happy to set up a poll (thanks Dharman for the strawpoll.com suggestion).

I've also updated the RFC Future Scope to note how this could be a
dedicated type (thanks someniatko, Matthew Brown, and Dik Takken); and I'm
really impressed to see the addition to Psalm so quickly.

Craig

4 years ago by Dik Takken — view source

unread

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

Hi Internals,

Following up on the is_literal() RFC, thanks for the feedback. It looks
like there are only 2 minor open issues - updating the implementation to
allow integers, and what to call the flag/function.

Matthew Brown wants to support integer values, simply because so much code
already includes them, and I cannot find a single way that integers alone
can cause issues from an Injection Vulnerability point of view (but if
anyone can, I absolutely welcome their input). Other variable types like
floats and booleans will still be excluded (because the value you put in
often is not what you get out, e.g. true/TRUE being converted to "1").

The security concerns that this RFC is addressing arise from data that
an attacker might control, like user input. In that respect I guess
there need not be any difference between types of literals.

Expanding the 'literal' concept to other types than just strings sounds
like a good idea. However, limiting it to just integers sounds a little
arbitrary. If people want to insert a float or boolean into their query
and they are happy with the way those are coerced into strings, why not
allow it?

Which leads us to the name, because "is_literal" may be, uh, too literal.
So can we come up with something better?

It needs to be a name that suggests: This variable contains programmer
defined strings, integers, and interned values (as noticed by Claude Pache
and Rowan Tommins). Ideally staying in the standard convention of
‘is_singleword’.

Joe has suggested is_known(), where "we have the concept of known strings
internally", which I like (though might clash with more userland functions
than ‘is_literal’). Last night I also wondered about is_constrained(), as
the value has limited sources. But I'd also welcome more suggestions, and
am happy to set up a poll (thanks Dharman for the strawpoll.com suggestion).

Throwing in another idea: is_hard_coded().

I've also updated the RFC Future Scope to note how this could be a
dedicated type (thanks someniatko, Matthew Brown, and Dik Takken); and I'm
really impressed to see the addition to Psalm so quickly.

Craig

4 years ago by Craig Francis — view source

unread

Matthew Brown wants to support integer values, simply because so much
code
already includes them, and I cannot find a single way that integers alone
can cause issues from an Injection Vulnerability point of view.

[…]

Expanding the 'literal' concept to other types than just strings
sounds like a good idea. However, limiting it to just integers sounds a
little arbitrary. If people want to insert a float or boolean into their
query and they are happy with the way those are coerced into strings, why
not allow it?

Applying this to floats and booleans might be dangerous, as what’s put in
might not be what comes out (I want to keep the scope as limited as
possible). is_literal can be used for strings because we can flag what’s
user and what’s developer defined, and with Matthew’s request, it could do
integers (because an integer value alone is not inherently risky, and it’s
already used a lot). However because of things like true/TRUE getting
converted to "1", we can’t apply the same certainty to floats and booleans

it wouldn’t be a consistent secure approach.

Which leads us to the name, because "is_literal" may be, uh, too literal.

So can we come up with something better?

Throwing in another idea: is_hard_coded().

I’d be a little hesitant on the name ‘is_hard_coded’, if we allow integers,
that means that it’s no longer strictly hard coded, and might get confusing.

4 years ago by Bruce Weirdan — view source

unread

is_literal can be used for strings because we can flag what’s
user and what’s developer defined, and with Matthew’s request, it could do
integers (because an integer value alone is not inherently risky, and it’s
already used a lot).

To clarify, do you imply that all integers are safe? Or would they
also be differentiated into literal and non-literal varieties?

--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com

4 years ago by Craig Francis — view source

unread

On Thu, Jun 17, 2021 at 12:01 AM Craig Francis craig@craigfrancis.co.uk
wrote:

is_literal can be used for strings because we can flag what’s
user and what’s developer defined, and with Matthew’s request, it could
do
integers (because an integer value alone is not inherently risky, and
it’s
already used a lot).

To clarify, do you imply that all integers are safe? Or would they
also be differentiated into literal and non-literal varieties?

This would be all integers, to support the request from Matthew.

I also believe, after a quick check with Joe, that it’s not possible to add
a flag to an integer (“non reference counted types are stack allocated, ie
the zval on either side of a call boundary are unique”). So it couldn’t
work the same way.

Craig

4 years ago by Dik Takken — view source

unread

Which leads us to the name, because "is_literal" may be, uh, too literal.

So can we come up with something better?

Throwing in another idea: is_hard_coded().

I’d be a little hesitant on the name ‘is_hard_coded’, if we allow integers,
that means that it’s no longer strictly hard coded, and might get confusing.

Ah, I was assuming that you were still strictly referring to integer
literals, not variables containing integers. Then indeed the name would
be confusing.

Regards,
Dik Takken

4 years ago by Mike Schinkel — view source

unread

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

Hi Internals,

Following up on the is_literal() RFC, thanks for the feedback. It looks
like there are only 2 minor open issues - updating the implementation to
allow integers, and what to call the flag/function.

Matthew Brown wants to support integer values, simply because so much code
already includes them, and I cannot find a single way that integers alone
can cause issues from an Injection Vulnerability point of view (but if
anyone can, I absolutely welcome their input). Other variable types like
floats and booleans will still be excluded (because the value you put in
often is not what you get out, e.g. true/TRUE being converted to "1").

Which leads us to the name, because "is_literal" may be, uh, too literal.
So can we come up with something better?

It needs to be a name that suggests: This variable contains programmer
defined strings, integers, and interned values (as noticed by Claude Pache
and Rowan Tommins). Ideally staying in the standard convention of
‘is_singleword’.

Joe has suggested is_known(), where "we have the concept of known strings
internally", which I like (though might clash with more userland functions
than ‘is_literal’). Last night I also wondered about is_constrained(), as
the value has limited sources. But I'd also welcome more suggestions, and
am happy to set up a poll (thanks Dharman for the strawpoll.com suggestion).

"is_trusted()" comes to mind, inspired by Javascript's Trusted Types API[1].

That way in the future it could validate other trusted types besides just literals and integers when and if an agreement on what should be trusted becomes clear

Another plus with the semantics is in many cases the library developer who used is_trusted() would not need to update their code to support these newly trusted types, e.g. literal-string types, "trusted" classes with a __ToString() method, etc.

-Mike

[1] https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API

4 years ago by Craig Francis — view source

unread

On Wed, 16 Jun 2021 at 18:24, Craig Francis craig@craigfrancis.co.uk
wrote:

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

Following up on the is_literal() RFC, thanks for the feedback. It looks
like there are only 2 minor open issues - updating the implementation to
allow integers, and what to call the flag/function.

As there’s been no issues raised with supporting integers, and doing so
will help adoption, the implementation will be updated to allow them.

Now to choose the name, with the options is_known() from Joe and
is_trusted() from Moritz:

https://strawpoll.com/bd2qed2xs

Keep in mind it might also become a dedicated type in the future.

Craig

4 years ago by Paul Crovella — view source

unread

As there’s been no issues raised with supporting integers, and doing so
will help adoption, the implementation will be updated to allow them.

Now to choose the name, with the options is_known() from Joe and
is_trusted() from Moritz:

https://strawpoll.com/bd2qed2xs

Keep in mind it might also become a dedicated type in the future.

Craig

Someone somewhere will end up writing a function that drops their
database when given a 3 that PHP told them to trust.

4 years ago by Pierre — view source

unread

Le 18/06/2021 à 08:00, Craig Francis a écrit :

Keep in mind it might also become a dedicated type in the future.

Hello,

If so, why the question should not be about the type name instead ? It
might raises different concerns and new arguments to this discussion ?

What is this type ? What does it covers ? Can float, int, etc... be
literals as well ? Are those subtypes of their actual existing related
type ? What should be the behavior regarding user method typings, etc etc ?

Instead of introducing is_literal() may be directly introducing those
types could be a good idea as well ?

Regards,

--

Pierre

4 years ago by Craig Francis — view source

unread

Le 18/06/2021 à 08:00, Craig Francis a écrit :

Keep in mind it might also become a dedicated type in the future.

Hello,

If so, why the question should not be about the type name instead ? It
might raises different concerns and new arguments to this discussion ?

What is this type ? What does it covers ? Can float, int, etc... be
literals as well ? Are those subtypes of their actual existing related
type ? What should be the behavior regarding user method typings, etc etc ?

Instead of introducing is_literal() may be directly introducing those
types could be a good idea as well ?

Hi Pierre,

On Monday we had the discussion about types:

https://externals.io/message/114835#114846

The RFCs Future Scope was updated to note the suggestion from someniatko
and Matthew about how this could be a type in the future (Joe has also
shown an interest); where it was agreed that it should be added later.

The discussions I’ve had about this flag have been to ensure it covers all
future use cases, including a dedicated type, where the function is an easy
way to expose this flag to libraries today, so they can handle developer
mistakes gracefully (rather than causing type errors), with types needing
to have their own future/separate discussion.

The type will still use this flag, so it will match the same definition in
the RFC - strings defined by the programmer, and we will include integers
after the feedback from Matthew (integers have always been considered, the
only reason they were originally excluded was because I tried to keep the
definition as small as possible, Matthew noted how they would help
adoption, and no one can find a single issue with allowing them).

The RFC also explains why floats and booleans are not included, so wouldn’t
be used by the type for the same reason.

While adding a flag is an ideal to work towards, it is more important to
have this simple change first, so it can start improving security
immediately (so libraries don’t need to wait years for all of their
supported versions of PHP to include the type), and it gives libraries full
freedom in how they handle values which don’t have this flag.

Hopefully this and the Future Scope section can also answer your questions.

Craig

4 years ago by Pierre — view source

unread

Le 18/06/2021 à 11:41, Craig Francis a écrit :

Hi Pierre,

On Monday we had the discussion about types:

https://externals.io/message/114835#114846

The RFCs Future Scope was updated to note the suggestion from someniatko
and Matthew about how this could be a type in the future (Joe has also
shown an interest); where it was agreed that it should be added later.
Thanks for pointing out this discussion, I must have missed it.
The discussions I’ve had about this flag have been to ensure it covers all
future use cases, including a dedicated type, where the function is an easy
way to expose this flag to libraries today, so they can handle developer
mistakes gracefully (rather than causing type errors), with types needing
to have their own future/separate discussion.

The type will still use this flag, so it will match the same definition in
the RFC - strings defined by the programmer, and we will include integers
after the feedback from Matthew (integers have always been considered, the
only reason they were originally excluded was because I tried to keep the
definition as small as possible, Matthew noted how they would help
adoption, and no one can find a single issue with allowing them).

Makes sense.

In many cases, I didn't want to bypass/override the RFC to something
else, but just point out that the discussion about future types names
could give hint about the function name itself.

Regards,

--

Pierre

4 years ago by Marc Bennewitz — view source

unread

On Wed, 16 Jun 2021 at 18:24, Craig Francis craig@craigfrancis.co.uk
wrote:

On Sat, 12 Jun 2021 at 18:00, Craig Francis craig@craigfrancis.co.uk
wrote:

I'd like to start the discussion on the is_literal() RFC:
https://wiki.php.net/rfc/is_literal

Following up on the is_literal() RFC, thanks for the feedback. It looks
like there are only 2 minor open issues - updating the implementation to
allow integers, and what to call the flag/function.

As there’s been no issues raised with supporting integers, and doing so
will help adoption, the implementation will be updated to allow them.

Not sure but what happens if you have like a DB connection in big5,
sjis, ... and add an integer as ASCII char into it? But that's the only
edge case I can think of.

Now to choose the name, with the options is_known() from Joe and
is_trusted() from Moritz:

https://strawpoll.com/bd2qed2xs

Keep in mind it might also become a dedicated type in the future.

Craig

4 years ago by Craig Francis — view source

unread

Not sure but what happens if you have like a DB connection in big5,
sjis, ... and add an integer as ASCII char into it? But that's the only
edge case I can think of.

The integer character code points are the same in all three. PHP represents
integers with the western arabic numerals 0-9, so as long as their code
points are the same, nothing changes between any of the charsets.

While I’ve not checked all other character encodings (far too many), the
ones I did check were the most-used character encodings and they all used
extended ASCII (that includes big5 and Shift JIS).

The only exception is that there is something called EBCDIC (an old IBM
character encoding descended from punch cards), but it’s very deprecated
and PHP doesn’t work with it anyway - PHP explicitly has not supported it
since 8.0

“EBCDIC targets are no longer supported, though it's unlikely that they
were still working in the first place.”

https://www.php.net/manual/en/migration80.other-changes.php

4 years ago by Pierre — view source

unread

Le 18/06/2021 à 08:00, Craig Francis a écrit :

As there’s been no issues raised with supporting integers, and doing so
will help adoption, the implementation will be updated to allow them.

Now to choose the name, with the options is_known() from Joe and
is_trusted() from Moritz:

https://strawpoll.com/bd2qed2xs

Keep in mind it might also become a dedicated type in the future.

Craig

If I'm understanding this correctly, what you call a literal value is a
statically written and compiled string, which doesn't originate from a
user given value, but was hardcoded at some point. If so, even if the
primary use is of course for security concerns, the realty of what it
does is not, it's mostly about how a certain variable was defined /
assigned. How about is_static() or is_value_static() ? Or even
is_statically_defined() ? There's many options in this path.

Regards,

--

Pierre

4 years ago by Guilliam Xavier — view source

unread

Le 18/06/2021 à 08:00, Craig Francis a écrit :

As there’s been no issues raised with supporting integers, and doing so
will help adoption, the implementation will be updated to allow them.

Now to choose the name, with the options is_known() from Joe and
is_trusted() from Moritz:

https://strawpoll.com/bd2qed2xs

Keep in mind it might also become a dedicated type in the future.

Craig

If I'm understanding this correctly, what you call a literal value is a
statically written and compiled string, which doesn't originate from a
user given value, but was hardcoded at some point. If so, even if the
primary use is of course for security concerns, the realty of what it
does is not, it's mostly about how a certain variable was defined /
assigned. How about is_static() or is_value_static() ? Or even
is_statically_defined() ? There's many options in this path.

IIUC, with the addition of integers, the function will return true for e.g.
'SELECT * FROM foo LIMIT ' . (int)$limit even if $limit doesn't come from
a "static" value (e.g. random_int() or even $_GET['limit'])

--
Guilliam Xavier

4 years ago by Craig Francis — view source

unread

On Fri, 18 Jun 2021 at 11:45 am, Guilliam Xavier guilliam.xavier@gmail.com
wrote:

IIUC, with the addition of integers, the function will return true for e.g.
'SELECT * FROM foo LIMIT ' . (int)$limit even if $limit doesn't come from
a "static" value (e.g. random_int() or even $_GET['limit'])

Yes, that’s correct.

Supporting integers from any source helps with adoption, and we cannot find
any security issues (it’s a fairly small change to the RFC, and that
prompted the new name, especially as the original is_literal wasn’t
perfect).

4 years ago by Mike Schinkel — view source

unread

On Fri, 18 Jun 2021 at 11:45 am, Guilliam Xavier guilliam.xavier@gmail.com
wrote:

IIUC, with the addition of integers, the function will return true for e.g.
'SELECT * FROM foo LIMIT ' . (int)$limit even if $limit doesn't come from
a "static" value (e.g. random_int() or even $_GET['limit'])

Yes, that’s correct.

Supporting integers from any source helps with adoption, and we cannot find
any security issues (it’s a fairly small change to the RFC, and that
prompted the new name, especially as the original is_literal wasn’t
perfect).

For the avoidance of doubt can you confirm that this $sql would indeed be trusted?

$ids = array_map( 'intval', $_GET['ids'] ?? [] );
$where = implode( ',', $ids );
$sql = 'SELECT * FROM foo WHERE id IN (' . $where . ')';

Also, as it is painful to have to use string concatenation, can we please consider supporting only the '%s' and '%d' format specifiers when used with trusted strings and integers for sprintf(), respectfully:

$sql = sprintf( 'SELECT * FROM foo WHERE id IN (%s)', $where );

And

$sql = sprintf( 'SELECT * FROM foo LIMIT %d', (int)$limit );

-Mike

4 years ago by Mike Schinkel — view source

unread

On Fri, 18 Jun 2021 at 11:45 am, Guilliam Xavier <guilliam.xavier@gmail.com mailto:guilliam.xavier@gmail.com>
wrote:

IIUC, with the addition of integers, the function will return true for e.g.
'SELECT * FROM foo LIMIT ' . (int)$limit even if $limit doesn't come from
a "static" value (e.g. random_int() or even $_GET['limit'])

Yes, that’s correct.

Supporting integers from any source helps with adoption, and we cannot find
any security issues (it’s a fairly small change to the RFC, and that
prompted the new name, especially as the original is_literal wasn’t
perfect).

For the avoidance of doubt can you confirm that this $sql would indeed be trusted?

$ids = array_map( 'intval', $_GET['ids'] ?? [] );

Sorry, that should have been:

$ids = array_map( 'intval', $_GET['ids'] ?? [0] );

-Mike

4 years ago by Craig Francis — view source

unread

For the avoidance of doubt can you confirm that this $sql would indeed be
trusted?

$ids = array_map( 'intval', $_GET['ids'] ?? [] );

$where = implode( ',', $ids );
$sql = 'SELECT * FROM foo WHERE id IN (' . $where . ')';

Also, as it is painful to have to use string concatenation, can we please
consider supporting only the '%s' and '%d' format specifiers when used with
trusted strings and integers for sprintf(), respectfully:

$sql = sprintf( 'SELECT * FROM foo WHERE id IN (%s)', $where );

And

$sql = sprintf( 'SELECT * FROM foo LIMIT %d', (int)$limit );

Sorry, that should have been:

$ids = array_map( 'intval', $_GET['ids'] ?? [0] );

Yes, I can confirm that that $sql would be trusted.

Yes, sprintf() does support ‘%s’ and ‘%d’; the function has been updated so
it will return a trusted string if all the inputs are trusted (including
other specifiers).

Both your examples work, as we support integers now (so developers do not
need to change their existing code that use these valid approaches), and if
you want to try it I’ve put your examples on 3v4l.org so you can see it
working here:

https://3v4l.org/FvtpW/rfc#focus=rfc.literals

While using parameterised queries is preferred for integers too, it is a
common pattern, and doesn't introduce any Injection Vulnerabilities.

Craig

4 years ago by Pierre — view source

unread

Le 18/06/2021 à 12:45, Guilliam Xavier a écrit :

IIUC, with the addition of integers, the function will return true for e.g.
'SELECT * FROM foo LIMIT ' . (int)$limit even if $limit doesn't come from
a "static" value (e.g. random_int() or even $_GET['limit'])

OK I get it.

I followed the initial discussions but I didn't read everything for a while.

Doesn't it mean that is_literal() which doesn't test anymore if
something is literal does a bit more than that ?

The original intent of being able to tell if a string is literal or not
seems to be a very good idea, but now it forked to something that is
more SQL-OtherDatabase business related: this means that PHP own std
will, in my opinion, take a role it isn't supposed to have, by
arbitrarily (don't take wrongly, all discussions I had read until now
are smart and make sense) telling people what is safe, and what is not ?

The original is_literal() function idea is nice, why not having both,
is_literal() that exposes a purely technical information in one side,
and is_secure() that is more specialized and supposes what is safe or
not considering more than simply string static state, for std provided
SQL database layers ?

In such scenario, userland libraries could use, depending on their own
need, one or both of theses functions.

I'd much like to have the is_literal() function for my own needs, I do
maintain an SQL query builder, but I don't expect it to handle int and
other types as well, I'll probably continue to handle those by myself
since I have a custom escaping API. In my use case, I'd probably not use
the is_secure() function but I would use the is_literal() one, mostly
for performance reasons (avoid to escape something that doesn't need to be).

I think that both being introduced at the same time is a good idea.

Regards,

--

pierre

4 years ago by Craig Francis — view source

unread

Le 18/06/2021 à 12:45, Guilliam Xavier a écrit :

IIUC, with the addition of integers, the function will return true for
e.g.
'SELECT * FROM foo LIMIT ' . (int)$limit even if $limit doesn't come
from
a "static" value (e.g. random_int() or even $_GET['limit'])

OK I get it.

I followed the initial discussions but I didn't read everything for a
while.

Doesn't it mean that is_literal() which doesn't test anymore if something
is literal does a bit more than that ?

I think there may be a misunderstanding here.

We are only going to ‘trust’ variables created from:

Literal Strings (i.e. written by the developer),
Integers.

It’s a very basic approach. The only change to the implementation has been
the inclusion of integers, nothing else. (Any Future Scope is not relevant
yet in regards to a new type etc. and it currently works on its own.)

The original is_literal() function idea is nice, why not having both,

is_literal() that exposes a purely technical information in one side, and
[one] that is more specialized […] than simply string static state, for std
provided SQL database layers ?

We have not found any issues with allowing integers in SQL, HTML, CLI; and
other contexts as well (like preg, mail additional_params, XPath query, and
even gasp eval).

While philosophically more pure, there is actually no extra security
benefit for excluding integers.

And to support having 2 functions, we would need 2 flags on strings. These
flags are limited, and managing 2 flags would affect performance.

Craig

4 years ago by Bruce Weirdan — view source

unread

While philosophically more pure, there is actually no extra security
benefit for excluding integers.

One would be potential denial of service prevention (e.g. with enormous LIMIT
value where only a limited set of ints was intended, like
"Items per page: 10, 20, 50, 100"). Another would be preventing abuse if you
used some integers like role IDs for access control. Using slightly
modified Matt's example:

    function f(array $allowed_ids) {
        //....
        $query .= 'WHERE `foo` IN (' . implode(', ', $allowed_ids) . ')';
        //....
    }

Here you really don't want $allowed_ids to include user input.

Overall I think allowing ints in literal concatenation without
tainting the result as non-literal
is a mistake. It would either prevent implementing proper literal int
type in future, or will make
it inconsistent (where non-literal int would be considered literal by
is_literal() for BC reasons).

Personally I would prefer limited applicability today that would not
prevent future consistent
implementation.

BTW, Psalm already distinguishes literal-int from int and
considers the result of
literal-string + int concatenation a non-literal string:
https://psalm.dev/r/59ad602688
This may mean that Matthew's point has been misinterpreted.

--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com

4 years ago by Craig Francis — view source

unread

One would be potential denial of service prevention (e.g. with enormous
LIMIT value where only a limited set of ints was intended.
[...]

Here you really don't want $allowed_ids to include user input.

The developer is writing this query to find those IDs and what the
developer asks for is what they get, the values inside that list came from
somewhere else and that part earlier is the bit with the unknown
vulnerability that allowed the value in there. I can’t address logical
flaws from the Developer; what you’re describing is a different issue. This
RFC is to help prevent Injection Vulnerabilities, and it can’t be an
Injection Vulnerability because the variable doesn’t contain commands, it’s
only containing values. We cannot create anything entirely ‘safe’ there’s
no such thing as ‘safe code’, and neither I or anyone else can hope to
create a single RFC that can do that.

If we just use the original is_literal (without integers), the only
difference in your example will be that, when using integers for something
like this, the developer will need to use a Parameterised Query instead
(because we’d only be accepting literal strings) to do the exact same
thing, and get that same outcome.

Overall I think allowing ints in literal concatenation without tainting the

result as non-literal is a mistake.

To counter this, supporting integers makes adoption easier, and it’s not
causing any security issues.

I prefer something that more developers/systems will be able to easily use,
especially when updating existing code.

BTW, Psalm already distinguishes literal-int from int

Psalm is its own engine - it’s looking at the code, not running it, so it’s
not negatively impacting performance. However PHP itself cannot do this,
integers cannot include a flag to state if they came from source code,
because adding would have too much of a performance impact.

Or to quote Joe directly "We can't really do that, non reference counted
types are stack allocated, ie the zval on either side of a call boundary
are unique."

We can’t split integers into two groups. It’s accept integers or no
integers at all. And no integers really isn’t feasible for most developers.

Craig

4 years ago by Joe Watkins — view source

unread

Just to explain this tracking of longs.

We can track strings because there's space to do that in GC info which
recounted types all have.

Where things aren't recounted there is no usable space on the value to
store the literal flag without disabling some internal assumptions about
type flags.

We have explored this, it wouldn't be an acceptable implementation detail.

Cheers
Joe

One would be potential denial of service prevention (e.g. with enormous
LIMIT value where only a limited set of ints was intended.
[...]

Here you really don't want $allowed_ids to include user input.

The developer is writing this query to find those IDs and what the
developer asks for is what they get, the values inside that list came from
somewhere else and that part earlier is the bit with the unknown
vulnerability that allowed the value in there. I can’t address logical
flaws from the Developer; what you’re describing is a different issue. This
RFC is to help prevent Injection Vulnerabilities, and it can’t be an
Injection Vulnerability because the variable doesn’t contain commands, it’s
only containing values. We cannot create anything entirely ‘safe’ there’s
no such thing as ‘safe code’, and neither I or anyone else can hope to
create a single RFC that can do that.

If we just use the original is_literal (without integers), the only
difference in your example will be that, when using integers for something
like this, the developer will need to use a Parameterised Query instead
(because we’d only be accepting literal strings) to do the exact same
thing, and get that same outcome.

Overall I think allowing ints in literal concatenation without tainting the

result as non-literal is a mistake.

To counter this, supporting integers makes adoption easier, and it’s not
causing any security issues.

I prefer something that more developers/systems will be able to easily use,
especially when updating existing code.

BTW, Psalm already distinguishes literal-int from int

Psalm is its own engine - it’s looking at the code, not running it, so it’s
not negatively impacting performance. However PHP itself cannot do this,
integers cannot include a flag to state if they came from source code,
because adding would have too much of a performance impact.

Or to quote Joe directly "We can't really do that, non reference counted
types are stack allocated, ie the zval on either side of a call boundary
are unique."

We can’t split integers into two groups. It’s accept integers or no
integers at all. And no integers really isn’t feasible for most developers.

Craig

4 years ago by Joe Watkins — view source

unread

Refcounted* not recounted ... Predictive text on phone ...

Cheers
Joe

Just to explain this tracking of longs.

We can track strings because there's space to do that in GC info which
recounted types all have.

Where things aren't recounted there is no usable space on the value to
store the literal flag without disabling some internal assumptions about
type flags.

We have explored this, it wouldn't be an acceptable implementation detail.

Cheers
Joe

One would be potential denial of service prevention (e.g. with enormous
LIMIT value where only a limited set of ints was intended.
[...]

Here you really don't want $allowed_ids to include user input.

The developer is writing this query to find those IDs and what the
developer asks for is what they get, the values inside that list came from
somewhere else and that part earlier is the bit with the unknown
vulnerability that allowed the value in there. I can’t address logical
flaws from the Developer; what you’re describing is a different issue.
This
RFC is to help prevent Injection Vulnerabilities, and it can’t be an
Injection Vulnerability because the variable doesn’t contain commands,
it’s
only containing values. We cannot create anything entirely ‘safe’ there’s
no such thing as ‘safe code’, and neither I or anyone else can hope to
create a single RFC that can do that.

If we just use the original is_literal (without integers), the only
difference in your example will be that, when using integers for something
like this, the developer will need to use a Parameterised Query instead
(because we’d only be accepting literal strings) to do the exact same
thing, and get that same outcome.

Overall I think allowing ints in literal concatenation without tainting
the

result as non-literal is a mistake.

To counter this, supporting integers makes adoption easier, and it’s not
causing any security issues.

I prefer something that more developers/systems will be able to easily
use,
especially when updating existing code.

BTW, Psalm already distinguishes literal-int from int

Psalm is its own engine - it’s looking at the code, not running it, so
it’s
not negatively impacting performance. However PHP itself cannot do this,
integers cannot include a flag to state if they came from source code,
because adding would have too much of a performance impact.

Or to quote Joe directly "We can't really do that, non reference counted
types are stack allocated, ie the zval on either side of a call boundary
are unique."

We can’t split integers into two groups. It’s accept integers or no
integers at all. And no integers really isn’t feasible for most
developers.

Craig

4 years ago by Dik Takken — view source

unread

Le 18/06/2021 à 12:45, Guilliam Xavier a écrit :

IIUC, with the addition of integers, the function will return true for
e.g.
'SELECT * FROM foo LIMIT ' . (int)$limit even if $limit doesn't come
from
a "static" value (e.g. random_int() or even $_GET['limit'])

OK I get it.

I followed the initial discussions but I didn't read everything for a
while.

Doesn't it mean that is_literal() which doesn't test anymore if
something is literal does a bit more than that ?

The original intent of being able to tell if a string is literal or not
seems to be a very good idea, but now it forked to something that is
more SQL-OtherDatabase business related: this means that PHP own std
will, in my opinion, take a role it isn't supposed to have, by
arbitrarily (don't take wrongly, all discussions I had read until now
are smart and make sense) telling people what is safe, and what is not ?

This is my feeling as well. The original proposal was pure with solid
security guarantees, independent of the context (SQL, HTML, ...) in
which it is used. Elevating some user input to the same level of
security as literals is not ideal.

On the other hand, as Craig pointed out to me (thanks!), a feature that
is too much of a hassle to use may not be widely adopted and the goal of
the proposal (improving security) may not be met.

Regards,

Dik Takken

4 years ago by Andreas Leathley — view source

unread

As there’s been no issues raised with supporting integers, and doing so
will help adoption, the implementation will be updated to allow them.

Now to choose the name, with the options is_known() from Joe and
is_trusted() from Moritz:

https://strawpoll.com/bd2qed2xs

Keep in mind it might also become a dedicated type in the future.

I think "trusted" is a good option, also if you think about adding it as
a type in the future and how it reads for somebody looking through code.
Something like

public function process(string&trusted $input): string&trusted;

Would be good in terms of readability and captures the meaning quite
well (although having a specific trusted string type might be more
sensible, to avoid these intersection types everywhere). "trusted" could
be used in a similar way like the "pure" annotations in Psalm/PHPStan
(just for a different meaning of course): to signify where non-trusted
variables should be reported and could even be used for whole objects to
signify they should only ever get and return trusted variables.

[RFC] is_literal

The problem

How to fix this in the RFC

Wouldn't forcing people to use literal_concat() and

Is this a problem in other languages?

What's to like?

What's of concern?