Hi Internals,
I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.
The vote closes 2021-07-19
The proposal is to add the function is_literal(), a simple way to identify
if a string was written by a developer, removing the risk of a variable
containing an Injection Vulnerability.
This implementation is for literal strings ONLY (after discussion over
allowing integers) and, thanks to the amazing work of Joe Watkins, now
works fully with compiler optimisations, interned strings etc.
Craig
Hi Internals,
I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.The vote closes 2021-07-19
The proposal is to add the function is_literal(), a simple way to identify
if a string was written by a developer, removing the risk of a variable
containing an Injection Vulnerability.This implementation is for literal strings ONLY (after discussion over
allowing integers) and, thanks to the amazing work of Joe Watkins, now
works fully with compiler optimisations, interned strings etc.Craig
Hi Craig,
Although I think the idea of the feature is useful,
I'm not so sure about the implementation.
I watched the talk you referenced a couple of times at different points in
time (the first being a couple of years back), and I fail to see how this
RFC is a similar implementation to it. As how they do it at Google is to
have it part of the type systems (arguably in a weird way but nonaless),
and due to the language being compiled the compiler will just flat out
refuse to produce an executable if the types mismatch.
From my understanding, the RFC's implementation is similar to what Google
does, which is to "annotate" the string, but without having the guarantees
of a compiler to back it up. This approach is totally reasonable for static
analysis, as running it is akin to the compilation step in checking the
validity.
However, having this approach built into the language itself seems rather
problematic to me.
Ideally we would want to assign a variable to be of 'literal' type to
ensure none of the actions applied to it demote it from being a literal,
and when such a demotion would occur, for it to TypeError.
Due to PHP's nature we cannot do this (yet?), therefore overloading the
concatenation operation seems rather unwise.
The case where concatenation between a literal and a non-literal happens,
without error, is very similar to passing around a nullable type until one
function/method/property doesn't accept null where it blows up into your
face, and you need to track down where on earth did the null value came
from, which might be multiple calls prior.
And there has been a hell of a lot of talks/articles/etc. about not using
nullable types due to this issue.
This is I think the main issue with the current shape of the proposal.
This implementation will detect certain security issues, but finding the
root cause for them is going to be rather complicated, as the concatenation
operation is basically kicking the can down the road about the
responsibility of checking whether or not the result is a literal.
Whereas using a function like concat_literal() which checks that the inputs
are indeed literals provides immediate feedback that the type constraint is
not being violated.
Due to this reason, I'm voting against this proposal.
Best regards,
George P. Banyard
This is I think the main issue with the current shape of the proposal.
This implementation will detect certain security issues, but finding the
root cause for them is going to be rather complicated, as the concatenation
operation is basically kicking the can down the road about the
responsibility of checking whether or not the result is a literal.
Whereas using a function like concat_literal() which checks that the inputs
are indeed literals provides immediate feedback that the type constraint is
not being violated.
I still don't follow this reasoning, for the reasons I outlined here:
https://externals.io/message/114835#114868 and again later:
https://externals.io/message/115037#115156
You can write your own concat_literal() function with the current
implementation:
function concat_literal(string $a, string $b): string {
if( ! is_literal($a) || ! is_literal($b) ) {
throw new TypeError;
}
return $a . $b;
}
This does everything a native version would, and can be used in all the
same places.
What it won't do, is tell you when you've forgotten to use it, and used
the normal string concatenation operator - but nor would a built-in
implementation.
Whether "$foo . $bar" is always non-literal, or non-literal only if one
of its operands is, you're going to get an error about a non-literal
string somewhere else in the program, and have to trace back to find
where the "bad" concatenation happened.
Regards,
--
Rowan Tommins
[IMSoP]
Rowan Tommins rowan.collins@gmail.com wrote:
I still don't follow this reasoning, for the reasons I outlined here:
...
Whether "$foo . $bar" is always non-literal, or non-literal only if one
of its operands is, you're going to get an error about a non-literal
string somewhere else in the program, and have to trace back to find
where the "bad" concatenation happened.
There are two differences. The first is "even if there is a problem,
you at least can find the relevant code".
With string concatenation preserving literal flags, when a problem is
detected, the situation is: "There is a non-literal somewhere in this
chunk of HTML. Good luck finding it in the string, and then figuring
out where it comes from in the code."
With having to jump through the hoop of using a specific function to
preserve the literal flag, the situation is: "This line of code is
wrong. It needs to be changed to use escaping."
To be precise, you wouldn't have to trace back to find where the
non-literal value comes from. At the point where the error is you
would probably just change it to use the appropriate escaping. e.g. if
this line of fails, because the color suddenly becomes a non-literal:
literal_concat("<div style='color: ", $up->getColor(), "'>");
you don't need to figure out where the color is coming from, you can
just change it to use a function that does the appropriate escaping:
html("<div style='color: %s'>", $up->getColor());
In general, any use of literal_concat() where the correctness of it
isn't obvious from reading the code, should probably start a
conversation of "have you considered defaulting to using
escaping/prepared query here"?
It prevents the problem reaching an end-user.
The solution at Google prevents any security problem from reaching the
end-user by refusing to compile code.
One of the problems with PHP is that it is 'normal' for logs to be
filled with warnings...which means that most people just ignore them.
By allowing security problems to get through, rather than defaulting
to being an error means that they will reach end-users, and the
warning messages will end up in log files, and the process for finding
and fixing them will take more time than people will like.
So instead of defaulting to safe, it will default to being ignorable.
And the whole point (in my understanding) of Chris Kern's talk, is
that you need to make software safe by default.
What it won't do, is tell you when you've forgotten to use it,
Which is a huge difference.
If you're aware that you're touching a security sensitive bit of code,
you're likely to do the right thing anyway, and so won't need the
crutch of is_literal.
If you're unaware that you're touching a security sensitive bit of
code, you're more likely to accidentally include user input where it
isn't meant to be used.
That means the feature would be annoying for the people who need it most.
Allowing errors to get through to users, for them to be allegedly
fixed at a later date is referred to as "Normalization of deviance"
and isn't a good choice for this RFC, imo.
cheers
Dan
Ack
- problems with static analysis as a solution
** Not everyone runs it, and it would be nicer to allow wordpress et
al to provide this security threat detection rather than hoping users
decided to use static analysis.
** It might require libraries also using the same static analysis annotations.
** It might get quite tricky to implement correctly, in particular for
functions where the literal-ness of the output depends on the
literal-ness of the input.
What it won't do, is tell you when you've forgotten to use it,
Which is a huge difference.
You've edited out the crucial last part of that sentence:
but nor would a built-in implementation
I don't know how to make this point more emphatically: Neither a
built-in literal_concat function, nor setting the internal flag to false
when you use the "." operator, can prevent you using plain concatenation
in the wrong place.
Regardless of what the . operator does to the internal literal flag, the
following code will give an error in exactly the same place:
// What this line does to the literal flag is irrelevant:
$blah = 'a' . 'b';
// This line will set the literal flag to false, but will not raise any
errors:
$foo = $blah . $_GET['foo'];
// $foo is not literal, so $bar is not literal; but there are still no
errors:
$bar = $foo . '123';
// Finally, an error will be raised when trying to use $bar in
literal_concat, regardless of whether it's built-in or user-implemented:
$baz = literal_concat($bar, '456');
The only way to put that error in a different place would be if the .
operator itself was able to raise errors, and I can't think of any way
that would be possible.
Regards,
--
Rowan Tommins
[IMSoP]
Although I think the idea of the feature is useful,
I'm not so sure about the implementation.
[...]
Whereas using a function like concat_literal() which checks that the
inputs are indeed literals provides immediate feedback that the type
constraint is not being violated.
Hi George,
Thank you for your message.
We have provided a userland literal_concat()
function in the RFC to do
exactly what you’re suggesting, while allowing developers to choose to do
things like raise exceptions during development/testing, and ignore/log
issues when running in production. So you absolutely can use it like that
if you want.
https://wiki.php.net/rfc/is_literal#support_functions
We also agree that a dedicated type would be useful, but as noted by Joe
and someniatko, that should come in 8.2 once the function is established
(allowing us to potentially build on Intersection Types, and will involve a
separate discussion). This is noted under "Future Scope".
https://externals.io/message/114835#114847
The only difference is that we decided to allow string concatenation of
literals, as we want to provide something that’s usable for everyone
immediately, provides the same level of security, and doesn’t require a
mass rewriting of existing code. (Which is basically a death sentence for
most security-based improvements, as a lot of people won’t have the
time/energy to do that. The more automatic security can be, the better,
which is why something libraries can implement is ideal).
https://wiki.php.net/rfc/is_literal#string_concatenation
Excluding concatenation would almost certainly prevent libraries from using
this check, simply because developers do use concatenation, which will
result in too many invalid errors, requiring them to make
substantial/unnecessary changes (i.e. replacing every string concat with
literal_concat()
, or using a special Query Builder for their
SQL/HTML/CLI/etc).
It’s also worth noting that developers who want to use strict_types
are
probably using static analysis already, where Psalm has just added support
for this (thank you Matthew):
https://github.com/vimeo/psalm/releases/tag/4.8.0
Thanks,
Craig
which will result in too many invalid errors, requiring them to make
substantial/unnecessary changes
My understanding is that the majority of PHP developers use one of the
many frameworks available. All of those provide libraries that people
use rather than accessing the DB directly.
At the other end of the scale, you have companies with large-ish
engineering teams (e.g. 30+) where access to the DB is done through an
in-house library.
The only people who don't already go through an access layer of code
to reach the DB are people who don't depend on frameworks, and
maintain their own applications with a small (or singular) engineering
team.
This type of person may be over-represented in internals, but are
probably relatively few in number compared to the majority of PHP
users.
requiring them to make substantial/unnecessary changes
My very strong suspicion is that if/when people start using this, the
majority of changes that would be required would be actual security
problems that ought to be fixed.
(i.e. replacing every string concat with
literal_concat()
, or using a special Query Builder for their
SQL/HTML/CLI/etc).
Most people should be using a library for building HTML and CLI
escaping, because remembering how to do the escaping properly is
really difficult. I certainly never remember the difference between
the correct escaping for CSS and JS.
But the changes required for supporting DB queries just don't seem
that big to me. I've put a code example of an implementation below.
I'm pretty sure that the changes needed to fix the actual security
problems that exist in their code would be bigger, and also that
tracking down a single leaked non-literal would take longer than
migrating code to use the new feature.
As I said in my reply to Rowan, making it easy to track down issues
where they occur, minimises the cost of using this feature over the
years it would be used.
cheers
Dan
Ack
Before
$query = "select * from preferences user_id = " . $_GET['user_id'];
db_exec($query);
After
$query[] = "select * from preferences user_id = ";
$query[] = $_GET['user_id'];
db_exec($query);
And the function db_exec would be changed to something like:
function db_exec(string|array $query_parts)
{
if (is_string($query_parts) && !is_literal($query_parts)) {
throw new \Exception("Cannot use non literal string as query.
Please pass the parts in as an array");
} else {
foreach ($query_parts as $query_part) {
if (is_string($query_part) && !is_literal($query_part) {
throw new \Exception("non-literal string found [$query_part]");
}
else if (is_int($query_part)) {
// todo - decide if you want to allow this or not.
// I personally wouldn't.
}
else {
// todo - support other types
}
}
$query = implode("", $query_parts);
}
// rest of db_exec here...
}
As I said in my reply to Rowan, making it easy to track down issues where
they occur, minimises the cost of using this feature over the years it
would be used.
This implementation allows you to do all of that.
If you find debugging is a problem, you can choose not to use string
concatenation (I haven't needed to).
The implementation allows you to use a $query array, exactly as you
describe:
https://3v4l.org/aCFIT/rfc#vrfc.literals
And the RFC itself provides you with a userland version of the
literal_concat()
function as you proposed, allowing anyone to customise
it to their needs (e.g. only throwing an exception during development).
But forcing every single project in existence to replace every instance of
concatenation, especially when it is not necessary, and doesn’t improve
security in any way, I would argue makes it too strict, and would harm
adoption.
Also, thanks for mentioning in your email to Rowan the problems with static
analysis.
Craig
you can choose not to use string concatenation ... allowing anyone to customise it to their needs
Please can you explain how:
i) An individual programmer can enforce that they don't accidentally
use string concatenation for stuff that is used in a security
sensitive context.
ii) A team of 50 programmers can enforce that they don't accidentally
use string concatenation for stuff that is used in a security
sensitive context.
iii) A library can enforce that string concatenation isn't used in the
params passed to it.
and doesn’t improve security in any way,
It prevents issues from being able to make it through to production.
But the main reason would be to reduce the cost of using this feature
long-term.
you can choose not to use string concatenation (I haven't needed to).
Wait....what?
Is your position both that preserving literal-ness across string
concatenation is required, otherwise this feature is too hard to use,
and at the same time, you've not needed that in your own applications.
Is that right? Because if preserving literal-ness across strings
wasn't required for you...why would it be required for every other
project?
And to be clear, I don't think it's required.
I think by listening to feedback from people who aren't sure it's a
good idea, who said "this is a good idea but only if it's really easy
to start using it" that this RFC has been watered down from the most
useful proposal. At the core, there is a good idea behind this RFC,
but the set of trade-offs chosen just aren't the right ones, and
aren't the "proven" trade-offs made at Google.
cheers
Dan
Ack
On Mon, 12 Jul 2021 at 02:09, Craig Francis craig@craigfrancis.co.uk
wrote:you can choose not to use string concatenation ... allowing anyone to
customise it to their needsPlease can you explain how:
i) An individual programmer can enforce that they don't accidentally
use string concatenation for stuff that is used in a security
sensitive context.
i) Usually if developers want to add specific restrictions for themselves
or their team that the vast majority of a userbase would not need or use,
they would use a separate tool that fills that niche to enforce their
chosen coding style (like how they might want to enforce "tabs vs spaces"
in their own codebase).
ii) A team of 50 programmers can enforce that they don't accidentally
use string concatenation for stuff that is used in a security
sensitive context.
ii) Same answer as i) as it scales.
iii) A library can enforce that string concatenation isn't used in the
params passed to it.
iii) A library shouldn't care if the developer used concatenation, it
should only care if user data has been included incorrectly (i.e. checking
for an Injection Vulnerability). A library’s purpose is to ensure whether
code is safe or not, not about enforcing personal coding styles.
and doesn’t improve security in any way,
It [disallowing concatenation] prevents issues from being able to make it
through to production.
But the main reason would be to reduce the cost of using this feature
long-term.
Disallowing concatenation doesn’t guarantee that issues can’t get through
to production though; for example, something that can sometimes be a
non-literal, but during development defaults to a literal.
From a security point of view, it doesn’t matter whether the error is
caught at the point of concatenation, or when it’s checked as the library
receives it, because the injection vulnerability gets caught either way.
I take it the "cost" that you’re referring to is just the debugging time?
Ultimately any extra debug time that might occur, is magnitudes less than
the time it would take almost every developer to check and rewrite most of
the projects that exist today, which is what the idea of not supporting
concatenation requires.
For a developer who really finds debugging too onerous, there are other
ways of debugging - using tools better suited for it such as XDebug or
Static Analysis tools like Psalm (and as above, if you were wanting to
force yourself/your team to not concatenate, you would be using a Static
Analysis tool already (i/ii)).
you can choose not to use string concatenation (I haven't needed to).
Wait....what?
Is your position both that preserving literal-ness across string
concatenation is required, otherwise this feature is too hard to use,
and at the same time, you've not needed that in your own applications.Is that right? Because if preserving literal-ness across strings
wasn't required for you...why would it be required for every other
project?And to be clear, I don't think it's required.
We are trying to improve the language for the majority of developers.
I’m an experienced PHP developer who is genuinely passionate about
security. I find it fun, curating my code to be as secure as possible is
practically a hobby. That makes me an Outlier. Like a lot of us here. My
focus is on writing an RFC that works for as many developers as possible,
so whether it’s ‘necessary’ in my own personal projects is irrelevant to
whether a simple safety feature should exist for the community.
We need to make things easy and safe for the people who are /not/ just
high-level programmers, but for people who don’t know everything we do and
need things to be simple and functional as possible.
My projects do use a lot of string concatenation. Not an erroneously high
amount, probably about normal. So let’s say we do break string
concatenation: For some numbers, I've found roughly 1,300 instances of SQL
and HTML concatenation in my projects. And even if I would be willing to go
through such a big task to replace them, the real problem is actually
finding them, because if you’re trying to use a search, well, who would
have thought looking for "." would return so many results?
I think by listening to feedback from people who aren't sure it's a
good idea, who said "this is a good idea but only if it's really easy
to start using it" that this RFC has been watered down from the most
useful proposal. At the core, there is a good idea behind this RFC,
but the set of trade-offs chosen just aren't the right ones, and
aren't the "proven" trade-offs made at Google.
The proposal hasn’t changed. This is keeping to the original concept, and
while you wanted to remove string concatenation support, that does not mean
that anything so strict was our intention. The proposal was always meant
for the greatest and easiest adoption possible, but that was your creative
difference with us, which is fine, but doesn’t mean that this isn’t exactly
as originally intended.
The only real change to is_literal()
that has been made since the start
of this RFC is improvements to the compilation process. Otherwise it is the
same as day one.
If it will put your mind at ease, Krzysztof Kotowicz is probably the best
placed person to provide feedback, as he is the implementor of this same
principle in JavaScript. And he provided feedback to this project, saying
that they trust concatenated constants, and that yes, while a programmer
could go out of their way to do something intentionally dangerous (like
building up an array of single character literals, and joining them
together based on user input), the “go-safe-html” library authors decided
that "the ergonomics of trusting concatenated constants far outweighs the
security concern".
Craig
the “go-safe-html” library authors decided that
"the ergonomics of trusting concatenated constants far outweighs the security concern".
Go is a quite different programming language to PHP.
When they say 'constants', they appear to be able to enforce that by
using a clever feature of that language
https://github.com/google/safehtml/blob/2057dd9c30f9e264f4d01c29d886d51f1b519302/template/template.go#L440-L452
:
// stringConstant is an unexported string type. Users of this package cannot
// create values of this type except by passing an untyped string constant to
// functions which expect a stringConstant. This type must be used only in
// function and method parameters.
type stringConstant string
func stringConstantsToStrings(strs []stringConstant) []string {
ret := make([]string, 0, len(strs))
for _, s := range strs {
ret = append(ret, string(s))
}
return ret
}
i.e. this code works, as the url template is an actual string constant
(not a variable):
safehtml.TrustedResourceURLFormatFromConstant(
`//www.youtube.com/v/%{id}?hl=%{lang}`,
map[string]string{
"id": "abc0def1",
"lang": "en",
})
Attempting to use a string variable, fails to even compile:
format := `//www.youtube.com/v/%{id}?hl=%{lang}`
safehtml.TrustedResourceURLFormatFromConstant(
format,
map[string]string{
"id": "abc0def1",
"lang": "en",
})
cannot use format (type string) as type safehtml.stringConstant in
argument to safehtml.TrustedResourceURLFormatFromConstant
The current JavaScript equivalent ideas for string literals appear to
be inactive or archived:
-
https://github.com/mikewest/tc39-proposal-literals - This repository
has been archived by the owner. It is now read-only. -
https://github.com/tc39/proposal-array-is-template-object - not
particularly active.
But my understanding is that like the Go implementation, they are
proposing to enforce literal-ness of function parameters through
special compilation rules for parameters to function calls, rather
than passing literal strings around to arbitrary places - below* or
https://tc39.es/proposal-array-is-template-object/#sec-gettemplateobject
The other JavaScript approach for dealing with trusted types
(https://auth0.com/blog/securing-spa-with-trusted-types/) is even more
different than this proposal.
It seems pretty inaccurate to claim that either the safehtml library
or the proposal for JavaScript support the choice made for the PHP
RFC. They don't appear to actually allow carrying literal-ness through
variables, only through compile-time constants that are placed inside
the parentheses or a function call. They also work in a different way,
and use features of those languages not available in PHP.
cheers
Dan
Ack
- this is the definition for the proposed JavaScript literal
implementation, I think.
Runtime Semantics: GetTemplateObject ( templateLiteral )
The abstract operation GetTemplateObject is called with a Parse Node,
templateLiteral, as an argument. It performs the following steps:
1. Let realm be the current Realm Record.
2. Let templateRegistry be realm.[[TemplateMap]].
3. For each element e of templateRegistry, do
a. If e.[[Site]] is the same Parse Node as templateLiteral, then
i. Return e.[[Array]].
4. Let rawStrings be TemplateStrings of templateLiteral with argument true.
5. Let cookedStrings be TemplateStrings of templateLiteral with argument false.
6. Let count be the number of elements in the List cookedStrings.
7. Assert: count ≤ 232 - 1.
8. Let template be ! ArrayCreate(count).
9. Set template.[[TemplateObject]] to true.
10. Let rawObj be ! ArrayCreate(count).
11. Let index be 0.
12. ...
On Mon, 12 Jul 2021 at 19:57, Craig Francis craig@craigfrancis.co.uk
wrote:the “go-safe-html” library authors decided that
"the ergonomics of trusting concatenated constants far outweighs the
security concern".Go is a quite different programming language to PHP.
Go is different, it's limited to running the check at compile time. That's
why I was referencing the “go-safe-html” library.
PHP is more dynamic, so we don't need to have the same restrictions (we can
allow the developer to concatenate the string, which allows us to support
existing code, rather than relying on the library).
The current JavaScript equivalent ideas for string literals appear to be
inactive or archived.
As noted in the RFC, it's this one:
https://github.com/tc39/proposal-array-is-template-object
Krzysztof has just confirmed that he’s working on it, and is currently
getting it through tc39 (specifically updates related to Realms, a way of
executing JavaScript within the context of a new global object, something
PHP does not need to worry about).
The other JavaScript approach for dealing with trusted types
(https://auth0.com/blog/securing-spa-with-trusted-types/) is even more
different than this proposal.
While the article shows some React/Angular code, the focus is on Trusted
Types, which works with this concept. It protects unsafe APIs (like
innerHTML), where you can create a policy with methods to check/filter
values (e.g. forcing the use of DOMPurify). The isTemplateObject method
(which checks that a template string was created by the developer) will
work with Trusted Types, so you don't need to rely on filtering
(unreliable/limited).
Craig
Hi Internals,
I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.
I am glad to see that the RFC eventually turned out as originally
proposed. It is simple and provides clear and strong guarantees about
the origins of string data.
While it has its limitations I do see the potential it has as a building
block for building strong security guarantees in applications.
Success of this feature probably depends on it being integrated in
frameworks. If I had a vote I would vote along with framework authors.
Regards,
Dik Takken
On Mon, Jul 5, 2021 at 8:15 PM Craig Francis craig@craigfrancis.co.uk
wrote:
Hi Internals,
I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.The vote closes 2021-07-19
The proposal is to add the function is_literal(), a simple way to identify
if a string was written by a developer, removing the risk of a variable
containing an Injection Vulnerability.This implementation is for literal strings ONLY (after discussion over
allowing integers) and, thanks to the amazing work of Joe Watkins, now
works fully with compiler optimisations, interned strings etc.
The new implementation, while being very predictable, also comes at a cost.
The RFC doesn't really explain how this works, so let me explain it here
(based on a cursory look through the patch):
Strings have a new flag that indicates whether they are "literal". The
problem is that PHP performs string interning, which means that certain
strings, mostly those seen during compilation of PHP code, are
deduplicated. If you write "foo" two times in a piece of PHP code, there
will only be one interned string "foo". Now, what happens if there is both
a literal string "foo" and a non-literal string "foo" and we want to intern
both?
I believe what the original implementation did is to just assume that all
interned strings are literal strings (this is an educated guess, I don't
know where to find the old implementation). This makes things technically
very simple, and is a pretty sensible assumption in practice. The reason is
that interned strings are almost always derived from literal strings in the
program. The main exception to that are a) single character strings, where
a lot of code will prefer fetching an interned string rather than
allocating a new one and b) the result of optimizations in conjunction with
opcache, which will render strings like $a=1; $b="Foo".$a; interned as a
result of constant propagation.
What the new implementation does is to actually allow two separate interned
strings for "foo"(literal) and "foo"(not literal). Part of the hashtable
implementation needs to be duplicated, because it now needs to distinguish
strings that are nominally equal.
The complexity involved here doesn't look terrible, but I'm unsure about
the performance and memory usage impact. Where per-request strings are
concerned, I expect duplication to be low, because most per-request
interned strings are going to be literal. However, I don't think this holds
for permanent interned strings, which will be predominantly (or entirely?)
non-literal. At least it doesn't look to me like names of internal
functions/classes/etc are considered literal. I think this may be
problematic, in that now the permanent non-literal interned string in the
function/class tables will be different from the per-request literal
interned string used by user code to reference it. This means that all of
these are going to be duplicated, and will no longer benefit from efficient
identity comparison, and will have to be compared based on string contents
instead.
I'm not sure what the actual impact on performance would be. The RFC
indicates that the impact is minor, but I believe those measurements were
made with the original version of the RFC, which did not try to distinguish
literal and non-literal interned strings.
Overall, this is a no vote from my side. While I was not entirely convinced
by the RFC, I was willing to accept the original approach based on sheer
simplicity -- tracking the "probably literal" flag didn't really cost us
anything in terms of either technical complexity or performance. With the
new implementation this isn't the case anymore. I could be convinced that
the technical implications here are unproblematic based on further
analysis, but the current RFC doesn't really discuss this point at all.
Regards,
Nikita
The RFC indicates that the impact is minor, but I believe those
measurements were made with the original version of the RFC, which did not
try to distinguish literal and non-literal interned strings
Hi Nikita,
The performance test results in the RFC are based on the latest version,
updated on Monday (before the vote started), Máté Kocsis did these tests
independently to ensure there was no bias from me (my own tests were
roughly the same).
Joe Watkins would be better placed to explain the details of the
implementation.
We have always tried to keep the implementation as simple as possible, but
there were concerns that certain optimisations by the compiler would make
certain strings appear as literals, not in an unsafe way, but in a way that
might be confusing to developers, with the examples being noted in the RFC:
https://wiki.php.net/rfc/is_literal#compiler_optimisations
Thanks,
Craig
The perf numbers are for current implementation.
Predictability seems more important than simplicity, considering what the
feature is for.
The only way to make the simpler implementation predictable in the same
kind of way was found to be unacceptable during discussion.
Cheers
Joe
On Mon, Jul 5, 2021 at 8:15 PM Craig Francis craig@craigfrancis.co.uk
wrote:Hi Internals,
I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.The vote closes 2021-07-19
The proposal is to add the function is_literal(), a simple way to
identify
if a string was written by a developer, removing the risk of a variable
containing an Injection Vulnerability.This implementation is for literal strings ONLY (after discussion over
allowing integers) and, thanks to the amazing work of Joe Watkins, now
works fully with compiler optimisations, interned strings etc.The new implementation, while being very predictable, also comes at a cost.
The RFC doesn't really explain how this works, so let me explain it here
(based on a cursory look through the patch):Strings have a new flag that indicates whether they are "literal". The
problem is that PHP performs string interning, which means that certain
strings, mostly those seen during compilation of PHP code, are
deduplicated. If you write "foo" two times in a piece of PHP code, there
will only be one interned string "foo". Now, what happens if there is both
a literal string "foo" and a non-literal string "foo" and we want to intern
both?I believe what the original implementation did is to just assume that all
interned strings are literal strings (this is an educated guess, I don't
know where to find the old implementation). This makes things technically
very simple, and is a pretty sensible assumption in practice. The reason is
that interned strings are almost always derived from literal strings in the
program. The main exception to that are a) single character strings, where
a lot of code will prefer fetching an interned string rather than
allocating a new one and b) the result of optimizations in conjunction with
opcache, which will render strings like $a=1; $b="Foo".$a; interned as a
result of constant propagation.What the new implementation does is to actually allow two separate interned
strings for "foo"(literal) and "foo"(not literal). Part of the hashtable
implementation needs to be duplicated, because it now needs to distinguish
strings that are nominally equal.The complexity involved here doesn't look terrible, but I'm unsure about
the performance and memory usage impact. Where per-request strings are
concerned, I expect duplication to be low, because most per-request
interned strings are going to be literal. However, I don't think this holds
for permanent interned strings, which will be predominantly (or entirely?)
non-literal. At least it doesn't look to me like names of internal
functions/classes/etc are considered literal. I think this may be
problematic, in that now the permanent non-literal interned string in the
function/class tables will be different from the per-request literal
interned string used by user code to reference it. This means that all of
these are going to be duplicated, and will no longer benefit from efficient
identity comparison, and will have to be compared based on string contents
instead.I'm not sure what the actual impact on performance would be. The RFC
indicates that the impact is minor, but I believe those measurements were
made with the original version of the RFC, which did not try to distinguish
literal and non-literal interned strings.Overall, this is a no vote from my side. While I was not entirely convinced
by the RFC, I was willing to accept the original approach based on sheer
simplicity -- tracking the "probably literal" flag didn't really cost us
anything in terms of either technical complexity or performance. With the
new implementation this isn't the case anymore. I could be convinced that
the technical implications here are unproblematic based on further
analysis, but the current RFC doesn't really discuss this point at all.Regards,
Nikita
Hi Nikita,
I performed a few other benchmarks in order to provide a little bit more
insights into the performance aspect of the RFC. My latest measurement is
using the same setup as the previous
one, although I made a few smaller changes in the process (e.g. changing
the order of the tests). So at the end, I got the following results:
- Laravel demo app: +0.1%
- Symfony demo app: +0.43%
- bench.php: +0.4%
- custom concat bench: +3.89%
The results are comparing the "literals" branch with the commit in master
on which Joe's work is based. As far as I remember, the simpler variant of
is_literals had a 0.1% slowdown
in case of Symfony, and 2.09% in case of the custom concat benchmark.
Personally, I think even the current numbers are acceptable, especially
because PHP 8.1 is going to be
roughly 30% faster than PHP 8.0 according to my benchmark in case of the
Symfony demo app. Laravel's result is very similar, but I don't remember
about the exact numbers.
Regards:
Máté
Hi Nikita,
I performed a few other benchmarks in order to provide a little bit more
insights into the performance aspect of the RFC. My latest measurement is
using the same setup as the previous
one, although I made a few smaller changes in the process (e.g. changing
the order of the tests). So at the end, I got the following results:
- Laravel demo app: +0.1%
- Symfony demo app: +0.43%
- bench.php: +0.4%
- custom concat bench: +3.89%
The results are comparing the "literals" branch with the commit in master
on which Joe's work is based. As far as I remember, the simpler variant of
is_literals had a 0.1% slowdown
in case of Symfony, and 2.09% in case of the custom concat benchmark.
Personally, I think even the current numbers are acceptable, especially
because PHP 8.1 is going to be
roughly 30% faster than PHP 8.0 according to my benchmark in case of the
Symfony demo app. Laravel's result is very similar, but I don't remember
about the exact numbers.
Thanks Máté! While I'm not really happy with the technical implications,
your numbers clearly show that there is barely any impact in practice. I've
dropped my "no" vote for that reason.
Regards,
Nikita
permanent interned strings, which will be predominantly (or entirely?)
non-literal. At least it doesn't look to me like names of internal
functions/classes/etc are considered literal. I think this may be
problematic, in that now the permanent non-literal interned string in
the
function/class tables will be different from the per-request literal
interned string used by user code to reference it.
Just wondering, if this is a major performance problem, is there a
compelling reason why the internal names couldn't be marked literal
(even if it's technically "wrong")? It seems unlikely to get a function
or class name accidentally in a SQL query and even less likely that user
input was involved.
--
Lauri Kenttä
is there a compelling reason why the internal names couldn't be marked
literal
(even if it's technically "wrong")?
Hi Lauri,
Just again quickly noting we’re only talking about ~0.43% difference,
nothing major in any way.
But we wanted to ensure developers didn't wonder why something was seen as
a literal in one case, but not in another (we wanted to ensure
consistency), because most people don’t know much about how the internals
of PHP work - especially optimisations which are supposed to be kind of
‘invisible’ to developers. This change also helped when a couple of other
compiler optimisations happen:
https://wiki.php.net/rfc/is_literal#compiler_optimisations
Craig
Injection Vulnerabilities remain at the top of the OWASP "Top 10 Web
Application Security Risks".
It’s important to remember that Injection Vulnerabilities don't just affect
the developer, but rather the data of potentially thousands of people using
the website/system.
These can even occur when using libraries. Take this example from CakePHP,
where the developer has dangerously included user data into the SQL:
$users->find()->where(['age >= ' . $_GET['age']]);
By distinguishing strings from a trusted developer, from strings that may
be attacker controlled, libraries can ensure values that go directly into
the SQL, HTML, CLI, etc have not been "Injected" with user data.
PHP is now lagging behind other languages, where Java and Go can already
test for developer defined strings (it's also being implemented in
JavaScript).
is_literal() is a simple and minor change that simply utilises a currently
unused flag on strings to mark whether the string was written by the
developer. It requires no rewriting of code by the developer to work, no
grand visionary overhaul of the PHP language, with only a 0.43% difference
in speed that is too small to measure with normal internet/database
variability. It’s just a basic but effective way of being able to warn
about and locate Injection Vulnerabilities (and therefore providing a way
for libraries to directly educate developers).
The vote for this RFC ends on Monday the 19th of July, 7:30pm UK time and
6:30pm UTC, and needs your support.
https://wiki.php.net/rfc/is_literal
The following link provides more examples of these mistakes, based on code
I’ve found on production servers. They show how similar they are to the
examples found in the libraries official documentation, and how easy it is
for a developer to make a small tweak that ends up being very dangerous:
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php
I have created 3 example libraries you can experiment with, to see what
is_literal() can do:
https://github.com/craigfrancis/php-is-literal-rfc/tree/main/examples
I'm happy to take questions on and off list.
Vote ends on Monday the 19th of July, 7:30pm UK time and 6:30pm UTC.
https://wiki.php.net/rfc/is_literal
Thanks,
Craig
Just another day, and another injection vulnerability (please patch):
https://woocommerce.com/posts/critical-vulnerability-detected-july-2021/
If only escaping wasn't being used, so user values did not get included in
certain strings :-)
diff -r
woocommerce.5.5.0/includes/data-stores/class-wc-webhook-data-store.php
woocommerce.5.5.1/includes/data-stores/class-wc-webhook-data-store.php
280c280
< $search = ! empty( $args['search'] ) ? "AND name
LIKE '%" .
$wpdb->esc_like( sanitize_text_field( $args['search'] ) ) . "%'" : '';
$search = ! empty( $args['search'] ) ? $wpdb->prepare( "AND
name
LIKE %s", '%' . $wpdb->esc_like( sanitize_text_field(
$args['search'] ) ) . '%' ) : '';
The vote for the is_literal RFC ends on Monday the 19th of July, 7:30pm UK
time and 6:30pm UTC, and needs your support.
short of a bug in esc_like(), i don't even see the vulnerability issue in
that code?
that sanitize call looks like a data corruption issue and i bet it fails to
search for binary data, but i don't see the critical vulnerability?
On Fri, 16 Jul 2021 at 21:24, Hans Henrik Bergan divinity76@gmail.com
wrote:
short of a bug in esc_like(), i don't even see the vulnerability issue in
that code?
Sorry Hans, I copied the wrong diff.
There were only 2 changes from woocommerce 5.5.0 to 5.5.1.
Like you I was wondering what that diff was doing before posting - I'm
fairly sure it's just to be consistent with the other lines (which all use
$wpdb->prepare).
The diff I should have copied is:
diff -r
woocommerce.5.5.0/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
woocommerce.5.5.1/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
86c86,92
< $attributes_to_count = array_map( 'wc_sanitize_taxonomy_name',
$attributes );
$attributes_to_count = array_map(
function( $attribute ) {
$attribute = wc_sanitize_taxonomy_name( $attribute );
return esc_sql( $attribute );
},
$attributes
);
In context $attributes_to_count
simply goes to:
$attributes_to_count_sql = 'AND term_taxonomy.taxonomy IN ("' . implode(
'","', $attributes_to_count ) . '")';
Where the the esc_sql() is basically a call to mysqli_real_escape_string()
,
which explains why it needs risky quotes in/around implode.
Craig
oh thanks, now the vulnerability is clear. (i would still complain on that
as a pull request though, using double quotes for strings is just a
horrible idea, it's not compliant with ISO sql, and it depends on the MySQL
server it's running on not having @@SQL_MODE=ANSI_QUOTES enabled which
changes double-quotes to be ISO-compliant instead of mysql-borky (double
quotes for strings only work on MySQL because of a MySQL quirk, ISO SQL say
double quotes is meant for identifiers just like ` ), and it's not portable
across different SQL servers, and all of these problems go away if you just
use the sensible alternative: single quotes - seriously someone should
complain to whoever wrote that, so at least he doesn't do the same in the
future
i can tell from only that diff that, at least as of 5.5.1, woocommerce is
not compatible with @@SQL_MODE=ANSI_QUOTES :p
)
On Sat, 17 Jul 2021 at 03:45, Craig Francis craig@craigfrancis.co.uk
wrote:
On Fri, 16 Jul 2021 at 21:24, Hans Henrik Bergan divinity76@gmail.com
wrote:short of a bug in esc_like(), i don't even see the vulnerability issue in
that code?Sorry Hans, I copied the wrong diff.
There were only 2 changes from woocommerce 5.5.0 to 5.5.1.
Like you I was wondering what that diff was doing before posting - I'm
fairly sure it's just to be consistent with the other lines (which all use
$wpdb->prepare).The diff I should have copied is:
diff -r
woocommerce.5.5.0/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
woocommerce.5.5.1/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
86c86,92
< $attributes_to_count = array_map( 'wc_sanitize_taxonomy_name',
$attributes );$attributes_to_count = array_map(
function( $attribute ) {
$attribute = wc_sanitize_taxonomy_name( $attribute );
return esc_sql( $attribute );
},
$attributes
);In context
$attributes_to_count
simply goes to:$attributes_to_count_sql = 'AND term_taxonomy.taxonomy IN ("' . implode(
'","', $attributes_to_count ) . '")';Where the the esc_sql() is basically a call
tomysqli_real_escape_string()
, which explains why it needs risky quotes
in/around implode.Craig
On Sat, 17 Jul 2021 at 08:59, Hans Henrik Bergan divinity76@gmail.com
wrote:
i can tell from only that diff that, at least as of 5.5.1, woocommerce is
not compatible with @@SQL_MODE=ANSI_QUOTES :p
Yep, and I did that years ago - I preferred to use single quotes for
strings in PHP (so variables stood out), and double quotes for SQL.
Just for fun, NO_BACKSLASH_ESCAPES
is a good way to mess with people (I
believe the "real" escaping method can catch this, but I wouldn't trust it).
When it comes to escaping in general, my favourite mistake is:
$sql = 'WHERE id = ' . $mysqli->real_escape_string($_GET['id']);
Where the assumed to be number hasn't been quoted at all :-)
PDO::quote() does work a bit better (by adding the quotes itself), but I
still prefer its documentation - noting how the quoted string is only
"theoretically safe", and that "you are strongly recommended to use
PDO::prepare() to prepare SQL statements with bound parameters".
https://www.php.net/manual/en/pdo.quote.php
Craig
On Fri, Jul 16, 2021 at 2:47 AM Craig Francis craig@craigfrancis.co.uk
wrote:
Just another day, and another injection vulnerability (please patch):
https://woocommerce.com/posts/critical-vulnerability-detected-july-2021/
If only escaping wasn't being used, so user values did not get included in
certain strings :-)diff -r
woocommerce.5.5.0/includes/data-stores/class-wc-webhook-data-store.php
woocommerce.5.5.1/includes/data-stores/class-wc-webhook-data-store.php
280c280
< $search = ! empty( $args['search'] ) ? "ANDname
LIKE '%" .
$wpdb->esc_like( sanitize_text_field( $args['search'] ) ) . "%'" : '';$search = ! empty( $args['search'] ) ? $wpdb->prepare( "AND
name
LIKE %s", '%' . $wpdb->esc_like( sanitize_text_field(
$args['search'] ) ) . '%' ) : '';
On Sat, Jul 17, 2021 at 3:45 AM Craig Francis craig@craigfrancis.co.uk
wrote:
On Fri, 16 Jul 2021 at 21:24, Hans Henrik Bergan divinity76@gmail.com
wrote:short of a bug in esc_like(), i don't even see the vulnerability issue in
that code?Sorry Hans, I copied the wrong diff.
There were only 2 changes from woocommerce 5.5.0 to 5.5.1.
Like you I was wondering what that diff was doing before posting - I'm
fairly sure it's just to be consistent with the other lines (which all use
$wpdb->prepare).
I don't think so. Looking at
https://developer.wordpress.org/reference/functions/sanitize_text_field/ and
https://developer.wordpress.org/reference/classes/wpdb/esc_like/
you can see that they don't escape single quotes,
so there was indeed an SQL injection vulnerability in that code.
(Which is [one of the reasons] why I avoid WordPress [and especially
third-party themes / plugins] as much as possible.)
--
Guilliam Xavier
On Mon, 19 Jul 2021 at 12:51, Guilliam Xavier guilliam.xavier@gmail.com
wrote:
there was indeed an SQL injection vulnerability in that code.
Yep, you're right, there was an issue in there as well.
esc_like() also needs to use esc_sql() for the value to be added directly
to the SQL string.
By changing to $wpdb->prepare(), assuming a literal string for the $query
argument, then esc_sql() would have been used automatically (technically
escape_by_ref).
Shall we just put it down as another example of escaping going wrong, why
Parameterised Queries work, and how Taint Checking isn't really a solution
(as the value was escaped, ish).
And if you want another fun one, not that anyone should be using inline
JavaScript, but esc_js() doesen't escape single quotes:
$variable = '' onmouseover=alert(1) a='';
$html = "<a href='#' onclick='alert( \"" . esc_js( $variable ) . "\" );'>Link</a>";
https://developer.wordpress.com/themes/escaping/#javascript
Craig
Hey Craig,
On Mon, Jul 5, 2021 at 8:15 PM Craig Francis craig@craigfrancis.co.uk
wrote:
Hi Internals,
I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.The vote closes 2021-07-19
The proposal is to add the function is_literal(), a simple way to identify
if a string was written by a developer, removing the risk of a variable
containing an Injection Vulnerability.This implementation is for literal strings ONLY (after discussion over
allowing integers) and, thanks to the amazing work of Joe Watkins, now
works fully with compiler optimisations, interned strings etc.Craig
I ended up voting "no" on this.
Even though I used to work on Doctrine ORM for years, and we had a private
discussion around this, my belief is that this is not a runtime problem,
but rather a type-level issue with tainted/untainted input/output.
This has been brought up in the discussion phase by Tyson Andre, as far as
I remember.
There is good progress in taint analysis in Psalm, specifically:
- https://psalm.dev/articles/detect-security-vulnerabilities-with-psalm
- https://psalm.dev/docs/security_analysis/
There's also the issue, from my end, that I would like the ecosystem to
pick up static analysis more, rather than people adding more runtime
roadblocks to their code, leading to further complexity in error analysis.
This is my personal push for putting more problems into build-time rather
than runtime.
You could call it a political choice to push a tool like psalm, instead of
the language having a runtime construct to further "duck-type safety".
In addition to that, a mechanism to un-taint values is missing, but it is
not a concept symmetrical to is_literal()
.
Greets,
Marco Pivetta
my belief is that this is not a runtime problem, but rather a type-level
issue with tainted/untainted input/output.
Thank you for the feedback Marco,
As you appreciate, I don’t believe we can get every PHP developer to use
Static Analysis. It’s an extra step that developers with less time, energy,
or care, will not setup and use.
Putting something in the base language, means that libraries can just use
it, and people using the sites/systems of rushed or lazier developers will
have these checks helping keep their data secure. Data breeches can have
life-changing consequences for people, Injection Vulnerabilities are one of
the biggest causes of them, and since we have the ability for libraries to
warn all developers about these mistake, we should.
At the moment our house can catch on fire and we don’t even have a smoke
alarm. This is the smoke alarm. And there are reasons why it’s builders and
landlords that have to install them, and we don’t rely on the tenants going
and sorting them out themselves. Because if they don’t, for the best or the
worse reasons, either way there are severe consequences to everybody.
In regards to Taint Checking, it has a significant problem as it creates a
false sense of security, hence these examples in the RFC:
$sql = 'SELECT * FROM users WHERE id = ' . $db->real_escape_string($id); //
INSECURE
$html = "<img src=" . htmlentities($url) . " alt='' />"; // INSECURE
$html = "<a href='" . htmlentities($url) . "'>..."; // INSECURE
Fortunately Psalm has just implemented the is_literal() concept, so those
developers who do use Psalm can protect themselves from these issues:
https://github.com/vimeo/psalm/releases/tag/4.8.0
In addition to that, a mechanism to un-taint values is missing,
That’s the main flaw with Taint Checking, because it’s not possible to mark
something as safe without knowing about the context. As in, developers use
an escaping function (to mark as untainted), think the value is now “safe”,
and incorrectly use that value in a way that causes a security
vulnerability.
is_literal() simplifies this problem considerably, by just identifying
developer defined strings, and instead using libraries to handle user
values.
Craig
Related to the general topic of injection attacks, I was considering
submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
FALSE, since this mistakenly can lead people to believe that using prepared
statements with PDO and MySQL protects against injection attacks. In fact,
this is only the case if they create the PDO object with the option
specified as false. I'm not aware however to reasoning for enabling prepare
emulation by default for MySQL. I would assume it's a performance choice,
but how long ago was this choice made and is it worth revisiting? Would
this be something that requires its own RFC? It's a single line change.
Jordan
On Sat, Jul 17, 2021 at 9:48 AM Craig Francis craig@craigfrancis.co.uk
wrote:
my belief is that this is not a runtime problem, but rather a type-level
issue with tainted/untainted input/output.Thank you for the feedback Marco,
As you appreciate, I don’t believe we can get every PHP developer to use
Static Analysis. It’s an extra step that developers with less time, energy,
or care, will not setup and use.Putting something in the base language, means that libraries can just use
it, and people using the sites/systems of rushed or lazier developers will
have these checks helping keep their data secure. Data breeches can have
life-changing consequences for people, Injection Vulnerabilities are one of
the biggest causes of them, and since we have the ability for libraries to
warn all developers about these mistake, we should.At the moment our house can catch on fire and we don’t even have a smoke
alarm. This is the smoke alarm. And there are reasons why it’s builders and
landlords that have to install them, and we don’t rely on the tenants going
and sorting them out themselves. Because if they don’t, for the best or the
worse reasons, either way there are severe consequences to everybody.In regards to Taint Checking, it has a significant problem as it creates a
false sense of security, hence these examples in the RFC:$sql = 'SELECT * FROM users WHERE id = ' . $db->real_escape_string($id); //
INSECURE$html = "<img src=" . htmlentities($url) . " alt='' />"; // INSECURE
$html = "<a href='" . htmlentities($url) . "'>..."; // INSECURE
Fortunately Psalm has just implemented the is_literal() concept, so those
developers who do use Psalm can protect themselves from these issues:https://github.com/vimeo/psalm/releases/tag/4.8.0
In addition to that, a mechanism to un-taint values is missing,
That’s the main flaw with Taint Checking, because it’s not possible to mark
something as safe without knowing about the context. As in, developers use
an escaping function (to mark as untainted), think the value is now “safe”,
and incorrectly use that value in a way that causes a security
vulnerability.is_literal() simplifies this problem considerably, by just identifying
developer defined strings, and instead using libraries to handle user
values.Craig
Related to the general topic of injection attacks, I was considering
submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
FALSE, since this mistakenly can lead people to believe that using prepared
statements with PDO and MySQL protects against injection attacks. In fact,
this is only the case if they create the PDO object with the option
specified as false. I'm not aware however to reasoning for enabling prepare
emulation by default for MySQL. I would assume it's a performance choice,
but how long ago was this choice made and is it worth revisiting? Would
this be something that requires its own RFC? It's a single line change.Jordan
There's some BC-breaks to be aware of when switching emulated prepares.
One example I know of is that when using emulated prepares you can reuse
the same placeholder (as in the following example), but with emulated
prepares disabled this does not work.
$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
'%') OR column2 LIKE CONCAT('%', :searchValue, '%');";
$params = [
"searchValue" => "test",
];
With emulated prepares disabled you have to use:
$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
'%') OR column2 LIKE CONCAT('%', :searchValue2, '%');";
$params = [
"searchValue" => "test",
"searchValue2" => "test",
];
That sounds like something that would require both a deprecation and an RFC
for the change then, even if the actual change in the source is small.
It still may be worth exploring, since this surely gives a large number of
people false confidence in protection against injection attacks, as nearly
every available tutorial on using PDO in PHP declares that simply using
prepared statements protects you from injection attacks.
Related to the general topic of injection attacks, I was considering
submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
FALSE, since this mistakenly can lead people to believe that using
prepared
statements with PDO and MySQL protects against injection attacks. In
fact,
this is only the case if they create the PDO object with the option
specified as false. I'm not aware however to reasoning for enabling
prepare
emulation by default for MySQL. I would assume it's a performance choice,
but how long ago was this choice made and is it worth revisiting? Would
this be something that requires its own RFC? It's a single line change.Jordan
There's some BC-breaks to be aware of when switching emulated prepares.
One example I know of is that when using emulated prepares you can reuse
the same placeholder (as in the following example), but with emulated
prepares disabled this does not work.$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
'%') OR column2 LIKE CONCAT('%', :searchValue, '%');";
$params = [
"searchValue" => "test",
];With emulated prepares disabled you have to use:
$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
'%') OR column2 LIKE CONCAT('%', :searchValue2, '%');";
$params = [
"searchValue" => "test",
"searchValue2" => "test",
];
There's some BC-breaks to be aware of when switching emulated prepares.
One example I know of is that when using emulated prepares you can reuse
the same placeholder (as in the following example), but with emulated
prepares disabled this does not work.$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
'%') OR column2 LIKE CONCAT('%', :searchValue, '%');";
$params = [
"searchValue" => "test",
];
This, and do note that from a performance point of view, disabling emulated
prepares is not innocuous : most of the time, you effectively send twice as
many queries to the database : one prepare, one execute.
There is a small performance improvement in using prepared statements if
you're executing the same statement many times; but in all other cases,
you're spending twice as much time waiting for the database.
Are there documented SQL injection opportunities when using emulated
prepares? I'm not aware of any.
— Benjamin
Are there documented SQL injection opportunities when using emulated
prepares? I'm not aware of any.
This was from my reading of the actual source, which of course may be
flawed. It appeared that if emulated prepares were used the values were
escaped and then passed as strings as part of the query, the same as if it
had been concatenated and wrapped in real_escape_string. I hadn't gone too
far in actually debugging it yet to find out how it behaved under different
circumstances as I was still trying to figure out how "small" of a change
this was from the perspective of internals.
On Sun, Jul 18, 2021 at 3:38 PM Benjamin Morel benjamin.morel@gmail.com
wrote:
There's some BC-breaks to be aware of when switching emulated prepares.
One example I know of is that when using emulated prepares you can reuse
the same placeholder (as in the following example), but with emulated
prepares disabled this does not work.$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
'%') OR column2 LIKE CONCAT('%', :searchValue, '%');";
$params = [
"searchValue" => "test",
];This, and do note that from a performance point of view, disabling
emulated prepares is not innocuous : most of the time, you effectively send
twice as many queries to the database : one prepare, one execute.
There is a small performance improvement in using prepared statements
if you're executing the same statement many times; but in all other cases,
you're spending twice as much time waiting for the database.Are there documented SQL injection opportunities when using emulated
prepares? I'm not aware of any.— Benjamin
Good morning,
Are there documented SQL injection opportunities when using emulated
prepares? I'm not aware of any.This was from my reading of the actual source, which of course may be
flawed. It appeared that if emulated prepares were used the values were
escaped and then passed as strings as part of the query, the same as if it
had been concatenated and wrapped in real_escape_string. I hadn't gone too
far in actually debugging it yet to find out how it behaved under different
circumstances as I was still trying to figure out how "small" of a change
this was from the perspective of internals.
I also don't think there is any left over possible SQL injection in
any of the core DB extensions (PDO or 'native'). It will indeed do not
prevent inserting invalid data but if there were any actual SQL
injection left, I am very confident we would have known it by now. :)
--
Pierre
@pierrejoye | http://www.libgd.org
On Sun, Jul 18, 2021 at 4:42 AM Jordan LeDoux jordan.ledoux@gmail.com
wrote:
Related to the general topic of injection attacks, I was considering
submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
FALSE, since this mistakenly can lead people to believe that using prepared
statements with PDO and MySQL protects against injection attacks. In fact,
this is only the case if they create the PDO object with the option
specified as false. I'm not aware however to reasoning for enabling prepare
emulation by default for MySQL. I would assume it's a performance choice,
but how long ago was this choice made and is it worth revisiting? Would
this be something that requires its own RFC? It's a single line change.Jordan
Please do refrain from spreading this FUD. While there are certain
tradeoffs between choosing emulated or native prepared statements, security
considerations do not factor into it. There's a very narrow window where
emulated prepared statements can lead to incorrect escaping (it involves
picking an exotic non-ASCII-compatible charset, not specifying it in the
connection DSN, and switching to it at runtime), but it's not something you
can hit by accident.
Regards,
Nikita
On Sat, Jul 17, 2021 at 9:48 AM Craig Francis craig@craigfrancis.co.uk
wrote:
On Sat, 17 Jul 2021 at 4:05 pm, Marco Pivetta ocramius@gmail.com
wrote:my belief is that this is not a runtime problem, but rather a
type-level
issue with tainted/untainted input/output.Thank you for the feedback Marco,
As you appreciate, I don’t believe we can get every PHP developer to use
Static Analysis. It’s an extra step that developers with less time,
energy,
or care, will not setup and use.Putting something in the base language, means that libraries can just use
it, and people using the sites/systems of rushed or lazier developers
will
have these checks helping keep their data secure. Data breeches can have
life-changing consequences for people, Injection Vulnerabilities are one
of
the biggest causes of them, and since we have the ability for libraries
to
warn all developers about these mistake, we should.At the moment our house can catch on fire and we don’t even have a smoke
alarm. This is the smoke alarm. And there are reasons why it’s builders
and
landlords that have to install them, and we don’t rely on the tenants
going
and sorting them out themselves. Because if they don’t, for the best or
the
worse reasons, either way there are severe consequences to everybody.In regards to Taint Checking, it has a significant problem as it creates
a
false sense of security, hence these examples in the RFC:$sql = 'SELECT * FROM users WHERE id = ' . $db->real_escape_string($id);
//
INSECURE$html = "<img src=" . htmlentities($url) . " alt='' />"; // INSECURE
$html = "<a href='" . htmlentities($url) . "'>..."; // INSECURE
Fortunately Psalm has just implemented the is_literal() concept, so those
developers who do use Psalm can protect themselves from these issues:https://github.com/vimeo/psalm/releases/tag/4.8.0
In addition to that, a mechanism to un-taint values is missing,
That’s the main flaw with Taint Checking, because it’s not possible to
mark
something as safe without knowing about the context. As in, developers
use
an escaping function (to mark as untainted), think the value is now
“safe”,
and incorrectly use that value in a way that causes a security
vulnerability.is_literal() simplifies this problem considerably, by just identifying
developer defined strings, and instead using libraries to handle user
values.Craig
Thanks Nikita, that's good to know. I'm still familiarizing myself with the
source right now, so I apologize if this is something that commonly gets
spread as false information, I honestly was exploring the workings of
injection protection in the source after following this discussion, but
that's why I asked. My intention was not to spread FUD, I just thought I'd
found something that slipped through the cracks.
Cheers,
Jordan
On Sun, Jul 18, 2021 at 4:42 AM Jordan LeDoux jordan.ledoux@gmail.com
wrote:Related to the general topic of injection attacks, I was considering
submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
FALSE, since this mistakenly can lead people to believe that using
prepared
statements with PDO and MySQL protects against injection attacks. In fact,
this is only the case if they create the PDO object with the option
specified as false. I'm not aware however to reasoning for enabling
prepare
emulation by default for MySQL. I would assume it's a performance choice,
but how long ago was this choice made and is it worth revisiting? Would
this be something that requires its own RFC? It's a single line change.Jordan
Please do refrain from spreading this FUD. While there are certain
tradeoffs between choosing emulated or native prepared statements, security
considerations do not factor into it. There's a very narrow window where
emulated prepared statements can lead to incorrect escaping (it involves
picking an exotic non-ASCII-compatible charset, not specifying it in the
connection DSN, and switching to it at runtime), but it's not something you
can hit by accident.Regards,
NikitaOn Sat, Jul 17, 2021 at 9:48 AM Craig Francis craig@craigfrancis.co.uk
wrote:
On Sat, 17 Jul 2021 at 4:05 pm, Marco Pivetta ocramius@gmail.com
wrote:my belief is that this is not a runtime problem, but rather a
type-level
issue with tainted/untainted input/output.Thank you for the feedback Marco,
As you appreciate, I don’t believe we can get every PHP developer to use
Static Analysis. It’s an extra step that developers with less time,
energy,
or care, will not setup and use.Putting something in the base language, means that libraries can just
use
it, and people using the sites/systems of rushed or lazier developers
will
have these checks helping keep their data secure. Data breeches can have
life-changing consequences for people, Injection Vulnerabilities are
one of
the biggest causes of them, and since we have the ability for libraries
to
warn all developers about these mistake, we should.At the moment our house can catch on fire and we don’t even have a smoke
alarm. This is the smoke alarm. And there are reasons why it’s builders
and
landlords that have to install them, and we don’t rely on the tenants
going
and sorting them out themselves. Because if they don’t, for the best or
the
worse reasons, either way there are severe consequences to everybody.In regards to Taint Checking, it has a significant problem as it
creates a
false sense of security, hence these examples in the RFC:$sql = 'SELECT * FROM users WHERE id = ' .
$db->real_escape_string($id); //
INSECURE$html = "<img src=" . htmlentities($url) . " alt='' />"; // INSECURE
$html = "<a href='" . htmlentities($url) . "'>..."; // INSECURE
Fortunately Psalm has just implemented the is_literal() concept, so
those
developers who do use Psalm can protect themselves from these issues:https://github.com/vimeo/psalm/releases/tag/4.8.0
In addition to that, a mechanism to un-taint values is missing,
That’s the main flaw with Taint Checking, because it’s not possible to
mark
something as safe without knowing about the context. As in, developers
use
an escaping function (to mark as untainted), think the value is now
“safe”,
and incorrectly use that value in a way that causes a security
vulnerability.is_literal() simplifies this problem considerably, by just identifying
developer defined strings, and instead using libraries to handle user
values.Craig
Hi Internals,
I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.
This RFC has been rejected; with 10 votes in favour, and 23 against.
I'd like to thank everyone who has been involved in this project, in
particular Joe Watkins for creating the implementation and helping me
though this process, Máté Kocsis for the independent performance tests,
Rowan Francis for helping me with the RFC text and emails, Derick Rethans
for covering this on the PHP Internals News podcast, and for everyone who
has discussed this implementation and provided their input (especially
Larry Garfield, Scott Arciszewski, and Rowan Tommins).
And thank you to Matthew Brown for adding the 'literal-string' type to
Psalm:
https://github.com/vimeo/psalm/releases/tag/4.8.0
As an aside, only 4 of 23 'no' voters provided any comment as to why they
voted that way on the mailing list, which I feel undermines the point of
the Request For Comment process, with an additional 5 responding personally
off-list after prompting. This makes it harder (or impossible) for points
to be discussed and addressed.
Craig
Hi Craig Francis,
As an aside, only 4 of 23 'no' voters provided any comment as to why they
voted that way on the mailing list, which I feel undermines the point of
the Request For Comment process, with an additional 5 responding personally
off-list after prompting. This makes it harder (or impossible) for points
to be discussed and addressed.
-
My earlier comments about static analysis, and on behavior depending on whether opcache is enabled
-
This might prevent certain optimizations in the future. For example, currently, 1-byte strings are all interned to save memory.
If is_literal was part of php prior to proposing that optimization, then that optimization may be rejected. -
PHP's
string
type is used both for (hopefully) valid unicode strings and for low level operations on literal byte arrays (e.g. cryptogrophy).
It seems really, really strange for a type system to track trustedness for a low level primitive to track byte arrays. (the php 6 unicode string refactoring failed)Further, if this were to be extended in the future beyond the original proposal (e.g. literal strings that are entirely digits are automatically interned or marked as trusted),
this might open previously safe code acting on byte arrays to side channel issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack) -
Internal functions and userland polyfills for those functions may unintentionally differ significantly for the resulting taintedness,
e.g. base64_decode in userland being built up byte by byte would end up being possibly untainted? -
The fact that 1-byte strings are almost always interned seems like a noticeable inconsistency (though library authors can deal with it once they're aware of it), though for it to become an issue a library may need to take multiple strings as input
(e.g. a contrived example"echo -- " . $trustedPrefix . shell_escape($notTrusted)
for $trustedPrefix of "'" (or "\n") and $notTrusted of "; evaluate command" -
Including it in core would make it harder to remove later if it interfered with opcache or jit work, or to migrate code to alternative interpreters for php if those were ever implemented (if frameworks were to extensively depend on is_literal)
-
Tracking whether a string is untrusted is a definition only suitable for a few (extremely common) formats for php. But for less common features, even stringified integers may be a problem (e.g. binary file formats, etc)
This is relatively minor given that php is typically used in a web programming context with json or html or js/css output
I'd think is_interned()/intern_string() is much closer to tracking something that corresponds with php's internals (e.g. and may allow saving memory in long-running processes which receive duplicate strings as input), though the 10 people who wanted fully featured trustedness checking would probably want is_literal instead
-
Serializing and unserializing data would lose information about trustedness of inputs, unpredictably (e.g.
unserialize()
in php 8.0 interns array keys).There's no (efficient) way to change trusted strings to untrusted or vice versa, though there are inefficient workarounds (modifying a byte and restoring it to stop trusting it, imploding single characters to create a trusted string)
This may done implicitly in frameworks using APCu/memcached/redis as a cache
(I definitely don't think the serialization data format should track is_literal())
-
Future refactorings, optimizations or deoptimizations (or security fixes) to
unserialize()
, etc. may unexpectedly break code using is_literal that throw instead of warn (more bug reports, discourage users from upgrading, etc.) -
This RFC adds an unknown amount of future work for php-src and PECLs to intuitively support mapping trusted inputs to trusted outputs or vice versa - less commonly used or unmaintained functions may not behave as expected for a while
-
https://pecl.php.net/package/taint is available already for a use case with some overlap for setups that need this
Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an interned string) would make sense to expose in a PECL as a regular extension
(not a zend_extension
) if is_interned also fails.
Unlike the zend_extension for https://www.php.net/manual/en/function.is-tainted.php ,
something simple may be possible without needing the performance hit and
future conflicts with XDebug that I assume https://www.php.net/manual/en/function.is-tainted.php would be prone to.
(https://pecl.php.net/package/taint seems to use a separate bit to track this. The latest release of the Taint pecl fixes XDebug compatibility)
- Other languages, such as Java, have exposed this for memory management purposes (rather than security) though it's rarely used directly or in frameworks, e.g. https://docs.oracle.com/javase/10/docs/api/java/lang/String.html#intern()
(https://docs.python.org/3/library/sys.html#sys.intern)
Thanks,
Tyson
As an aside, only 4 of 23 'no' voters provided any comment as to why they
voted that way on the mailing list, which I feel undermines the point of
the Request For Comment process, with an additional 5 responding personally
off-list after prompting. This makes it harder (or impossible) for points
to be discussed and addressed.
- My earlier comments about static analysis, and on behavior depending on whether opcache is enabled
Thanks Tyson,
Just to check I've not missed anything, are you referring to your comments from March 2020?
https://news-web.php.net/php.internals/109192 https://news-web.php.net/php.internals/109192
I mentioned your email in the RFC, as I agree that developers should use Static Analysis, but I think it's highly unlikely we will be able to get most PHP developers to use it:
https://wiki.php.net/rfc/is_literal#static_analysis https://wiki.php.net/rfc/is_literal#static_analysis
As to the OPcache, Joe's implementation already works with its optimisations, so the results are consistent.
https://wiki.php.net/rfc/is_literal#compiler_optimisations https://wiki.php.net/rfc/is_literal#compiler_optimisations
- This might prevent certain optimizations in the future. For example, currently, 1-byte strings are all interned to save memory.
If is_literal was part of php prior to proposing that optimization, then that optimization may be rejected.
Like the OPcache optimisations, the 1-byte strings and other existing optimisations work with Joe's implementation. I cannot comment or predict any future optimisation work, but this argument could be made for any change to PHP.
PHP's
string
type is used both for (hopefully) valid unicode strings and for low level operations on literal byte arrays (e.g. cryptogrophy).
It seems really, really strange for a type system to track trustedness for a low level primitive to track byte arrays. (the php 6 unicode string refactoring failed)Further, if this were to be extended in the future beyond the original proposal (e.g. literal strings that are entirely digits are automatically interned or marked as trusted),
this might open previously safe code acting on byte arrays to side channel issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack)
We're just adding a flag to say if the string was written by the developer, which is key for identifying Injection Vulnerabilities (it's similar to the Interned flag).
With cryptography and timing attacks, the only part affected would be during string concatenation, which has already got several optimisations that introduce variability in timings (e.g. concatenating variables containing 'a' with 'b' is about twice as slow than 'a' with '').
We did have a discussion about integer support, but the way integers/floats/booleans are currently stored in memory would require a big change to note if those values were defined in the developers PHP source code. We also explored the idea of allowing all integers, and while they cannot lead to an Injection Vulnerability, the consensus was that a developer defined string is easier to understand:
https://wiki.php.net/rfc/is_literal#integer_values https://wiki.php.net/rfc/is_literal#integer_values
- Internal functions and userland polyfills for those functions may unintentionally differ significantly for the resulting taintedness,
e.g. base64_decode in userland being built up byte by byte would end up being possibly untainted?
All of these functions would return a non-literal string, as the values were not written by the developer (as in, directly defined in their source code).
- The fact that 1-byte strings are almost always interned seems like a noticeable inconsistency (though library authors can deal with it once they're aware of it), though for it to become an issue a library may need to take multiple strings as input
(e.g. a contrived example"echo -- " . $trustedPrefix . shell_escape($notTrusted)
for $trustedPrefix of "'" (or "\n") and $notTrusted of "; evaluate command"
1-byte strings have already been addressed in the implementation.
Libraries need to take user values separately from developer defined strings, because developers frequently make mistakes:
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4 https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4
Especially when it comes to escaping values:
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4 https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4
In your command example, the library could copy how Parameterised Queries work in SQL.
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/cli-basic.php https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/cli-basic.php
- Including it in core would make it harder to remove later if it interfered with opcache or jit work, or to migrate code to alternative interpreters for php if those were ever implemented (if frameworks were to extensively depend on is_literal)
This is why Joe was very careful with the implementation. Overall the approach is simple, where it's just providing libraries with the key bit of information to identify their primary source of Injection Vulnerabilities (when a developer has incorrectly included user data).
Tracking whether a string is untrusted is a definition only suitable for a few (extremely common) formats for php. But for less common features, even stringified integers may be a problem (e.g. binary file formats, etc)
This is relatively minor given that php is typically used in a web programming context with json or html or js/css output
I'd think is_interned()/intern_string() is much closer to tracking something that corresponds with php's internals (e.g. and may allow saving memory in long-running processes which receive duplicate strings as input), though the 10 people who wanted fully featured trustedness checking would probably want is_literal instead
The implementation is pretty close to matching the Interned flag, but isn't exactly the same, as the Interned flag can be unpredictable for normal PHP developers (e.g. the chr()
function), and it could be changed in the future (e.g. Joe noted how there was a suggestion to intern keys while JSON decoding or unserializing).
Serializing and unserializing data would lose information about trustedness of inputs, unpredictably (e.g.
unserialize()
in php 8.0 interns array keys).There's no (efficient) way to change trusted strings to untrusted or vice versa, though there are inefficient workarounds (modifying a byte and restoring it to stop trusting it, imploding single characters to create a trusted string)
This may done implicitly in frameworks using APCu/memcached/redis as a cache
(I definitely don't think the serialization data format should track is_literal())
Agreed, serializing and unserializing (or any other modifications) does not set the literal flag.
As to providing an easy way to mark a non-literal string to a literal - in short, we do not want to recreate the biggest flaw of Taint Checking, where naive developers would see this as a way to mark escaped strings as "safe", giving them a false sense of security that these strings can now be trusted.
- Future refactorings, optimizations or deoptimizations (or security fixes) to
unserialize()
, etc. may unexpectedly break code using is_literal that throw instead of warn (more bug reports, discourage users from upgrading, etc.)
The implementation already includes several tests, and output from functions like unserialize()
do not set the literal flag.
- This RFC adds an unknown amount of future work for php-src and PECLs to intuitively support mapping trusted inputs to trusted outputs or vice versa - less commonly used or unmaintained functions may not behave as expected for a while
This shouldn't be an issue, as the flag is only being used to identify literal strings. If new strings are being created or modified, they are no longer considered literal strings (the flag is off by default).
- https://pecl.php.net/package/taint is available already for a use case with some overlap for setups that need this
Libraries need something that's available to everyone (we all make mistakes; and the developers most likely to introduce these vulnerabilities are unlikely to install an extension, or use Static Analysis).
And with the Taint extension in particular, it has really helped me experiment with this concept, but that approach does give a false sense of security, which is why this RFC did not re-create it:
https://wiki.php.net/rfc/is_literal#taint_checking https://wiki.php.net/rfc/is_literal#taint_checking
Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an interned string) would make sense to expose in a PECL as a regular
extension
(not azend_extension
) if is_interned also fails.
Unlike the zend_extension for https://www.php.net/manual/en/function.is-tainted.php ,
something simple may be possible without needing the performance hit and
future conflicts with XDebug that I assume https://www.php.net/manual/en/function.is-tainted.php would be prone to.
(https://pecl.php.net/package/taint seems to use a separate bit to track this. The latest release of the Taint pecl fixes XDebug compatibility)
As noted above, we couldn't expose the Interned flag directly, as it's a little unpredictable. That's why we added a new flag - one that at first glance might look the same, but handles the edge cases.
Thanks again,
Craig
On Mon, 19 Jul 2021 at 19:59, Craig Francis craig@craigfrancis.co.uk
wrote:
On Mon, 5 Jul 2021 at 19:14, Craig Francis craig@craigfrancis.co.uk
wrote:I have opened voting on https://wiki.php.net/rfc/is_literal for the
is-literal function.This RFC has been rejected; with 10 votes in favour, and 23 against.
[...]
And thank you to Matthew Brown for adding the 'literal-string' type to
Psalm:
https://github.com/vimeo/psalm/releases/tag/4.8.0
FYI: The 'literal-string' type has now been added to PHPStan, thanks
to Ondřej Mirtes:
https://github.com/phpstan/phpstan/releases/tag/0.12.97
Obviously I'd still like libraries to be able to protect everyone from
introducing Injection Vulnerabilities (as the majority of programmers don't
use static analysis), but that's for another day.
Craig
Le 7 sept. 2021 à 11:49, Craig Francis craig@craigfrancis.co.uk a écrit :
Obviously I'd still like libraries to be able to protect everyone from
introducing Injection Vulnerabilities (as the majority of programmers don't
use static analysis), but that's for another day.
Hi,
We all want to protect from injection vulnerability, but I think there are better way than is_literal.
One way is to use templates, an area where PHP is ironically lagging behind. I suggest looking at JS tagged templates:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
For example:
$qb->select('u')
->from('User', 'u')
->where('u.id = ' . $_GET['id']); // INSECURE
could be written as
<?php
$qb->exec SELECT u FROM User u WHERE u.id = %{ $_GET['id'] }
?>
where the part between %{ ... } is transformed into an SQL literal string (with delimiters "...", not just “escaping”) when it is a string; into the SQL expression NULL
when it is null; into an SQL subexpression if it is an object (provided by the library) that represents a well-formed SQL subexpression, etc.
—Claude
Le 7 sept. 2021 à 11:49, Craig Francis craig@craigfrancis.co.uk a écrit :
Obviously I'd still like libraries to be able to protect everyone from
introducing Injection Vulnerabilities (as the majority of programmers don't
use static analysis), but that's for another day.Hi,
We all want to protect from injection vulnerability, but I think there are better way than is_literal.
One way is to use templates, an area where PHP is ironically lagging behind. I suggest looking at JS tagged templates:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
For example:
$qb->select('u')
->from('User', 'u')
->where('u.id = ' . $_GET['id']); // INSECURE
could be written as<?php
$qb->execSELECT u FROM User u WHERE u.id = %{ $_GET['id'] }
?>where the part between %{ ... } is transformed into an SQL literal string (with delimiters "...", not just “escaping”) when it is a string; into the SQL expression
NULL
when it is null; into an SQL subexpression if it is an object (provided by the library) that represents a well-formed SQL subexpression, etc.—Claude
Resending from on-list address because I’m an idiot. Apologies for the dupe Claude/Craig.
Hi Claude,
I had my share of issues with Craig’s PR, but I think the original goal of it was a good and useful concept - provide developers (mostly lib authors, but its not like it couldn’t be used by end developers too) a way to know that a string came from something hard coded in a php file.
A ‘tagged template’ like that doesn’t help solve the problem in any way that parameterised queries can’t already do, and if you want to make it more ’templated’ like that, you could implement the same thing already by passing a printf-compatible template and the arguments to a function/method.
None of that helps solve what the is_literal
function (or potential type hint) would help with: when the part of the query that needs to be substituted, is something that cannot be parameterised at the SQL level (i.e. a column name) you really don’t want that to accept user input of any kind.
Cheers
Stephen
We all want to protect from injection vulnerability, but I think there are
better way than is_literal.One way is to use templates, an area where PHP is ironically lagging
behind. I suggest looking at JS tagged templates:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates
Hi Claude,
Posting on-list, as I've not had a reply (was confirming I've not missed
anything).
I have looked at JavaScript Tagged Templates before, and while they could
be made to work (ish), I don't believe they are better than the
is_literal()
proposal to protect against Injection Vulnerabilities:
-
It would require developers and libraries to re-write all of their
existing code to use Tagged Templates. -
If we copied JavaScript, the methods/functions can still be called
incorrectly:
function template(html, ...values) {
console.log(html, values);
}
template<p>Hi ${name}<p>
;
template([<p>Hi ${name}<p>
]); // Wrong
template(['<p>Hi ', name, '<p>']); // Wrong
PHP could provide a way for Libraries to check the developer has used a
Tagged Template, but that's basically what the is_literal()
proposal
does. With JavaScript, this is why isTemplateObject()
is being developed,
and Trusted Types might get fromLiteral()
.
-
Libraries would not be able to use Tagged Templates and easily support
older versions of PHP. -
The backtick character is already used for
shell_exec()
like
functionality.
Craig