Hi Internals,
We're going back to the original is_literal() proposal.
https://wiki.php.net/rfc/is_literal
This means that integers, which we cannot flag if they came from the
developer, will not be considered as part of the "literal" definition.
This helps us avoid the naming issue, and trying to define a concept that's
a bit vague (strings from the developer, or integers from anywhere)... and
while I’m still of the belief that integers would help adoption, I can also
see the appeal of something that's easier to understand, especially for a
supermajority.
(My ideal solution would be to have a primary ‘is_literal’ vote, with a
secondary question of adding integer support, but I’ve checked and the
multiple-question system here is for minor implementation details only e.g.
names.)
But either way it’s important that we address how Injection Vulnerabilities
occur. This simple flagging-and-checking literals system that has been
proven to work in other languages, will give libraries an easy way to check
that certain sensitive values have only come from the developer.
Considering we need a two-thirds majority vote, we do need to keep this
straightforward, avoiding any concerns over the variables contents.
Thanks for all your feedback so far, and I hope you can continue to show
your support for this on here, to show RFC voters this should be added to
PHP.
Craig
We're going back to the original is_literal() proposal.
Nice work! chr()
returning literal strings is somewhat questionable,
but otherwise this is looking very good.
--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com
Craig,
We're going back to the original is_literal() proposal.
The RFC still contains carrying the string literal flag through string
concatenation, which is still not a good idea.
And it still doesn't say what people are meant to do when there is an error.
Dan Ackroyd wrote:
Please can you go into some detail about what you think people are
meant to do when they detect a non-literal used where a literal is
expected?
Craig Francis wrote:
By using a simple function, it allows the library to handle
the situation however it likes.
That's not an answer to the question.
Just saying "you can handle it however you like" skips over the core
problem with RFC, which is that by supporting carrying the literal
flag through string concatenations,
It allows silly mistakes to slip through and make it to production. As
per https://news-web.php.net/php.internals/114858:
Assume that both UserPreferences and getInfoPanel code is covered by
unit tests. The developer who made that change would run the tests,
see the tests pass and then push their code live. At which point, it
would cause an error in production, as the string returned would not
pass the is_literal check.
For is_literal checks, what's going to happen is:
- programmer makes a mistake, that isn't caught by a test.
- programmer deploys site, and entries about non-literal strings start
being written to log files. - someone then needs to remember to check the log files, and pull out
the appropriate entries, and make a ticket for them to be
investigated. - someone then needs to pick up that ticket, investigate the cause of
the non-literal string being used, and trace it back to which
module/library is causing it, then update the ticket for someone on
that team to fix it. - someone on that team needs to have that ticket prioritised before
they start work on it.
I'd really like anyone who is promoting this RFC to answer the
question from https://news-web.php.net/php.internals/115010 :
Why are you prioritising speed of adoption, over reducing the cost of
using this feature for projects over say the next 5 or 10 years?
It's going to take more time to track down an error than it will be to
start using the feature.
I had planned to use this RFC to prevent XSS attacks, and in it's
current state I'm going to get a notice of "There's some userdata in
an inappropriate place. Have fun finding it!" because the information
about where that user data is in the string will be lost.
It's still bizarre to me that the RFC is proposing something that
allows mistakes to slip through to production.
cheers
Dan
Ack
btw, email or ticketing systems are not an appropriate place to log
this type of problem. They could happen on every page load, and so
generate hundreds of errors per second. As amusing as it is to look
back at the times that I have generated 10,000 error emails per second
in the past, I don't want to repeat that experience.
Also, I'm going to cross-post this, as it's unreasonable to expect
people to read all of the different threads.
We're going back to the original is_literal() proposal.
Please stop adding my name as a contributor to this RFC.
I explicitly removed my name from the document yesterday when I first
realised it was there.
I don't support the RFC as it is. I'd support a similar RFC but not this one.
sincerely,
Dan
Ack
The idea behind is_literal() is of good intention, but as they say the road to hell is paved with good intentions.
The RFC proposes to add an internal "literal" flag to a string, the is_literal() function, and nothing else.
Further the RFC states a vision to get "libraries to start using is_literal() to check their inputs." Again, that comes from a great intention.
The problem lies with the fact that library developer who choose to disallow non-literal strings will offer solutions when a use-case literally (pun-intended) cannot produce a literal string.
Sure, most leading library developers will offer a solution, but many of the long-tail library developers will not either because it will add scope and/or because those library developers don't have to skill and/or experience to do so.
So what will those users of those libraries do when faced with a required to only use literal strings? They will find a workaround so they can get their jobs done. And that workaround is really simple:
function make_literal(string $non_literal):string {
$literal = '';
for( $i = 0; $i< strlen($non_literal); $i++ ){
$literal .= chr(ord($non_literal[$i]));
}
return $literal;
}
You can see it in action 3v4l.org http://3v4l.org/ here[1] and for posterity on gist.github.com http://gist.github.com/ here[2].
Once developers start bypassing the is_literal() check then all those good intentions will be moot, and many who think they are secure from injection attacks will be vulnerable:
$sql = 'SELECT * FROM foo WHERE id=' . make_literal( $_GET['id']);
$result = mysqli_query($conn, $sql);
So what am I suggesting we do?
-
We postpone passing this is_literal() RFC until we have collectively addressed how userland developers will be able to handle non-literals in SQL, HTML, etc. when their use-cases require non-literals.
-
We could also go ahead and add the internal "literal" flag to a string so that if someone wants to use it to write an extension to add an is_literal() function in the mean time it would be available, but not yet standardized into PHP.
Thank you in advance for your consideration.
-Mike
[1] https://3v4l.org/oCBp7#focus=rfc.literals https://3v4l.org/oCBp7#focus=rfc.literals
[2] https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18 <https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18
Just a quick reply at the moment, but what happens if a db library had a
method to set what was in the LIMIT part of a query, you did the MySQL
thing of “30, 10” to get paginated results, and then one day the library
changed to require the argument to be an integer?
The idea behind is_literal() is of good intention, but as they say the
road to hell is paved with good intentions.The RFC proposes to add an internal "literal" flag to a string, the
is_literal() function, and nothing else.Further the RFC states a vision to get "libraries to start using
is_literal() to check their inputs." Again, that comes from a great
intention.The problem lies with the fact that library developer who choose to
disallow non-literal strings will offer solutions when a use-case literally
(pun-intended) cannot produce a literal string.Sure, most leading library developers will offer a solution, but many of
the long-tail library developers will not either because it will add scope
and/or because those library developers don't have to skill and/or
experience to do so.So what will those users of those libraries do when faced with a required
to only use literal strings? They will find a workaround so they can get
their jobs done. And that workaround is really simple:function make_literal(string $non_literal):string {
$literal = '';
for( $i = 0; $i< strlen($non_literal); $i++ ){
$literal .= chr(ord($non_literal[$i]));
}
return $literal;
}You can see it in action 3v4l.org http://3v4l.org/ here[1] and for
posterity on gist.github.com http://gist.github.com/ here[2].Once developers start bypassing the is_literal() check then all those good
intentions will be moot, and many who think they are secure from injection
attacks will be vulnerable:$sql = 'SELECT * FROM foo WHERE id=' . make_literal( $_GET['id']);
$result = mysqli_query($conn, $sql);So what am I suggesting we do?
We postpone passing this is_literal() RFC until we have collectively
addressed how userland developers will be able to handle non-literals in
SQL, HTML, etc. when their use-cases require non-literals.We could also go ahead and add the internal "literal" flag to a string
so that if someone wants to use it to write an extension to add an
is_literal() function in the mean time it would be available, but not yet
standardized into PHP.Thank you in advance for your consideration.
-Mike
[1] https://3v4l.org/oCBp7#focus=rfc.literals <
https://3v4l.org/oCBp7#focus=rfc.literals>
[2] https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18
<https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18
Just a quick reply at the moment, but what happens if a db library had a method to set what was in the LIMIT part of a query, you did the MySQL thing of “30, 10” to get paginated results, and then one day the library changed to require the argument to be an integer?
You didn't provide enough details on your hypothetical for a definitive answer, so let's fill in some details so I can provide an answer (but if theses details do not fit the use-case you envision, you'll need to be more specific.)
This db library you speak of is a SQL query builder. Using it would look like this (code examples derived from [1]):
$builder = new MySqlBuilder();
$query = $builder->select()->setTable('user')->limit('30,10');
The version where the above code works is version 2 and there are no type hint on method parameters. Then they release version 3 is which they add an int
type hint on the limit() method, and they introduce an offset() method. This is the scenario I will use to answer your question.
I upgrade to version 3 and my code breaks. I inspect the error, figure out the problem and then I change my code to look like this:
$query = $builder->select()->setTable('user')->offset(30)->limit(10);
Problem solved.
That said, please note the difference between your LIMIT scenario and the is_literal() scenario are three (3) fold:
-
If added to PHP is_literal() will be promoted as the panacea to solve all injection vulnerabilities in (almost?) all the "What's new in PHP" blog posts and then many of the PHP library developers who read these articles will rush to add is_literal() checks to their libraries without fully understanding and addressing the ramifications. And that over-zealousness will take time to recognize and thus recover from.
-
Your LIMIT scenario does not have a coincidental concurrent PR blitz that is_literal() will get meaning that even if some db libraries do constrain LIMIT it will only be a few at a time at most. So no more a problem than any other backward compatibility break in a library.
-
Finally and most importantly, there is an easy fix in userland for your LIMIT scenario as I illustrated. For many use-cases where is_literal() may be added as a gatekeeper there would be no easy fix, except to use a technique that your RFC describes as "clearly the developer doing something wrong."
-Mike
[1] https://github.com/nilportugues/php-sql-query-builder
The idea behind is_literal() is of good intention, but as they say the road to hell is paved with good intentions.
The RFC proposes to add an internal "literal" flag to a string, the is_literal() function, and nothing else.
Further the RFC states a vision to get "libraries to start using is_literal() to check their inputs." Again, that comes from a great intention.
The problem lies with the fact that library developer who choose to disallow non-literal strings will offer solutions when a use-case literally (pun-intended) cannot produce a literal string.
Sure, most leading library developers will offer a solution, but many of the long-tail library developers will not either because it will add scope and/or because those library developers don't have to skill and/or experience to do so.
So what will those users of those libraries do when faced with a required to only use literal strings? They will find a workaround so they can get their jobs done. And that workaround is really simple:
function make_literal(string $non_literal):string {
$literal = '';
for( $i = 0; $i< strlen($non_literal); $i++ ){
$literal .= chr(ord($non_literal[$i]));
}
return $literal;
}You can see it in action 3v4l.org http://3v4l.org/ <http://3v4l.org/ http://3v4l.org/> here[1] and for posterity on gist.github.com http://gist.github.com/ <http://gist.github.com/ http://gist.github.com/> here[2].
Once developers start bypassing the is_literal() check then all those good intentions will be moot, and many who think they are secure from injection attacks will be vulnerable:
$sql = 'SELECT * FROM foo WHERE id=' . make_literal( $_GET['id']);
$result = mysqli_query($conn, $sql);So what am I suggesting we do?
We postpone passing this is_literal() RFC until we have collectively addressed how userland developers will be able to handle non-literals in SQL, HTML, etc. when their use-cases require non-literals.
We could also go ahead and add the internal "literal" flag to a string so that if someone wants to use it to write an extension to add an is_literal() function in the mean time it would be available, but not yet standardized into PHP.
Thank you in advance for your consideration.
-Mike
[1] https://3v4l.org/oCBp7#focus=rfc.literals https://3v4l.org/oCBp7#focus=rfc.literals <https://3v4l.org/oCBp7#focus=rfc.literals https://3v4l.org/oCBp7#focus=rfc.literals>
[2] https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18 https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18<https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18 <https://gist.github.com/mikeschinkel/b9abd4178db461568b813269bc936c18