Hi,
I've written up my suggestion for a is_literal() function:
https://wiki.php.net/rfc/is_literal
Any feedback would be appreciated.
Craig
Hi,
I've written up my suggestion for a is_literal() function:
https://wiki.php.net/rfc/is_literal
Any feedback would be appreciated.
Craig
While I appreciate the intent, without an untaint() or equivalent I fear its usefulness will be limited, or else it will get overused and thus cut off numerous entirely valid situations.
Eg, there's plenty of very good reasons to put a template string into the database rather than a file literal. Or to build an SQL query dynamically in ways that an is_literal check would not allow, at least not without an absurdly complex query builder.
Without a way to flag "yes, I know this was built dynamically but I've vetted it, it's OK" on a value, I fear such a check will either be unuseful or counter-productive.
--Larry Garfield
Eg, there's plenty of very good reasons to put a template string into the
database rather than a file literal. Or to build an SQL query dynamically
in ways that an is_literal check would not allow, at least not without an
absurdly complex query builder.
Thanks Larry,
I think the examples I've provided should cover the issues that typically
get raised.
The main ones tend to be "WHERE x IN (?,?,?)" and "ORDER BY variable",
where the current work arounds get a bit risky (such as string escaping),
but please let me know if I've missed any.
Craig
Hi,
I've written up my suggestion for a is_literal() function:
https://wiki.php.net/rfc/is_literal
Any feedback would be appreciated.
Craig
While I appreciate the intent, without an untaint() or equivalent I fear
its usefulness will be limited, or else it will get overused and thus cut
off numerous entirely valid situations.Eg, there's plenty of very good reasons to put a template string into the
database rather than a file literal. Or to build an SQL query dynamically
in ways that an is_literal check would not allow, at least not without an
absurdly complex query builder.Without a way to flag "yes, I know this was built dynamically but I've
vetted it, it's OK" on a value, I fear such a check will either be unuseful
or counter-productive.--Larry Garfield
Any feedback would be appreciated.
The RFC should probably define what a literal is.
cheers
Dan
Ack
On Sat, 21 Mar 2020 at 19:14, Craig Francis craig@craigfrancis.co.uk
wrote:Any feedback would be appreciated.
The RFC should probably define what a literal is.
Good point Dan, I've added it to the intro.
Please let me know if it needs more work (I'm under the impression that PHP
already defines these as literals, hence the name).
Hi,
I've written up my suggestion for a is_literal() function:
https://wiki.php.net/rfc/is_literal
Any feedback would be appreciated.
Craig
This seems very similar to the taint extension. How does it differ from ext-taint, and have you considered working together with ext-taint?
https://www.php.net/manual/en/intro.taint.php
The name is_literal() is too ambiguous to me. As Dan suggested, the RFC should define what a literal is.
Cheers,
Ben
This seems very similar to the taint extension
Hi Ben,
Yes, it is similar; I hope I've covered the differences under the "Taint
Checking" heading (please let me know if that could be improved).
https://wiki.php.net/rfc/is_literal#taint_checking
As to the name, it's to work alongside functions such as
is_int()
, is_string()
, etc - is that a good enough reason?
Craig
On Mar 21, 2020, at 14:13, Craig Francis craig@craigfrancis.co.uk
wrote:Hi,
I've written up my suggestion for a is_literal() function:
https://wiki.php.net/rfc/is_literal
Any feedback would be appreciated.
Craig
This seems very similar to the taint extension. How does it differ from
ext-taint, and have you considered working together with ext-taint?https://www.php.net/manual/en/intro.taint.php
The name is_literal() is too ambiguous to me. As Dan suggested, the RFC
should define what a literal is.Cheers,
Ben
As to the name, it's to work alongside functions such as
is_int()
,is_string()
, etc - is that a good enough reason?
I think it could cause confusion since int and string are value types,
- literal is not a type as such, it merely describes the way the value
was created.
I'm under the impression that PHP already defines these as literals
Does anyone know if that is correct? Does PHP remember how the string
variable / constant was created?
On Sat, Mar 21, 2020 at 3:26 PM Craig Francis craig@craigfrancis.co.uk
wrote:As to the name, it's to work alongside functions such as
is_int()
,is_string()
, etc - is that a good enough reason?I think it could cause confusion since int and string are value types,
- literal is not a type as such, it merely describes the way the value
was created.
I'm happy to use a different name; but I should add that is_numeric()
isn't
really a type, there are other functions such as is_writable()
, and the
taint extension uses is_tainted().
I'm under the impression that PHP already defines these as literals
Does anyone know if that is correct? Does PHP remember how the string
variable / constant was created?
I've talked to Paul Dragoonis and Derick Rethans recently (they both kindly
did talks at PHP-SW); when I mentioned it to Paul, I was told that's where
I should start looking, and that was the correct terminology; and Derick
helped confirm some of these ideas (but we were walking to the pub at the
time).
And while I keep trying, I don't know enough about C, or the internals of
PHP.
I'm happy to use a different name; but I should add that
is_numeric()
isn't really a type, there are other functions such asis_writable()
, and the taint extension uses is_tainted().
Right, good points. However, to my logic, whether or not the value was
created from a literal is not something you can infer from the value
itself, it needs an accompanying flag or something.
I'll suggest is_from_literal() as a more precise formulation.
I've talked to Paul Dragoonis and Derick Rethans recently (they both kindly did talks at PHP-SW); when I mentioned it to Paul, I was told that's where I should start looking, and that was the correct terminology; and Derick helped confirm some of these ideas (but we were walking to the pub at the time).
Sounds like you were having fun though :-)
And while I keep trying, I don't know enough about C, or the internals of PHP.
Appreciate the effort.
I'll suggest is_from_literal() as a more precise formulation.
Good suggestion, I've added it to the list of Open Issues (if we can
determine the different ways this can be achieved, without affecting
performance, then we can see what people prefer for the name).
Sounds like you were having fun though :-)
I was; although I must confess, I didn't recognise Derick until the talk
began (then, suddenly, the voice from the PHP Internals podcast).
On Sat, Mar 21, 2020 at 4:58 PM Craig Francis craig@craigfrancis.co.uk
wrote:I'm happy to use a different name; but I should add that
is_numeric()
isn't really a type, there are other functions such asis_writable()
, and
the taint extension uses is_tainted().Right, good points. However, to my logic, whether or not the value was
created from a literal is not something you can infer from the value
itself, it needs an accompanying flag or something.I'll suggest is_from_literal() as a more precise formulation.
I've talked to Paul Dragoonis and Derick Rethans recently (they both
kindly did talks at PHP-SW); when I mentioned it to Paul, I was told that's
where I should start looking, and that was the correct terminology; and
Derick helped confirm some of these ideas (but we were walking to the pub
at the time).Sounds like you were having fun though :-)
And while I keep trying, I don't know enough about C, or the internals
of PHP.Appreciate the effort.
Hi Craig,
https://github.com/laruence/taint#taint notes that
"Please note that do not enable this extension in product(ion) env, since it will slowdown your app."
-
That repo already provides is_tainted() http://docs.php.net/is_tainted
A fork of that repo would theoretically allow implementing is_literal() as described in the RFC - is that the implementation plan?
-
The slowdown would be a large problem if this feature was always on.
And if it can be implemented as a PECL module, that would be more preferable to me than a core module of php.
If it was in core, having to support that feature may limit optimizations or implementation changes that could be done in the future.
If it's implemented in the same way as taint (i.e. cannot be used in combination with XDebug, blackfire, newrelic, etc),
that would also be a problem for including it in core.
If it wasn't, then it'd slow down concatenation, calls, etc. even when the application didn't need is_literal.
I also imagine that whether or not opcache was enabled is likely to affect whether or not
something ends up being a literal or not
(e.g. opcache can evaluate functions such as str_repeat()
for literals at compile time)
https://github.com/laruence/taint/blob/master/taint.c seems to already support a whitelist (php_taint_override_func),
so that isn't insurmountable for functions,
but it's possible if ($local === 'literal') { process($local); }
would only satisfy is_literal() with opcache enabled.
Related projects (static analysis instead of runtime checks, though):
It's also worth noting that vimeo/psalm
had an in progress way to detect some ways in which tainted strings may be used by applications, based on a paper by Facebook.
(https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext (for HHVM, though))
https://github.com/vimeo/psalm/issues/611#issuecomment-515153305 - but it isn't completed or usable yet, as far as I can tell.
Wikimedia also created https://gerrit.wikimedia.org/g/mediawiki/tools/phan/SecurityCheckPlugin/ , but that's currently beta.
Both would have ways they fail to catch every way an argument could be passed to a function (e.g. unanalyzable dynamic/framework calls)
- Tyson
Hi Tyson,
Thanks for your email, I really appreciate it.
I'm not a C developer, but if I was to have a go at implementing
is_literal()
, I probably would start with the existing taint extension.
But you're right, that approach can cause performance issues, and doesn't
play well with XDebug (much to my annoyance).
I'm hoping that someone with more C and PHP Internals experience might have
a better solution, as there was some mention of the existing engine knowing
about literals being semi-special, and there might be (I'm really showing
my ignorance here) a better way to work with the compiler.
As to using PECL; my hope is that we can use this to help educate
developers about the dangers of mixing variables with commands. While they
won't think to use this function themselves, it can be used by frameworks
to ensure mistakes aren't being made (and who knows, maybe one day
functions like mysqli_query()
could use this as well).
As to the related projects / static analysis; while I do appreciate them,
similar tools have been available for years (admittedly some with huge
price tags), and I don't think they have really solved these problems. That
said, I have added a note in the RFC.
Thanks again,
Craig
Hi Craig,
https://github.com/laruence/taint#taint notes that
"Please note that do not enable this extension in product(ion) env, since
it will slowdown your app."
That repo already provides is_tainted() http://docs.php.net/is_tainted
A fork of that repo would theoretically allow implementing is_literal()
as described in the RFC - is that the implementation plan?The slowdown would be a large problem if this feature was always on.
And if it can be implemented as a PECL module, that would be more
preferable to me than a core module of php.
If it was in core, having to support that feature may limit
optimizations or implementation changes that could be done in the future.If it's implemented in the same way as taint (i.e. cannot be used in
combination with XDebug, blackfire, newrelic, etc),
that would also be a problem for including it in core.
If it wasn't, then it'd slow down concatenation, calls, etc. even when the
application didn't need is_literal.I also imagine that whether or not opcache was enabled is likely to affect
whether or not
something ends up being a literal or not
(e.g. opcache can evaluate functions such asstr_repeat()
for literals at
compile time)
https://github.com/laruence/taint/blob/master/taint.c seems to already
support a whitelist (php_taint_override_func),
so that isn't insurmountable for functions,
but it's possibleif ($local === 'literal') { process($local); }
would
only satisfy is_literal() with opcache enabled.Related projects (static analysis instead of runtime checks, though):
It's also worth noting that
vimeo/psalm
had an in progress way to detect
some ways in which tainted strings may be used by applications, based on a
paper by Facebook.
(
https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext
(for HHVM, though))
https://github.com/vimeo/psalm/issues/611#issuecomment-515153305 - but it
isn't completed or usable yet, as far as I can tell.Wikimedia also created
https://gerrit.wikimedia.org/g/mediawiki/tools/phan/SecurityCheckPlugin/
, but that's currently beta.
Both would have ways they fail to catch every way an argument could be
passed to a function (e.g. unanalyzable dynamic/framework calls)
- Tyson