Hello. I was thinking about a presence of escaped output operator in PHP
and found this feature request: https://bugs.php.net/bug.php?id=62574. I
think this is quite necessary feature. There are a lot of projects which is
written without templating engine, and there are frameworks without
built-in templating engine by default. All this projects require to write
the code. Usually it is rather simple to switch to new version of language,
but it is almost impossible to switch many and many templates on a
templating engine.
Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful. Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical -
to have both forms, escaped and unescaped.
I want to suggest the operator "<?~ $str ?>", which will automatically wrap
output in htmlspecialchars()
. It is mentioned in the feature request above.
It is quite easy to type, and there is a small possibility to write "<?=
?>" instead.
In PHP 7 there are new operators and other changes. I think, new echo
operator also can be added. I can implement it myself.
Hi!
Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful. Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical -
to have both forms, escaped and unescaped.
This has been discussed on the list a number of times. Main issue with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is generic
and can be used in all contexts safely.
--
Stas Malyshev
smalyshev@gmail.com
you can simply add the context to the current output operator:
<?=html $str ?>
<?=attr $str ?>
<?=text $str ?> (=strip_tags)
<?=js $str ?>
<?=css $str ?>
Regards
Thomas
Stanislav Malyshev wrote on 17.06.2016 22:14:
Hi!
Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful. Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical -
to have both forms, escaped and unescaped.This has been discussed on the list a number of times. Main issue with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is generic
and can be used in all contexts safely.--
Stas Malyshev
smalyshev@gmail.com
you can simply add the context to the current output operator:
<?=html($str) ?>
<?=attr($str) ?>
<?=text($str) ?> (=strip_tags)
<?=js($str) ?>
<?=css($str) ?>
Look at that. Add a couple parens and its completely implementable in
userland now with no language changes required.
Regards
ThomasStanislav Malyshev wrote on 17.06.2016 22:14:
Hi!
Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful.
Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical
to have both forms, escaped and unescaped.
This has been discussed on the list a number of times. Main issue with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is generic
and can be used in all contexts safely.--
Stas Malyshev
smalyshev@gmail.com
Sure you can implement that in userland, but people don't do it or make it too complicated,
so you get every day code with unescaped stuff.
Regards
Thomas
Ryan Pallas wrote on 18.06.2016 00:27:
you can simply add the context to the current output operator:
<?=html($str) ?>
<?=attr($str) ?>
<?=text($str) ?> (=strip_tags)
<?=js($str) ?>
<?=css($str) ?>Look at that. Add a couple parens and its completely implementable in userland now with no language changes required.
Regards
ThomasStanislav Malyshev wrote on 17.06.2016 22:14:
Hi!
Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful. Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical -
to have both forms, escaped and unescaped.This has been discussed on the list a number of times. Main issue with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is generic
and can be used in all contexts safely.--
Stas Malyshev
smalyshev@gmail.com mailto:smalyshev@gmail.com--
--
e.g. it won't protect you from XSS if you ever place user-controlled data
in HTML attributes.
As I've found, such an XSS can have a place in the code like this:
$xss = "');your_code_here();//";
<div onmouseover="alert('<?php echo htmlspecialchars($xss, ENT_QUOTES, 'UTF-8') ?>')">I think this is more architectural problem, not an escaping problem. This
is very special case when we really need it.
If you are in a HTML context you need different escaping than you need in
a CSS or JS block.
For JS it's better to usejson_encode()
. And I've never met CSS+PHP output,
this is some special case.
would be misleading people into thinking this operator is generic and can
be used in all contexts safely.
I don't think that many programmers can think so. Anyway, this can be
menthioned in documentation.
The escaping should also be aware of the content encoding.
For special cases - e.g. when we use one encoding and need to output a
value in another encoding -htmlspecialchars()
still can be used.
Sure you can implement that in userland, but people don't do it or make
it too complicated,
so you get every day code with unescaped stuff.
Yes. This is the main problem.
Almost each echo operator is an output of data from database, usually this
is an entity property if the ORM is used or an array key if isn't. I'm not
talking about fully functional escaping operator for all cases, just for
most often case - output a value into HTML document. If we have a shorcut
for "<?php echo $value; ?>" then we also need a shortcut for "<?php echo
htmlspecialchars($value, ENT_QUOTES); ?>", because PHP is a web-programming
language. I think this operator can make many projects more safer.
2016-06-18 3:32 GMT+05:00 Thomas Bley mails@thomasbley.de:
Sure you can implement that in userland, but people don't do it or make it
too complicated,
so you get every day code with unescaped stuff.Regards
ThomasRyan Pallas wrote on 18.06.2016 00:27:
On Fri, Jun 17, 2016 at 2:23 PM, Thomas Bley <mails@thomasbley.de
mailto:mails@thomasbley.de > wrote:you can simply add the context to the current output operator:
<?=html($str) ?>
<?=attr($str) ?>
<?=text($str) ?> (=strip_tags)
<?=js($str) ?>
<?=css($str) ?>Look at that. Add a couple parens and its completely implementable in
userland now with no language changes required.Regards
ThomasStanislav Malyshev wrote on 17.06.2016 22:14:
Hi!
Most of output code is an output of properties of database entities,
and
only in some cases it's needed to concatenate HTML into string and
then
print it with unescaped output. Escaped output operator can be
useful. Also
we output data not into the void and not into simple text file, but
into
HTML-document which has a certain format (markup). Also this is
logical -
to have both forms, escaped and unescaped.This has been discussed on the list a number of times. Main issue with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it
won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is
generic
and can be used in all contexts safely.--
Stas Malyshev
smalyshev@gmail.com mailto:smalyshev@gmail.com--
--
Add a couple parens and its completely implementable in userland
If we could autoload functions, I bet that's what everyone would be doing.
At the moment, no one is able to commit to that pattern, because it
doesn't scale - you can't just keep adding to a list of global
functions (and files) that get aggressively loaded whenever you render
a view, even if each view uses only one or two of them...
So in practice, you minimally end up with something like this:
<?php use My\Stuff\EscapeFunctions as e; ?>
<?=e::html($str) ?>
<?=e::attr($str) ?>
<?=e::text($str) ?>
...
But that isn't really practical either, since you can only cram so
many functions into the same class - at which point you start adding
more classes...
<?php use My\Stuff\EscapeFunctions as e; ?>
<?php use My\Stuff\OtherFunctions as o; ?>
<?=e::html($str) ?>
<?=o::stuff(...) ?>
It quickly gets ugly, messy and confusing.
Then I start thinking about crazy solutions like tokenizing the
template file first and dynamically adding require_once statements for
any functions discovered being used, which would be more convenient,
but quite overly complex for such a small problem - and we're still
talking about occupying the global namespace with lots of functions.
And so you likely end up accepting that it's ugly and inconvenient,
and you resign yourself to use-statements and static methods, or
fully-static classes, which I've taken to referring to as
"psuedo-namespaces", since we're really abusing classes as a kind of
namespace for functions, just so we can get them to autoload.
Functions just aren't all that convenient or useful in PHP, because
they largely depend on manual use of require_once, which feels really
ugly and old-fashioned (since everything else autoloads like it's
supposed to) - and it isn't even always possible, since, for example,
you can't (reliably) know where a Composer package is located relative
to your project or package; it depends on whether your project is
currently the root package (e.g. under test) or an installed package
in the vendor-folder.
I really like pure functions - they're neat, simple and predictable.
In Javascript (and other languages) I always use functions first and
resort to classes only when there's a real clear benefit. In PHP, I
feel like I'm almost always forced into using classes for everything,
mainly because that's what works best in PHP and creates the least
rub.
This has been bothering me for many years - and I wish that I could
propose a solution, but I really don't have any ideas.
Can we do something to improve and encourage the use of functions in PHP?
you can simply add the context to the current output operator:
<?=html($str) ?>
<?=attr($str) ?>
<?=text($str) ?> (=strip_tags)
<?=js($str) ?>
<?=css($str) ?>Look at that. Add a couple parens and its completely implementable in
userland now with no language changes required.Regards
ThomasStanislav Malyshev wrote on 17.06.2016 22:14:
Hi!
Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful.
Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical
to have both forms, escaped and unescaped.
This has been discussed on the list a number of times. Main issue with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is generic
and can be used in all contexts safely.--
Stas Malyshev
smalyshev@gmail.com
Rasmus Schultz rasmus@mindplay.dk schrieb am Sa., 18. Juni 2016, 17:44:
Add a couple parens and its completely implementable in userland
If we could autoload functions, I bet that's what everyone would be doing.
At the moment, no one is able to commit to that pattern, because it
doesn't scale - you can't just keep adding to a list of global
functions (and files) that get aggressively loaded whenever you render
a view, even if each view uses only one or two of them...So in practice, you minimally end up with something like this:
<?php use My\Stuff\EscapeFunctions as e; ?>
<?=e::html($str) ?>
<?=e::attr($str) ?>
<?=e::text($str) ?>
...But that isn't really practical either, since you can only cram so
many functions into the same class - at which point you start adding
more classes...<?php use My\Stuff\EscapeFunctions as e; ?>
<?php use My\Stuff\OtherFunctions as o; ?>
<?=e::html($str) ?>
<?=o::stuff(...) ?>It quickly gets ugly, messy and confusing.
Did you know that you can alias namespaces, too?
<?php use My\Stuff\Escape as esc; ?>
<?=esc\html($str)?>
You can always add more functions to a namespace even spread accross
multiple files.
Then I start thinking about crazy solutions like tokenizing the
template file first and dynamically adding require_once statements for
any functions discovered being used, which would be more convenient,
but quite overly complex for such a small problem - and we're still
talking about occupying the global namespace with lots of functions.And so you likely end up accepting that it's ugly and inconvenient,
and you resign yourself to use-statements and static methods, or
fully-static classes, which I've taken to referring to as
"psuedo-namespaces", since we're really abusing classes as a kind of
namespace for functions, just so we can get them to autoload.Functions just aren't all that convenient or useful in PHP, because
they largely depend on manual use of require_once, which feels really
ugly and old-fashioned (since everything else autoloads like it's
supposed to) - and it isn't even always possible, since, for example,
you can't (reliably) know where a Composer package is located relative
to your project or package; it depends on whether your project is
currently the root package (e.g. under test) or an installed package
in the vendor-folder.I really like pure functions - they're neat, simple and predictable.
In Javascript (and other languages) I always use functions first and
resort to classes only when there's a real clear benefit. In PHP, I
feel like I'm almost always forced into using classes for everything,
mainly because that's what works best in PHP and creates the least
rub.This has been bothering me for many years - and I wish that I could
propose a solution, but I really don't have any ideas.Can we do something to improve and encourage the use of functions in PHP?
On Sat, Jun 18, 2016 at 12:27 AM, Ryan Pallas derokorian@gmail.com
wrote:On Fri, Jun 17, 2016 at 2:23 PM, Thomas Bley mails@thomasbley.de
wrote:you can simply add the context to the current output operator:
<?=html($str) ?>
<?=attr($str) ?>
<?=text($str) ?> (=strip_tags)
<?=js($str) ?>
<?=css($str) ?>Look at that. Add a couple parens and its completely implementable in
userland now with no language changes required.Regards
ThomasStanislav Malyshev wrote on 17.06.2016 22:14:
Hi!
Most of output code is an output of properties of database entities,
and
only in some cases it's needed to concatenate HTML into string and
then
print it with unescaped output. Escaped output operator can be
useful.
Also
we output data not into the void and not into simple text file, but
into
HTML-document which has a certain format (markup). Also this is
logical
to have both forms, escaped and unescaped.
This has been discussed on the list a number of times. Main issue with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it
won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is
generic
and can be used in all contexts safely.--
Stas Malyshev
smalyshev@gmail.com
Rasmus Schultz rasmus@mindplay.dk schrieb am Sa., 18. Juni 2016, 17:44:
Did you know that you can alias namespaces, too?
<?php use My\Stuff\Escape as esc; ?>
<?=esc\html($str)?>You can always add more functions to a namespace even spread accross
multiple files.
Pro-userland: quick reminder that a composer update
is much quicker than
a full system PHP version upgrade.
I'd rather rely on an escaping package written in PHP, easier to maintain
and quicker to upgrade, than something that will likely use some obscure
shared library (or the PHP binary itself) that may not be upgraded for
weird reasons (it's shared, remember?).
I know that you put a lot of effort in security maintenance, but it's still
easier to deal with this stuff in userland in any case, and most templating
languages in common frameworks already inject helpers in the script context
in order to achieve quick, effective and context-aware (no automatic
context detection) escaping.
Marco Pivetta
Guys, wait please) I don't suggest escaping package for all contexts and
for all cases. This is not what I described in my first letter. My point is
that the main job of echo operator "<?= ?>" is output an unknown value from
database to an HTML environment. So in all this places we should copy-pase
the call of htmlspecialchars()
to prevent XSS. There are many projects
which is written on custom engines, or frameworks, or CMS, and they does
not have any templating engine, and there is no possibility to rewrite many
working PHP templates to Twig, or Smarty, or something else.
I suggest new simple operator "<?~ ?>" which will automatically wrap the
output value in htmlspecialchars()
. It is intended specially for HTML, not
for XML or JS. It does not require any php.ini settings, new classes or
constants. The reason for implementing it is the same as for implementing
"??", or "<=>", or "<?= ?>" operators - make better usual and often
operations, descrease copy-paste, and increase security. I can implement it
myself and send a patch.
What do you think?
2016-06-19 12:59 GMT+05:00 Marco Pivetta ocramius@gmail.com:
Rasmus Schultz rasmus@mindplay.dk schrieb am Sa., 18. Juni 2016, 17:44:
Did you know that you can alias namespaces, too?
<?php use My\Stuff\Escape as esc; ?>
<?=esc\html($str)?>You can always add more functions to a namespace even spread accross
multiple files.Pro-userland: quick reminder that a
composer update
is much quicker than
a full system PHP version upgrade.I'd rather rely on an escaping package written in PHP, easier to maintain
and quicker to upgrade, than something that will likely use some obscure
shared library (or the PHP binary itself) that may not be upgraded for
weird reasons (it's shared, remember?).I know that you put a lot of effort in security maintenance, but it's
still easier to deal with this stuff in userland in any case, and most
templating languages in common frameworks already inject helpers in the
script context in order to achieve quick, effective and context-aware (no
automatic context detection) escaping.Marco Pivetta
My point is
that the main job of echo operator "<?= ?>" is output an unknown value from
database to an HTML environment. So in all this places we should copy-pase
the call ofhtmlspecialchars()
to prevent XSS.
The majority of XSS problems are created because the free format input
INTO the application are not correctly handled. Simply banging
htmlspecialchars()
around totally unmanaged text is NOT the solution,
and handling the correct filtering of the inputs is where this should be
handled.
I'm sure all of you see various attempts at XSS and SQL injections in
your log files. About 20% of my overnight traffic is people trying to
'get in' but because I do not allow raw text to get through all it
results in is errors in the log files.
The packages that we have had problems cleaning up have tried using the
'clean the output' approach, but this STILL left holes which can only be
fixed by cleaning the input ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
My point is
that the main job of echo operator "<?= ?>" is output an unknown value
from
database to an HTML environment. So in all this places we should
copy-pase
the call ofhtmlspecialchars()
to prevent XSS.The majority of XSS problems are created because the free format input
INTO the application are not correctly handled. Simply banging
htmlspecialchars()
around totally unmanaged text is NOT the solution,
and handling the correct filtering of the inputs is where this should be
handled.I'm sure all of you see various attempts at XSS and SQL injections in
your log files. About 20% of my overnight traffic is people trying to
'get in' but because I do not allow raw text to get through all it
results in is errors in the log files.The packages that we have had problems cleaning up have tried using the
'clean the output' approach, but this STILL left holes which can only be
fixed by cleaning the input ...
This basically means that you lack basic understanding of how escaping and
user input are to be handled.
Most apps out there about getting a bunch of text from the user, then
rendering it somewhere else in the app.
Cleaning user input just leads to frustration and a big mess in most
scenarios, which is why we're all talking about escaping output instead.
This is not "cleaning" either, it's escaping, which is a non-destructive
and reversible operation (which is why it works so well).
Marco Pivetta
This basically means that you lack basic understanding of how escaping and
user input are to be handled.
Most apps out there about getting a bunch of text from the user, then
rendering it somewhere else in the app.
Cleaning user input just leads to frustration and a big mess in most
scenarios, which is why we're all talking about escaping output instead.
This is not "cleaning" either, it's escaping, which is a non-destructive
and reversible operation (which is why it works so well).
Well we have to disagree ... simply expecting htmlspecialchars()
to fix
all your problems without proper handling of the input text is 'the big
mess' and there is NO need to simply slap htmlspecialchars()
onto
properly built data so the idea that <?= should automatically add it is
totally pointless!
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
This basically means that you lack basic understanding of how escaping
and
user input are to be handled.
Most apps out there about getting a bunch of text from the user, then
rendering it somewhere else in the app.
Cleaning user input just leads to frustration and a big mess in most
scenarios, which is why we're all talking about escaping output instead.
This is not "cleaning" either, it's escaping, which is a non-destructive
and reversible operation (which is why it works so well).Well we have to disagree ... simply expecting
htmlspecialchars()
to fix
all your problems without proper handling of the input text is 'the big
mess' and there is NO need to simply slaphtmlspecialchars()
onto
properly built data so the idea that <?= should automatically add it is
totally pointless!--
Lester Caine - G8HFLContact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk--
Let me tell you a story.
Once upon a time, WordPress decided to escape user input to protect against
XSS attacks. Then this happened: https://klikki.fi/adv/wordpress2.html
(Stored XSS via MySQL Column Truncation vulnerability.)
Escaping against XSS attacks should happen on output, not on input. Dead
stop.
You MAY cache the escaped output for performance gains, but keep the
unescaped data in the database in case you need to adjust your escaping
strategy without mangling existing data.
Further reading:
https://paragonie.com/blog/2015/06/preventing-xss-vulnerabilities-in-php-everything-you-need-know
Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises https://paragonie.com/
From your story Scott, it looks like the failure was bad input filtering,
not input filtering in general. If sites are really trying to be secure,
they should follow both Lester's and your ideas and filter on input and
escape on output.
Given your second link the better suggestion is to stop taking raw HTML.
Assuming user generated HTML is ever safe to re-render in an output page
has been a bad idea for years. Ebay/paypal once thought that stripping all
letters and numbers from JavaScript was enough to make it safe, it wasn't.
Somebody used just things like (){}[]=+ to build functional attack scripts.
While a simple method of output escaping seems like a good idea, I agree
with the others that point out that is one of those security systems where
getting it 90% correct is worse that not doing anything at all. Things like
this will cause people to be blindsided when the uncaught escapes cause the
next major security problem.
Walter
On Sun, Jun 19, 2016 at 10:28 AM, Scott Arciszewski scott@paragonie.com
wrote:
This basically means that you lack basic understanding of how escaping
and
user input are to be handled.
Most apps out there about getting a bunch of text from the user, then
rendering it somewhere else in the app.
Cleaning user input just leads to frustration and a big mess in most
scenarios, which is why we're all talking about escaping output
instead.
This is not "cleaning" either, it's escaping, which is a
non-destructive
and reversible operation (which is why it works so well).Well we have to disagree ... simply expecting
htmlspecialchars()
to fix
all your problems without proper handling of the input text is 'the big
mess' and there is NO need to simply slaphtmlspecialchars()
onto
properly built data so the idea that <?= should automatically add it is
totally pointless!--
Lester Caine - G8HFLContact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk--
Let me tell you a story.
Once upon a time, WordPress decided to escape user input to protect against
XSS attacks. Then this happened: https://klikki.fi/adv/wordpress2.html
(Stored XSS via MySQL Column Truncation vulnerability.)Escaping against XSS attacks should happen on output, not on input. Dead
stop.You MAY cache the escaped output for performance gains, but keep the
unescaped data in the database in case you need to adjust your escaping
strategy without mangling existing data.Further reading:
https://paragonie.com/blog/2015/06/preventing-xss-vulnerabilities-in-php-everything-you-need-know
Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises https://paragonie.com/
--
The greatest dangers to liberty lurk in insidious encroachment by men of
zeal, well-meaning but without understanding. -- Justice Louis D. Brandeis
Lester
there is NO need to simply slap
htmlspecialchars()
onto
properly built data
There are many cases when user data can contain quotes or other html
entities.
Walter
where getting it 90% correct is worse that not doing anything at all.
Things like this will cause people to be blindsided when the uncaught
escapes
cause the next major security problem.
Why do you think so? What real problems can happen if there will be a short
operator for htmlspecialchars()
?
2016-06-19 22:48 GMT+05:00 Walter Parker walterp@gmail.com:
From your story Scott, it looks like the failure was bad input filtering,
not input filtering in general. If sites are really trying to be secure,
they should follow both Lester's and your ideas and filter on input and
escape on output.Given your second link the better suggestion is to stop taking raw HTML.
Assuming user generated HTML is ever safe to re-render in an output page
has been a bad idea for years. Ebay/paypal once thought that stripping all
letters and numbers from JavaScript was enough to make it safe, it wasn't.
Somebody used just things like (){}[]=+ to build functional attack scripts.While a simple method of output escaping seems like a good idea, I agree
with the others that point out that is one of those security systems where
getting it 90% correct is worse that not doing anything at all. Things like
this will cause people to be blindsided when the uncaught escapes cause the
next major security problem.Walter
On Sun, Jun 19, 2016 at 10:28 AM, Scott Arciszewski scott@paragonie.com
wrote:On Sun, Jun 19, 2016 at 1:14 PM, Lester Caine lester@lsces.co.uk
wrote:Well we have to disagree ... simply expecting
htmlspecialchars()
to fix
all your problems without proper handling of the input text is 'the big
mess' and there is NO need to simply slaphtmlspecialchars()
onto
properly built data so the idea that <?= should automatically add it is
totally pointless!--
Lester Caine - G8HFLContact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk--
Let me tell you a story.
Once upon a time, WordPress decided to escape user input to protect
against
XSS attacks. Then this happened: https://klikki.fi/adv/wordpress2.html
(Stored XSS via MySQL Column Truncation vulnerability.)Escaping against XSS attacks should happen on output, not on input. Dead
stop.You MAY cache the escaped output for performance gains, but keep the
unescaped data in the database in case you need to adjust your escaping
strategy without mangling existing data.Further reading:
https://paragonie.com/blog/2015/06/preventing-xss-vulnerabilities-in-php-everything-you-need-know
Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises https://paragonie.com/--
The greatest dangers to liberty lurk in insidious encroachment by men of
zeal, well-meaning but without understanding. -- Justice Louis D.
Brandeis
Lester
<img title="<?= $book['title'] ?>" /> // $book['title'] = 'When we say "Hello"'; <div><?= $user['about_me'] ?></div> // $user['about_me'] = 'I am a programmer. I like to write <script>alert("xss")</script> in "About me" field';there is NO need to simply slap
htmlspecialchars()
onto
properly built data
There are many cases when user data can contain quotes or other html
entities.
( Cut moan about top posting and duplicating sigs and I use plain text
for any email archive )
Now ... I want to add content that includes
<script>alert("xss")</script> it needs to be in the format<script>alert("xss")<script> so that it never
appears in the 'dangerous' format, but if $user['about_me'] is
designated a simple text string, then any attempt to add
processing of text needs to understand what it is expecting to receive
and process it accordingly, so if the content is material such as email
messages it can be correctly processed for storage by escaping if
necessary. The fun comes when you are looking for content such as "About
me" AFTER the data has been sanitised. In this case the search term
needs to be processed as well so "About me" ... so again one
needs to know just what state the data is in and my input process
converts ' to ' as well to be safe when using single quotes.
Of cause there are very good reasons why messages and comments should be
limited to simple text. Many Wordpress/Joomla/etc problems would have
been prevented if the trend to use HTML for everything had not started.
Strip any tags and just leave the raw text is ideal for comment fields
which can be the target for scammers where uncontrolled access may be
required. And if there is a limit on field size in the database, the
same restriction should apply to the data entry ... If the data is
expanded by the sanitising process that also needs to be taken into
account, along with multi byte characters.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Lester Caine lester@lsces.co.uk schrieb am So., 19. Juni 2016, 22:03:
Lester
<img title="<?= $book['title'] ?>" /> // $book['title'] = 'When we say "Hello"'; <div><?= $user['about_me'] ?></div> // $user['about_me'] = 'I am a programmer. I like to write <script>alert("xss")</script> in "About me" field';there is NO need to simply slap
htmlspecialchars()
onto
properly built data
There are many cases when user data can contain quotes or other html
entities.( Cut moan about top posting and duplicating sigs and I use plain text
for any email archive )Now ... I want to add content that includes
<script>alert("xss")</script> it needs to be in the format<script>alert("xss")<script> so that it never
<script>alert("xss")</script> via an input should be blocked!
appears in the 'dangerous' format, but if $user['about_me'] is
designated a simple text string, then any attempt to add
No, it shouldn't be blocked. It should just be escaped on output. What if
that's a comment to a tech blog, where we talk about these things instead
of trying to find a vulnerability?
The input
processing of text needs to understand what it is expecting to receive
and process it accordingly, so if the content is material such as email
messages it can be correctly processed for storage by escaping if
necessary. The fun comes when you are looking for content such as "About
me" AFTER the data has been sanitised. In this case the search term
needs to be processed as well so "About me" ...
One more reason not to escape on input.
so again one
needs to know just what state the data is in and my input process
converts ' to ' as well to be safe when using single quotes.
What if you suddenly start to output it in JSON or plain text format?
Suddenly you need a different escaping.
You really shouldn't escape on input, as your input doesn't know where it's
used.
What you should do on input is validation, so decide whether it's in the
right format. But if it doesn't validate, you reject it and don't even save
it.
Of cause there are very good reasons why messages and comments should be
limited to simple text. Many Wordpress/Joomla/etc problems would have
been prevented if the trend to use HTML for everything had not started.
Strip any tags and just leave the raw text is ideal for comment fields
which can be the target for scammers where uncontrolled access may be
required. And if there is a limit on field size in the database, the
same restriction should apply to the data entry ... If the data is
expanded by the sanitising process that also needs to be taken into
account, along with multi byte characters.--
Lester Caine - G8HFLContact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Now ... I want to add content that includes
<script>alert("xss")</script> it needs to be in the format<script>alert("xss")<script> so that it never
<script>alert("xss")</script> via an input should be blocked!
appears in the 'dangerous' format, but if $user['about_me'] is
designated a simple text string, then any attempt to addNo, it shouldn't be blocked. It should just be escaped on output. What if
that's a comment to a tech blog, where we talk about these things instead
of trying to find a vulnerability?
Re-read what I wrote!
You should ALWAYS sanitise simple text such as short descriptions, and
even user names and other simple text fields and I would always do that
with strings like $user['about_me'] ... '<?~' creates a false sense of
security when users should be educated as to the risks that NOT
validating data can create. Such as overflowing field sizes and creating
text which internally can cause problem even before outputting to a
browser ... such as quotes in combined strings.
( Rowan sums up the output side nicely ... )
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
2016-06-20 11:12 GMT+02:00 Lester Caine lester@lsces.co.uk:
Now ... I want to add content that includes
<script>alert("xss")</script> it needs to be in the format<script>alert("xss")<script> so that it never
<script>alert("xss")</script> via an input should be blocked!
appears in the 'dangerous' format, but if $user['about_me'] is
designated a simple text string, then any attempt to addNo, it shouldn't be blocked. It should just be escaped on output. What if
that's a comment to a tech blog, where we talk about these things instead
of trying to find a vulnerability?Re-read what I wrote!
I read it and I fundamentally disagree with that.
You should ALWAYS sanitise simple text such as short descriptions, and
even user names and other simple text fields and I would always do that
with strings like $user['about_me'] ...
'<?~' creates a false sense of
security
You're right. But it's the case because it doesn't obey the output context.
It's not because it escapes on outpu
when users should be educated as to the risks that NOT
validating data can create. Such as overflowing field sizes and creating
text which internally can cause problem even before outputting to a
browser
Data validation is a totally different topic and not what this thread is
about.
such as quotes in combined strings.
Where's that an issue?
( Rowan sums up the output side nicely ... )
where getting it 90% correct is worse that not doing anything at all.
Things like this will cause people to be blindsided when the uncaught
escapes
cause the next major security problem.Why do you think so? What real problems can happen if there will be a
short operator forhtmlspecialchars()
?What could happen is this getting sold/documented as a general purpose
security feature:
"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB." What it solves is a subset,
which is escaping characters stored in a data that have special meanings to
HTML. My concern is that the remain security issues might get overlooked or
ignored because '<?~' is considered good enough. There are issues with
htmlspecialchars, UTF-8 and certain language-specific characters (non
English). There were also issues with quotes in the past.
Walter
you can never avoid people writing things incorrectly, just look at code using addslashes()
instead of mysql_real_escape_string() ...
Regards
Thomas
Walter Parker wrote on 20.06.2016 01:41:
where getting it 90% correct is worse that not doing anything at all.
Things like this will cause people to be blindsided when the uncaught
escapes
cause the next major security problem.Why do you think so? What real problems can happen if there will be a
short operator forhtmlspecialchars()
?What could happen is this getting sold/documented as a general purpose
security feature:
"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB." What it solves is a subset,
which is escaping characters stored in a data that have special meanings to
HTML. My concern is that the remain security issues might get overlooked or
ignored because '<?~' is considered good enough. There are issues with
htmlspecialchars, UTF-8 and certain language-specific characters (non
English). There were also issues with quotes in the past.Walter
"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB."
I don't think this is a good phrase for documentation. This form should be
considered exactly as htmlspecialchars, with taking into account any
language and encoding-specific issues, and this should be pointed in
documentation. This is a shorcut for often operation, like '??' for isset()
check. And it can really improve security, not in 90% but about 99.9999%
cases.
2016-06-20 4:41 GMT+05:00 Walter Parker walterp@gmail.com:
where getting it 90% correct is worse that not doing anything at all.
Things like this will cause people to be blindsided when the uncaught
escapes
cause the next major security problem.Why do you think so? What real problems can happen if there will be a
short operator forhtmlspecialchars()
?What could happen is this getting sold/documented as a general purpose
security feature:
"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB." What it solves is a subset,
which is escaping characters stored in a data that have special meanings to
HTML. My concern is that the remain security issues might get overlooked or
ignored because '<?~' is considered good enough. There are issues with
htmlspecialchars, UTF-8 and certain language-specific characters (non
English). There were also issues with quotes in the past.Walter
Good, then we do agree, as what I said was what I DID NOT want to see in
the documentation.
This should be documented as shortcut for <? echo htmlspecialchars(string)
?>. It should be further pointed out that while this will be useful in
catching many XSS and other HTML issues, it will not catch all of them, so
care and attention to proper data hygiene is still required.
Walter
On Sun, Jun 19, 2016 at 8:22 PM, Михаил Востриков <
michael.vostrikov@gmail.com> wrote:
"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB."
I don't think this is a good phrase for documentation. This form should be
considered exactly as htmlspecialchars, with taking into account any
language and encoding-specific issues, and this should be pointed in
documentation. This is a shorcut for often operation, like '??' for isset()
check. And it can really improve security, not in 90% but about 99.9999%
cases.2016-06-20 4:41 GMT+05:00 Walter Parker walterp@gmail.com:
where getting it 90% correct is worse that not doing anything at all.
Things like this will cause people to be blindsided when the uncaught
escapes
cause the next major security problem.Why do you think so? What real problems can happen if there will be a
short operator forhtmlspecialchars()
?What could happen is this getting sold/documented as a general purpose
security feature:
"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB." What it solves is a subset,
which is escaping characters stored in a data that have special meanings to
HTML. My concern is that the remain security issues might get overlooked or
ignored because '<?~' is considered good enough. There are issues with
htmlspecialchars, UTF-8 and certain language-specific characters (non
English). There were also issues with quotes in the past.Walter
--
The greatest dangers to liberty lurk in insidious encroachment by men of
zeal, well-meaning but without understanding. -- Justice Louis D. Brandeis
Good, then we do agree, as what I said was what I DID NOT want to see in
the documentation.This should be documented as shortcut for <? echo htmlspecialchars(string)
?>. It should be further pointed out that while this will be useful in
catching many XSS and other HTML issues, it will not catch all of them, so
care and attention to proper data hygiene is still required.Walter
On Sun, Jun 19, 2016 at 8:22 PM, Михаил Востриков <
michael.vostrikov@gmail.com> wrote:"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB."
I don't think this is a good phrase for documentation. This form should
be
considered exactly as htmlspecialchars, with taking into account any
language and encoding-specific issues, and this should be pointed in
documentation. This is a shorcut for often operation, like '??' for
isset()
check. And it can really improve security, not in 90% but about 99.9999%
cases.2016-06-20 4:41 GMT+05:00 Walter Parker walterp@gmail.com:
where getting it 90% correct is worse that not doing anything at all.
Things like this will cause people to be blindsided when the uncaught
escapes
cause the next major security problem.Why do you think so? What real problems can happen if there will be a
short operator forhtmlspecialchars()
?What could happen is this getting sold/documented as a general purpose
security feature:
"Use '<?~' and it will solve your XSS and other escaping problems with
outputting HTML that was stored in a DB." What it solves is a subset,
which is escaping characters stored in a data that have special
meanings to
HTML. My concern is that the remain security issues might get
overlooked or
ignored because '<?~' is considered good enough. There are issues with
htmlspecialchars, UTF-8 and certain language-specific characters (non
English). There were also issues with quotes in the past.
There will never be a way to make this operator useful to a majority of
users or use cases; similar ideas have been discussed many times in the
past.
If we get annotations then you might be able to hook something in from
userland transparently that understands your specific context and
application. This would be much more feasible IMO.
- Davey
Davey, could you give some example? As I see in this discussion, all
specific use cases are associated with output to JS or URL context. But
this is not a majority of use cases. Also, html escaping should not be used
here, json_encode()
or urlencode()
should be used instead.
2016-06-20 8:39 GMT+05:00 Davey Shafik davey@php.net:
Good, then we do agree, as what I said was what I DID NOT want to see in
the documentation.This should be documented as shortcut for <? echo htmlspecialchars(string)
?>. It should be further pointed out that while this will be useful in
catching many XSS and other HTML issues, it will not catch all of them, so
care and attention to proper data hygiene is still required.Walter
There will never be a way to make this operator useful to a majority of
users or use cases; similar ideas have been discussed many times in the
past.If we get annotations then you might be able to hook something in from
userland transparently that understands your specific context and
application. This would be much more feasible IMO.
- Davey
And it can really improve security, not in 90% but about 99.9999%
cases.
I think you are rather overstating how much of a "special edge case" it
is to echo a variable into other contexts like URLs, or JS. It doesn't
need to be anything fancy, just an innocent-looking snippet like this:
There are three different escape mechanism needed there; if there is a
shorthand for one, do you think it will be more likely or less that
people will get the other two right?
Regards,
Rowan Collins
[IMSoP]
<ul> <?php foreach ( $things as $thing ) { ?> <li><a href="/things/<?= $thing['name'] ?>" onclick="show_popup('<?= $thing['name'] ?>');"><?= $thing['name'] ?></a> <?php } ?> </ul>There are three different escape mechanism needed there; if there is a
shorthand for one, do you think it will be more likely or less that people
will get the other two right?
I have to agree with that - assigning special syntax to one kind of
escape-function gives that function an elevated status, which could
easily encourage neglect and oversight.
I do wish that we had an obvious, consistently-named set of
web-related escape/encode functions for use in plain PHP templates,
like html(), attr(), js(), etc... having to type and read
htmlspecialchars()
and json_encode()
while you're trying to visually
parse a template is really inconvenient.
That's all it is though, inconvenience. Nice to have, not must have.
I'd be much more interested in a general solution to the problem of
being unable to (or at least strongly demotivated from) using actual
namespaced functions in this and many other cases - that's a missing
feature and a more general problem, whereas in my opinion an operator
or shorter function-names are just a work-around...
(and please, nobody say "use a template engine" - I am using a
template engine, it's called PHP!)
(and please, nobody say "use a template engine" - I am using a
template engine, it's called PHP!)
My PHP is augmented with Smarty so I know which are template files and
which are program code :)
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
My PHP is augmented with Smarty so I know which are template files and
which are program code :)
I name my template files "*.view.php", so I know which is which.
I also head off every file with /** @var MyViewModel $view */ for IDE
support and inspections with CS/MD/phan, etc.
If you're curious: https://github.com/mindplay-dk/kisstpl
I quit smarty many years ago, and I will not use a template engine,
ever again - I don't want to write my views in an entirely different
language in favor of some convenience; especially when that
convenience comes with another whole slew of inconveniences...
2016-06-20 17:51 GMT+02:00 Rasmus Schultz rasmus@mindplay.dk:
My PHP is augmented with Smarty so I know which are template files and
which are program code :)I name my template files "*.view.php", so I know which is which.
I also head off every file with /** @var MyViewModel $view */ for IDE
support and inspections with CS/MD/phan, etc.If you're curious: https://github.com/mindplay-dk/kisstpl
I quit smarty many years ago, and I will not use a template engine,
ever again - I don't want to write my views in an entirely different
language in favor of some convenience; especially when that
convenience comes with another whole slew of inconveniences...
Well, until you want to share your templates in multiple languages like a
version in JS to use it directly in the frontend.
Davey
- https://marc.info/?t=145851323800001&r=1&w=2 — automatic template
escaping- https://marc.info/?t=135082660600002&r=1&w=2 — this one even proposed
the same syntax!- https://marc.info/?t=144225546000001&r=1&w=2 — tainted variables also
"solves" this problem
These discussions and arguments against are all about
super-universal-escaping operator, and that escaping method depends on
context. Third discussion is even a little different thing, second
discussion is more closer to my proposal.
I suggest an operator for special context - HTML markup, because this is
most often used context. This is shown in examlple below.
Rowan
I think you are rather overstating how much of a "special edge case" it
<ul> <?php foreach ( $things as $thing ) { ?> <li><a href="/things/<?= $thing['name'] ?>" onclick="show_popup('<?=
is to echo a variable into other contexts like URLs,
or JS. It doesn't need to be anything fancy, just an innocent-looking
snippet like this:
$thing['name'] ?>');"><?= $thing['name'] ?></a>
<?php } ?>
</ul> There are three different escape mechanism needed there; if there is a
shorthand for one,
do you think it will be more likely or less that people will get the
other two right?
Actually, htmlspecialchars()
is needed in all three cases:
<?php $thing = ['name' => 'Say "Hello")']; ?>
<a
href="/things/<?= htmlspecialchars(urlencode($thing['name'])) ?>"
onclick="alert(<?= htmlspecialchars(json_encode($thing['name']),
ENT_QUOTES) ?>); return false"
<?= htmlspecialchars($thing['name']) ?>
</a>
You may not write htmlspecialchars together with urlencode just because
urlencode encodes all special characters with its own way.
Imagine that urlencode does not encode quotes - what function should we
call for its result?
That's why I say this is very often case. The main purpose of PHP - is
web-programming and generating HTML (hypertext preprocessor, yes).
The fact itself, that there were many discussions about it, indicates that
it is a necessary feature.
Actually,
htmlspecialchars()
is needed in all three cases:
...
You may not write htmlspecialchars together with urlencode just because
urlencode encodes all special characters with its own way.
So, not needed in all 3 cases then...
Imagine that urlencode does not encode quotes - what function should we
call for its result?
Ideally, an escape filter that performs both functions; if the aim is to make things easier, I shouldn't need to think about the need to nest two escape functions. If I still have to use non-obvious combinations of magic syntax plus function calls, the claim of "secure by default" doesn't really stand up. The ~ becomes nothing more than an alias that I still need to remember when to deploy.
<script>$('[data-thing-id="<?~ $thing['name'] ?>]').on('click', function(){doThing('<?~ $thing['name'] ?>'});</script>I'm pretty sure the tempting syntax is actively harmful in that situation...
The fact itself, that there were many discussions about it, indicates
that
it is a necessary feature.
Popularity is not the same thing as necessity. More relevantly, even when we agree on the problem, the simple solution isn't always the best, sometimes it pays to think a bit more broadly about the problem space. Larry's escaper registration is one example of that.
HackLang's XHP is another - rather than thinking about escaping as an action, it gives the compiler richer knowledge of the structure, so it can "know" the right escape syntax. If the compiler could look at my previous example and recognise the attribute, URL, script, and text contexts itself, then you really would have security-by-default. Unfortunately, that too is tricky to generalise - what is the correct escape method for an attribute named "data-my-action"...?
Regards,
--
Rowan Collins
[IMSoP]
If you're curious: https://github.com/mindplay-dk/kisstpl
https://github.com/bitweaver ... couple of thousand templates with my
personal extensions ... which I would not even consider rewriting and
time soon. Moving from Smarty2 to 3 was bad enough ... and we still keep
hitting missed bits.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
<ul> <?php foreach ( $things as $thing ) { ?> <li><a href="/things/<?= $thing['name'] ?>" onclick="show_popup('<?= $thing['name'] ?>');"><?= $thing['name'] ?></a> <?php } ?> </ul>There are three different escape mechanism needed there; if there is a
shorthand for one, do you think it will be more likely or less that people
will get the other two right?
I have to agree with that - assigning special syntax to one kind of
escape-function gives that function an elevated status, which could
easily encourage neglect and oversight.I do wish that we had an obvious, consistently-named set of
web-related escape/encode functions for use in plain PHP templates,
like html(), attr(), js(), etc... having to type and read
htmlspecialchars()
andjson_encode()
while you're trying to visually
parse a template is really inconvenient.That's all it is though, inconvenience. Nice to have, not must have.
I'd be much more interested in a general solution to the problem of
being unable to (or at least strongly demotivated from) using actual
namespaced functions in this and many other cases - that's a missing
feature and a more general problem, whereas in my opinion an operator
or shorter function-names are just a work-around...(and please, nobody say "use a template engine" - I am using a
template engine, it's called PHP!)
In many of the PHP template engines, there are multiple "filters"
available with a specific syntax, and a way to add more. You have to
specify which one you want, because only you know the context.
Twig (and possibly others?) lets you set a default escaper that can be
overridden case by case as needed, including by a "none" option.
Rather than try to make print statements themselves into a more secure
template layer (Sorry, Rasmus, that ship has long since sailed, even
Drupal gave up on PHPTemplate), perhaps it would be more useful to look
at the needs of the various template engines (Twig, Smarty, Blade, etc.)
and see what the language can/should do to make it easier for those
engines to be made secure. That same underlying tooling, then, can be
exposed in a way that those still using PHP itself as a template engine
can leverage it more easily.
I'm not 100% certain what that would look like. My initial thought
(which is potentially a terrible idea), is some sort of callable
registration, akin to stream wrappers or autoloaders, where you could do
something vaguely like this:
register_escaper('html', function($var) { return htmlentities($var,
ENT_QUOTES
| ENT_HTML5); });
register_escaper('html_attrib', function($var) { ... });
register_escaper('raw', function($var) { return $var;});
set_default_escaper('html'); // I dislike a global flag like this, but
it works for this demonstration.
<?php
print $foo; // no escaping, because BC.
printe $foo; // Run through html escaper
printe($foo, 'html_attrib'); // Run through html_attrib escaper
?>
<?= $foo; ?> // No escaping
<?~ $foo ?> // Run through html escaper
<?~html_attrib $foo ?> // Run through html_attrib escaper
And then Twig, Smarty, etc. can also leverage the registered escapers,
and add their own as appropriate.
That's off-the-cuff syntax that may be terrible, but the goal here is to
better enable template engines (whether thick ones like Twig or thin
ones like *.view.php) to do security better, rather than trying to Solve
It For All The Things(tm), which for security is a very dangerous
dead-end approach.
--Larry Garfield
A "filter" is just a function - the difference is just global state
indicating the current "default" function, which I think is a very bad
idea.
Just alias function calls as closures:
$html = function ($str) { return htmlspecialchars($str); }; // "default filter"
$attr = function ($str) { ... } // "attribute filter"
Then call them:
<?= $html($foo) ?>
<?= $attr($attr) ?>
IMO, this is much clearer and simpler (and more flexible) than hiding
a function name in global state.
Incidentally, that's what Aura HTML does: https://github.com/auraphp/Aura.Html
Rather than inventing a new concept (filters) why not leverage a
well-known quantity like functions? If they could autoload, that would
be a more explicit, natural and still pretty convenient way to
accomplish the same thing.
Hmm. Here's a thought.
You can already mix use-statements, right? I mean, you can "use"
either class/interface name or a namespace name.
At the time when PHP encounters the use-statement, it doesn't actually
decide what you imported, it just creates a file-wide alias for a
name.
That's why you can have a class and namespace with the same name and
"use" them both with one use-statement.
What if that worked for functions too?
use MyNamespace\fun;
fun(); // -> MyNamespace\fun()
This makes it convenient to import and call namespaced functions.
Okay, but to the real problem, autoloading functions... what if
spl_autoload_register()
was triggered for missing functions as well as
for missing classes?
I know, I know - BC break, but think about it... with Composer there's
no problem? it'll just attempt to autoload "MyNamespace/fun.php" and
give up. If you were to follow that naming convention, it would
probably just work without any modifications at all.
What's funny is that spl_autoload_register()
documentation page
doesn't in fact even say that it's for classes, hehehe ;-)
Okay, so I'm only half serious - It would probably be cleaner to add a
dedicated spl_autoload_func_register() or something?
<ul> <?php foreach ( $things as $thing ) { ?> <li><a href="/things/<?= $thing['name'] ?>" onclick="show_popup('<?= $thing['name'] ?>');"><?= $thing['name'] ?></a> <?php } ?> </ul>There are three different escape mechanism needed there; if there is a
shorthand for one, do you think it will be more likely or less that
people
will get the other two right?I have to agree with that - assigning special syntax to one kind of
escape-function gives that function an elevated status, which could
easily encourage neglect and oversight.I do wish that we had an obvious, consistently-named set of
web-related escape/encode functions for use in plain PHP templates,
like html(), attr(), js(), etc... having to type and read
htmlspecialchars()
andjson_encode()
while you're trying to visually
parse a template is really inconvenient.That's all it is though, inconvenience. Nice to have, not must have.
I'd be much more interested in a general solution to the problem of
being unable to (or at least strongly demotivated from) using actual
namespaced functions in this and many other cases - that's a missing
feature and a more general problem, whereas in my opinion an operator
or shorter function-names are just a work-around...(and please, nobody say "use a template engine" - I am using a
template engine, it's called PHP!)In many of the PHP template engines, there are multiple "filters" available
with a specific syntax, and a way to add more. You have to specify which
one you want, because only you know the context.Twig (and possibly others?) lets you set a default escaper that can be
overridden case by case as needed, including by a "none" option.Rather than try to make print statements themselves into a more secure
template layer (Sorry, Rasmus, that ship has long since sailed, even Drupal
gave up on PHPTemplate), perhaps it would be more useful to look at the
needs of the various template engines (Twig, Smarty, Blade, etc.) and see
what the language can/should do to make it easier for those engines to be
made secure. That same underlying tooling, then, can be exposed in a way
that those still using PHP itself as a template engine can leverage it more
easily.I'm not 100% certain what that would look like. My initial thought (which
is potentially a terrible idea), is some sort of callable registration, akin
to stream wrappers or autoloaders, where you could do something vaguely like
this:register_escaper('html', function($var) { return htmlentities($var,
ENT_QUOTES
| ENT_HTML5); });
register_escaper('html_attrib', function($var) { ... });
register_escaper('raw', function($var) { return $var;});set_default_escaper('html'); // I dislike a global flag like this, but it
works for this demonstration.<?php
print $foo; // no escaping, because BC.
printe $foo; // Run through html escaper
printe($foo, 'html_attrib'); // Run through html_attrib escaper
?><?= $foo; ?> // No escaping
<?~ $foo ?> // Run through html escaper
<?~html_attrib $foo ?> // Run through html_attrib escaperAnd then Twig, Smarty, etc. can also leverage the registered escapers, and
add their own as appropriate.That's off-the-cuff syntax that may be terrible, but the goal here is to
better enable template engines (whether thick ones like Twig or thin ones
like *.view.php) to do security better, rather than trying to Solve It For
All The Things(tm), which for security is a very dangerous dead-end
approach.--Larry Garfield
Further reading:
https://paragonie.com/blog/2015/06/preventing-xss-vulnerabilities-in-php-everything-you-need-know
Thanks!
Minor issue:
| If you failed to specify ENT_QUOTES
and attacker simply needs to pass
| " onload="malicious javascript code as a value to that form field and
| presto, instant client-side code execution.
That's not correct, unless ENT_NOQUOTES
would have been specified. The
default of htmlspecialchars()
is to escape double-quotes, but to leave
single-quotes alone.
--
Christoph M. Becker
I think it's best to create a rfc and put it to vote: https://wiki.php.net/rfc/howto
Having <?~ makes it a lot easier to do code reviews.
I also think majority of use cases is <?~, other parts can use json_encode()
, filter_var()
and other filters/escapers.
Regards
Thomas
Михаил Востриков wrote on 19.06.2016 10:38:
Guys, wait please) I don't suggest escaping package for all contexts and
for all cases. This is not what I described in my first letter. My point is
that the main job of echo operator "<?= ?>" is output an unknown value from
database to an HTML environment. So in all this places we should copy-pase
the call ofhtmlspecialchars()
to prevent XSS. There are many projects
which is written on custom engines, or frameworks, or CMS, and they does
not have any templating engine, and there is no possibility to rewrite many
working PHP templates to Twig, or Smarty, or something else.I suggest new simple operator "<?~ ?>" which will automatically wrap the
output value inhtmlspecialchars()
. It is intended specially for HTML, not
for XML or JS. It does not require any php.ini settings, new classes or
constants. The reason for implementing it is the same as for implementing
"??", or "<=>", or "<?= ?>" operators - make better usual and often
operations, descrease copy-paste, and increase security. I can implement it
myself and send a patch.What do you think?
2016-06-19 12:59 GMT+05:00 Marco Pivetta ocramius@gmail.com:
Rasmus Schultz rasmus@mindplay.dk schrieb am Sa., 18. Juni 2016, 17:44:
Did you know that you can alias namespaces, too?
<?php use My\Stuff\Escape as esc; ?>
<?=esc\html($str)?>You can always add more functions to a namespace even spread accross
multiple files.Pro-userland: quick reminder that a
composer update
is much quicker than
a full system PHP version upgrade.I'd rather rely on an escaping package written in PHP, easier to maintain
and quicker to upgrade, than something that will likely use some obscure
shared library (or the PHP binary itself) that may not be upgraded for
weird reasons (it's shared, remember?).I know that you put a lot of effort in security maintenance, but it's
still easier to deal with this stuff in userland in any case, and most
templating languages in common frameworks already inject helpers in the
script context in order to achieve quick, effective and context-aware (no
automatic context detection) escaping.Marco Pivetta
Please give me RFC karma. My wiki account is "michael-vostrikov". I plan to
create an RFC for this feature.
2016-06-19 21:09 GMT+05:00 Thomas Bley mails@thomasbley.de:
I think it's best to create a rfc and put it to vote:
https://wiki.php.net/rfc/howtoHaving <?~ makes it a lot easier to do code reviews.
I also think majority of use cases is <?~, other parts can use
json_encode()
,filter_var()
and other filters/escapers.Regards
ThomasМихаил Востриков wrote on 19.06.2016 10:38:
Guys, wait please) I don't suggest escaping package for all contexts and
for all cases. This is not what I described in my first letter. My point
is
that the main job of echo operator "<?= ?>" is output an unknown value
from
database to an HTML environment. So in all this places we should
copy-pase
the call ofhtmlspecialchars()
to prevent XSS. There are many projects
which is written on custom engines, or frameworks, or CMS, and they does
not have any templating engine, and there is no possibility to rewrite
many
working PHP templates to Twig, or Smarty, or something else.I suggest new simple operator "<?~ ?>" which will automatically wrap the
output value inhtmlspecialchars()
. It is intended specially for HTML,
not
for XML or JS. It does not require any php.ini settings, new classes or
constants. The reason for implementing it is the same as for implementing
"??", or "<=>", or "<?= ?>" operators - make better usual and often
operations, descrease copy-paste, and increase security. I can implement
it
myself and send a patch.What do you think?
On Sun, Jun 19, 2016 at 6:53 PM, Михаил Востриков <
michael.vostrikov@gmail.com> wrote:
Please give me RFC karma. My wiki account is "michael-vostrikov". I plan to
create an RFC for this feature.
hi,
I've just granted you with rfc karma on the wiki.
--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Did you know that you can alias namespaces, too?
Yes
You can always add more functions to a namespace even spread accross multiple files
Same problem: no autoloading.
You would have to add require_one statements - which, as said, is not
really possible with Composer packages...
Rasmus Schultz rasmus@mindplay.dk schrieb am Sa., 18. Juni 2016, 17:44:
Add a couple parens and its completely implementable in userland
If we could autoload functions, I bet that's what everyone would be doing.
At the moment, no one is able to commit to that pattern, because it
doesn't scale - you can't just keep adding to a list of global
functions (and files) that get aggressively loaded whenever you render
a view, even if each view uses only one or two of them...So in practice, you minimally end up with something like this:
<?php use My\Stuff\EscapeFunctions as e; ?>
<?=e::html($str) ?>
<?=e::attr($str) ?>
<?=e::text($str) ?>
...But that isn't really practical either, since you can only cram so
many functions into the same class - at which point you start adding
more classes...<?php use My\Stuff\EscapeFunctions as e; ?>
<?php use My\Stuff\OtherFunctions as o; ?>
<?=e::html($str) ?>
<?=o::stuff(...) ?>It quickly gets ugly, messy and confusing.
Did you know that you can alias namespaces, too?
<?php use My\Stuff\Escape as esc; ?>
<?=esc\html($str)?>You can always add more functions to a namespace even spread accross
multiple files.Then I start thinking about crazy solutions like tokenizing the
template file first and dynamically adding require_once statements for
any functions discovered being used, which would be more convenient,
but quite overly complex for such a small problem - and we're still
talking about occupying the global namespace with lots of functions.And so you likely end up accepting that it's ugly and inconvenient,
and you resign yourself to use-statements and static methods, or
fully-static classes, which I've taken to referring to as
"psuedo-namespaces", since we're really abusing classes as a kind of
namespace for functions, just so we can get them to autoload.Functions just aren't all that convenient or useful in PHP, because
they largely depend on manual use of require_once, which feels really
ugly and old-fashioned (since everything else autoloads like it's
supposed to) - and it isn't even always possible, since, for example,
you can't (reliably) know where a Composer package is located relative
to your project or package; it depends on whether your project is
currently the root package (e.g. under test) or an installed package
in the vendor-folder.I really like pure functions - they're neat, simple and predictable.
In Javascript (and other languages) I always use functions first and
resort to classes only when there's a real clear benefit. In PHP, I
feel like I'm almost always forced into using classes for everything,
mainly because that's what works best in PHP and creates the least
rub.This has been bothering me for many years - and I wish that I could
propose a solution, but I really don't have any ideas.Can we do something to improve and encourage the use of functions in PHP?
On Sat, Jun 18, 2016 at 12:27 AM, Ryan Pallas derokorian@gmail.com
wrote:On Fri, Jun 17, 2016 at 2:23 PM, Thomas Bley mails@thomasbley.de
wrote:you can simply add the context to the current output operator:
<?=html($str) ?>
<?=attr($str) ?>
<?=text($str) ?> (=strip_tags)
<?=js($str) ?>
<?=css($str) ?>Look at that. Add a couple parens and its completely implementable in
userland now with no language changes required.Regards
ThomasStanislav Malyshev wrote on 17.06.2016 22:14:
Hi!
Most of output code is an output of properties of database entities,
and
only in some cases it's needed to concatenate HTML into string and
then
print it with unescaped output. Escaped output operator can be
useful.
Also
we output data not into the void and not into simple text file, but
into
HTML-document which has a certain format (markup). Also this is
logical
to have both forms, escaped and unescaped.
This has been discussed on the list a number of times. Main issue
with
this kind of proposals is that escaping is context-dependent. E.g.
htmlspecialchars()
would not help you in many scenarios - e.g. it
won't
protect you from XSS if you ever place user-controlled data in HTML
attributes. Having operator for each of the possible contexts does
not
really looks feasible, and having it for only one of them and not the
others would be misleading people into thinking this operator is
generic
and can be used in all contexts safely.--
Stas Malyshev
smalyshev@gmail.com
You can always add more functions to a namespace even spread accross
multiple filesSame problem: no autoloading.
You would have to add require_one statements - which, as said, is not
really possible with Composer packages...
You should look at packages that already do this:
https://github.com/nikic/iter/blob/5527ca489bf151ceef17622f1c89114640f522d2/composer.json#L16
Ref: https://getcomposer.org/doc/04-schema.md#files
Marco Pivetta
I am well familiar with this approach, and it does not scale - not
only would you be aggressively loading every installed view-helper
anytime you render a view, you would even be loading them when you're
not rendering a view.
I'm afraid the best we could do at this point, without changing the
language, is a establish a convention for autoloading functions
(and/or namespaces of functions) from files, based on static analysis
of template files.
But that is pretty complex - on the organizational side, it requires
developers to agree on and adopt a convention, and on the technical
side, you need static analysis and thereby most likely a cache layer
as well.
It's all possible, but most people aren't going to put up with this
much complexity for something this simple.
Hmm. What if we could import static methods into file scope and use
them as functions?
use My\Namespace::my_function;
my_function(); // <-- effectively My\Namespace::my_function()
This would leverage auto-loading at least... I mean, it's still
effectively just abusing classes as pseudo-namespaces, so there is
that - but it would work with e.g. Composer right away, and probably
with many existing static classes?
Yeah, it's still ugly...
You can always add more functions to a namespace even spread accross
multiple filesSame problem: no autoloading.
You would have to add require_one statements - which, as said, is not
really possible with Composer packages...You should look at packages that already do this:
https://github.com/nikic/iter/blob/5527ca489bf151ceef17622f1c89114640f522d2/composer.json#L16Ref: https://getcomposer.org/doc/04-schema.md#files
Marco Pivetta
I am well familiar with this approach, and it does not scale - not
only would you be aggressively loading every installed view-helper
anytime you render a view, you would even be loading them when you're
not rendering a view.I'm afraid the best we could do at this point, without changing the
language, is a establish a convention for autoloading functions
(and/or namespaces of functions) from files, based on static analysis
of template files.But that is pretty complex - on the organizational side, it requires
developers to agree on and adopt a convention, and on the technical
side, you need static analysis and thereby most likely a cache layer
as well.It's all possible, but most people aren't going to put up with this
much complexity for something this simple.Hmm. What if we could import static methods into file scope and use
them as functions?use My\Namespace::my_function;
my_function(); // <-- effectively My\Namespace::my_function()
This would leverage auto-loading at least... I mean, it's still
effectively just abusing classes as pseudo-namespaces, so there is
that - but it would work with e.g. Composer right away, and probably
with many existing static classes?Yeah, it's still ugly...
Ugly, but brilliant! +1
David
Add a couple parens and its completely implementable in userland
If we could autoload functions, I bet that's what everyone would be doing.
FWIW, there is an respective RFC draft[1] "lying around". See also
https://bugs.php.net/72459.
[1] https://wiki.php.net/rfc/function_autoloading
--
Christoph M. Becker
beauty! when can we have that?? :-)
beauty! when can we have that?? :-)
Maybe never, but at least somebody would have to pursue the RFC. See
also the related discussion from 2013, starting with
http://news.php.net/php.internals/68693.
--
Christoph M. Becker
Hi Thomas,
you can simply add the context to the current output operator:
<?=html $str ?>
<?=attr $str ?>
<?=text $str ?> (=strip_tags)
<?=js $str ?>
<?=css $str ?>
We need <?=uri $str ?> in addition. If we adopt this, we must document
clearly that LDAP, SQL, etc are not supported.
I like this idea a lot. Output context is clear and explicit.
We may be better to consider "<?= $str" to be "<?php echo
htmlspecialchars($str)" rather than "<?php echo $str", but this change
would be for PHP 8.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
We may be better to consider "<?= $str" to be "<?php echo
htmlspecialchars($str)" rather than "<?php echo $str", but this change
would be for PHP 8.
No, that would be highly inadvisable. As it's been pointed out, people
use PHP templating for things besides just HTML. And if you made this
change, then <?= could no longer be used in echoing out javascript, XML,
RSS feeds, making custom API responses, etc.
Eli
--
| Eli White | http://eliw.com/ | Twitter: EliW |
Hi,
the issue is that things have to be escaped dependent on the context. If
you are in a HTML context you need different escaping than you need in a
CSS or JS block. The escaping should also be aware of the content encoding.
All that makes it difficult for PHP to directly support such an operator.
You can always alias "e" or something like that to be your default escape
function.
Regards, Niklas
Михаил Востриков michael.vostrikov@gmail.com schrieb am Fr., 17. Juni
2016, 21:29:
Hello. I was thinking about a presence of escaped output operator in PHP
and found this feature request: https://bugs.php.net/bug.php?id=62574. I
think this is quite necessary feature. There are a lot of projects which is
written without templating engine, and there are frameworks without
built-in templating engine by default. All this projects require to write
the code. Usually it is rather simple to switch to new version of language,
but it is almost impossible to switch many and many templates on a
templating engine.Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful. Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical -
to have both forms, escaped and unescaped.I want to suggest the operator "<?~ $str ?>", which will automatically wrap
output inhtmlspecialchars()
. It is mentioned in the feature request above.
It is quite easy to type, and there is a small possibility to write "<?=
?>" instead.In PHP 7 there are new operators and other changes. I think, new echo
operator also can be added. I can implement it myself.
using the default encoding from php.ini's default_charset should be no problem, htmlspecialchars()
already does it if the encoding parameter is not provided.
Regards
Thomas
Niklas Keller wrote on 17.06.2016 22:31:
Hi,
the issue is that things have to be escaped dependent on the context. If
you are in a HTML context you need different escaping than you need in a
CSS or JS block. The escaping should also be aware of the content encoding.
All that makes it difficult for PHP to directly support such an operator.You can always alias "e" or something like that to be your default escape
function.Regards, Niklas
Михаил Востриков michael.vostrikov@gmail.com schrieb am Fr.,
- Juni
2016, 21:29:Hello. I was thinking about a presence of escaped output operator in PHP
and found this feature request: https://bugs.php.net/bug.php?id=62574. I
think this is quite necessary feature. There are a lot of projects which is
written without templating engine, and there are frameworks without
built-in templating engine by default. All this projects require to write
the code. Usually it is rather simple to switch to new version of language,
but it is almost impossible to switch many and many templates on a
templating engine.Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful. Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical -
to have both forms, escaped and unescaped.I want to suggest the operator "<?~ $str ?>", which will automatically wrap
output inhtmlspecialchars()
. It is mentioned in the feature request above.
It is quite easy to type, and there is a small possibility to write "<?=
?>" instead.In PHP 7 there are new operators and other changes. I think, new echo
operator also can be added. I can implement it myself.
Thomas, are you actually reading and understanding what the others are
saying?
You seem to be answering questions that have not been asked or giving the
simple, easy and wrong answer.
Walter
using the default encoding from php.ini's default_charset should be no
problem,htmlspecialchars()
already does it if the encoding parameter is
not provided.Regards
ThomasNiklas Keller wrote on 17.06.2016 22:31:
Hi,
the issue is that things have to be escaped dependent on the context. If
you are in a HTML context you need different escaping than you need in a
CSS or JS block. The escaping should also be aware of the content
encoding.
All that makes it difficult for PHP to directly support such an operator.You can always alias "e" or something like that to be your default escape
function.Regards, Niklas
Михаил Востриков michael.vostrikov@gmail.com schrieb am Fr.,
- Juni
2016, 21:29:Hello. I was thinking about a presence of escaped output operator in PHP
and found this feature request: https://bugs.php.net/bug.php?id=62574.
I
think this is quite necessary feature. There are a lot of projects
which is
written without templating engine, and there are frameworks without
built-in templating engine by default. All this projects require to
write
the code. Usually it is rather simple to switch to new version of
language,
but it is almost impossible to switch many and many templates on a
templating engine.Most of output code is an output of properties of database entities, and
only in some cases it's needed to concatenate HTML into string and then
print it with unescaped output. Escaped output operator can be useful.
Also
we output data not into the void and not into simple text file, but into
HTML-document which has a certain format (markup). Also this is logical
to have both forms, escaped and unescaped.
I want to suggest the operator "<?~ $str ?>", which will automatically
wrap
output inhtmlspecialchars()
. It is mentioned in the feature request
above.
It is quite easy to type, and there is a small possibility to write "<?=
?>" instead.In PHP 7 there are new operators and other changes. I think, new echo
operator also can be added. I can implement it myself.--
--
The greatest dangers to liberty lurk in insidious encroachment by men of
zeal, well-meaning but without understanding. -- Justice Louis D. Brandeis