I'd like to submit an RFC (with a pull request) for adding auto-escaping to
the php language.
We at iFixit.com have used PHP for nearly a decade to run our website.
Several years ago, we abandoned the Smarty templating engine and used php
files directly as templates. This worked, but was a bit unsafe and made it
too easy to leave user submitted content unescaped. Several years ago we
switched to using a modified version of PHP that included auto-escaping and
it has been working great. In the process of preparing to use php 7, I've
re-implemented the changes against the master branch.
I'd like to gauge interest in this before I formally submit an RFC. Here's
a somewhat better description that I've attached to a pull request on our
internal fork of php.
Pull request on internal fork: https://github.com/iFixit/php-src/pull/14
Background
PHP doesn't have any mechanism to inject logic between templating
and final output. There is no way to filter or alter the content
that comes from code in templates like: <?= $someVar ?>
To use php as a robust templataing language, we must inject some
logic between templates and their output. We have chosen to make
all <?=, echo, and print statements subject to an optional
trip through the internal function php_escape_html_entitiles.
The functionality can be toggled with ini_set('__auto_escape')
and configured with __auto_escape_flags
and
__auto_escape_exempt_class
(see commit
https://github.com/iFixit/php-src/commit/2dae5d16436ce37856f6e00ca2a1b3009bb1f7ed
for info about the class name based auto-escaping exemption.
Methodology
T_ECHO
(echo, <?=) and T_PRINT
(print) now both emit a
ZEND_AST_ECHO_ESCAPE node in the syntax tree.
That's compiled to a function which emits a ZEND_ECHO_ESCAPE op code.
The op code interpretation is a dupe of ZEND_ECHO except with some
if() statements that switch the underlying function from zend_write
to zend_write_escape
based on the ini settings.
zend_write_escape is a new function pointer that points to
php_escape_write.
php_escape_write is a new function that passes it's string argument
through php_escape_html_entities() (with __auto_escape_flags) before
calling the underlying php_output_write.
Use
This functionality allows us to safely use php straight as a
templating language with no template compilation step (as many
other templating libraries have).
See the included tests for more usage information.
Exempt Class
It is useful to allow some utility functions and helpers to produce
html and have it passed straight through in the template (without
being double-encoded). We accomplish this by tagging strings
as being HTML.
class HtmlString implements JsonSerializable {
protected $html = '';
public function __construct($html) {
$this->html = $html;
}
public function __toString() {
return (string)$this->html;
}
public function jsonSerialize() {
return $this->html;
}
}
The auto-escaping system can be configured with an:
__auto_escape_exempt_class="HtmlString"
Which allows instances of HtmlString
to pass straight through a
template without being modified (skipping the html_entities call).
Helper functions can now return html safely and consumers don't have
to care if it is HTML or not because the auto-escaping system knows
what to do.
Thanks for your consideration!
Daniel Beardsley
I'd like to submit an RFC (with a pull request) for adding auto-escaping to
the php language.
T_ECHO
(echo, <?=) andT_PRINT
(print) now both emit a
ZEND_AST_ECHO_ESCAPE node in the syntax tree.
Interesting approach, I assume an explicit echo $foo;
takes the
normal ZEND_ECHO route then?
zend_write_escape is a new function pointer that points to
php_escape_write.
Pluggable, good.
The auto-escaping system can be configured with an:
__auto_escape_exempt_class="HtmlString"
I wonder if allowing the classes to declare themselves as exempty (or
self-escapable) might be a better approach.
e.g.
class Foo implements HtmlEscapable {
public function htmlEscape() {
return htmlentites($this->whatever);
}
}
Which allows instances of
HtmlString
to pass straight through a
template without being modified (skipping the html_entities call).
IME once you provide an escape hatch, said hatch WILL be used. It's
not a question of IF.
For my part, I'd toss the idea of XHP (
https://docs.hhvm.com/hack/XHP/introduction ) back into consideration
over something like this.
This approach has the smell of magic quotes which we got rid of for
very good reason. XHP is much more explicit in separating markup from
data and relies far less (not at all when you do it right) on escape
hatches.
-Sara
I agree XHP really is the right solution for this problem. It enables HTML
to be created structurally and composably with a concise inline syntax,
just like JSX/React does for JavaScript, and just like LINQ does for SQL in
C#. It's* much* better than passing around snippets of HTML as strings that
can easily break.
I'd like to submit an RFC (with a pull request) for adding auto-escaping
to
the php language.
T_ECHO
(echo, <?=) andT_PRINT
(print) now both emit a
ZEND_AST_ECHO_ESCAPE node in the syntax tree.
Interesting approach, I assume an explicit echo $foo;
takes the
normal ZEND_ECHO route then?
zend_write_escape is a new function pointer that points to
php_escape_write.
Pluggable, good.
The auto-escaping system can be configured with an:
__auto_escape_exempt_class="HtmlString"
I wonder if allowing the classes to declare themselves as exempty (or
self-escapable) might be a better approach.
e.g.
class Foo implements HtmlEscapable {
public function htmlEscape() {
return htmlentites($this->whatever);
}
}
Which allows instances of
HtmlString
to pass straight through a
template without being modified (skipping the html_entities call).
IME once you provide an escape hatch, said hatch WILL be used. It's
not a question of IF.
For my part, I'd toss the idea of XHP (
https://docs.hhvm.com/hack/XHP/introduction ) back into consideration
over something like this.
This approach has the smell of magic quotes which we got rid of for
very good reason. XHP is much more explicit in separating markup from
data and relies far less (not at all when you do it right) on escape
hatches.
-Sara
I agree XHP really is the right solution for this problem. It enables HTML
to be created structurally and composably with a concise inline syntax, just
like JSX/React does for JavaScript, and just like LINQ does for SQL in C#.
It's much better than passing around snippets of HTML as strings that can
easily break.
Yeah, XHP is a great way to go, but it's so different from traditional
templating and is a decent performance penalty that it's hard for pre-existing
large projects to adopt.
Hi - I’m currently the maintainer of facebook/xhp-lib; as XHP’s been mentioned a few times here, if someone’s interested in making a PHP7 version:
- while I won’t be able to directly work on a PHP7 port, I’ll be happy to help in any other way (eg any questions about design/decisions)
- the main branch is Hack-only; the PHP-5 compatible stuff is here: https://github.com/facebook/xhp-lib/tree/1.x https://github.com/facebook/xhp-lib/tree/1.x - https://github.com/facebookarchive/xhp-php5-extension https://github.com/facebookarchive/xhp-php5-extension
- the performance of XHP under HHVM is usually not a practical issue; it seems likely to be less of an issue under PHP7. A good way to test could be to desugar some tests to plain PHP (https://github.com/facebookarchive/xhp-php5-extension/blob/master/xhp/xhpize.cpp https://github.com/facebookarchive/xhp-php5-extension/blob/master/xhp/xhpize.cpp ), then run them under PHP5 and PHP7; this way you can measure without having to port the extension first
- there are workarounds for when it is a real issue, eg https://github.com/hhvm/user-documentation/blob/master/src/site/xhp/APCCachedRenderable.php https://github.com/hhvm/user-documentation/blob/master/src/site/xhp/APCCachedRenderable.php - here, the performance was an insignificant part of real requests, but the navigation integration tests were effectively a worst-case microbenchmark and test run time is important to us.
While facebook/xhp-lib must remain Hack because of the nature of it’s async support, I’d be happy to work with a separate PHP7 library to avoid avoid any unnecessary differences - hopefully to the point where we could have a shared ‘provides’ in composer. One option would be for the bulk of the files to be transpiled (HHVM provides ‘h2tp’ for this), but have core classes like :x:element written separately to strip out Async stuff.
Regards,
- Fred
I agree XHP really is the right solution for this problem. It enables HTML
to be created structurally and composably with a concise inline syntax, just
like JSX/React does for JavaScript, and just like LINQ does for SQL in C#.
It's much better than passing around snippets of HTML as strings that can
easily break.Yeah, XHP is a great way to go, but it's so different from traditional
templating and is a decent performance penalty that it's hard for pre-existing
large projects to adopt.
T_ECHO
(echo, <?=) andT_PRINT
(print) now both emit a
ZEND_AST_ECHO_ESCAPE node in the syntax tree.Interesting approach, I assume an explicit
echo $foo;
takes the
normal ZEND_ECHO route then?
No, looking at the code and tests: echo, print, and <?= all end up
parsing to a ZEND_AST_ECHO_ESCAPE which emits a
ZEND_ECHO_ESCAPE op code when compiled. Only inline
html still compiles as ZEND_ECHO.
Which allows instances of
HtmlString
to pass straight through a
template without being modified (skipping the html_entities call).IME once you provide an escape hatch, said hatch WILL be used. It's
not a question of IF.
For sure, mistakes can be made with any system, but this helps
dangerous code look more wrong: new HtmlString($username)
is obviously wrong. And it makes the correct things require little
to no extra code: <?= $username ?> is always safe.
For my part, I'd toss the idea of XHP (
https://docs.hhvm.com/hack/XHP/introduction ) back into consideration
over something like this.
XHP is pretty sweet, but I imagine there are a decent number of people
that don't consider using it because it is such a departure from traditional
templating.
This approach has the smell of magic quotes which we got rid of for
very good reason. XHP is much more explicit in separating markup from
data and relies far less (not at all when you do it right) on escape
hatches.
Huh, I don't see similarities to magic quotes at all. That had to do with
attempting to sanitize input data (plenty of problems with that). All
templating systems have a means of making the default output
mechanism perform escaping and a means of preventing that
escaping with, this adds the same for php templates.
-Sara
T_ECHO
(echo, <?=) andT_PRINT
(print) now both emit a
ZEND_AST_ECHO_ESCAPE node in the syntax tree.Interesting approach, I assume an explicit
echo $foo;
takes the
normal ZEND_ECHO route then?No, looking at the code and tests: echo, print, and <?= all end up
parsing to a ZEND_AST_ECHO_ESCAPE which emits a
ZEND_ECHO_ESCAPE op code when compiled. Only inline
html still compiles as ZEND_ECHO.Which allows instances of
HtmlString
to pass straight through a
template without being modified (skipping the html_entities call).IME once you provide an escape hatch, said hatch WILL be used. It's
not a question of IF.For sure, mistakes can be made with any system, but this helps
dangerous code look more wrong: new HtmlString($username)
is obviously wrong. And it makes the correct things require little
to no extra code: <?= $username ?> is always safe.For my part, I'd toss the idea of XHP (
https://docs.hhvm.com/hack/XHP/introduction ) back into consideration
over something like this.XHP is pretty sweet, but I imagine there are a decent number of people
that don't consider using it because it is such a departure from
traditional
templating.This approach has the smell of magic quotes which we got rid of for
very good reason. XHP is much more explicit in separating markup from
data and relies far less (not at all when you do it right) on escape
hatches.Huh, I don't see similarities to magic quotes at all. That had to do with
attempting to sanitize input data (plenty of problems with that). All
templating systems have a means of making the default output
mechanism perform escaping and a means of preventing that
escaping with, this adds the same for php templates.
Not the default (php) output but their default behavior when no specific
escape method (or filter/whatever else) has not been specified.
This is a huge difference with is proposed here.
Not sure about having such features in the core. It does sound like trying
to solve a real issue but using the wrong solution or in the wrong place.
Cheers,
Pierre
This approach has the smell of magic quotes which we got rid of for
very good reason. XHP is much more explicit in separating markup from
data and relies far less (not at all when you do it right) on escape
hatches.Huh, I don't see similarities to magic quotes at all. That had to do with
attempting to sanitize input data (plenty of problems with that). All
templating systems have a means of making the default output
mechanism perform escaping and a means of preventing that
escaping with, this adds the same for php templates.
The similarity is that magic quotes assumed that the input data was going
to be embedded within an SQL query without escaping, and therefore needed
escaping. Of course that's an invalid assumption, the input data could be
re-rendered, processed in some arbitrary way, written to a file, sent in an
email, to another web service, etc etc.
This feature makes a similar assumption about output, rather than input.
Specifically, it assumes that the output is HTML, and what is being echoed
hasn't already been escaped and therefore needs to be escaped. Of course
that's an invalid assumption, command line scripts do echo/print of plain
text, and I've seen PHP scripts generate JSON (eg a web service),
JavaScript, CSS and plain text via the output buffer. Not to mention
anything could so
ob_start()
;
// ...
echo $blah;
// ...
$foo = ob_get_clean()
;
or
ob_start()
;
// ...
?>...<?= $blah ?>...<?
// ...
$foo = ob_get_clean()
;
and have an expectation about $foo.
The similarity is that magic quotes assumed that the input data was going to
be embedded within an SQL query without escaping, and therefore needed
escaping. Of course that's an invalid assumption, the input data could be
re-rendered, processed in some arbitrary way, written to a file, sent in an
email, to another web service, etc etc.This feature makes a similar assumption about output, rather than input.
Specifically, it assumes that the output is HTML, and what is being echoed
hasn't already been escaped and therefore needs to be escaped.
True, but the difference is that safety is the default instead of
the exception. Every system has an assumption. It's better that
mistakes about escaping cause double-escaped html than
an XSS hole.
that's an invalid assumption, command line scripts do echo/print of plain
text, and I've seen PHP scripts generate JSON (eg a web service),
JavaScript, CSS and plain text via the output buffer. Not to mention
anything could so
I'm sorry, I wasn't clear in the RFC. This feature is meant to only be
turned on during template rendering (imagine you have a Template
class):
function render() {
set_ini('__auto_escape', 1);
require $this->templatePath;
set_ini('__auto_escape', 0);
}
I think having the behaviour of language features depend in an incompatible
way on a global runtime setting is a bad idea because it creates nonlocal
effects and means code cannot be realiably composed. Effectively, every
function and method will have an implicit assumption about whether or not
it is supposed to be called "during templating" i.e. with __auto_escape set
to 0 or 1. If you are very careful to separate your "templating" code
from the rest of your code and not to call either from the other, I guess
it would work, but it creates a burden on the programmers I'd rather them
not have. Without this setting, I know I always need to do <?= to_html(
$text ) ?>. Easy. But now to figure out whether I need to escape my HTML or
not I have traverse the call graph to try to figure out what the value
of __auto_escape is going to be at runtime. Eugh.
The similarity is that magic quotes assumed that the input data was
going to
be embedded within an SQL query without escaping, and therefore needed
escaping. Of course that's an invalid assumption, the input data could be
re-rendered, processed in some arbitrary way, written to a file, sent in
an
email, to another web service, etc etc.This feature makes a similar assumption about output, rather than input.
Specifically, it assumes that the output is HTML, and what is being
echoed
hasn't already been escaped and therefore needs to be escaped.True, but the difference is that safety is the default instead of
the exception. Every system has an assumption. It's better that
mistakes about escaping cause double-escaped html than
an XSS hole.that's an invalid assumption, command line scripts do echo/print of plain
text, and I've seen PHP scripts generate JSON (eg a web service),
JavaScript, CSS and plain text via the output buffer. Not to mention
anything could soI'm sorry, I wasn't clear in the RFC. This feature is meant to only be
turned on during template rendering (imagine you have a Template
class):function render() {
set_ini('__auto_escape', 1);
require $this->templatePath;
set_ini('__auto_escape', 0);
}
I think having the behaviour of language features depend in an incompatible
way on a global runtime setting is a bad idea because it creates nonlocal
effects and means code cannot be realiably composed.
This is probably the best argument against this RFC. Though how often
that issue would come up... I have no idea. It hasn't yet in our usage
(2 years) but we don't use that many external libraries during template
rendering, at least none that have used their own templating.
A potential solution is to create a new syntax like: <?E= or something
but that of course has even more challenges and backward
incompatability, unless it's done in an extension.. hmm. You would then
have to prevent usage of <?= in templates with a pre-commit hook
or add the check to your CI build.
Effectively, every
function and method will have an implicit assumption about whether or not it
is supposed to be called "during templating" i.e. with __auto_escape set to
0 or 1. If you are very careful to separate your "templating" code from the
rest of your code and not to call either from the other, I guess it would
work, but it creates a burden on the programmers I'd rather them not have.
I think I fail to see the burden. You write code as normal, it's always safe
to <?= $anything ?> so no thought required there.
When some template helper function generates html,
it must tag it as such upon returning: return html($someHtmlStr)
Perhaps I've never never had the need to use php's templating
features to generate something other than HTML during our
HTML template rendering phase.
Without this setting, I know I always need to do <?= to_html( $text ) ?>.
Easy.
Sure, but that's a lot of ugly syntax and you'd have to enforce usage of
that function with a regex in a pre-commit hook or a CI build to prevent
dangerous mistakes.
But now to figure out whether I need to escape my HTML or not I have
traverse the call graph to try to figure out what the value of __auto_escape
is going to be at runtime. Eugh.
Huh? I think I'm missing something, or my description wasn't clear enough.
The point of auto is that you don't need to escape anything.
Templates shouldn't need to include new HtmlString()
nor htmlspecialchars()
. Functions that generate html simply return an
HtmlString object. The template will pass them straight through.
Our methodology has been to mark content as HTML at the source
(when it is generated, in small bits) and the downstream (the templates)
don't have to care and can safely echo anything.
Hi!
True, but the difference is that safety is the default instead of
the exception. Every system has an assumption. It's better that
This sounds as the major assumption is there's some procedure ("the
safety") that allows to render any output safe. This could not be more
wrong. Escaping is highly context-dependent, and without knowing
specific details of the context it is impossible to do proper escaping.
I do not see how by setting one flag you could provide proper context.
Moreover, one template may include multiple contexts.
I'm sorry, I wasn't clear in the RFC. This feature is meant to only be
turned on during template rendering (imagine you have a Template
class):
I think there's assumption here templates only exist in one context or
at least allow user data only in one context. This is not true, of
course. But if it were true, this code would be trivial to make safe:
function render() {
set_ini('__auto_escape', 1);
require $this->templatePath;
set_ini('__auto_escape', 0);
}
function render() {
ob_start()
;
require $this->templatePath;
echo magic_security_filter(ob_get_clean());
}
--
Stas Malyshev
smalyshev@gmail.com
Hi Daniel,
I'd like to submit an RFC (with a pull request) for adding auto-escaping to
the php language.We at iFixit.com have used PHP for nearly a decade to run our website.
Several years ago, we abandoned the Smarty templating engine and used php
files directly as templates. This worked, but was a bit unsafe and made it
too easy to leave user submitted content unescaped. Several years ago we
switched to using a modified version of PHP that included auto-escaping and
it has been working great. In the process of preparing to use php 7, I've
re-implemented the changes against the master branch.I'd like to gauge interest in this before I formally submit an RFC. Here's
a somewhat better description that I've attached to a pull request on our
internal fork of php.Pull request on internal fork: https://github.com/iFixit/php-src/pull/14
Background
PHP doesn't have any mechanism to inject logic between templating
and final output. There is no way to filter or alter the content
that comes from code in templates like: <?= $someVar ?>To use php as a robust templataing language, we must inject some
logic between templates and their output. We have chosen to make
all <?=, echo, and print statements subject to an optional
trip through the internal function php_escape_html_entitiles.The functionality can be toggled with
ini_set('__auto_escape')
and configured with__auto_escape_flags
and
__auto_escape_exempt_class
(see commit
https://github.com/iFixit/php-src/commit/2dae5d16436ce37856f6e00ca2a1b3009bb1f7ed
for info about the class name based auto-escaping exemption.Methodology
T_ECHO
(echo, <?=) andT_PRINT
(print) now both emit a
ZEND_AST_ECHO_ESCAPE node in the syntax tree.That's compiled to a function which emits a ZEND_ECHO_ESCAPE op code.
The op code interpretation is a dupe of ZEND_ECHO except with some
if() statements that switch the underlying function fromzend_write
tozend_write_escape
based on the ini settings.zend_write_escape is a new function pointer that points to
php_escape_write.php_escape_write is a new function that passes it's string argument
through php_escape_html_entities() (with __auto_escape_flags) before
calling the underlying php_output_write.Use
This functionality allows us to safely use php straight as a
templating language with no template compilation step (as many
other templating libraries have).See the included tests for more usage information.
Exempt Class
It is useful to allow some utility functions and helpers to produce
html and have it passed straight through in the template (without
being double-encoded). We accomplish this by tagging strings
as being HTML.class HtmlString implements JsonSerializable { protected $html = ''; public function __construct($html) { $this->html = $html; } public function __toString() { return (string)$this->html; } public function jsonSerialize() { return $this->html; } }
The auto-escaping system can be configured with an:
__auto_escape_exempt_class="HtmlString"Which allows instances of
HtmlString
to pass straight through a
template without being modified (skipping the html_entities call).
Helper functions can now return html safely and consumers don't have
to care if it is HTML or not because the auto-escaping system knows
what to do.Thanks for your consideration!
Daniel Beardsley
Issue is "Escaping is done on a specific context".
I understand your proposal is focused on HTML escaping. However,
setting names like
__auto_escape_exempt_class
is not good choice. It has to be
__auto_html_escape_exempt_class
at least because it is for HTML escaping.
In addition, HTML consists of multiple contexts
- HTML context that requires HTML escape
- URI context that requires URI escape
- JavaScript context, embedded JavaScript strings for example , that
requires JavaScript string escape, etc.
e.g. http://blog.ohgaki.net/javascript-string-escape (Sorry. It's
my blog and written in Japanese.
You may try translation service or you should be able to understand
PHP code at least) - CSS context that requires CSS escape.
e.g. https://developer.mozilla.org/ja/docs/Web/API/CSS/escape - And so on
Dealing HTML context only would be problematic even if it works for many cases.
Escaping must be done depends on context. Multiple contexts may apply
also. HTML context only escaping would not work well.. Applying proper
escapes to variables in HTML is very complex task..
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Wouldn't this __auto_escape setting effectively break libraries that depend
on it being on or off?
People often write code to generate HTML like this:
ob_start()
;
?>
<?php
$html = ob_get_clean()
;
If that code is in a library, it can't be used with this setting enabled.
That could become a real pain point for the whole PHP ecosystem.
Hi Daniel,
On Mon, Mar 21, 2016 at 7:11 AM, Daniel Beardsley daniel@ifixit.com
wrote:I'd like to submit an RFC (with a pull request) for adding auto-escaping
to
the php language.We at iFixit.com have used PHP for nearly a decade to run our website.
Several years ago, we abandoned the Smarty templating engine and used php
files directly as templates. This worked, but was a bit unsafe and made
it
too easy to leave user submitted content unescaped. Several years ago we
switched to using a modified version of PHP that included auto-escaping
and
it has been working great. In the process of preparing to use php 7, I've
re-implemented the changes against the master branch.I'd like to gauge interest in this before I formally submit an RFC.
Here's
a somewhat better description that I've attached to a pull request on our
internal fork of php.Pull request on internal fork: https://github.com/iFixit/php-src/pull/14
Background
PHP doesn't have any mechanism to inject logic between templating
and final output. There is no way to filter or alter the content
that comes from code in templates like: <?= $someVar ?>To use php as a robust templataing language, we must inject some
logic between templates and their output. We have chosen to make
all <?=, echo, and print statements subject to an optional
trip through the internal function php_escape_html_entitiles.The functionality can be toggled with
ini_set('__auto_escape')
and configured with__auto_escape_flags
and
__auto_escape_exempt_class
(see commithttps://github.com/iFixit/php-src/commit/2dae5d16436ce37856f6e00ca2a1b3009bb1f7ed
for info about the class name based auto-escaping exemption.
Methodology
T_ECHO
(echo, <?=) andT_PRINT
(print) now both emit a
ZEND_AST_ECHO_ESCAPE node in the syntax tree.That's compiled to a function which emits a ZEND_ECHO_ESCAPE op code.
The op code interpretation is a dupe of ZEND_ECHO except with some
if() statements that switch the underlying function fromzend_write
tozend_write_escape
based on the ini settings.zend_write_escape is a new function pointer that points to
php_escape_write.php_escape_write is a new function that passes it's string argument
through php_escape_html_entities() (with __auto_escape_flags) before
calling the underlying php_output_write.Use
This functionality allows us to safely use php straight as a
templating language with no template compilation step (as many
other templating libraries have).See the included tests for more usage information.
Exempt Class
It is useful to allow some utility functions and helpers to produce
html and have it passed straight through in the template (without
being double-encoded). We accomplish this by tagging strings
as being HTML.class HtmlString implements JsonSerializable { protected $html = ''; public function __construct($html) { $this->html = $html; } public function __toString() { return (string)$this->html; } public function jsonSerialize() { return $this->html; } }
The auto-escaping system can be configured with an:
__auto_escape_exempt_class="HtmlString"Which allows instances of
HtmlString
to pass straight through a
template without being modified (skipping the html_entities call).
Helper functions can now return html safely and consumers don't have
to care if it is HTML or not because the auto-escaping system knows
what to do.Thanks for your consideration!
Daniel BeardsleyIssue is "Escaping is done on a specific context".
I understand your proposal is focused on HTML escaping. However,
setting names like
__auto_escape_exempt_class
is not good choice. It has to be
__auto_html_escape_exempt_class
at least because it is for HTML escaping.In addition, HTML consists of multiple contexts
- HTML context that requires HTML escape
- URI context that requires URI escape
- JavaScript context, embedded JavaScript strings for example , that
requires JavaScript string escape, etc.
e.g. http://blog.ohgaki.net/javascript-string-escape (Sorry. It's
my blog and written in Japanese.
You may try translation service or you should be able to understand
PHP code at least)- CSS context that requires CSS escape.
e.g. https://developer.mozilla.org/ja/docs/Web/API/CSS/escape- And so on
Dealing HTML context only would be problematic even if it works for many
cases.Escaping must be done depends on context. Multiple contexts may apply
also. HTML context only escaping would not work well.. Applying proper
escapes to variables in HTML is very complex task..Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Wouldn't this __auto_escape setting effectively break libraries that depend
on it being on or off?
The settings was meant to be turned on only during template
rendering. So, yes, if outside code is run during your template
rendering that also uses templating, but is unaware of
auto escaping, things will be double-escaped.
In our project, we've never run into this problem. Most php
libraries we use aren't in the business of producing strings
with php templates. And if they are, we haven't called them
during template rendering.
People often write code to generate HTML like this:
<div>some HTML <?= escape_html($other_text) ?></div> <div>more HTML <?= $other_html ?></div> <!-- etc -->
ob_start()
;
?><?php
$html =ob_get_clean()
;If that code is in a library, it can't be used with this setting enabled.
That could become a real pain point for the whole PHP ecosystem.
Issue is "Escaping is done on a specific context".
I understand your proposal is focused on HTML escaping. However,
setting names like
__auto_escape_exempt_class
is not good choice. It has to be
__auto_html_escape_exempt_class
at least because it is for HTML escaping.
Yes, the ini settings have poor names and can totally be changed.
In addition, HTML consists of multiple contexts
- HTML context that requires HTML escape
- URI context that requires URI escape
- JavaScript context, embedded JavaScript strings for example , that
requires JavaScript string escape, etc.
e.g. http://blog.ohgaki.net/javascript-string-escape (Sorry. It's
my blog and written in Japanese.
You may try translation service or you should be able to understand
PHP code at least)- CSS context that requires CSS escape.
e.g. https://developer.mozilla.org/ja/docs/Web/API/CSS/escape- And so on
You are right. Though not all those problems are serious:
- HTML attributes:
UseENT_QUOTES
so that content is escaped well enough
for use in quoted attributes (still need quotes though) - URI escaping:
Does anyone really use <?= ?> or echo when generating a uri? - Javascript:
Good point, though I would say it's fairly rare to create javascript
code using a php template with variables. The most we ever do
in our app is <?= json_encode($someArray) ?> - Eveything else:
I think the better solution here is to simply let the user control this.
Provide an ini setting that allows a custom output function to be set
so the user could control what happens to unsafe strings and what
the exemptions are. I'm considering doing this. This was html-only
at the beginning because we only created this for html templates
and were able to call the internal php functions directly, ginving
us nearly no performance penalty.
Dealing HTML context only would be problematic even if it works for many cases.
Escaping must be done depends on context. Multiple contexts may apply
also. HTML context only escaping would not work well.. Applying proper
escapes to variables in HTML is very complex task..Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Daniel Beardsley wrote on 21/03/2016 06:35:
You are right. Though not all those problems are serious:
- URI escaping:
Does anyone really use <?= ?> or echo when generating a uri?- Javascript:
Good point, though I would say it's fairly rare to create javascript
code using a php template with variables. The most we ever do
in our app is <?= json_encode($someArray) ?>
I've done both of these in the past (using Smarty, in my case); here's
some example uses:
<a href="/products/<?= $product['category'] ?>/<?= $product['id'] ?>"><?= $product['name'] ?></a>
<script>var debug_session_id = '<?= get_debug_session_id() ?>';</script>Now, I'm not saying there aren't better ways of doing these things, but
people absolutely do it like this, and a hook into something as
fundamental as "echo" can't really rely on "it's quite rare" as an
excuse for not accounting for them.
Regards,
Rowan Collins
[IMSoP]
Honestly, as it stands this is a pretty terrible idea.
- It has a huge potential for introducing BC breaks.
- I have some code somewhere which uses output buffering and echo to
write cached copies of html pages to disk. This would break that. - Writing out html like structures when running as a cli shouldn't be
affected; but it probably would be. - Several systems store html templates in a database and echo them.
Possible breaking change here
- Relying on an ini setting for security is a bad idea: we did that with
magic quotes and look how that turned out. - Ini setting changes at runtime cannot be relied upon. (Think shared
hosting providers who might switch this on(or off) globally and deny
changes to it at runtime) - Already mentioned but there is more to escaping than just HTML
If you decide to pursue this further try using
declare(this_is_a_template=true) at the top of each template file to enable
this badhaviour instead of an ini setting. This then applies on a per file
basis and side steps numerous issues.
Hi,
basically I agree to you while I see the issue, but I don't think this
is the solution (it might have been a solution if introduced 20 years
ago, making it "secure by default" and let users opt-out where needed,
but now might lead to a BC hell now)
But a comment here:
- Relying on an ini setting for security is a bad idea: we did that
with
magic quotes and look how that turned out.
One can't fully compare this: magic_quotes happened before the script
started. Thus the setting was outside the control of the script. With
this feature it is under the control of the script. You can do ini_set()
at the beginning of the script to enforce what your app needs. (while
writing libraries which are generating output in a portable way will be
harder). With magic_quotes the only way where these foreach ($_GET)
{ stripslashes } loops which often had bugs (recursion related)
johannes
"Rowan Collins" wrote in message news:56EFE897.3070804@gmail.com...
Daniel Beardsley wrote on 21/03/2016 06:35:
You are right. Though not all those problems are serious:
- URI escaping:
Does anyone really use <?= ?> or echo when generating a uri?- Javascript:
Good point, though I would say it's fairly rare to create javascript
code using a php template with variables. The most we ever do
in our app is <?= json_encode($someArray) ?>I've done both of these in the past (using Smarty, in my case); here's some
example uses:<a href="/products/<?= $product['category'] ?>/<?= $product['id'] ?>"><?=
<script>var debug_session_id = '<?= get_debug_session_id() ?>';</script>
$product['name'] ?></a>Now, I'm not saying there aren't better ways of doing these things, but
people absolutely do it like this, and a hook into something as fundamental
as "echo" can't really rely on "it's quite rare" as an excuse for not
accounting for them.Regards,
I think the whole idea of trying to execute some application logic after the
data has been sent to the templating engine is wrong. I don't use Smarty but
I do use XSLT as my templating engine. This means that I have to copy all
the relevant data to an XML document before I perform the XSL
transformation. There is no need for any application code to be executed in
the transformation process simply because I executed that code BEFORE it was
copied to the XML document.
You should try executing your application logic BEFORE you send your data to
Smarty, then you won't have to bend the templating system to do something it
was not designed to do.
--
Tony Marston
On Mon, 21 Mar 2016 07:35:46 +0100, Daniel Beardsley daniel@ifixit.com
wrote:
Issue is "Escaping is done on a specific context".
I understand your proposal is focused on HTML escaping. However,
setting names like
__auto_escape_exempt_class
is not good choice. It has to be
__auto_html_escape_exempt_class
at least because it is for HTML escaping.Yes, the ini settings have poor names and can totally be changed.
In addition, HTML consists of multiple contexts
- HTML context that requires HTML escape
- URI context that requires URI escape
- JavaScript context, embedded JavaScript strings for example , that
requires JavaScript string escape, etc.
e.g. http://blog.ohgaki.net/javascript-string-escape (Sorry. It's
my blog and written in Japanese.
You may try translation service or you should be able to understand
PHP code at least)- CSS context that requires CSS escape.
e.g. https://developer.mozilla.org/ja/docs/Web/API/CSS/escape- And so on
You are right. Though not all those problems are serious:
- HTML attributes:
UseENT_QUOTES
so that content is escaped well enough
for use in quoted attributes (still need quotes though)- URI escaping:
Does anyone really use <?= ?> or echo when generating a uri?- Javascript:
Good point, though I would say it's fairly rare to create javascript
code using a php template with variables. The most we ever do
in our app is <?= json_encode($someArray) ?>- Eveything else:
I think the better solution here is to simply let the user control
this.
Provide an ini setting that allows a custom output function to be set
so the user could control what happens to unsafe strings and what
the exemptions are. I'm considering doing this. This was html-only
at the beginning because we only created this for html templates
and were able to call the internal php functions directly, ginving
us nearly no performance penalty.Dealing HTML context only would be problematic even if it works for
many cases.Escaping must be done depends on context. Multiple contexts may apply
also. HTML context only escaping would not work well.. Applying proper
escapes to variables in HTML is very complex task..Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
The escape context could be detected (e.g. Latte template engine supports
context-aware escaping for years –
https://latte.nette.org/en/#toc-context-aware-escaping) but the logic is
quite complex for it to be included in PHP core.
Also relying on ini setting is evil as it may break existing libraries.
You would need to introduce a different way to manage the setting, e.g.
introduce new language construct require_template which would turn this
behavior on for a single included file or new declare(template_mode=1)
which you would need to write at the beginning of each template.
Regards,
Jan Tvrdik
The escape context could be detected (e.g. Latte template engine supports
context-aware escaping for years –
https://latte.nette.org/en/#toc-context-aware-escaping) but the logic is
quite complex for it to be included in PHP core.
Sorry, I have to... TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡
H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝
S̨̥̫͎̭ͯ̿̔̀ͅ
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags,
fwiw.
No, detecting context is not possible unless you are in a very strict and
inflexible context, such as an XML-based templating engine.
Marco Pivetta
On Tue, 22 Mar 2016 13:32:58 +0100, Marco Pivetta ocramius@gmail.com
wrote:
The escape context could be detected (e.g. Latte template engine
supports
context-aware escaping for years –
https://latte.nette.org/en/#toc-context-aware-escaping) but the logic is
quite complex for it to be included in PHP core.Sorry, I have to... TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡
H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝
S̨̥̫͎̭ͯ̿̔̀ͅhttp://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags,
fwiw.No, detecting context is not possible unless you are in a very strict and
inflexible context, such as an XML-based templating engine.Marco Pivetta
Why do you assume that Latte parser is limited by regexp ability to parse
HTML?
Why do you assume that Latte parser is limited by regexp ability to parse
HTML?
Because it is:
https://github.com/nette/latte/blob/19b759b550caaad75ca0dee5f0d85f9ffb59c845/src/Latte/Parser.php#L124
Why do you assume that Latte parser is limited by regexp ability to
parse
HTML?Because it is:
https://github.com/nette/latte/blob/19b759b550caaad75ca0dee5f0d85f9ffb59c845/src/Latte/Parser.php#L124
No. That argument would only be valid, if the parser consisted only of a
single regexp. When you combine PHP code with PCRE you loose nothing from
PHP's turing completeness and it's ability to parse HTML.
That being said I'm not claiming that Latte parser is 100 % correct HTML
parser (neither is most existing HTML parsers in the world). I'm saying
that it could be (to the extend of what's possible to statically analyze,
i.e. if editor could highlight the code properly).
Regards,
Jan Tvrdik
Daniel,
This is a really interesting idea! However, I'm unsure whether it's wise
to bring this feature in without having the community test and validate it
first. Would it be possible to release this as an extension first so we
can gauge its stability and desirability in "the real world"?
As far as the implementation goes, one thing I don't like is the complexity
involved to output unescaped HTML. I'd strongly prefer to do something
like <?=raw($someVariable) ?> than having to instantiate a special class
every time I need to output some raw HTML.
Also, I know some templating systems (like Twig) allow you to specify
different escaping strategies:
http://twig.sensiolabs.org/doc/filters/escape.html Would this proposed
feature have any similar functionality?
Best regards,
Colin
On Tue, 22 Mar 2016 14:01:09 +0100, Craig Duncan php@duncanc.co.uk
wrote:Why do you assume that Latte parser is limited by regexp ability to
parse
HTML?Because it is:
No. That argument would only be valid, if the parser consisted only of a
single regexp. When you combine PHP code with PCRE you loose nothing from
PHP's turing completeness and it's ability to parse HTML.That being said I'm not claiming that Latte parser is 100 % correct HTML
parser (neither is most existing HTML parsers in the world). I'm saying
that it could be (to the extend of what's possible to statically analyze,
i.e. if editor could highlight the code properly).Regards,
Jan Tvrdik
This is a really interesting idea! However, I'm unsure whether it's wise
to bring this feature in without having the community test and validate it
first. Would it be possible to release this as an extension first so we
can gauge its stability and desirability in "the real world"?
It is possible! While the implementation here adds a whole new opcode
and ast kind, the reality is that it could be implemented as an
extension by overriding the ZEND_ECHO opcode (see pecl/operator for
examples of how to do this).
-Sara
Hi Daniel,
When I write scripts that need to behave the same independently of the
value of mbstring.func_overload then I have to remember to be careful
with the functions it affects. It's a drag. I resent having to write
things like mb_strlen($str, '8bit') to get a byte-count knowing that the
scripts would be cleaner and easier to understand if I could dictate the
value of mbstring.func_overload (or if it had never been invented).
Would your proposal have any sort of similar effect? I mean, would it
complicate the task of HTML-escaping output when the scripts need to
work the same regardless of the '__auto_escape' ini setting?
Tom