[RFC] New operator for context-dependent escaping

9 years ago by Thomas Bley — view source

unread

It's not that difficult to write a static analyser that detects
instances of "<?=" not followed by "h(" or "e(" or whatever.

<?* and <?= are same for all applications, h() is user-defined. So you need to write a different analyzer for every application if you use h() or e().

Surely the feature gets most of its value from what you don't need to
do - which is why I think it's bizarre that the current version doesn't
even have a built-in HTML escaper at all.

I think it's no problem to have a follow-up rfc defining some default escapers.

It's not possible for multiple frameworks or libraries to declare
different escape handlers in your proposal, either.

not sure I get your point?

public function render($template) {
set_escape_handler(['SomeClass', 'methodName']);
ob_start();
include $template;
$content = ob_get_clean();
restore_escape_handler();
return $content;
}

You could equally say, "with <?=e()?> you have to define an e()
function". The main effort is remembering to use the right syntax, which
you have to do either way.

the thing here is that people can use <?= without e() and save coding time.

Security cannot be optional, see.

git clone https://github.com/phpmyadmin/phpmyadmin.git
git log | grep -i xss | wc -l
206

Regards
Thomas

Rowan Collins wrote on 24.07.2016 18:39:

<?* $str ?>

instead of

<?=h($str)?>
benefits are using static code analyzers, grep "<?=" for code reviews, etc.

It's not that difficult to write a static analyser that detects
instances of "<?=" not followed by "h(" or "e(" or whatever.

Having function names with single characters is bad taste and only useful for
obfuscating.

And having a token "*" that calls a different function in every
application is somehow less obfuscated?

Using multiple frameworks or libraries, it's not possible to redeclare
functions with the same name.

It's not possible for multiple frameworks or libraries to declare
different escape handlers in your proposal, either.

The big difference is:
With <?*, you have to define an escaping function, with <?= it's optional.

You could equally say, "with <?=e()?> you have to define an e()
function". The main effort is remembering to use the right syntax, which
you have to do either way.

Surely the feature gets most of its value from what you don't need to
do - which is why I think it's bizarre that the current version doesn't
even have a built-in HTML escaper at all.

Regards,

Rowan Collins
[IMSoP]

9 years ago by Rowan Collins — view source

unread

It's not that difficult to write a static analyser that detects
instances of "<?=" not followed by "h(" or "e(" or whatever.
<?* and <?= are same for all applications, h() is user-defined. So you need to write a different analyzer for every application if you use h() or e().

This argument is only valid if the RFC includes an implementation, not
just a syntax. As currently proposed, not even the syntax would be the
same for all applications, as part of it is hand-waved as up to whoever
writes the escape callback.

Surely the feature gets most of its value from what you don't need to
do - which is why I think it's bizarre that the current version doesn't
even have a built-in HTML escaper at all.
I think it's no problem to have a follow-up rfc defining some default escapers.

In my opinion, they are central to the feature, not an optional extra.

It's not possible for multiple frameworks or libraries to declare
different escape handlers in your proposal, either.
not sure I get your point?

public function render($template) {
set_escape_handler(['SomeClass', 'methodName']);
ob_start();
include $template;
$content = ob_get_clean();
restore_escape_handler();
return $content;
}

OK, so I can dynamically redefine the same syntax to mean different
things at different times, within the same application. I'm not entirely
sure that's a particularly good thing.

You could equally say, "with <?=e()?> you have to define an e()
function". The main effort is remembering to use the right syntax, which
you have to do either way.
the thing here is that people can use <?= without e() and save coding time.

People can use <?= instead of <?* and save learning the difference. Lazy
people will still be lazy.

Yes, there's a very slight effort saved by it being shorter, but at the
cost of a minimum PHP version, an extra thing to learn, etc.

Security cannot be optional, see.

Then why is absolutely everything in the current RFC optional and
configurable to the Nth degree?

Regards,

--
Rowan Collins
[IMSoP]

9 years ago by Thomas Bley — view source

unread

Then why is absolutely everything in the current RFC optional and
configurable to the Nth degree?

It's one handler: set_escape_handler() (N=1)

Currently, every framework has it's own methods for escaping. To get this together, set_escape_handler() is a good choice, similar to set_error_handler().

OK, so I can dynamically redefine the same syntax to mean different
things at different times, within the same application. I'm not entirely
sure that's a particularly good thing.

It's the same thing with set_error_handler(), set_exception_handler(), spl_autoload_register(), error_reporting(), etc., this concept is proven to work.

In my opinion, they are central to the feature, not an optional extra.

maybe you can join the rfc and provide the implementation?

Regards
Thomas

Rowan Collins wrote on 24.07.2016 19:41:

It's not that difficult to write a static analyser that detects
instances of "<?=" not followed by "h(" or "e(" or whatever.
<?* and <?= are same for all applications, h() is user-defined. So you need to
write a different analyzer for every application if you use h() or e().

This argument is only valid if the RFC includes an implementation, not
just a syntax. As currently proposed, not even the syntax would be the
same for all applications, as part of it is hand-waved as up to whoever
writes the escape callback.

Surely the feature gets most of its value from what you don't need to
do - which is why I think it's bizarre that the current version doesn't
even have a built-in HTML escaper at all.
I think it's no problem to have a follow-up rfc defining some default
escapers.

In my opinion, they are central to the feature, not an optional extra.

It's not possible for multiple frameworks or libraries to declare
different escape handlers in your proposal, either.
not sure I get your point?

public function render($template) {
set_escape_handler(['SomeClass', 'methodName']);
ob_start();
include $template;
$content = ob_get_clean();
restore_escape_handler();
return $content;
}

OK, so I can dynamically redefine the same syntax to mean different
things at different times, within the same application. I'm not entirely
sure that's a particularly good thing.

You could equally say, "with <?=e()?> you have to define an e()
function". The main effort is remembering to use the right syntax, which
you have to do either way.
the thing here is that people can use <?= without e() and save coding time.

People can use <?= instead of <?* and save learning the difference. Lazy
people will still be lazy.

Yes, there's a very slight effort saved by it being shorter, but at the
cost of a minimum PHP version, an extra thing to learn, etc.

Security cannot be optional, see.

Then why is absolutely everything in the current RFC optional and
configurable to the Nth degree?

Regards,

--
Rowan Collins
[IMSoP]

9 years ago by Rowan Collins — view source

unread

Then why is absolutely everything in the current RFC optional and
configurable to the Nth degree?
It's one handler: set_escape_handler() (N=1)

Currently, every framework has it's own methods for escaping. To get this together, set_escape_handler() is a good choice, similar to set_error_handler().

It's not set_escape_handler() that I'm concerned about, it's how you
actually use it in the templates. At the moment, the only thing the RFC
actually asserts about the escape handler is "it's a function with two
arguments". Frameworks are free to write all sorts of weird shit:

<?* $foo, 'html' ?>
<?* 'html', $foo ?>
<?* $foo, 'text/html' ?>
<?* $foo, [$this, 'escaper'] ?>
<?* $foo, $this ?>
<?* $foo, 'js | html' ?>
<?* $foo, 'js + html' ?>
<?* $foo, ['js', 'html'] ?>
<?* $foo, '[js, html]' ?>
<?* $foo, '{js, html}' ?>
<?* $foo, 'html(UTF-8)' ?>
<?* $foo, 'UTF-8' ?>
<?* [$foo, $bar, $baz], 'ul > li' ?>
etc
etc

If you want to provide something that will be the same in all
frameworks, then you've got to actually provide it.

OK, so I can dynamically redefine the same syntax to mean different
things at different times, within the same application. I'm not entirely
sure that's a particularly good thing.
It's the same thing with set_error_handler(), set_exception_handler(), spl_autoload_register(), error_reporting(), etc., this concept is proven to work.

OK, fair enough. I'm not sure it's really a killer feature, though. The
fact that I can't easily redefine "function e()" is no more of a problem
here than anywhere else in the language.

In my opinion, they are central to the feature, not an optional extra.
maybe you can join the rfc and provide the implementation?

The implementation I'm talking about is hardly complex, just some
default arguments to htmlspecialchars(). Or that would be the case, if
we didn't need to provide one escape callback to handle all possible
arguments, rather than registering for a specific strategy name.

Regards,

--
Rowan Collins
[IMSoP]

9 years ago by Thomas Bley — view source

unread

Frameworks are free to write all sorts of weird shit:

with set_escape_handler(), the "weird shit" is in one place and can be quickly verified. Now the "weird shit" is spread over all templates.
Normally the problem is not fixing the frameworks, it's most work to fix code that is using the frameworks in a wrong way.

Regards
Thomas

Rowan Collins wrote on 24.07.2016 20:29:

Then why is absolutely everything in the current RFC optional and
configurable to the Nth degree?
It's one handler: set_escape_handler() (N=1)

Currently, every framework has it's own methods for escaping. To get this
together, set_escape_handler() is a good choice, similar to
set_error_handler().

It's not set_escape_handler() that I'm concerned about, it's how you
actually use it in the templates. At the moment, the only thing the RFC
actually asserts about the escape handler is "it's a function with two
arguments". Frameworks are free to write all sorts of weird shit:

<?* $foo, 'html' ?>
<?* 'html', $foo ?>
<?* $foo, 'text/html' ?>
<?* $foo, [$this, 'escaper'] ?>
<?* $foo, $this ?>
<?* $foo, 'js | html' ?>
<?* $foo, 'js + html' ?>
<?* $foo, ['js', 'html'] ?>
<?* $foo, '[js, html]' ?>
<?* $foo, '{js, html}' ?>
<?* $foo, 'html(UTF-8)' ?>
<?* $foo, 'UTF-8' ?>
<?* [$foo, $bar, $baz], 'ul > li' ?>
etc
etc

If you want to provide something that will be the same in all
frameworks, then you've got to actually provide it.

OK, so I can dynamically redefine the same syntax to mean different
things at different times, within the same application. I'm not entirely
sure that's a particularly good thing.
It's the same thing with set_error_handler(), set_exception_handler(),
spl_autoload_register(), error_reporting(), etc., this concept is proven to
work.

OK, fair enough. I'm not sure it's really a killer feature, though. The
fact that I can't easily redefine "function e()" is no more of a problem
here than anywhere else in the language.

In my opinion, they are central to the feature, not an optional extra.
maybe you can join the rfc and provide the implementation?

The implementation I'm talking about is hardly complex, just some
default arguments to htmlspecialchars(). Or that would be the case, if
we didn't need to provide one escape callback to handle all possible
arguments, rather than registering for a specific strategy name.

Regards,

--
Rowan Collins
[IMSoP]