[RFC] New operator for context-dependent escaping

8 years ago by Thomas Bley — view source

unread

if I see it correctly, this is just a framework for defining callbacks to a escaping operator, without a implementation of "html" and "js"?
Not sure if this helps.

Regards
Thomas

Michael Vostrikov wrote on 16.07.2016 17:33:

Hello.
I have created RFC about context-dependent escaping operator.
https://wiki.php.net/rfc/escaping_operator

Initial discussion was here: http://marc.info/?t=146619199100001

At first, I wanted to add a call of special function like
escaper_call($str, $context), which performs html-escaping by default and
can be replaced with a separate extension for extended work with contexts.
But then I figured out better variant.

Main idea.

Operator has the following form:

<?* $str ?>
<?* $str, 'html' ?>
<?* $str, 'js | html' ?>

Both expressions can be any type which can be converted to string. Second
expression is optional.

I changed '~' sign because it is not present on keyboard layouts for some
european languages. And also it does not give any error on previous
versions of PHP with short tags enabled, because this is recognized as
bitwise operation.

Operator is compiled into the following AST:

echo PHPEscaper::escape(first_argument, second_argument);

Don't you forget that we already have special operator for one function?
Backticks and shell_exec(). New operator is compiled very similar to it.

There is a default implementation of the class 'PHPEscaper'. It has 4
static methods:

PHPEscaper::escape($string, $context = 'html');
PHPEscaper::registerHandler($context, $escaper_function);
PHPEscaper::unregisterHandler($context);
PHPEscaper::getHandlers();

Method PHPEscaper::escape($string, $context) splits $context by '|'
delimiter, all parts are trimmed, and then calls registered handler for
every context in a chain.
'html' is default value for context, and it has special handling.
If there is no handler for 'html' context, it calls
htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE);

We can use it like this:

<?php
// anywhere in application
PHPEscaper::registerHandler('html', [MyEscaper, 'escapeHtml']);
PHPEscaper::registerHandler('js', function($str) { return
json_encode($str); });
?>
<?* $str, 'js | html' ?>

And even more.
In the AST, 'PHPEscaper' is registered as not fully qualified name
(ZEND_NAME_NOT_FQ).
This allows us to use namespaces and autoloading:

<?php use MyEscaper as PHPEscaper; ?>
<?* $str, 'js | html' ?>

MyEscaper::escape($str, 'js | html') will be called.

In this way we can have autoloading, multiple contexts, HTML escaping by
default, and full control and customization.
This is not an operator for one function, just there is one default
implementation.

My first goal is to draw the attention on the problem with a security and
HTML escaping. Exact implementation is secondary thing.

This small change can really improve a security and make development easier
in many applications.

How do you think, maybe also it would be good to create some official poll
about this feature and to know community opinion about it?

8 years ago by Michael Vostrikov — view source

unread

if I see it correctly, this is just a framework for defining callbacks to
a escaping operator, without a implementation of "html" and "js"?
Not sure if this helps.
There is a default escaping for HTML. If there is no registered handler for
'html' context, it calls htmlspecialchars($str, ENT_QUOTES |
ENT_SUBSTITUTE).

8 years ago by Dan Ackroyd — view source

unread

Hi Michael,

Hello.
I have created RFC about context-dependent escaping operator.
https://wiki.php.net/rfc/escaping_operator

Initial discussion was here: http://marc.info/?t=146619199100001

I'm more than slightly sceptical about this RFC's chances, but to give
you some feedback, this bit is a bad pattern:

bool PHPEscaper::registerHandler(string $context, callable $escaper_function)

Registers new handler for a given context. If handler for this context is already
registered, it returns false, on success registration returns true.

It would be better to return the previous handler, or NULL if one
wasn't set, in the same way as set_error_handler() does, and for the
same reasons.

Also, shouldn't these just be functions? Why is there a class involved
when it only has static functions/state?

cheers
Dan

8 years ago by Michael Vostrikov — view source

unread

It would be better to return the previous handler, or NULL if one
wasn't set, in the same way as set_error_handler() does, and for the same
reasons.

Well, maybe you are right.
But I thought, this is not a use case, usually we don't need multiple
handlers for certain context. I.e. we don't need to encode single quotes in
HTML as ''' in one template and as ''' in another.
Use case would be the following: if there is no handler for 'my_context',
register this handler. If some library really need to replace existing
handler of application, it can call unregisterHandler() directly.
Also in Twig there is no such behavior, and applications with it work good.
I tried not to invent new mechanisms of usage.

Also, shouldn't these just be functions? Why is there a class involved
when it only has static functions/state?

I didn't want to add many related items to a global namespace. And with
class we can use autoloading and fully replace an implementation, unlike
functions. It is possible to disable extension and use own class.

8 years ago by Michael Vostrikov — view source

unread

Is there any specific reason why we're using a class instead of
functions to register a callable the same way it's done for exception
handling or error handling? Hacking non FQN resolutions to inject another
escaper ...

I would not call it 'hacking') This is exaclty the same as if we write
'PHPEscaper::escape()' manually in PHP context.
I didn't want to add many related items to a global namespace, and with
class it is possible to use autoloading.

Is there any rationale why we're using strings separated by '|' to pass

context instead of an array? Ex.:
<?* $str, ['js', 'html'] ?>

Yes, I thought about array. It can be added on a par with string. Strings
just looks more similar to escaping in template engines.

I think the default implementation should throw a more specific

exception in case of unknown context (\PHP\EscapeException?)

This line on your patch is unnecessary

You are right, thanks. This commit is a concept, if RFC will be accepted, I
will prepare a patch with more correct code.

8 years ago by Dan Ackroyd — view source

unread

If some library really need to replace existing handler of application, it can call unregisterHandler() directly.

But then there is no way to restore the previous handler.

usually we don't need multiple handlers for certain context.

Stuff that is added to the core needs to cover all use-cases, not just
the typical ones.

I tried not to invent new mechanisms of usage.

Then please copy the set_error_handler behaviour.

cheers
Dan

8 years ago by Marcio Almada — view source

unread

2016-07-18 16:29 GMT-04:00 Dan Ackroyd danack@basereality.com:

On 17 July 2016 at 04:49, Michael Vostrikov michael.vostrikov@gmail.com
wrote:

If some library really need to replace existing handler of application,
it can call unregisterHandler() directly.

But then there is no way to restore the previous handler.

usually we don't need multiple handlers for certain context.

Stuff that is added to the core needs to cover all use-cases, not just
the typical ones.

In what real world case multiple escape handlers for the same context would
be a thing? I only see multiple handlers for the same context as risky
since double escaping can, in some contexts, lead to security issues.

I tried not to invent new mechanisms of usage.

Then please copy the set_error_handler behaviour.

100% agree with that.

cheers

Dan

8 years ago by Marcio Almada — view source

unread

2016-07-16 11:33 GMT-04:00 Michael Vostrikov michael.vostrikov@gmail.com:

Hello.
I have created RFC about context-dependent escaping operator.
https://wiki.php.net/rfc/escaping_operator

Initial discussion was here: http://marc.info/?t=146619199100001

Hi,

A few possible RFC improvements:

Is there any specific reason why we're using a class instead of
functions to register a callable the same way it's done for exception
handling or error handling? Hacking non FQN resolutions to inject another
escaper implementation as in "<?php use MyEscaper as PHPEscaper; ?>" was a
creative idea but it seems inconsistent with the preferable "PHP way" to
handle these edge cases.

2 . Is there any rationale why we're using strings separated by '|' to pass
context instead of an array? Ex.:

<?* $str, ['js', 'html'] ?>

I think the default implementation should throw a more specific
exception in case of unknown context (\PHP\EscapeException?) -
currently it's throwing \Exception. The reason for that is because log
level and handling may be more severe in case of failing escaping.
Userland implementations should be encouraged to throw the same specific
exception too (documentation could enforce that).
This line on your patch is unnecessary:
https://github.com/michael-vostrikov/php-src/commit/571cd7c88488a08c82b10f0c3af559881f1a2951#diff-7eff82c2c5b45db512a9dc49fb990bb8R274

In general this RFC looks better than the expected considering the previous
discussions. Please, keep improving :)

Best,
Márcio.

8 years ago by lauri.kentta@gmail.com — view source

unread

2 . Is there any rationale why we're using strings separated by '|' to
pass
context instead of an array? Ex.:

<?* $str, ['js', 'html'] ?>

Multiple arguments would make the syntax even cleaner:
<?* $str, 'js', 'html' ?>

--
Lauri Kenttä

8 years ago by Michael Vostrikov — view source

unread

Multiple arguments would make the syntax even cleaner:
<?* $str, 'js', 'html' ?>

I thought about it. Multiple arguments do not allow runtime modification
(and make the parser more complex).
Something like this:
<?php
$context = [];
if ($field->name == 'url') $context[] = 'url';
$context[] = 'html';
?>

<div data-my-attr="<?* $field->value, implode('|', $context) ?>"></div>

8 years ago by Niklas Keller — view source

unread

2016-07-17 12:35 GMT+02:00 Michael Vostrikov michael.vostrikov@gmail.com:

Multiple arguments would make the syntax even cleaner:
<?* $str, 'js', 'html' ?>

I thought about it. Multiple arguments do not allow runtime modification
(and make the parser more complex).
Something like this:
<?php
$context = [];
if ($field->name == 'url') $context[] = 'url';
$context[] = 'html';
?>
<div data-my-attr="<?* $field->value, implode('|', $context) ?>"></div>

Context should be defined where the variable is printed. Otherwise you move
the variable from HTML text to an attribute or add it somewhere else and
the context doesn't match anymore.

Regards, Niklas

8 years ago by Michael Vostrikov — view source

unread

Context should be defined where the variable is printed. Otherwise you
move the variable from HTML text to an attribute or add it somewhere else
and the context doesn't match anymore.

Well, maybe, but HTML is external context and it can be combined with other
contexts depending on task. We can not know all possible tasks. Single
variable is just more flexible.

8 years ago by Rasmus Schultz — view source

unread

I've read your RFC, and I think this a strange feature.

All it is, really, is a registry for functions, and syntactic sugar for
calling those functions - it's unnecessary, it's more global state you have
to manage, and it's the kind of superficial convenience that will end up
breeding more complexity.

What's also strange, is the ability to call functions in this registry
hinges on syntax. What if I want to call the registered functions from
within code?

`ob_start()`;
?><* $text *>&lt;?
$html = `ob_get_clean()`;

Yikes.

To quote a few phrases from the RFC:

Both variants <?= h($something) ?> and <?= $something ?> work good.

This is so true - and the whole syntactic convenience line of thinking
really should end with that.

Also there is a problem with function autoloading.

I maintain that this is the real problem, and perhaps the only problem -
all this RFC does, is provide a stop-gap solution. What we should really be
talking about, is implementing the RFC that addresses the existing gap in
the the existing feature of the language.

Your arguments don't make sense to me. It's somehow easier to choose
between two different characters * and ? versus electing to call a function
or not? I don't see how - it still requires an active choice, and I don't
believe there's any (sound) way around that.

All this RFC changes is the syntax - not the problem.

Addition of a feature like this will affect even those who don't use it -
we all collaborate in teams, and most of us contribute to open source
projects... a feature like this will bring global state, side-effects and
many other interesting problems even to those who don't elect to use it,
when they inherit or consume code that does.

The poll doesn't make a whole lot of sense either, because you're asking
specifically about the proposed feature, rather than asking in general
about the problem. This doesn't prompt people to think about the problem -
it prompts them to consider the proposed solution. It's easy enough to look
at this on the surface and think "sure, that solves it" - reasoning about
the impact on the language, or deeper problems not directly relating to
this on the surface, requires more of an involvement than just a quick
click on a radio button.

More than 90% of output data - is data from DB and must be HTML-encoded

Yet, you argue we need a function registry for all kinds of other escape
operations to address the other 10%. I can't follow this line of thinking.
If the 90% use case is HTML escaping (with UTF-8 encoding, as is likely
true) then maybe I could accept the addition of syntax just for that.
Maybe.

I would still be much more concerned about the limited usefulness of
functions in general, which could be more generally addressed by solving
autoloading.

I view this RFC as a huge distraction and, if implemented, addressing that
one use-case for functions (templates) we're more likely to put off the
deeper issues for even longer.

Please, let's focus on improving the language in general - rather than
improving one isolated use-case.

On Sat, Jul 16, 2016 at 5:33 PM, Michael Vostrikov <
michael.vostrikov@gmail.com> wrote:

Hello.
I have created RFC about context-dependent escaping operator.
https://wiki.php.net/rfc/escaping_operator

Initial discussion was here: http://marc.info/?t=146619199100001

At first, I wanted to add a call of special function like
escaper_call($str, $context), which performs html-escaping by default and
can be replaced with a separate extension for extended work with contexts.
But then I figured out better variant.

Main idea.

Operator has the following form:

<?* $str ?>
<?* $str, 'html' ?>
<?* $str, 'js | html' ?>

Both expressions can be any type which can be converted to string. Second
expression is optional.

I changed '~' sign because it is not present on keyboard layouts for some
european languages. And also it does not give any error on previous
versions of PHP with short tags enabled, because this is recognized as
bitwise operation.

Operator is compiled into the following AST:

echo PHPEscaper::escape(first_argument, second_argument);

Don't you forget that we already have special operator for one function?
Backticks and shell_exec(). New operator is compiled very similar to it.

There is a default implementation of the class 'PHPEscaper'. It has 4
static methods:

PHPEscaper::escape($string, $context = 'html');
PHPEscaper::registerHandler($context, $escaper_function);
PHPEscaper::unregisterHandler($context);
PHPEscaper::getHandlers();

Method PHPEscaper::escape($string, $context) splits $context by '|'
delimiter, all parts are trimmed, and then calls registered handler for
every context in a chain.
'html' is default value for context, and it has special handling.
If there is no handler for 'html' context, it calls
htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE);

We can use it like this:

<?php
// anywhere in application
PHPEscaper::registerHandler('html', [MyEscaper, 'escapeHtml']);
PHPEscaper::registerHandler('js', function($str) { return
json_encode($str); });
?>
<?* $str, 'js | html' ?>

And even more.
In the AST, 'PHPEscaper' is registered as not fully qualified name
(ZEND_NAME_NOT_FQ).
This allows us to use namespaces and autoloading:

<?php use MyEscaper as PHPEscaper; ?>
<?* $str, 'js | html' ?>

MyEscaper::escape($str, 'js | html') will be called.

In this way we can have autoloading, multiple contexts, HTML escaping by
default, and full control and customization.
This is not an operator for one function, just there is one default
implementation.

My first goal is to draw the attention on the problem with a security and
HTML escaping. Exact implementation is secondary thing.

This small change can really improve a security and make development easier
in many applications.

How do you think, maybe also it would be good to create some official poll
about this feature and to know community opinion about it?

8 years ago by Michael Vostrikov — view source

unread

All it is, really, is a registry for functions, and syntactic sugar for
calling those functions - it's unnecessary, it's more global state you have
to manage and it's the kind of superficial convenience that will end up
breeding more complexity.

Registry of functions - is exactly how escaping is performed in Symfony and
Twig.
https://github.com/symfony/symfony/blob/f29d46f29b91ea5c30699cf6bdb8e65545d1dd26/src/Symfony/Component/Templating/PhpEngine.php#L421
https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L1039

What's also strange, is the ability to call functions in this registry

hinges on syntax. What if I want to call the registered functions from
within code?
ob_start();
?><* $text *><?
$html = ob_get_clean();

Sorry, I don't understand what do you mean in this example. You can call
your escapers by name or by callable value from array of handlers.

<?php

function myHtmlEscaper($str) {
    return htmlspecialchars($str, `ENT_QUOTES` | ENT_HTML5 |

ENT_DISALLOWED | ENT_SUBSTITUTE);
}

PHPEscaper::registerHandler('html', myHtmlEscaper);
$text = '"Test"';

// --------

`ob_start()`;
?><* $text *>&lt;?         // do you mean a call of escaper here?
$html = `ob_get_clean()`;
var_dump($html);

// string(11) "<* $text *>"

// --------

`ob_start()`;
?>&lt;?* $text ?>&lt;?
$html = `ob_get_clean()`;
var_dump($html);

// string(16) "&quot;Test&quot;"

// --------

$escapers = PHPEscaper::getHandlers();
$htmlEscaperCallable = $escapers['html'];
`ob_start()`;
echo $htmlEscaperCallable($text);
$html = `ob_get_clean()`;
var_dump($html);

// string(16) "&quot;Test&quot;"

?>

Both variants <?= h($something) ?> and <?= $something ?> work good.
This is so true - and the whole syntactic convenience line of thinking
really should end with that.

Wrong and unsafe variant should not work good.

Also there is a problem with function autoloading.
I maintain that this is the real problem, and perhaps the only problem -
all this RFC does, is provide a stop-gap solution.

This RFC is not related to function autoloading. It just does not have this
problem. The code becomes the same as if we will write PHPEscaper::escape()
manually. In the static calls there is no problem with autoloading.

It's somehow easier to choose between two different characters * and =

versus electing to call a function or not?

Unsafe variant 'not to call a function' is a short subset of safe variant
'call a function'. Safe variant requires additional actions. With new
operator safe variant is as easy as unsafe.
And we must not choose, operator <?* ?> must be used everywhere, except
1-2% of cases where we have ready HTML.
And if we accidentally use it for HTML, this will not be an XSS, it just
will show double-encoded content, and this will be noticeable.

All this RFC changes is the syntax - not the problem.

Ok, what do you think is the problem? As I think, the problem is correct
HTML escaping and XSS. And it can be solved the same way as in template
engines. I don't suggest to bring the whole template engine into PHP, only
escaping mechanism. Function autoloading - is another problem.

Addition of a feature like this will affect even those who don't use it

It does not affect any existing code. It is not a replacement for <?= ?>
operator.

a feature like this will bring global state, side-effects and many other

interesting problems

Default implementation of PHPEscaper is not required. This is possible to
fully remove it and use custom PHPEscaper written on PHP. Or just don't use
this operator and don't worry about escaping.
Ok, could you please give an example of the problem?

when they inherit or consume code that does

I think this is the same as if they inherit or consume code that uses Twig
or Smarty.

you're asking specifically about the proposed feature, rather than asking

in general about the problem

The proposed feature solves the problem with HTML escaping. I asked about
presence of problem and about a solution.
Also, in the previous discussion I got the first answer that "Main issue
with this kind of proposals is that escaping is context-dependent". This
issue is mentioned in many other discussions. And I suggested a solution
for this problem too.

And that's why I asked about an official poll in the first message of this
discussion. This is a serious change and we need to know community opinion
about it.

reasoning about the impact on the language, or deeper problems requires

more of an involvement

And RFC is intended for this, isn't it? Could you give an example of impact
which can bring the problems?

More than 90% of output data - is data from DB and must be HTML-encoded
Yet, you argue we need a function registry for all kinds of other escape
operations to address the other 10%

Hm, wait please. Initially I suggested an operator specially for HTML
escaping. I tried to demonstrate that this is special context and it
deserve its own operator. I got the answer that there are many other
contexts and making an operator for one context is a bad idea. Ok, I
suggested the solution which allows and HTML escaping, and usage of other
contexts. Let's define, what is more suitable - many contexts or one
context.

you argue we need a function registry

Function registry is just a default implementation of PHPEscaper class. It
can be used 'out of the box' and provides HTML escaping by default. Default
implementation can fully be removed from the project, and replaced with
custom implementation - e.g. with limited set of methods. There is just one
static call, which does not differ from function call, but allows
autoloading.

which could be more generally addressed by solving autoloading

I can only repeat that the problem is not that we don't have a function. We
have a function and can write own functions. Maybe there are little issues
with direct usage of static class method, but it can be solved by defining
a function in global namespace, which will call this method, as it is done
in some frameworks.

Please, let's focus on improving the language in general - rather than

improving one isolated use-case.

Why do you call 90% of output data a 'one isolated use-case'?) I can only
mention here the words which were shown to me when I created the RFC:

Quoting Rasmus:
PHP is and should remain:
1) a pragmatic web-focused language
2) a loosely typed language
3) a language which caters to the skill-levels and platforms of a wide
range of users

HTML escaping is a very pragmatic task.

8 years ago by Rasmus Schultz — view source

unread

Registry of functions - is exactly how escaping is performed in Symfony
and Twig.

For one, that does not mean it's a good idea.

For another, the registry in Symfony (at least, I don't know about Twig) is
inside an instance - it's not global state.

Do you get my point that a reference to a closure is state? And if it's
global state, that's extremely bad - the entire PHP community is fighting
like hell to avoid that, with PSR-7 and layers of abstraction on top of,
well, everything, in order to make code testable.

Catering to different skill levels is no excuse.

HTML escaping is, yes, a very pragmatic task - it's also solved already,
with htmlspecialchars() ... the main problem you appear to be solving, is
that htmlspecialchars() is too long and ugly and inconvenient, which, okay,
it is - but adding a global registry for that is overkill, and the whole
problem would go away if you could simply autoload functions:

<h1>&lt;?= html($title) ?></h1>

That's not ugly or inconvenient. The only problem is you can't package your
html() function (and install it with e.g. Composer) because PHP can't
autoload functions.

Functions such as HTML escaping, or any other kind of escaping, do not
belong in a registry - you shouldn't swap out those functions at all, they
need to work precisely as specified, so the caller knows precisely what the
result it, because only the caller can know the context and intent.

I'm not going to go much deeper into it than that, sorry - I don't have
time...

On Sun, Jul 17, 2016 at 4:47 PM, Michael Vostrikov <
michael.vostrikov@gmail.com> wrote:

All it is, really, is a registry for functions, and syntactic sugar for
calling those functions - it's unnecessary, it's more global state you
have
to manage and it's the kind of superficial convenience that will end up
breeding more complexity.

Registry of functions - is exactly how escaping is performed in Symfony and
Twig.

https://github.com/symfony/symfony/blob/f29d46f29b91ea5c30699cf6bdb8e65545d1dd26/src/Symfony/Component/Templating/PhpEngine.php#L421

https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L1039

What's also strange, is the ability to call functions in this registry

hinges on syntax. What if I want to call the registered functions from
within code?
ob_start();
?><* $text *><?
$html = ob_get_clean();

Sorry, I don't understand what do you mean in this example. You can call
your escapers by name or by callable value from array of handlers.

<?php
function myHtmlEscaper($str) {
    return htmlspecialchars($str, `ENT_QUOTES` | ENT_HTML5 |
ENT_DISALLOWED | ENT_SUBSTITUTE);
}
PHPEscaper::registerHandler('html', myHtmlEscaper);
$text = '"Test"';

// --------

`ob_start()`;
?><* $text *>&lt;?         // do you mean a call of escaper here?
$html = `ob_get_clean()`;
var_dump($html);

// string(11) "<* $text *>"

// --------

`ob_start()`;
?>&lt;?* $text ?>&lt;?
$html = `ob_get_clean()`;
var_dump($html);

// string(16) "&quot;Test&quot;"

// --------

$escapers = PHPEscaper::getHandlers();
$htmlEscaperCallable = $escapers['html'];
`ob_start()`;
echo $htmlEscaperCallable($text);
$html = `ob_get_clean()`;
var_dump($html);

// string(16) "&quot;Test&quot;"
?>

Both variants <?= h($something) ?> and <?= $something ?> work good.
This is so true - and the whole syntactic convenience line of thinking
really should end with that.

Wrong and unsafe variant should not work good.

Also there is a problem with function autoloading.
I maintain that this is the real problem, and perhaps the only problem -
all this RFC does, is provide a stop-gap solution.

This RFC is not related to function autoloading. It just does not have this
problem. The code becomes the same as if we will write PHPEscaper::escape()
manually. In the static calls there is no problem with autoloading.

It's somehow easier to choose between two different characters * and =

versus electing to call a function or not?

Unsafe variant 'not to call a function' is a short subset of safe variant
'call a function'. Safe variant requires additional actions. With new
operator safe variant is as easy as unsafe.
And we must not choose, operator <?* ?> must be used everywhere, except
1-2% of cases where we have ready HTML.
And if we accidentally use it for HTML, this will not be an XSS, it just
will show double-encoded content, and this will be noticeable.

All this RFC changes is the syntax - not the problem.

Ok, what do you think is the problem? As I think, the problem is correct
HTML escaping and XSS. And it can be solved the same way as in template
engines. I don't suggest to bring the whole template engine into PHP, only
escaping mechanism. Function autoloading - is another problem.

Addition of a feature like this will affect even those who don't use it

It does not affect any existing code. It is not a replacement for <?= ?>
operator.

a feature like this will bring global state, side-effects and many other

interesting problems

Default implementation of PHPEscaper is not required. This is possible to
fully remove it and use custom PHPEscaper written on PHP. Or just don't use
this operator and don't worry about escaping.
Ok, could you please give an example of the problem?

when they inherit or consume code that does

I think this is the same as if they inherit or consume code that uses Twig
or Smarty.

you're asking specifically about the proposed feature, rather than asking

in general about the problem

The proposed feature solves the problem with HTML escaping. I asked about
presence of problem and about a solution.
Also, in the previous discussion I got the first answer that "Main issue
with this kind of proposals is that escaping is context-dependent". This
issue is mentioned in many other discussions. And I suggested a solution
for this problem too.

And that's why I asked about an official poll in the first message of this
discussion. This is a serious change and we need to know community opinion
about it.

reasoning about the impact on the language, or deeper problems requires

more of an involvement

And RFC is intended for this, isn't it? Could you give an example of impact
which can bring the problems?

More than 90% of output data - is data from DB and must be HTML-encoded
Yet, you argue we need a function registry for all kinds of other escape
operations to address the other 10%

Hm, wait please. Initially I suggested an operator specially for HTML
escaping. I tried to demonstrate that this is special context and it
deserve its own operator. I got the answer that there are many other
contexts and making an operator for one context is a bad idea. Ok, I
suggested the solution which allows and HTML escaping, and usage of other
contexts. Let's define, what is more suitable - many contexts or one
context.

you argue we need a function registry

Function registry is just a default implementation of PHPEscaper class. It
can be used 'out of the box' and provides HTML escaping by default. Default
implementation can fully be removed from the project, and replaced with
custom implementation - e.g. with limited set of methods. There is just one
static call, which does not differ from function call, but allows
autoloading.

which could be more generally addressed by solving autoloading

I can only repeat that the problem is not that we don't have a function. We
have a function and can write own functions. Maybe there are little issues
with direct usage of static class method, but it can be solved by defining
a function in global namespace, which will call this method, as it is done
in some frameworks.

Please, let's focus on improving the language in general - rather than

improving one isolated use-case.

Why do you call 90% of output data a 'one isolated use-case'?) I can only
mention here the words which were shown to me when I created the RFC:

Quoting Rasmus:
PHP is and should remain:
1) a pragmatic web-focused language
2) a loosely typed language
3) a language which caters to the skill-levels and platforms of a wide
range of users

HTML escaping is a very pragmatic task.

8 years ago by Marcio Almada — view source

unread

2016-07-18 16:03 GMT-04:00 Rasmus Schultz rasmus@mindplay.dk:

Registry of functions - is exactly how escaping is performed in Symfony
and Twig.

For one, that does not mean it's a good idea.

For another, the registry in Symfony (at least, I don't know about Twig) is
inside an instance - it's not global state.

Do you get my point that a reference to a closure is state? And if it's
global state, that's extremely bad - the entire PHP community is fighting
like hell to avoid that, with PSR-7 and layers of abstraction on top of,
well, everything, in order to make code testable.

Catering to different skill levels is no excuse.

Just a small rant on the global state discussion.

Even though the API for *_exception_handler() and *_exception_handler()
manage global state, this is not the biggest of the issues if we are
talking about language level hooks. If there is something that should be
allowed to manage global state by design is the programming language you're
working on (when you declare a function foo(){}, you're creating state
somewhere). The point is that it should be possible to manage the global
state with as much isolation as possible. So code like the following should
be possible:

class MyTemplatingEngineRender {
function render(Template $template, array $data) {
$old_handlers = set_escape_handlers(['html' => $this->htmlEscaper,
'xml' => $this->xmlEscaper, 'js' => $this->jsEscaper]);
// logic to render the templates and get the output
set_escape_handlers($old_handlers);
// OR
restore_escape_handlers();
// return the rendered template ready for response
}
}

Not defending that we should add global state as a first option for every
issue, but sometimes it's just not avoidable. For this RFC in specific, it
seems doable.

HTML escaping is, yes, a very pragmatic task - it's also solved already,
with htmlspecialchars() ... the main problem you appear to be solving, is
that htmlspecialchars() is too long and ugly and inconvenient, which, okay,
it is - but adding a global registry for that is overkill, and the whole
problem would go away if you could simply autoload functions:
<h1>&lt;?= html($title) ?></h1>

Agree with that, making functions easier to use seems more appealing to me.

Cheers,
Márcio.

8 years ago by Michael Vostrikov — view source

unread

Rasmus

Do you get my point that a reference to a closure is state? And if it's
global state, that's extremely bad - the entire PHP community is fighting
like hell to avoid that, with PSR-7 and layers of abstraction on top of,
well, everything, in order to make code testable.
What is the difference with autoload stask? This is global state too.

the main problem you appear to be solving, is that htmlspecialchars() is
too long and ugly and inconvenient,
Sorry, this IS NOT main problem. I repeated this many times, in discussion
and in RFC.

the registry in Symfony (at least, I don't know about Twig) is inside an
instance - it's not global state.
You can use your own implementation of PHP escaper, without a registry.

so the caller knows precisely what the result it, because only the caller
can know the context and intent.
The caller can use its own implementation. PHPEscaper is just default
implementation for those who don't want to care about it.

Dan

But then there is no way to restore the previous handler.
Why? You can get a callable value from getHandlers() and store it into
variable.
Escaping does not require a stack like SPL autoload. We don't need to
encode an apostrophe as '#039' and as 'apos' in different parts of template.
If we need to preapre e.g. some XML template, we can 'use MyXMLEscaper as
PHPEscaper' and implement there any encoding we need.

Stuff that is added to the core needs to cover all use-cases, not just
the typical ones.
What is such use case, could you give some example? And yes, it is possible
to write own implementation, with stack.

Then please copy the set_error_handler behaviour
set_error_handler is invented for, hm, error handling. As I think. the use
case for escaping handlers is "if there is no handler for 'js', define this
handler". So, with a behavior like set_error_handler(), registerHandler()
will just overwrite existing handler, and we will have to check existsing
handlers before a call of registerHandler(), instead of just to check
returned result.
Ok, I will change this behavior.

Rasmus and Marcio

adding a global registry for that is overkill, and the whole
problem would go away if you could simply autoload functions:
<h1><?= html($title) ?></h1>

Agree with that, making functions easier to use seems more appealing to
me.

Why do you talking about autoloading?
We can now define a global function h() which will call any function we
want. For escaping this is not a problem, unlike a big set of functions
from some namespace (e.g. like specific math functions). So, function
autoloading is another problem.
Advice about global function which was written in 2002 here
https://bugs.php.net/bug.php?id=16007. But the problem with HTML escaping
is still present.

8 years ago by Michael Vostrikov — view source

unread

A couple of thoughts.

Let's just remove default implementation of PHPEscaper. This is not hard to
define own class, in a global namespace or with 'use'.

Maybe allow to pass not limited set of arguments? E.g. like <?* $str,
'my_context', 'param', 'another_param' ?>
This can be used for all context-dependent text transformations - not just
for escaping, but also for translation, date and number formatting, etc.
<?* $message, 'translate', $messageCategory ?>

Maybe use ':' as a special sign? "<?: ?>" is more comfortable to type.

8 years ago by Michael Vostrikov — view source

unread

This can be used for all context-dependent text transformations
On the other hand, this is possible now with <?* $str, 'translate\category
| html' ?>. And list of arguments will not be needed if the second argument
will be an array. Ok, nevermind.

8 years ago by michal.brzuchalski@gmail.com — view source

unread

You are creating weird most of time unneded quite complex syntax. Just use
escaping functions or any other escaper or just simply template engine as
most of people does!
19 lip 2016 21:52 "Michael Vostrikov" michael.vostrikov@gmail.com
napisał(a):

This can be used for all context-dependent text transformations
On the other hand, this is possible now with <?* $str, 'translate\category
| html' ?>. And list of arguments will not be needed if the second argument
will be an array. Ok, nevermind.

8 years ago by Michael Vostrikov — view source

unread

You are creating weird most of time unneded quite complex syntax. Just
use escaping functions or any other escaper or just simply template engine
as most of people does!

I explained the reasons for implementing this operator in previous
discussion and in the "Problem description" section of the RFC. There are
many applications without template engine. 90% of output data in this
applications are from database and should be HTML-escaped. This is not a
"most of time unneeded" case. The number of similar discussion shows that
the problem really exists.

8 years ago by Rowan Collins — view source

unread

Operator has the following form:

<?* $str ?>
<?* $str, 'html' ?>
<?* $str, 'js | html' ?>
[...]
Operator is compiled into the following AST:

echo PHPEscaper::escape(first_argument, second_argument);

Hi Michael,

I'm coming around to the need for (or at least value of) this operator,
but I'm not keen on the details of the current draft.

For this operator to meet its stated aims it needs to be simple,
obvious, and easier to get right than wrong; and the current proposal
feels like it fails on all counts.

The trick with the magic class name and namespace aliasing is neat,
but feels likely to confuse a lot of users.
The basic <?* $foo ?> syntax seems OK, but sticking the escaping types
after the output makes it hard to spot what's going on with anything
other than a simple variable. e.g. <?*
$this->renderView($thing->getViewName(), 'html'), 'js' ?>
Because it's not obviously part of the <?* operator, someone might
think the escape parameter can be used elsewhere: <?php echo
'<blink>oops</blink>', 'html' ?> Operators don't normally have a list of
arguments.
The fact that the escape filter is a string, or any kind of
expression, compounds this. What happens if you mistype the argument?
<?* $foo, 'hmtl' ?> Or with the current proposal's use of '|', what if
you get that syntax wrong? <?* $foo, 'html || js' ?> <? $foo, 'html,
js' ?>

Now, I'm not saying this just to demolish the current proposal, but if
the idea is to make escaping second nature, I think the syntax needs to
be much more obvious, and much more "special". (If we make it too
flexible, we're basically inventing a new templating language, and we're
not actually short of those.)

I quite like Mathieu's <?[$escaper...]= $data ?> suggestion, but not the
free-form callable part (if you're mentioning the whole function name,
you can just call it already).

How about using a compound like <?*= and putting the list of filters up
front?

<?*= $foo ?> (defaulting to HTML escape)
<?*html= $foo ?>
<?*js= $foo ?> (JS escape only; not sure if this should encode as JSON,
or just a JS safe string; maybe <?*json= $foo ?> as well / instead...)
<?jshtml= $foo ?> (JS escape and then HTML escape)

The common escape types should be built in, with maybe a function to
override them and register additional types, similar to
stream_wrapper_register. The biggest use case for this is people who
aren't using a framework, which would have either a templating engine
or a bunch of escaping helpers already, so customising the definitions
is going to be the exception, not the rule.

The biggest barrier, though, remains that <?= $something ?> will still
work, and will still be used in examples people see, so people will
still be in the habit of using the unsafe operator. It's pretty marginal
whether remembering to reach for <?* $something ?> is actually any
easier than remembering to reach for <?= h($foo) ?>

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Michael Vostrikov — view source

unread

sticking the escaping types after the output makes it hard to spot what's
going on with anything other than a simple variable. e.g. <?*
$this->renderView($thing->getViewName(), 'html'), 'js' ?>

In Twig escapers and filters are also written after a variable, and this is
not confusing for many users.
{{ render(thing.viewName, 'html') | escape('js') | somefilter }}

Because it's not obviously part of the <?* operator, someone might think
the escape parameter can be used elsewhere: <?php echo
'<blink>oops</blink>', 'html' ?> Operators don't normally have a list of
arguments.

Just a default variable, nothing complicated.
If they write echo like this, they will notice 'oopshtml' and then correct
this contstruction.
Binary operators have 2 arguments (add($a, $b), escape($string, $context)),
'for' has 3 arguments.
And this is the reason why I want it to be a call of function with constant
name. This is very clear - <?* $str, $context ?> turns into
some_escape_function($str, $context).

What happens if you mistype the argument? <?* $foo, 'hmtl' ?> Or with the
current proposal's use of '|', what if you get that syntax wrong? <?* $foo,
'html || js' ?> <? $foo, 'html, js' ?>

Exception: Unknown context 'hmtl'.
Exception: Unknown context ''.
Exception: Unknown context 'html, js'.

If the handlers for these contexts are not set, of course.

if you're mentioning the whole function name, you can just call it already
<?*html= $foo ?>
Do you mean the function autoloading? What is the difference with not-fq
name 'PHPEscaper' then?) And how to use an escaper like [$this, 'html'] ?

JS escape only; not sure if this should encode as JSON, or just a JS safe
string; maybe <?*json= $foo ?> as well / instead...
This looks unclear for me - why I cannot use json for strings and what if
my variable sometimes is an array, sometimes a string?

The biggest use case for this is people who aren't using a framework
... so customising the definitions is going to be the exception, not the
rule
They can setup their escapers once, this is not a problem, but the problem
is e.g. default flags for html escaping.
Customization is required.

If we make it too flexible, we're basically inventing a new templating
language
We cannot forbid a customization, so any custom escaper is a kind of new
templating language.
The operator must be simple for use. If someone wants to create new
templating language in his application, let he create. It will be in
application, not in PHP.

The trick with the magic class name and namespace aliasing is neat, but
feels likely to confuse a lot of users

Yes, I have to agree. Maybe more better way is to make it similar to
set_error_handler() - not for context as it is in RFC, but for 'escape'
callable.

<?php

// somewhere in application

set_escape_handler(function($string, $context = 'html'){
    ...
});

// or

set_escape_handler([$this, 'escape']);

?>

<?* $myValue, $myContext ?>

Libraries can save and restore original handler when rendering their
templates. If the library meet unknown context during work, it can call
original handler from inside its handler. Frameworks and CMSs can provide
an internal syntax for registering custom handlers from modules and
libraries.
The reason for creating not-fq name 'PHPEscaper' was a possiblity to use
custom handler in some library, without taking care about application
handler. But maybe this will bring more problems than it solves...

We cannot use a stack like spl_autoload, because escaping function can
return only a string, not true or false.
There can be used a special variable "html($str, $context, &$handled)",
return an array [$str, $handled], or throwing and catching exceptions that
can reduce performance. All variants look as inappropriate.

So, it seems, the easiest way is with set_escape_handler().

8 years ago by Rowan Collins — view source

unread

sticking the escaping types after the output makes it hard to spot what's
going on with anything other than a simple variable. e.g. <?*
$this->renderView($thing->getViewName(), 'html'), 'js' ?>

In Twig escapers and filters are also written after a variable, and this is
not confusing for many users.
{{ render(thing.viewName, 'html') | escape('js') | somefilter }}

Sure, but in Twig, Smarty, etc filters aren't just a special syntax for
doing escaping, they're a fundamental part of the language. In essence,
they're just a different way of writing a function call, a bit like
Sara's pipe operator, which incidentally would allow this:

<?= render($thing->viewName, 'html') |> escape($$, 'js') |>
somefilter($$) ?>

Because it's not obviously part of the <?* operator, someone might think
the escape parameter can be used elsewhere: <?php echo
'<blink>oops</blink>', 'html' ?> Operators don't normally have a list of
arguments.

Just a default variable, nothing complicated.
If they write echo like this, they will notice 'oopshtml' and then correct
this contstruction.
Binary operators have 2 arguments (add($a, $b), escape($string, $context)),
'for' has 3 arguments.

Binary operators have a left-hand operand and a right-hand operand;
unary operators have an operand immediately before or immediately after
their symbol. You don't write +$a, $b you write $a + $b. The only
exception is the ternary decision operator, $a ? $b : $c, which still
doesn't use a comma.

Functions, on the other hand, have a name, and a pair of brackets. So
+($a, $b) looks a bit like a function call with a funky name; that would
give us <?*( $foo, 'html') ?>.

I can see your logic in defining it this way, it just doesn't look like
anything else in the language, and that's a bad thing for people using
it properly.

And this is the reason why I want it to be a call of function with constant
name. This is very clear - <?* $str, $context ?> turns into
some_escape_function($str, $context).

The more you compare it to a function call, the less I understand how it
gains over just defining a function e() and writing <?= e( $foo, 'html') ?>

What happens if you mistype the argument? <?* $foo, 'hmtl' ?> Or with the
current proposal's use of '|', what if you get that syntax wrong? <?* $foo,
'html || js' ?> <? $foo, 'html, js' ?>

Exception: Unknown context 'hmtl'.
Exception: Unknown context ''.
Exception: Unknown context 'html, js'.

If the handlers for these contexts are not set, of course.

But if these are run-time errors, how is the clever syntax helping
people get it right here? If you can pass a variable as the escaping
method, define arbitrary escaping functions, etc, you can't even write a
strict static analyser.

if you're mentioning the whole function name, you can just call it already
<?*html= $foo ?>
Do you mean the function autoloading? What is the difference with not-fq
name 'PHPEscaper' then?) And how to use an escaper like [$this, 'html'] ?

You've merged two unrelated lines from my e-mail here, so I'm not sure
what your question is. By the first line, I meant that if you write
"<?[$this, 'html'] $foo ?>" you might as well just write "<?=
$this->html($foo) ?>". The syntax has gained you nothing but a minimum
version of PHP and some head-scratching from new users.

JS escape only; not sure if this should encode as JSON, or just a JS safe
string; maybe <?*json= $foo ?> as well / instead...
This looks unclear for me - why I cannot use json for strings and what if
my variable sometimes is an array, sometimes a string?

'foo' is valid JSON - there is much hot air about the difference between
a "JSON value", a "JSON document", etc, but long story short, it is
perfectly fine to say <?= json_encode('hello world') ?>

On the other hand, if I have an array and ask for it to be HTML-escaped,
nothing iterates the array for me, it will just print "Array". So if I
ask for it to be "JS-escaped", why should it magically produce a JSON
array? Not to mention the fact that PHP "arrays" cannot be losslessly
represented as either an object or an array in JS (they have both
arbitrary keys and well-defined order, JS makes you choose one or the
other).

The biggest use case for this is people who aren't using a framework
... so customising the definitions is going to be the exception, not the
rule
They can setup their escapers once, this is not a problem, but the problem
is e.g. default flags for html escaping.
Customization is required.

Customizability is required, yes, but it absolutely should not be
mandatory. If people are using a framework already, they will already
have a method of doing this. This feature is really only useful for
people who are relying on absolutely minimum framework code, who want
something to work "out of the box".

If I have to write "register_escape_handler(function($string,
$mode='html') { .... })" I might as well just write "function e($string,
$mode='html') { .... }". There's not even a question of autoloading,
because nothing is going to autoload the procedural code that runs
"register_escape_handler" anyway.

If we make it too flexible, we're basically inventing a new templating
language
We cannot forbid a customization, so any custom escaper is a kind of new
templating language.
The operator must be simple for use. If someone wants to create new
templating language in his application, let he create. It will be in
application, not in PHP.

I think we're in agreement here - simplicity is key. :)

The trick with the magic class name and namespace aliasing is neat, but
feels likely to confuse a lot of users

Yes, I have to agree. Maybe more better way is to make it similar to
set_error_handler() - not for context as it is in RFC, but for 'escape'
callable.

Personally, I think having a separate handler registered for each
context argument, like streams, makes more sense. I don't see the use
case for customising how contexts are combined, passing variable
contexts around, or anything else that is gained by one callback with
two parameters.

Again, focusing on simplicity:

<?php
set_escape_handler('html', 'htmlspecialchars');
set_escape_handler('json', 'json_encode');
?>
<?htmljson= $foo ?>

becomes:

<?php echo htmlspecialchars(json_encode($foo)); ?>

And, as I say, the common escape handlers should be defined by default,
just like a whole bunch of stream types are.

Regards,

Rowan Collins
[IMSoP]

8 years ago by Ryan Pallas — view source

unread

On Fri, Jul 22, 2016 at 1:31 AM, Michael Vostrikov <
michael.vostrikov@gmail.com> wrote:

The trick with the magic class name and namespace aliasing is neat, but
feels likely to confuse a lot of users

Yes, I have to agree. Maybe more better way is to make it similar to
set_error_handler() - not for context as it is in RFC, but for 'escape'
callable.

<?php
// somewhere in application

set_escape_handler(function($string, $context = 'html'){
    ...
});

// or

set_escape_handler([$this, 'escape']);
?>

<?* $myValue, $myContext ?>

Which begs the question, if you can verify that the call to
set_escape_handler comes before the template, then can you also be sure
that a function definition will come before it, and just call a function
directly?

8 years ago by Michael Vostrikov — view source

unread

The more you compare it to a function call, the less I understand how it
gains over just defining a function e() and writing <?= e( $foo, 'html') ?>
I might as well just write "function e($string, $mode='html') { .... }"
they will already have a method of doing this
Yes, and they have to write a call of it everywhere. New operator can
remove it.
I told about it in previous messages, when explained why function
autoloading is another problem - this is not a problem to create a
function, the problem is to copy-paste it in 90% places of output data.

The goal is to remove copy-paste for HTML escaping (so it will become
automatic) and possible XSS when this copy-paste is missed. If we can get a
work with other contexts, it will be just a good addition.

If you can pass a variable as the escaping method
Second variable is not escaping method. It is a context. Escaping function
can handle this context as it wants.

On the other hand, if I have an array and ask for it to be HTML-escaped,
nothing iterates the array for me, it will just print "Array". So if I ask
for it to be "JS-escaped", why should it magically produce a JSON array?
Yes, I agree, I methioned this in RFC - JSON is not escaping, it is
encoding in special notation.
This is one of the reasons why I was disagreed with the need to support
multiple contexts.
So, the question is up again - do we really need multiple contexts?

Again, focusing on simplicity:
<?php
set_escape_handler('html', 'htmlspecialchars');
set_escape_handler('json', 'json_encode');
?>
<?htmljson= $foo ?>
becomes:
<?php echo htmlspecialchars(json_encode($foo)); ?>

Yes, PHPEscaper from RFC works that way - PHPEscaper::registerHandler().
But it think runtime definition with second variable is more flexible way.
There are external contexts (HTML is one of) and internal task-dependent
contexts which can be combined with HTML. We cannot know all possible tasks.
And we come again to pipe operator and twig-like syntax.

Which begs the question, if you can verify that the call to
set_escape_handler comes before the template, then can you also be sure
that a function definition will come before it, and just call a function
directly?
Sorry, not sure if I understand your question. What do you mean in 'call a
function directly'?
If you mean <?= $this->escape($myValue, $myContext) ?> then the goal is to
remove this copy-paste.

8 years ago by Mathieu Rochette — view source

unread

The more you compare it to a function call, the less I understand how it
gains over just defining a function e() and writing <?= e( $foo, 'html') ?>
I might as well just write "function e($string, $mode='html') { .... }"
they will already have a method of doing this
Yes, and they have to write a call of it everywhere. New operator can
remove it.
I told about it in previous messages, when explained why function
autoloading is another problem - this is not a problem to create a
function, the problem is to copy-paste it in 90% places of output data.

The goal is to remove copy-paste for HTML escaping (so it will become
automatic) and possible XSS when this copy-paste is missed. If we can get a
work with other contexts, it will be just a good addition.
I missed that from the rfc. I'm conflicted with this one. For you php
"More than 90% of output data - is data from DB and must be HTML-encoded."
I have no idea how you came with this, even with applications or
websites I'm working on not using a template engine this is far from the
truth.
especially now that more ans more web application are consuming json
API, the backend often produce mostly json & xml. at that you can add
csv and pdf sometimes used for reporting, invoicing, etc. then there is
js ads

so, if I'm used to only using <?* $data ?> for html, what will make me
think about setting the different context on other places ? I'll
probably end up with html encoded data in my csv files instead.

If you can pass a variable as the escaping method
Second variable is not escaping method. It is a context. Escaping function
can handle this context as it wants.

On the other hand, if I have an array and ask for it to be HTML-escaped,
nothing iterates the array for me, it will just print "Array". So if I ask
for it to be "JS-escaped", why should it magically produce a JSON array?
Yes, I agree, I methioned this in RFC - JSON is not escaping, it is
encoding in special notation.
This is one of the reasons why I was disagreed with the need to support
multiple contexts.
So, the question is up again - do we really need multiple contexts?
I don't think "json" escaping should produce an array, the escape should
be on the output, if I have <?= [] ?> the output is Array if I have <?*
[], 'json' ?> the output should be "Array"

I don't get why it's not escaping for you, if I have a template for a js
file, I think I should be able to escape data like that:

console.log(<?* $row['data'], 'json' ?>);

what is the difference with html here ? the escape mechanism only works
on string. obviously if I want to output a json object I should manually
call <?=

json_encode(['data' => $row['data']) ?>

and finally I'll say it again, if I want to output a javascript string
inside a script tag in a html file, for me that's two nested contexts
and the escaping should not be different from one and the other

Again, focusing on simplicity:
<?php
set_escape_handler('html', 'htmlspecialchars');
set_escape_handler('json', 'json_encode');
?>
<?htmljson= $foo ?>
becomes:
<?php echo htmlspecialchars(json_encode($foo)); ?>
Yes, PHPEscaper from RFC works that way - PHPEscaper::registerHandler().
But it think runtime definition with second variable is more flexible way.
There are external contexts (HTML is one of) and internal task-dependent
contexts which can be combined with HTML. We cannot know all possible tasks.
And we come again to pipe operator and twig-like syntax.

Which begs the question, if you can verify that the call to
set_escape_handler comes before the template, then can you also be sure
that a function definition will come before it, and just call a function
directly?
Sorry, not sure if I understand your question. What do you mean in 'call a
function directly'?
If you mean <?= $this->escape($myValue, $myContext) ?> then the goal is to
remove this copy-paste.

8 years ago by Michael Vostrikov — view source

unread

I'm conflicted with this one. For you php "More than 90% of output data -
is data from DB and must be HTML-encoded."
I have no idea how you came with this, even with applications or websites
I'm working on not using a template engine this is far from the truth.
especially now that more ans more web application are consuming json API,
the backend often produce mostly json & xml.

This is from my experience and from the poll. 35% + 23% people work 'with
the projects with template rendering on PHP
where template engines are not used' always or quite often. And in such
applications there are many constructions like <?= h($entity->property) ?>,
except values with HTML or constructions like <?= json_encode() ?>, which
is quite rare case.
Also, HTML escaping can be used in XML templates.
Of course, for JSON API this operator is useless, as well as for
applications with template engine.

I don't think "json" escaping should produce an array

JSON is object notation. This is unclear why I wrote <?* ['a' => 'b'],
'json' ?>, but don't get this value encoded in JSON. But you are right, if
array is casted to string, this is escaping.

8 years ago by David Rodrigues — view source

unread

The idea is good, but I think that is not pratical in general.

First point: we should define each new identifier that could be used.
It not make clear what this identifier does or even how it should
works when a package redefines what it does.

For instance:

// file1.php
set_escape_handler('e', 'html_entities_encode');

// file2.php
set_escape_handler('e', 'my_own_encode');

// file3.php
<?* $value, 'e' ?>

If file1.php includes file3.php, it should use first implementation.
If file2.php does that, so will run the second implementation. I can't
control from it come.
If file1.php includes file2.php, what should happen? error? override?

You can solve that by using the namespace, but then the implementation
should be something like:

<?* $value, \Namespace::e ?>

But if you do that, is better you call it directly.

Second point: can create a confusion with echo syntax, because it
accepts comma separated arguments, like in <?php echo 1, 2, 3; ?>

Third point: the second argument is a string separated by comma,
instead of separate each argument by it self.
It is a string too, instead of an identifier like in <?* $value, escape ?>

Fourth point: this will create conflict to IDE, as joined string (<?*
$value, 'escape, json' ?>) is hard to identify each argument item, as
splitted string (<?* $value, 'escape', 'json' ?>) can confuse with a
user string (if you get), as identifier is better, but it can conflict
with const, but even if we ignore that, IDE will have problem by
identify where you have defined it. Let's suppose that you have
defined by a generic function, like:

function create_escape($escape, $callback) {
set_escape_handler($escape, function () use ($callback) { return
doSomethingWith($callback) });
}

create_scape('e', 'my_own_encode');

Fifth point: you can't use arguments on each escape to change the mean
of what happen, so I need define each possibility (that could be a
lot). For instance: imagine that I have a escape that does, on
reality, a "clamp" that do a $value bet more than min, and less than
max. It should receives two arguments (min and max) and optional one
(inclusive). Currently I could do it like: <?= echo clamp($value, 5,
25, true); ?>. How you can do that on your case?

Sixth point: current escape methods seems be more eficient and without
create a new operator, like: <?php e($value); ?>
But it can be improved with Sara suggestion with the pipe v2 (chain?)
operator like <?php $value~>e($$); ?>

As conclusion, I think that if PHP create something like that, so is
better PHP implements their own template engine system, turning this
operation faster. But it can inflate PHP code with a template option
that you don't like to use (for instance, I like Blade, but you can
like Twig, another like PHP native template engine).

I think that to some os problems above is create a exclusive scope to
you apply your 'escapers', for instance:

$escape = new SplEscaper;
$escape->support('e', function () { ... });
$escape->require('myfile.php');

In this case, it'll require myfile.php and accept your escapers based
on this instance.

8 years ago by Rowan Collins — view source

unread

Fifth point: you can't use arguments on each escape to change the mean
of what happen, so I need define each possibility (that could be a
lot). For instance: imagine that I have a escape that does, on
reality, a "clamp" that do a $value bet more than min, and less than
max. It should receives two arguments (min and max) and optional one
(inclusive). Currently I could do it like: <?= echo clamp($value, 5,
25, true); ?>. How you can do that on your case?

I think that's drifting a long way away from the idea of "simple escape
syntax", and into "building a templating engine". I don't see any need
for this list to concern itself with such a task, because Smarty, and
Twig, and many others already exist for that job. "clamp()" is a data
transform, but it's not an "escape" in any sense that I can think of.

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Rowan Collins — view source

unread

The more you compare it to a function call, the less I understand how it
gains over just defining a function e() and writing <?= e( $foo, 'html') ?>

I might as well just write "function e($string, $mode='html') { .... }"
they will already have a method of doing this
Yes, and they have to write a call of it everywhere. New operator can
remove it.

This is the part I don't get. How does "using an operator everywhere"
remove the effort of "using a function everywhere"? It's the same effort
in both cases.

The goal is to remove copy-paste for HTML escaping (so it will become
automatic)
...
the problem is to copy-paste it in 90% places of output data.

If somebody can't type "e(" and ") without copying and pasting, then
they're going to have a hard time writing any meaningful code.

On the other hand, if I have an array and ask for it to be HTML-escaped,
nothing iterates the array for me, it will just print "Array". So if I ask
for it to be "JS-escaped", why should it magically produce a JSON array?
Yes, I agree, I methioned this in RFC - JSON is not escaping, it is
encoding in special notation.
This is one of the reasons why I was disagreed with the need to support
multiple contexts.
So, the question is up again - do we really need multiple contexts?

The reason multiple contexts are needed, in my opinion, is the false
sense of security of saying "we have built-in escaping", but actually
meaning "we have built-in HTML escaping". If the idea of this feature is
to make good escaping habits second-nature to users who don't have a
templating system, then there's a responsibility to remind them that not
all contexts are created equal.

As for what "JS-escaped" should mean, why not just backslash-escape
quote marks, like you'd ampersand escape double quotes in HTML?

<?php $foo = "World's End"; ?>
var foo='<?*js= $foo ?>';

var foo='World's End';

Yes, PHPEscaper from RFC works that way - PHPEscaper::registerHandler().
But it think runtime definition with second variable is more flexible way.
There are external contexts (HTML is one of) and internal task-dependent
contexts which can be combined with HTML. We cannot know all possible tasks.
And we come again to pipe operator and twig-like syntax.

More flexible to what end? Why do I need to be able to dynamically
define arbitrarily complex expressions as the filter name?

Note that what is defined in the RFC currently is not similar to
Twig/Smarty, because it views the parsing of the | as internal to the
callback, not part of the escaping syntax itself. It allows you to
define an escape callback that instead uses "," as the separator, and I
don't see why that would ever be necessary.

If you have a new context, give it a name, and register it. As I say,
this is how streams work - you don't register a callback that filters
all filepath strings, you register a prefix for your particular stream type.

Again, the stated aim of this RFC is to make correct escaping easier
than incorrect escaping. That means providing really obvious syntax, out
of the box, for doing the right thing.

Regards,

Rowan Collins
[IMSoP]

8 years ago by Mathieu Rochette — view source

unread

The more you compare it to a function call, the less I understand
how it
gains over just defining a function e() and writing <?= e( $foo,
'html') ?>

I might as well just write "function e($string, $mode='html') {
.... }"
they will already have a method of doing this
Yes, and they have to write a call of it everywhere. New operator can
remove it.

This is the part I don't get. How does "using an operator everywhere"
remove the effort of "using a function everywhere"? It's the same
effort in both cases.
I to think that's the same effort, however using an escape operator have
one advantage imho: you can use a linter to check that all output pass
through an escaper.
it can still be wrong but its absence can be detected

thinking about escaping, it reminded me about ob_start, it takes an
optional output_callback that's called on the output data. what if the
escape operator was not limited to short tags and worked the same way
(except for the buffering part) ? it would ensure that any output within
"escape context" would be escaped, eg:

<!DOCTYPE html> <html> <body> <h1><?['html']: $this->title() ?></h1>

<p>
    &lt;?php['html']:
        echo $this->article()->summary();
    ?>
    <button
        onclick="document.getElementById(&lt;?['js', 'html_attr']:

"article-".$this->article()->id() ?>).style.display = 'block';"
>read more</button>
</p>

<article id="article-&lt;?['html_attr']: $this->article()->id() ?>"

style="display: none">
<?php[]:// this says: this one should not be escaped
echo $this->article()->htmlContent();
?>
</article>

</body> </html>

and if it's explained that work on the output string data, there is no
doubt about that <?['json']: [] ?> should result in "Array"

The goal is to remove copy-paste for HTML escaping (so it will become
automatic)
...
the problem is to copy-paste it in 90% places of output data.

If somebody can't type "e(" and ") without copying and pasting, then
they're going to have a hard time writing any meaningful code.
I think means that it's repeated all over the place, not necessarily
that it's not written by hand. that said, I'd rather have that repeated
all over to make sure you think about the correct context at all times

On the other hand, if I have an array and ask for it to be
HTML-escaped,
nothing iterates the array for me, it will just print "Array". So if
I ask
for it to be "JS-escaped", why should it magically produce a JSON array?
Yes, I agree, I methioned this in RFC - JSON is not escaping, it is
encoding in special notation.
This is one of the reasons why I was disagreed with the need to support
multiple contexts.
So, the question is up again - do we really need multiple contexts?

The reason multiple contexts are needed, in my opinion, is the false
sense of security of saying "we have built-in escaping", but actually
meaning "we have built-in HTML escaping". If the idea of this feature
is to make good escaping habits second-nature to users who don't have
a templating system, then there's a responsibility to remind them that
not all contexts are created equal.

As for what "JS-escaped" should mean, why not just backslash-escape
quote marks, like you'd ampersand escape double quotes in HTML?

<?php $foo = "World's End"; ?>
var foo='<?*js= $foo ?>';

var foo='World's End';

Yes, PHPEscaper from RFC works that way - PHPEscaper::registerHandler().
But it think runtime definition with second variable is more flexible
way.
There are external contexts (HTML is one of) and internal task-dependent
contexts which can be combined with HTML. We cannot know all possible
tasks.
And we come again to pipe operator and twig-like syntax.

More flexible to what end? Why do I need to be able to dynamically
define arbitrarily complex expressions as the filter name?

Note that what is defined in the RFC currently is not similar to
Twig/Smarty, because it views the parsing of the | as internal to
the callback, not part of the escaping syntax itself. It allows you to
define an escape callback that instead uses "," as the separator, and
I don't see why that would ever be necessary.

If you have a new context, give it a name, and register it. As I say,
this is how streams work - you don't register a callback that filters
all filepath strings, you register a prefix for your particular stream
type.

Again, the stated aim of this RFC is to make correct escaping easier
than incorrect escaping. That means providing really obvious syntax,
out of the box, for doing the right thing.

Regards,

--
Mathieu Rochette

8 years ago by Michael Morris — view source

unread

Not replying to anyone in particular or quoting anything so I'll start
afresh. This is with the topic though, but I'm going to step outside of the
box a bit.

PHP was a template engine at inception. It still is to some degree - hence
braceless syntax. The argument can be made that while the language has
evolved the template engine capabilities of PHP are severely lagging. This
is why smarty and twig exist - and something has gone awry when people are
writing template engines inside of a template engine.

There exists output that needs to be escaped, usually through
htmlentities(). Having PHP auto escape the content is often desirable. In a
given file though it is very unlikely that more than one filter technique.
So instead of a new operator, why not use the existing declare mechanism to
declare a filter?

declare('filter=htmlentities');

With this on the function set for filter will be invoked on the output of
any echo statement or the shortcode for it. When raw output is still
needed allow print() to output the content bypassing any declared filters
for the file.

This alone would help a great deal, but in addition to this I think having
contextual require statements could also be useful. The most obvious
example:

html_require 'path/to/template/file.phtml';

Contextual file import also opens to the door to pull in code files that
aren't supposed to be echoing anything out, such as class files in most
frameworks. Consider this possiblity

php_require 'path/to/my/classfile.php';

This statement would throw a parse error if there are any <?php ?> or <?=
?> tags at all except a <?php at the very beginning of the file (allowed so
that users of older IDE's can work with the file and have color context
editing intact). This statement has the potential to allow the engine to
build these files a bit faster since inline_html tokens wouldn't need to be
tracked. Also, there is a security upside I think by prohibiting files that
shouldn't be outputting anything from doing so.

In closing, I see the need and do think it needs to be filled, but I'm not
sure a new operator with multiple arguments is the way to go.

8 years ago by Michael Vostrikov — view source

unread

For instance:
// file1.php
set_escape_handler('e', 'html_entities_encode');
// file2.php
set_escape_handler('e', 'my_own_encode');
// file3.php
<?* $value, 'e' ?>

If file1.php includes file3.php, it should use first implementation.
If file2.php does that, so will run the second implementation. I can't
control from it come.
If file1.php includes file2.php, what should happen? error? override?

What is the difference from function e() ? What should happen - error or
override?
And as I wrote in previous message:
"Maybe more better way is to make it similar to set_error_handler() - not
for context as it is in RFC, but for 'escape' callable.". So in your
example set_escape_handler() should be used as
"set_escape_handler('my_own_handler')". If you will perform error or
override is up to you.

It is a string too, instead of an identifier like in <?* $value, escape ?>
... IDE will have problem by identify where you have defined it

It should not be identifier or single function name, because in this way we
could not use closures or object methods ($this->escape) for escaping.
Context should be an expression, like it is done in template engines. So,
no problems with IDE.

Currently I could do it like: <?= echo clamp($value, 5, 25, true); ?>.
How you can do that on your case?

This is not a task of escaping. This is a logic (business logic or
presentation logic).

current escape methods seems be more eficient and without create a new
operator, like: <?php e($value); ?>

The problem is not that we don't have a function, the problem is that we
must copy-paste it everywhere, and if we forget to do it, we will get an
XSS.

This is the part I don't get. How does "using an operator everywhere"
remove the effort of "using a function everywhere"? It's the same effort in
both cases.

"using an operator everywhere" and "using an operator + function
everywhere, especially if the operator itself works good but is unsafe".

If somebody can't type "e(" and ") without copying and pasting, then
they're going to have a hard time writing any meaningful code.

What is the difference how he wrote 'e()' ? It may be 'ctrl-c-ctrl-v',
'ctrl-insert-shift-insert', 'e-shift-(-)'. The result is the same - this is
a copied code.

More flexible to what end? Why do I need to be able to dynamically define
arbitrarily complex expressions as the filter name?

To the case when we write escapers statically. Twig allows to pass a
context as a variable, why it is needed to specially restrict escaping
mechanism in PHP? We don't know all possible tasks which can require
additinal escaping together with HTML.

With this on the function set for filter will be invoked on the output of
any echo statement or the shortcode for it.
When raw output is still needed allow print() to output the content
bypassing any declared filters for the file.

This will require a lot of changes it the language. For now, 'print',
'echo', <?= $a, $b ?>, <div></div> output a value via echo opcode.

8 years ago by Michael Vostrikov — view source

unread

I have written many messages already. I think, the purpose of this operator
is clear.
In this discussion I have come up to understanding what I would like to use.

You suggest very hard and complex solutions:

<?jshtml= $str ?>
<?php['html']: ?>
$escape = new SplEscaper; $escape->support('e', function () { ... });
declare('filter=htmlentities');

This is not what I wanted to suggest.

I have rewritten RFC a little. There is no tricks with ZEND_NAME_NOT_FQ,
there is no magic constants, there is no problems with autoloading. The
soultion is small, simple, and customizable.
https://wiki.php.net/rfc/escaping_operator

There are 3 functions:
callable|null set_escape_handler(callable $handler)
bool restore_escape_handler()
escape_handler_call(mixed $string, mixed $context)

They work similar to set_error_handler() / restore_error_handler().

Operator is compiled into the following AST:
echo escape_handler_call(first_argument, second_argument);

Function escape_handler_call() just pass given arguments into user-defined
handler. Second argument is not required. If the handler is not set, it
throws an exception. There is no default handler for any context, to
prevent 'built-in' wrong work of <?* $str ?> constructions in non-HTML
contexts like CSV. This is not hard to create a handler once. Default
context can be set in it as default value for second argument.

set_escape_handler(function($str, $context = 'html') {
...
});

What is under discussion:

Starting sign.
Last one is more comfortable to type.

<?* $a, $b ?>
<?: $a, $b ?>

Separator sign.
Maybe it should differ from standard <?= $a, $b ?> syntax to prevent
mistakes like <?= $a, 'html' ?> instead of <?* $a, 'html' ?>. '|' won't
give error, but looks more similar to escaping in template engines.

<?* $a , $b ?>
<?* $a | $b ?>
<?* $a |> $b ?>
<?: $a : $b ?>

If to wrap functions in a class or namespace (fully qualified), to not
clutter up a global namespace:

set_escape_handler()
restore_escape_handler()
escape_handler_call()

PHPEscaper::setEscapeHandler()
PHPEscaper::restoreEscapeHandler()
PHPEscaper::escapeHandlerCall()

And also any names in source code or details of implementation, without
changing main algorithm.

What is not under discussion:

Built-in contexts.
Because escape_handler_call() is not an escaper itself, but just a helper
to call user-defined escaper, it should not handle any contexts. This
allows to prevent 'built-in' wrong work of <?* $str ?> constructions in
non-HTML contexts like CSV.

Multiple arguments.
<?* $a, 'js', 'html' ?>
I think, it is enough that second argument can be of any type, e.g. an
array.

Complicated syntax like <?htmljs= $str ?>.
If we allow custom handlers, then we need runtime processing, so the
example above cannot be compiled into
<?= htmlspecialchars(json_encode($str)) ?>
directly, and it will something like
<?= escape_handler_call(escape_handler_call($str, 'html'), 'js') ?>
I.e. we anyway need to pass context as a second argument, so why not allow
user to do it.

If someone wants more complex solution or built-in template engine, he can
create another RFC and suggest his own implementation.

8 years ago by Rowan Collins — view source

unread

Operator is compiled into the following AST:
echo escape_handler_call(first_argument, second_argument);

I'm sorry, but this is now so simple it undermines its own argument for
existing.

There is no default handler for any context, to
prevent 'built-in' wrong work of <?* $str ?> constructions in non-HTML
contexts like CSV. This is not hard to create a handler once. Default
context can be set in it as default value for second argument.

So it is now mandatory to have some bootstrap file somewhere that
defines and registers the escape function? How is that different from
writing, right now, at the top of your bootstrap file:

function e($str, $context = 'html') {
...
}

You are effectively offering a way of aliasing a particular function to
the magic name "*", and everything else is still down to the user.

Complicated syntax like <?htmljs= $str ?>.

I have no idea why that is "complicated syntax", but your proposal isn't:

<?: $str | 'html | js' ?>

Or even:

<?: $str | ['html', 'js'] ?>

In your proposal, part of the syntax won't even be standard between
different people's code (and yes, the '|' in 'html | js' is syntax, even
if it's not parsed until run-time).

Is it just that you don't like the escape strategy coming first? I
suggested it that way just to make it stand out more, but this would be
entirely equivalent (assuming we could find an appropriate separator):

<?* $str : html : js ?>

If we allow custom handlers, then we need runtime processing, so the
example above cannot be compiled into
<?= htmlspecialchars(json_encode($str)) ?>
directly, and it will something like
<?= escape_handler_call(escape_handler_call($str, 'html'), 'js') ?>

Yes, this is exactly how all template languages I've ever seen do it.
Once you unroll the if / switch / lookup table, the code run under your
proposal would be something like this:

$temp = $str;
$temp = json_encode($temp);
$temp = htmlspecialchars($temp);

I don't really see how one is any better than the other.

I.e. we anyway need to pass context as a second argument, so why not allow
user to do it.

Because we're trying to make it easier for the user, not harder. Why
make them handle the nesting, sanity-checking, and control flow of
multiple filters, rather than building them into the syntax from the start?

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Thomas Bley — view source

unread

<?: $a, $b ?>

php already uses ?: for ternary operator, so users get a bit confused by using it for escaping.

<?* $a | $b ?>

this allows multiple interpretations:

<?* $a | $b ?> meaning $a context $b
<?* $a | $b ?> meaning $a | $b context 'html'

<?* $a |> $b ?>

|> may be used by Pipe Operator rfc, if vote is successful

if ($context == 'html') {

this is bad coding style since $context = 0 gives unexpected html escaping. The following expressions would be equal:

<?* $str, 'html' ?>
<?* $str, 0 ?>

please use:
if ($context === 'html') {
if ($context === 'js') {

<?* $a, 'js', 'html' ?>

currently we cannot use set_escape_handler(function($str, ...$context = 'html') since variadic parameters cannot have a default value. So having second argument be any type should be fine.

Maybe add an example for using escape operator callback functions in frameworks:

public function render($template, $vars) {
$this->setVars($vars);
set_escape_handler(['SomeClass', 'methodName']);
ob_start();
include $template;
$content = ob_get_clean();
restore_escape_handler();
return $content;
}

In total a good rfc everybody should be happy with.

Regards
Thomas

Michael Vostrikov wrote on 24.07.2016 11:48:

I have written many messages already. I think, the purpose of this operator
is clear.
In this discussion I have come up to understanding what I would like to use.

You suggest very hard and complex solutions:

<?jshtml= $str ?>
<?php['html']: ?>
$escape = new SplEscaper; $escape->support('e', function () { ... });
declare('filter=htmlentities');

This is not what I wanted to suggest.

I have rewritten RFC a little. There is no tricks with ZEND_NAME_NOT_FQ,
there is no magic constants, there is no problems with autoloading. The
soultion is small, simple, and customizable.
https://wiki.php.net/rfc/escaping_operator

There are 3 functions:
callable|null set_escape_handler(callable $handler)
bool restore_escape_handler()
escape_handler_call(mixed $string, mixed $context)

They work similar to set_error_handler() / restore_error_handler().

Operator is compiled into the following AST:
echo escape_handler_call(first_argument, second_argument);

Function escape_handler_call() just pass given arguments into user-defined
handler. Second argument is not required. If the handler is not set, it
throws an exception. There is no default handler for any context, to
prevent 'built-in' wrong work of <?* $str ?> constructions in non-HTML
contexts like CSV. This is not hard to create a handler once. Default
context can be set in it as default value for second argument.

set_escape_handler(function($str, $context = 'html') {
...
});

What is under discussion:

Starting sign.
Last one is more comfortable to type.

<?* $a, $b ?>
<?: $a, $b ?>

Separator sign.
Maybe it should differ from standard <?= $a, $b ?> syntax to prevent
mistakes like <?= $a, 'html' ?> instead of <?* $a, 'html' ?>. '|' won't
give error, but looks more similar to escaping in template engines.

<?* $a , $b ?>
<?* $a | $b ?>
<?* $a |> $b ?>
<?: $a : $b ?>

If to wrap functions in a class or namespace (fully qualified), to not
clutter up a global namespace:

set_escape_handler()
restore_escape_handler()
escape_handler_call()

PHPEscaper::setEscapeHandler()
PHPEscaper::restoreEscapeHandler()
PHPEscaper::escapeHandlerCall()

And also any names in source code or details of implementation, without
changing main algorithm.

What is not under discussion:

Built-in contexts.
Because escape_handler_call() is not an escaper itself, but just a helper
to call user-defined escaper, it should not handle any contexts. This
allows to prevent 'built-in' wrong work of <?* $str ?> constructions in
non-HTML contexts like CSV.

Multiple arguments.
<?* $a, 'js', 'html' ?>
I think, it is enough that second argument can be of any type, e.g. an
array.

Complicated syntax like <?htmljs= $str ?>.
If we allow custom handlers, then we need runtime processing, so the
example above cannot be compiled into
<?= htmlspecialchars(json_encode($str)) ?>
directly, and it will something like
<?= escape_handler_call(escape_handler_call($str, 'html'), 'js') ?>
I.e. we anyway need to pass context as a second argument, so why not allow
user to do it.

If someone wants more complex solution or built-in template engine, he can
create another RFC and suggest his own implementation.

8 years ago by Christoph Becker — view source

unread

In total a good rfc everybody should be happy with.

I'm not happy (to put it mildly) with the RFC as it's now. The RFC
speaks of operator, where actually start-tags[1] are meant, to start
with. Using the word operator is rather confusing in this context.

Then the RFC states that the new operator is compiled into the following
AST:

| echo escape_handler_call(first_argument, second_argument);

But what happens to additional code, e.g.

<?* $str, 'html', 42 ?>
<?* $str, 'html'; echo 42 ?>

Contrast that to the language specification which explains:

| If <?= is used as the start-tag, the Engine proceeds as if the
| statement-list started with echo statement.

Simple, yet precise.

Anyhow, even if this formal issues will be addressed, I still don't see
the benefit of being able to write

<?* $str ?>

instead of

<?=h($str)?>

The argument that h() might be forgotten is moot, because it's similarly
easy to accidently write = instead of *, and both forms allow for
equally well (semi-)automatic verification that all output is escaped.

[1]
https://github.com/php/php-langspec/blob/master/spec/04-basic-concepts.md#program-structure

--
Christoph M. Becker

8 years ago by Thomas Bley — view source

unread

<?* $str ?>

instead of

<?=h($str)?>

benefits are using static code analyzers, grep "<?=" for code reviews, etc.
Having function names with single characters is bad taste and only useful for obfuscating.
Using multiple frameworks or libraries, it's not possible to redeclare functions with the same name.

The big difference is:
With <?*, you have to define an escaping function, with <?= it's optional.

Regards
Thomas

Christoph Becker wrote on 24.07.2016 17:54:

In total a good rfc everybody should be happy with.

I'm not happy (to put it mildly) with the RFC as it's now. The RFC
speaks of operator, where actually start-tags[1] are meant, to start
with. Using the word operator is rather confusing in this context.

Then the RFC states that the new operator is compiled into the following
AST:

| echo escape_handler_call(first_argument, second_argument);

But what happens to additional code, e.g.

<?* $str, 'html', 42 ?>
<?* $str, 'html'; echo 42 ?>

Contrast that to the language specification which explains:

| If <?= is used as the start-tag, the Engine proceeds as if the
| statement-list started with echo statement.

Simple, yet precise.

Anyhow, even if this formal issues will be addressed, I still don't see
the benefit of being able to write

<?* $str ?>

instead of

<?=h($str)?>

The argument that h() might be forgotten is moot, because it's similarly
easy to accidently write = instead of *, and both forms allow for
equally well (semi-)automatic verification that all output is escaped.

[1]
https://github.com/php/php-langspec/blob/master/spec/04-basic-concepts.md#program-structure

--
Christoph M. Becker

8 years ago by Thomas Bley — view source

unread

The big difference is:
With <?*, you have to define an escaping function, with <?= it's optional.

a few minutes ago, security updates for CVE-2016-2040 were published:

https://github.com/phpmyadmin/phpmyadmin/commit/edffb52884b09562490081c3b8666ef46c296418
https://github.com/phpmyadmin/phpmyadmin/commit/75a55824012406a08c4debf5ddb7ae41c32a7dbc
https://github.com/phpmyadmin/phpmyadmin/commit/aca42efa01917cc0fe8cfdb2927a6399ca1742f2

Regards
Thomas

Thomas Bley wrote on 24.07.2016 18:21:

<?* $str ?>

instead of

<?=h($str)?>

benefits are using static code analyzers, grep "<?=" for code reviews, etc.
Having function names with single characters is bad taste and only useful for
obfuscating.
Using multiple frameworks or libraries, it's not possible to redeclare
functions with the same name.

The big difference is:
With <?*, you have to define an escaping function, with <?= it's optional.

Regards
Thomas

Christoph Becker wrote on 24.07.2016 17:54:

In total a good rfc everybody should be happy with.

I'm not happy (to put it mildly) with the RFC as it's now. The RFC
speaks of operator, where actually start-tags[1] are meant, to start
with. Using the word operator is rather confusing in this context.

Then the RFC states that the new operator is compiled into the following
AST:

| echo escape_handler_call(first_argument, second_argument);

But what happens to additional code, e.g.

<?* $str, 'html', 42 ?>
<?* $str, 'html'; echo 42 ?>

Contrast that to the language specification which explains:

| If <?= is used as the start-tag, the Engine proceeds as if the
| statement-list started with echo statement.

Simple, yet precise.

Anyhow, even if this formal issues will be addressed, I still don't see
the benefit of being able to write

<?* $str ?>

instead of

<?=h($str)?>

The argument that h() might be forgotten is moot, because it's similarly
easy to accidently write = instead of *, and both forms allow for
equally well (semi-)automatic verification that all output is escaped.

[1]
https://github.com/php/php-langspec/blob/master/spec/04-basic-concepts.md#program-structure

--
Christoph M. Becker

8 years ago by Rowan Collins — view source

unread

<?* $str ?>

instead of

<?=h($str)?>
benefits are using static code analyzers, grep "<?=" for code reviews, etc.

It's not that difficult to write a static analyser that detects
instances of "<?=" not followed by "h(" or "e(" or whatever.

Having function names with single characters is bad taste and only useful for obfuscating.

And having a token "*" that calls a different function in every
application is somehow less obfuscated?

Using multiple frameworks or libraries, it's not possible to redeclare functions with the same name.

It's not possible for multiple frameworks or libraries to declare
different escape handlers in your proposal, either.

The big difference is:
With <?*, you have to define an escaping function, with <?= it's optional.

You could equally say, "with <?=e()?> you have to define an e()
function". The main effort is remembering to use the right syntax, which
you have to do either way.

Surely the feature gets most of its value from what you don't need to
do - which is why I think it's bizarre that the current version doesn't
even have a built-in HTML escaper at all.

Regards,

Rowan Collins
[IMSoP]

8 years ago by Christoph Becker — view source

unread

<?* $str ?>

instead of

<?=h($str)?>

benefits are using static code analyzers, grep "<?=" for code reviews, etc.

Well, something like grep -P <\?=(?!h[(]) seems to be a viable
alternative.

Having function names with single characters is bad taste and only useful for obfuscating.

Cryptic "operators", however, are not?

The big difference is:
With <?*, you have to define an escaping function, with <?= it's optional.

But you still have to rember to use <?* instead of <?= and use the
proper escaping function.

Actually, I'm not really interested in discussing the current RFC (the
discussion is already rather lengthy, and has started to go in circles
long ago). I just wanted to give an explanation why I would vote
against it.

--
Christoph M. Becker

8 years ago by Thomas Bley — view source

unread

But you still have to rember to use <?* instead of <?= and use the
proper escaping function.

I see no problem if companies make a rule not to deploy code containing "<?=". I've seen similar rules for eval() and other functions.
Using proper escaping function is surely another challenge which can be source of security bugs, maybe someone brings up a fuzzy generator to detect these things.

I just wanted to give an explanation why I would vote
against it.

I'm not sure if it is a good thing to vote against security enhancements.

Regards
Thomas

Christoph Becker wrote on 24.07.2016 18:52:

<?* $str ?>

instead of

<?=h($str)?>

benefits are using static code analyzers, grep "<?=" for code reviews, etc.

Well, something like grep -P <\?=(?!h[(]) seems to be a viable
alternative.

Having function names with single characters is bad taste and only useful for
obfuscating.

Cryptic "operators", however, are not?

The big difference is:
With <?*, you have to define an escaping function, with <?= it's optional.

But you still have to rember to use <?* instead of <?= and use the
proper escaping function.

Actually, I'm not really interested in discussing the current RFC (the
discussion is already rather lengthy, and has started to go in circles
long ago). I just wanted to give an explanation why I would vote
against it.

--
Christoph M. Becker

8 years ago by Christoph Becker — view source

unread

I'm not sure if it is a good thing to vote against security enhancements.

Most certainly, it is not. :-)

--
Christoph M. Becker

8 years ago by Michael Vostrikov — view source

unread

PHP today is a programming language, and applications and libraries can
be and are written in that programming language.

PHP has <?= ?> and <?php ?> tags, all outside these tags is considered as
HTML. It is needed or to remove these tags and use PHP as programming
language only, or to improve usage of these tags. Because <?= ?> tags
itself without additional handling causes XSS vulnerabilities.

Trying to build default functionality that would compete with a modern
templating engine like Twig would be a lot of effort, and to what end? A
kind of language nationalism, that "PHP does it all"?

This operator (or tag) is intended for that applications which are already
writte and already do not have template engine, but are developed and
require to write code. Also, there are frameworks or CMS, which do not have
built-in template engine, and people start new projects using them. Also,
this operator can be useful for junior programmers, who know PHP but don't
know some template engine yet.

register_escape_handler('foo', [$this, 'escape']);
<?*foo= $something ?>
Where's the problem?
If you mean you want to be able to pass an actual callable as the context

No problems with the code, I anwered to "IDE will have problem by identify
where you have defined it". I did not mean a callable as a context.

<?= will still "work good but be unsafe"
But people will be allowed to not use it at all. They could even create a
rule about it in their code style guides.

it doesn't really matter if you say the incantation to output a variable
is "<?= e($" followed by the variable name and ")?>", or "<?* $" followed
by the variable name and "?>".

It does matter. He can try to remove unnecessary 'e' and see that it still
works good.
With old operator he can write unsafe code without additional actions.
With new operator he should specially set 'raw' context or something
similar.
This is the reason why template engines have html escaping by default.

One is 3 characters shorter, but that is the sole difference in terms of
effort.
No. The difference is that you cannot write unsafe code by removing 3
characters. Length of code or function name is not the reason of this RFC,
I told this many times.

Huh? Is the word "I" copied in this e-mail, because the English language
requires me to write it more than once? And if "e(" is "copied code", how
is the "" in "<?" not also "copied code"?

<?* ?> is one action in source code, <?= e() ?> are 2 actions. This is the
same as if you woul need to call constructor manually every time: new
MyClass->__construct(). Is it a better code? Maybe let's remove automatic
constructor call?)

Twig allows you to register a named "strategy" to a single callable,
exactly as I am suggesting:
http://twig.sensiolabs.org/doc/filters/escape.html#custom-escapers This is
much more useful than a single callback that has to handle all possible
strategies.

As I understand, the problem is that this is a registry with global state,
as Rasmus said.
In Twig this is not a global registry, it is stored in object of 'Core'
class. And yes, this is a single callback twig_escape_filter(), which
handles all possible strategies.
First variant of this RFC was a registry. But actually, people don't need a
registry, especially with built-in escapers, they ask about an easy way to
call escaper (htmlspecialchars() in feature requests).

Also, all possible strategies depend on tasks. Even for htmlspecialchars()
different set of flags could be used. Let user choose how to escape HTML.
This is needed once during application development.

But this could still be done without allowing arbitrary expressions, or
embedding syntax inside the strategy argument:
<?$strategyhtml= $text ?>

Sorry, I don't understand. Why $strategy is not 'arbitrary expression'?
And why it is needed to make so complex parsing logic, which will be the
same as html($text, $strategy)?

If they're doing something complex, they can implement their own way of
doing it - probably by writing a templating engine, or using one of the
many that already exist.

There is a possibility to make this with new operator describeed in RFC. It
does not require many changes in PHP source code or application source
code. Why it is needed to specially restrict its functionality?

So it is now mandatory to have some bootstrap file somewhere that defines
and registers the escape function? How is that different from writing,
right now, at the top of your bootstrap file:
function e($str, $context = 'html') { ... }

It is different, because this function must be called everywhere manually,
and when it is missed, this gives a possible XSS vulnerability. New
operator is a simple way to automatically call user-defined escapers.

Complicated syntax like <?htmljs= $str ?>.
I have no idea why that is "complicated syntax", but your proposal isn't:
<?: $str | 'html | js' ?>
Or even:
<?: $str | ['html', 'js'] ?>

That is "complicated syntax" because it requires many changes in the syntax
parser, more than operator described in RFC. More changes - more
complexity. And I don't suggest multiple arguments.

In your proposal, part of the syntax won't even be standard between
different people's code
There is no aim to invent new global standard. As there is no standards for
naming escapers function, they are differs in different people's code.

Is it just that you don't like the escape strategy coming first?
I told about flexibility, not about placement.

I.e. we anyway need to pass context as a second argument, so why not
allow user to do it.
Because we're trying to make it easier for the user, not harder.

Why restriction is easier? You decide to forgive pass a context as a
variable, and user

Why make them handle the nesting, sanity-checking, and control flow of
multiple filters, rather than building them into the syntax from the start?

Because this fully depends on application, which flags should be passed
into htmlspecialchars. So, user must first unregister build-in handler and
then register his handler.

8 years ago by Michael Vostrikov — view source

unread

if ($context == 'html') {
this is bad coding style since $context = 0 gives unexpected html
escaping.

I know, it was just an example)

The RFC speaks of operator, where actually start-tags[1] are meant, to
start with.
Using the word operator is rather confusing in this context.

Technically yes, but there are echo operator, so it can be considered as
special construction for using echo operator. I don't think that exact work
is very important here.

But what happens to additional code, e.g.
<?* $str, 'html', 42 ?>
<?* $str, 'html'; echo 42 ?>
This is new operator with new syntax. It will give parsing error.

Contrast that to the language specification which explains:
| If <?= is used as the start-tag, the Engine proceeds as if the
| statement-list started with echo statement.
Simple, yet precise.

<?= '1'; echo '2' ?> which output '12' is simple? It does not seem clear
for me.

I still don't see the benefit of being able to write
<?* $str ?>
instead of
<?=h($str)?>

With new operator you cannot output unsafe value. It wiil be escaped or
will not be output.

a few minutes ago, security updates for CVE-2016-2040 were published:

https://github.com/phpmyadmin/phpmyadmin/commit/edffb52884b09562490081c3b8666ef46c296418

https://github.com/phpmyadmin/phpmyadmin/commit/75a55824012406a08c4debf5ddb7ae41c32a7dbc

https://github.com/phpmyadmin/phpmyadmin/commit/aca42efa01917cc0fe8cfdb2927a6399ca1742f2

Good examples, thanks. This is what I'm trying to explain.

It's not possible for multiple frameworks or libraries to declare
different escape handlers in your proposal, either.

It works similer to set_error_handler(). Is it poossible to declare
different error handlers? I think, yes.

with <?=e()?> you have to define an e() function
Or just write without e(). I.e. you have not to.

which is why I think it's bizarre that the current version doesn't even
have a built-in HTML escaper at all.
This argument is only valid if the RFC includes an implementation, not
just a syntax.

Ok, if it will contain a default escape handler with a possibility to fully
unregister it and set custom one, will it be better variant? I will add an
additional voting about this option.

But:

In my opinion, they are central to the feature, not an optional extra.
If user will want to use different flags for htmlspecialchars(), it will
anyway must unregister built-in handler.

OK, so I can dynamically redefine the same syntax to mean different
things at different times, within the same application. I'm not entirely
sure that's a particularly good thing.

As I understand, you can do the same in Twig, setEscaper() function does
not perform any checks.
https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L29

Then why is absolutely everything in the current RFC optional and
configurable to the Nth degree?
All that this RFC contains is just an escape handler. As we agreed,
customization is required.

Frameworks are free to write all sorts of weird shit:
And? You can do the same in Twig. Is it a bad template engine?

Ok. Just ask you, why people ask the same question again since the time PHP
was created? Why almost all feature requests mentioned in RFC are about an
easy way to call htmlspecialchars()? You can vote up or down, I just want
to get an official result about this feature. I think, it can be considered
as official answer to community, to those people from community who would
like to use default escaping mechanism in PHP.

8 years ago by michal.brzuchalski@gmail.com — view source

unread

Previously you wrote about PHP as a lang only. There was an RFC
https://wiki.php.net/rfc/script_only_include about dissallow opening tags
in require statements - personally I'd love to see it in PHP it could
minimize affect af featores like operator we're talking about to just
templates.

26 lip 2016 15:16 "Michael Vostrikov" michael.vostrikov@gmail.com
napisał(a):

if ($context == 'html') {
this is bad coding style since $context = 0 gives unexpected html
escaping.

I know, it was just an example)

The RFC speaks of operator, where actually start-tags[1] are meant, to
start with.
Using the word operator is rather confusing in this context.

Technically yes, but there are echo operator, so it can be considered as
special construction for using echo operator. I don't think that exact
work
is very important here.

But what happens to additional code, e.g.
<?* $str, 'html', 42 ?>
<?* $str, 'html'; echo 42 ?>
This is new operator with new syntax. It will give parsing error.

Contrast that to the language specification which explains:
| If <?= is used as the start-tag, the Engine proceeds as if the
| statement-list started with echo statement.
Simple, yet precise.

<?= '1'; echo '2' ?> which output '12' is simple? It does not seem clear
for me.

I still don't see the benefit of being able to write
<?* $str ?>
instead of
<?=h($str)?>

With new operator you cannot output unsafe value. It wiil be escaped or
will not be output.

a few minutes ago, security updates for CVE-2016-2040 were published:

https://github.com/phpmyadmin/phpmyadmin/commit/edffb52884b09562490081c3b8666ef46c296418

https://github.com/phpmyadmin/phpmyadmin/commit/75a55824012406a08c4debf5ddb7ae41c32a7dbc

https://github.com/phpmyadmin/phpmyadmin/commit/aca42efa01917cc0fe8cfdb2927a6399ca1742f2

Good examples, thanks. This is what I'm trying to explain.

It's not possible for multiple frameworks or libraries to declare
different escape handlers in your proposal, either.

It works similer to set_error_handler(). Is it poossible to declare
different error handlers? I think, yes.

with <?=e()?> you have to define an e() function
Or just write without e(). I.e. you have not to.

which is why I think it's bizarre that the current version doesn't even
have a built-in HTML escaper at all.
This argument is only valid if the RFC includes an implementation, not
just a syntax.

Ok, if it will contain a default escape handler with a possibility to
fully
unregister it and set custom one, will it be better variant? I will add an
additional voting about this option.

But:

In my opinion, they are central to the feature, not an optional extra.
If user will want to use different flags for htmlspecialchars(), it will
anyway must unregister built-in handler.

OK, so I can dynamically redefine the same syntax to mean different
things at different times, within the same application. I'm not entirely
sure that's a particularly good thing.

As I understand, you can do the same in Twig, setEscaper() function does
not perform any checks.

https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L29

Then why is absolutely everything in the current RFC optional and
configurable to the Nth degree?
All that this RFC contains is just an escape handler. As we agreed,
customization is required.

Frameworks are free to write all sorts of weird shit:
And? You can do the same in Twig. Is it a bad template engine?

Ok. Just ask you, why people ask the same question again since the time
PHP
was created? Why almost all feature requests mentioned in RFC are about an
easy way to call htmlspecialchars()? You can vote up or down, I just want
to get an official result about this feature. I think, it can be
considered
as official answer to community, to those people from community who would
like to use default escaping mechanism in PHP.

8 years ago by Christoph Becker — view source

unread

The RFC speaks of operator, where actually start-tags[1] are meant, to
start with.
Using the word operator is rather confusing in this context.

Technically yes, but there are echo operator, so it can be considered as
special construction for using echo operator. I don't think that exact work
is very important here.

In my opinion, the wording is quite important in this case. An operator
is supposed to be usable "anywhere" in PHP code, but most certainly you
don't want to allow something like:

<?php
$value = read_value_from_db();
<?$value?>
?>

But what happens to additional code, e.g.
<?* $str, 'html', 42 ?>
<?* $str, 'html'; echo 42 ?>

This is new operator with new syntax. It will give parsing error.

So you want to invent an own mini-language for this "operator"? That
might require a lot of effort. Compare that with the implementation of
<?=[1], i.e. it is just a shortcut for <?php echo.

I suggest to consider doing something similar: invent a new statement
(say, echoe, for the sake of giving an example), and make <?* a
shortcut of <?php echoe. Then define the syntax and semantics of the
echoe statement.

[1]
https://github.com/php/php-src/blob/php-7.0.9/Zend/zend_language_scanner.l#L1791-L1794

--
Christoph M. Becker

8 years ago by Rowan Collins — view source

unread

Ok. Just ask you, why people ask the same question again since the time PHP
was created? Why almost all feature requests mentioned in RFC are about an
easy way to call htmlspecialchars()? You can vote up or down, I just want
to get an official result about this feature. I think, it can be considered
as official answer to community, to those people from community who would
like to use default escaping mechanism in PHP.

Hi Michael,

I think you and I are mostly going in circles at this point, so I'm
going to refrain from blow-by-blow responses and sum up my thinking on
this RFC.

Overall, I think there is some merit to the idea, but I think the detail
is important.

The aim in my mind would be to make escaping easier to do right, for
people who aren't already using a framework or templating engine with
its own solution.

Without an actual implementation, the feature wouldn't be useful to
those people.
Configurability should be a long way down the list of priorities, for
the same reason.
I think contexts other than HTML should be included to remind users
that they exist, but HTML could be the default.
Contexts should be stackable/nestable, without the user writing any
extra code.
The syntax should be easy to read as well as easy to write. How easy
it is to implement is a low priority.

The current implementation doesn't seem to share these priorities; it
feels like a building block for framework developers, who probably have
their own solutions already.

A few mentions have been made of Twig, which is known for its
comprehensive escaping support; it goes a lot further than the fact that
"|e" is an alias for "|escape('html')":

you can define automatic escaping for a whole file or a block within a
file
there is an extra filter to skip the automatic escaping (not the same
as unescaping)
the above can be done with any "context", but the default is HTML
a "context" is not just the argument to a single all-powerful "escape"
function; you can register a new context by name, without reimplementing
any of the existing functionality
other template functions can say that their output shouldn't be
escaped, or that their input should be pre-escaped
other functionality of the system is aware of these notions, and
designed to behave sensibly

I don't think there's any way PHP can ever reach that level of
sophistication, because most of the language knows nothing about
"context"; the feature we build in is only ever going to be a simple
short-hand for some basic function calls.

In many ways, defining a built-in function e($string, $context) would
fulfil most of the above. A dedicated syntax might make it a little
easier to type, and could handle nested contexts more elegantly. The
ability to register additional contexts and take advantage of the syntax
and nesting could be a simple addition. Any more complicated than that,
and you're fighting a losing battle against dedicated templating engines.

That's my opinion, anyway. It is just an opinion, and you're free to
disagree with it, but hopefully my reasoning is clear.

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Rowan Collins — view source

unread

It is a string too, instead of an identifier like in <?* $value, escape ?>
... IDE will have problem by identify where you have defined it
It should not be identifier or single function name, because in this way we
could not use closures or object methods ($this->escape) for escaping.
Context should be an expression, like it is done in template engines. So,
no problems with IDE.

register_escape_handler('foo', [$this, 'escape']);

<?*foo= $something ?>

Where's the problem?

If you mean you want to be able to pass an actual callable as the
context, what would be the point? Why would I ever write this:

<?* $something, [$this, 'escape'] ?>

when I could just write this:

<?= $this->escape($something) ?>

This is the part I don't get. How does "using an operator everywhere"
remove the effort of "using a function everywhere"? It's the same effort in
both cases.

"using an operator everywhere" and "using an operator + function
everywhere, especially if the operator itself works good but is unsafe".

Sorry, I still don't get it. <?= will still "work good but be unsafe" no
matter how the correctly-escaped version looks. When somebody's typing
code into their PHP "template" file, they've got to remember which
symbols to type; they don't care if those symbols are an operator, a
function, or a magic incantation. Look at the Wordpress documentation,
it talks about "tags", which any programmer immediately recognises as
function calls.

So to a novice writing templates, it doesn't really matter if you say
the incantation to output a variable is "<?= e($" followed by the
variable name and ")?>", or "<?* $" followed by the variable name and
"?>". One is 3 characters shorter, but that is the sole difference in
terms of effort.

If somebody can't type "e(" and ") without copying and pasting, then
they're going to have a hard time writing any meaningful code.

What is the difference how he wrote 'e()' ? It may be 'ctrl-c-ctrl-v',
'ctrl-insert-shift-insert', 'e-shift-(-)'. The result is the same - this is
a copied code.

Huh? Is the word "I" copied in this e-mail, because the English language
requires me to write it more than once? And if "e(" is "copied code",
how is the "" in "<?" not also "copied code"?

I get it, if you are talking about having to type "htmlspecialchars()"
the whole time, but I stand by my assertion that anyone put off by
typing "e()" is beyond hope.

More flexible to what end? Why do I need to be able to dynamically define
arbitrarily complex expressions as the filter name?

To the case when we write escapers statically. Twig allows to pass a
context as a variable, why it is needed to specially restrict escaping
mechanism in PHP? We don't know all possible tasks which can require
additinal escaping together with HTML.

Twig allows you to register a named "strategy" to a single callable,
exactly as I am suggesting:
http://twig.sensiolabs.org/doc/filters/escape.html#custom-escapers This
is much more useful than a single callback that has to handle all
possible strategies.

You're right that Twig allows you to use a variable as the escaping
strategy, although it warns that doing so defeats the intelligence of
its auto-escaping mechanism. But this could still be done without
allowing arbitrary expressions, or embedding syntax inside the strategy
argument:

<?$strategyhtml= $text ?>

We don't need to handle all possible things that anyone might ever want
to do. If they're doing something complex, they can implement their own
way of doing it - probably by writing a templating engine, or using one
of the many that already exist.

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Rowan Collins — view source

unread

PHP was a template engine at inception. [...] something has gone awry when people are
writing template engines inside of a template engine.

At its inception, PHP was a handful of scripts including access logging
and a guestbook form. You can take a look here:
https://github.com/phplang/php-past/tree/0246ebc1bf5ae2e945d28961f975717774e6d287
Yes, at a stretch, it was a rudimentary template engine; but in a sense,
so is any programming language that has variables, output, and some form
of string interpolation.

Smarty was created in 2000, and its first ever README makes clear
reference to older templating engines:
https://github.com/smarty-php/smarty/tree/497badbe646f73703b7130609e9ffe2cbd23fa42
When exactly did things go awry?

In other words, that ship has sailed. The idea that everything that PHP
has ever had a mediocre version of built in, should be forever embedded
in and maintained by the default distribution, is absurd. PHP today is a
programming language, and applications and libraries can be and are
written in that programming language.

Trying to build default functionality that would compete with a modern
templating engine like Twig would be a lot of effort, and to what end? A
kind of language nationalism, that "PHP does it all"?

Regards,

--
Rowan Collins
[IMSoP]