Of course you can make the discussion endless by mentioning escaping of all kinds of third party frameworks like jQuery, but that's a bit off-topic here.
As mentioned a few times in this thread, <?~ is an alias for <?= htmlspecialchars(, not more, not less.
<?~ is not about making PHP secure by default forever.
<?~ is not about removing json_encode()
, urlencode()
, etc.
<?~ is not about closing double quotes automatically
Regards
Thomas
The funny thing is that even my mail-client drops the < script >... when I click reply :)
Rowan Collins wrote on 21.06.2016 00:00:
On 20 June 2016 17:40:05 GMT+01:00, "Михаил Востриков"
michael.vostrikov@gmail.com wrote:Actually,
htmlspecialchars()
is needed in all three cases:
...
You may not write htmlspecialchars together with urlencode just because
urlencode encodes all special characters with its own way.So, not needed in all 3 cases then...
Imagine that urlencode does not encode quotes - what function should we
call for its result?Ideally, an escape filter that performs both functions; if the aim is to make
things easier, I shouldn't need to think about the need to nest two escape
functions. If I still have to use non-obvious combinations of magic syntax plus
function calls, the claim of "secure by default" doesn't really stand up. The ~
becomes nothing more than an alias that I still need to remember when to
deploy.I'm pretty sure the tempting syntax is actively harmful in that situation...
The fact itself, that there were many discussions about it, indicates
that
it is a necessary feature.Popularity is not the same thing as necessity. More relevantly, even when we
agree on the problem, the simple solution isn't always the best, sometimes it
pays to think a bit more broadly about the problem space. Larry's escaper
registration is one example of that.HackLang's XHP is another - rather than thinking about escaping as an action,
it gives the compiler richer knowledge of the structure, so it can "know" the
right escape syntax. If the compiler could look at my previous example and
recognise the attribute, URL, script, and text contexts itself, then you really
would have security-by-default. Unfortunately, that too is tricky to generalise
- what is the correct escape method for an attribute named "data-my-action"...?
Regards,
--
Rowan Collins
[IMSoP]
Hi!
As mentioned a few times in this thread, <?~ is an alias for <?= htmlspecialchars(, not more, not less.
And that is exactly the problem. Inventing operators to alias one
invocation of one function with one specific set of parameters is not a
good idea, unless there is a VERY good reason to do it. And the case
for this specific piece of code to deserve its own operator is rather weak.
--
Stas Malyshev
smalyshev@gmail.com
So, not needed in all 3 cases then...
So, we can still use <?= operator.
Imagine that urlencode does not encode quotes - what function should we
call for its result?
Ideally, an escape filter that performs both functions; if the aim is to
make things easier
No. The second function really depends on context, but HTML context is
always present. The aim is to create a shortcut for HTML escaping, decrease
copy-paste, and increase security.
I shouldn't need to think about the need to nest two escape functions.
the claim of "secure by default" doesn't really stand up.
This is about super-universal-operator. I did not suggest "secure by
default". As Thomas said, this is just an alias, not more, not less.
<script>$('[data-thing-id="<?~ $thing['name'] ?>]').on('click',
function(){doThing('<?~ $thing['name'] ?>'});</script>
I'm pretty sure the tempting syntax is actively harmful in that
situation...
You should not call htmlspecialchars inside script tags, even without <?~
operator. Because this is not an HTML context.
HackLang's XHP is another - rather than thinking about escaping as an
action
If the compiler could look at my previous example and recognise the
attribute, URL, script, and text contexts itself
This is very complex solution, and it can make some issues with
performance. Also, as I understand, it just calls htmlspecialchars.
https://github.com/facebook/xhp-lib/blob/master/src/core/XHP.php#L68
https://github.com/facebook/xhp-lib/blob/master/src/html/Element.php#L122
what is the correct escape method for an attribute named "data-my-action"
It should be HTML-encoded, because it is HTML markup.
And that is exactly the problem. Inventing operators to alias one
invocation of one function with one specific set of parameters is not a
good idea, unless there is a VERY good reason to do it.
The call of htmlspecialchars is very frequent case, specific set of
parameters (HTML context) is always present. Is it a very good reason?
And the case for this specific piece of code to deserve its own operator
is rather weak.
Why do you think so, why is it weak?) As I showed, HTML context is always
present, even if we write inline javascript in 'onclick' attribute.
This is not another context, there are 2 contexts together, and there is no
needs to determine it inside compiler - one context is always here.
So, it deserve its own operator.
Let's summarize.
We must not call htmlspecialchars()
in the following cases:
Inside <style> tag
This is very specific case. CSS is usually stored in static files.
Inside <script> tag
This is more frequent case, but usually it is used when we need to pass a
big object from PHP into Javascript. And here is not just escaping, here is
a special notation - JSON.
For small pieces of data JSON also can be used, but it's better to use
data-attributes. And here we need to call htmlspecialchars.
Text files
Text files with PHP processing can be used in code generators. Standart <?=
operator is enough here.
Any other cases and their combinations are very specific. For XML
htmlspecialchars is also enough.
So, my question is - can I create an RFC or there are any arguments
against? I think I explained why all arguments above are unsuitable.
2016-06-21 4:35 GMT+05:00 Stanislav Malyshev smalyshev@gmail.com:
Hi!
As mentioned a few times in this thread, <?~ is an alias for <?=
htmlspecialchars(, not more, not less.And that is exactly the problem. Inventing operators to alias one
invocation of one function with one specific set of parameters is not a
good idea, unless there is a VERY good reason to do it. And the case
for this specific piece of code to deserve its own operator is rather weak.--
Stas Malyshev
smalyshev@gmail.com
Hello. I've created an article on russian technical site habrahabr.ru.
https://habrahabr.ru/post/304162/
There is a poll about introducing of such operator. About 60% from those
people who have projects without template engine are "for" this operator.
And even a half of those who don't also think that such operator can be
useful.
I think you can use Google Translate to read it, common sense and code
examples should be understandable.
https://translate.google.com/translate?sl=en&tl=ru&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F304162%2F&edit-text=&act=url
Current results:
How often do you work with the projects with template rendering on PHP
where template engines are not used?
35% (163) Always
22% (104) Quite often
18% (86) Quite rare
25% (117) Almost never
Voted 470 people. Abstained 116 people.
How do you think, such an operator would be useful?
56% (264) Yes
44% (207) No
Voted 471 people. Abstained 121 people.
I don't use PHP teplate rendering ...
51% (147) and I think that such an operator is not needed
49% (139) but I think that such an operator will come in handy
Voted 286 people. Abstained 247 people.
Screenshot in Russian:
https://habrastorage.org/files/675/9ac/883/6759ac8834044ef0b5a09163c791f376.png
60% are "for" this operator, projects of others 40% will not be affected.
I think this is a good reason to create an RFC and discuss it on more
global level.
2016-06-21 9:51 GMT+05:00 Михаил Востриков michael.vostrikov@gmail.com:
So, not needed in all 3 cases then...
So, we can still use <?= operator.Imagine that urlencode does not encode quotes - what function should we
call for its result?
Ideally, an escape filter that performs both functions; if the aim is to
make things easier
No. The second function really depends on context, but HTML context is
always present. The aim is to create a shortcut for HTML escaping, decrease
copy-paste, and increase security.I shouldn't need to think about the need to nest two escape functions.
the claim of "secure by default" doesn't really stand up.
This is about super-universal-operator. I did not suggest "secure by
default". As Thomas said, this is just an alias, not more, not less.<script>$('[data-thing-id="<?~ $thing['name'] ?>]').on('click',function(){doThing('<?~ $thing['name'] ?>'});</script>
I'm pretty sure the tempting syntax is actively harmful in that
situation...
You should not call htmlspecialchars inside script tags, even without <?~
operator. Because this is not an HTML context.HackLang's XHP is another - rather than thinking about escaping as an
action
If the compiler could look at my previous example and recognise the
attribute, URL, script, and text contexts itself
This is very complex solution, and it can make some issues with
performance. Also, as I understand, it just calls htmlspecialchars.
https://github.com/facebook/xhp-lib/blob/master/src/core/XHP.php#L68
https://github.com/facebook/xhp-lib/blob/master/src/html/Element.php#L122what is the correct escape method for an attribute named "data-my-action"
It should be HTML-encoded, because it is HTML markup.And that is exactly the problem. Inventing operators to alias one
invocation of one function with one specific set of parameters is not a
good idea, unless there is a VERY good reason to do it.
The call of htmlspecialchars is very frequent case, specific set of
parameters (HTML context) is always present. Is it a very good reason?And the case for this specific piece of code to deserve its own operator
is rather weak.
Why do you think so, why is it weak?) As I showed, HTML context is always
present, even if we write inline javascript in 'onclick' attribute.
This is not another context, there are 2 contexts together, and there is
no needs to determine it inside compiler - one context is always here.
So, it deserve its own operator.Let's summarize.
We must not call
htmlspecialchars()
in the following cases:Inside <style> tag
This is very specific case. CSS is usually stored in static files.Inside <script> tag
This is more frequent case, but usually it is used when we need to pass a
big object from PHP into Javascript. And here is not just escaping, here is
a special notation - JSON.
For small pieces of data JSON also can be used, but it's better to use
data-attributes. And here we need to call htmlspecialchars.Text files
Text files with PHP processing can be used in code generators. Standart
<?= operator is enough here.Any other cases and their combinations are very specific. For XML
htmlspecialchars is also enough.So, my question is - can I create an RFC or there are any arguments
against? I think I explained why all arguments above are unsuitable.2016-06-21 4:35 GMT+05:00 Stanislav Malyshev smalyshev@gmail.com:
Hi!
As mentioned a few times in this thread, <?~ is an alias for <?=
htmlspecialchars(, not more, not less.And that is exactly the problem. Inventing operators to alias one
invocation of one function with one specific set of parameters is not a
good idea, unless there is a VERY good reason to do it. And the case
for this specific piece of code to deserve its own operator is rather
weak.--
Stas Malyshev
smalyshev@gmail.com
I've tried to gather all arguments for and against.
To be clear. I suggest new operator like '<?~ $value ?>' which is
equivalent of <?= htmlspecialchars($value, ENT_QUOTES
| ENT_SUBSTITUTE) ?>.
It is only for HTML context. Flag combination is taken from most popular
frameworks - Symfony, Zend, Yii, and Twig. Of course, exact form of
operator and default flags are the details of implementation.
- You can write short function in userland.
The problem is not that we have no function. The problem is that the same
action is always repeated, and if we don't repeat it then it leads to
security problems. More than 90% of output data - is data from DB and must
be HTML-encoded.
There is no such problem with other contexts. If we don't call json_encode
when passing an array or object into javascript, this only breaks the
script, and it will be noticeable, there won't be security problems.
With new operator we can write or <?~ ?>, or <?= ?>, they are mutually
exclusive, and we need specially write one or another, but with helper
function we have the same beginning <?= and then can write helper function
or not.
Also there is a problem with function autoloading.
- It is no place for such operators in the language.
It is no place for a such operators in C++, or C#, or Java. But in the most
popular language for web-programming it is very place for such operator.
- There are many other contexts
HTML is external context, but others are internal task-dependent contexts.
HTML can be used together with other contexts.
HTML context is the main context in every PHP file, and we write <?php at
the beginning to switch it.
Actually, on web page we have 3 external contexts - HTML, <script> tag,
<style> tag. PHP+CSS usually is not used. PHP+JS is not just escaping. It is encoding in special notation. Escaping can be applied only to strings, but encoding can be applied to any type of variables. Only urlencode is really escaping (and it also can be used together with other contexts). But urlencode is an internal context. If we construct an URL, e.g. for filters, we should encode every part. $filterUrl = '/my_route/?state=active'; if ($postData['contains_text']) $filterUrl .= '&contains_text='.urlencode($postData['contains_text']); When we write data-attirbute, additional context is task-dependent, but HTML context is always present. <div class="some-class" data-url="<?= htmlspecialchars('/my_route/?state=active&category_id=123') ?>" data-settings="<?= htmlspecialchars(json_encode($settings) ?>" ></div> - Other people will ask about operator for another context And you can say: We already added an operator for the main web context, because it is the most frequently used context. If you have a lot of work with other contexts, please use template engine. - You want to add new operator just for your needs It's not only my needs for one project. I meet this problem in many projects without template engine. The results of the poll, which I wrote about in my previous message, show that it has a place not only for me. Some feature requests on http://bugs.php.net with the same question were created in 2002. - Some people can use various flags, or use third and fourth parameters of `htmlspecialchars()`. How many such projects of total number of projects? Default flags can be set to be enough safe. Third parameter can be chaged via ini_set. Fourth parameter is not required for many cases. Except maybe when some people encode the data before saving it into database. Also, new operator is not a replacement for htmlspecialchars, so it could still be used. It just is looked not very good - we use special set of parameters, so you cannot add operator which we could not use. This problem with flags can be solved by adding default value for them with PHP_INI_USER mode, and getter and setter for it. But I'm not sure you think this is a good idea. - Exact flags / Default flags In popular frameworks there are the following flags: Symfony — `ENT_QUOTES` | `ENT_SUBSTITUTE` Yii — `ENT_QUOTES` | `ENT_SUBSTITUTE` Zend — `ENT_QUOTES` | `ENT_SUBSTITUTE` Twig — `ENT_QUOTES` | `ENT_SUBSTITUTE` https://github.com/symfony/symfony/blob/f29d46f29b91ea5c30699cf6bdb8e65545d1dd26/src/Symfony/Component/Templating/PhpEngine.php#L421 https://github.com/yiisoft/yii2/blob/c370c17e93f364a843ed7c31e1e1f7fc8caef0a3/framework/helpers/BaseHtml.php#L104 https://github.com/zendframework/zend-escaper/blob/1a855b5f7074607b1260d85c5526a59b1ab36593/src/Escaper.php#L117 https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L1039 - Tilde sign There is a good argument about tilde sign. It is absent in keyboard layouts for some european languages. But it is the details of implementation. I think it should be special sign which is typed with Shift and is located rather far from "=" sign. Possible variants are "<?! ?>", "<?@ ?>", "<?^ ?>". First variant looks more suitable. I don't see any arguments, why other operators can be implemented but this operator cannot. So, please tell me, what do you think about RFC? 2016-06-29 21:39 GMT+05:00 Михаил Востриков <michael.vostrikov@gmail.com>: > Hello. I've created an article on russian technical site habrahabr.ru. > https://habrahabr.ru/post/304162/ > > There is a poll about introducing of such operator. About 60% from those > people who have projects without template engine are "for" this operator. > And even a half of those who don't also think that such operator can be > useful. > > I think you can use Google Translate to read it, common sense and code > examples should be understandable. > > https://translate.google.com/translate?sl=en&tl=ru&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F304162%2F&edit-text=&act=url > > Current results: > > > How often do you work with the projects with template rendering on PHP > where template engines are not used? > 35% (163) Always > 22% (104) Quite often > 18% (86) Quite rare > 25% (117) Almost never > > Voted 470 people. Abstained 116 people. > > > How do you think, such an operator would be useful? > 56% (264) Yes > 44% (207) No > > Voted 471 people. Abstained 121 people. > > > I don't use PHP teplate rendering ... > 51% (147) and I think that such an operator is not needed > 49% (139) but I think that such an operator will come in handy > > Voted 286 people. Abstained 247 people. > > > Screenshot in Russian: > > https://habrastorage.org/files/675/9ac/883/6759ac8834044ef0b5a09163c791f376.png > > > 60% are "for" this operator, projects of others 40% will not be affected. > I think this is a good reason to create an RFC and discuss it on more > global level.I would prefer to have ENT_HTML5 as the default flag included, since normally all new html code is html5.
Maybe split voting between <?~ and <?html= which gives a later option for <?json=, <?csv and other contexts.
Regards
Thomas
Михаил Востриков wrote on 30.06.2016 21:35:
I've tried to gather all arguments for and against.
To be clear. I suggest new operator like '<?~ $value ?>' which is
equivalent of <?= htmlspecialchars($value,ENT_QUOTES
| ENT_SUBSTITUTE) ?>.
It is only for HTML context. Flag combination is taken from most popular
frameworks - Symfony, Zend, Yii, and Twig. Of course, exact form of
operator and default flags are the details of implementation.
- You can write short function in userland.
The problem is not that we have no function. The problem is that the same
action is always repeated, and if we don't repeat it then it leads to
security problems. More than 90% of output data - is data from DB and must
be HTML-encoded.There is no such problem with other contexts. If we don't call json_encode
when passing an array or object into javascript, this only breaks the
script, and it will be noticeable, there won't be security problems.With new operator we can write or <?~ ?>, or <?= ?>, they are mutually
exclusive, and we need specially write one or another, but with helper
function we have the same beginning <?= and then can write helper function
or not.Also there is a problem with function autoloading.
- It is no place for such operators in the language.
It is no place for a such operators in C++, or C#, or Java. But in the most
popular language for web-programming it is very place for such operator.
- There are many other contexts
HTML is external context, but others are internal task-dependent contexts.
HTML can be used together with other contexts.
HTML context is the main context in every PHP file, and we write <?php at
the beginning to switch it.Actually, on web page we have 3 external contexts - HTML, >script> tag,
<style> tag. PHP+CSS usually is not used. PHP+JS is not just escaping. It is encoding in special notation. Escaping can be applied only to strings, but encoding can be applied to any type of variables. Only urlencode is really escaping (and it also can be used together with other contexts). But urlencode is an internal context. If we construct an URL, e.g. for filters, we should encode every part. $filterUrl = '/my_route/?state=active'; if ($postData['contains_text']) $filterUrl .= '&contains_text='.urlencode($postData['contains_text']); When we write data-attirbute, additional context is task-dependent, but HTML context is always present. <div class="some-class" data-url="<?= htmlspecialchars('/my_route/?state=active&category_id=123') ?>" data-settings="<?= htmlspecialchars(json_encode($settings) ?>" ></div> - Other people will ask about operator for another context And you can say: We already added an operator for the main web context, because it is the most frequently used context. If you have a lot of work with other contexts, please use template engine. - You want to add new operator just for your needs It's not only my needs for one project. I meet this problem in many projects without template engine. The results of the poll, which I wrote about in my previous message, show that it has a place not only for me. Some feature requests on http://bugs.php.net with the same question were created in 2002. - Some people can use various flags, or use third and fourth parameters of `htmlspecialchars()`. How many such projects of total number of projects? Default flags can be set to be enough safe. Third parameter can be chaged via ini_set. Fourth parameter is not required for many cases. Except maybe when some people encode the data before saving it into database. Also, new operator is not a replacement for htmlspecialchars, so it could still be used. It just is looked not very good - we use special set of parameters, so you cannot add operator which we could not use. This problem with flags can be solved by adding default value for them with PHP_INI_USER mode, and getter and setter for it. But I'm not sure you think this is a good idea. - Exact flags / Default flags In popular frameworks there are the following flags: Symfony — `ENT_QUOTES` | `ENT_SUBSTITUTE` Yii — `ENT_QUOTES` | `ENT_SUBSTITUTE` Zend — `ENT_QUOTES` | `ENT_SUBSTITUTE` Twig — `ENT_QUOTES` | `ENT_SUBSTITUTE` https://github.com/symfony/symfony/blob/f29d46f29b91ea5c30699cf6bdb8e65545d1dd26/src/Symfony/Component/Templating/PhpEngine.php#L421 https://github.com/yiisoft/yii2/blob/c370c17e93f364a843ed7c31e1e1f7fc8caef0a3/framework/helpers/BaseHtml.php#L104 https://github.com/zendframework/zend-escaper/blob/1a855b5f7074607b1260d85c5526a59b1ab36593/src/Escaper.php#L117 https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L1039 - Tilde sign There is a good argument about tilde sign. It is absent in keyboard layouts for some european languages. But it is the details of implementation. I think it should be special sign which is typed with Shift and is located rather far from "=" sign. Possible variants are "<?! ?>", "<?@ ?>", "<?^ ?>". First variant looks more suitable. I don't see any arguments, why other operators can be implemented but this operator cannot. So, please tell me, what do you think about RFC? 2016-06-29 21:39 GMT+05:00 Михаил Востриков <michael.vostrikov@gmail.com>: > Hello. I've created an article on russian technical site habrahabr.ru. > https://habrahabr.ru/post/304162/ > > There is a poll about introducing of such operator. About 60% from those > people who have projects without template engine are "for" this operator. > And even a half of those who don't also think that such operator can be > useful. > > I think you can use Google Translate to read it, common sense and code > examples should be understandable. > > https://translate.google.com/translate?sl=en&tl=ru&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F304162%2F&edit-text=&act=url > > Current results: > > > How often do you work with the projects with template rendering on PHP > where template engines are not used? > 35% (163) Always > 22% (104) Quite often > 18% (86) Quite rare > 25% (117) Almost never > > Voted 470 people. Abstained 116 people. > > > How do you think, such an operator would be useful? > 56% (264) Yes > 44% (207) No > > Voted 471 people. Abstained 121 people. > > > I don't use PHP teplate rendering ... > 51% (147) and I think that such an operator is not needed > 49% (139) but I think that such an operator will come in handy > > Voted 286 people. Abstained 247 people. > > > Screenshot in Russian: > > https://habrastorage.org/files/675/9ac/883/6759ac8834044ef0b5a09163c791f376.png > > > 60% are "for" this operator, projects of others 40% will not be affected. > I think this is a good reason to create an RFC and discuss it on more > global level.
I wish you'd think about the bigger issue of autoloading functions,
which would solve this and many similar problems much more generally.
I mean, this:
<?html= $foo ?>
versus this:
<?= html($foo) ?>
What for?
I don't see the point in inventing new syntax, and introducing a new
concept, for what is effectively just a limited set of certain
specific functions.
We have functions already - rather than adding new features, we should
improve the features we already have instead, which benefits the
language as a whole, not just templates. Improving on functions is
long over due...
I would prefer to have ENT_HTML5 as the default flag included, since normally all new html code is html5.
Maybe split voting between <?~ and <?html= which gives a later option for <?json=, <?csv and other contexts.Regards
ThomasМихаил Востриков wrote on 30.06.2016 21:35:
I've tried to gather all arguments for and against.
To be clear. I suggest new operator like '<?~ $value ?>' which is
equivalent of <?= htmlspecialchars($value,ENT_QUOTES
| ENT_SUBSTITUTE) ?>.
It is only for HTML context. Flag combination is taken from most popular
frameworks - Symfony, Zend, Yii, and Twig. Of course, exact form of
operator and default flags are the details of implementation.
- You can write short function in userland.
The problem is not that we have no function. The problem is that the same
action is always repeated, and if we don't repeat it then it leads to
security problems. More than 90% of output data - is data from DB and must
be HTML-encoded.There is no such problem with other contexts. If we don't call json_encode
when passing an array or object into javascript, this only breaks the
script, and it will be noticeable, there won't be security problems.With new operator we can write or <?~ ?>, or <?= ?>, they are mutually
exclusive, and we need specially write one or another, but with helper
function we have the same beginning <?= and then can write helper function
or not.Also there is a problem with function autoloading.
- It is no place for such operators in the language.
It is no place for a such operators in C++, or C#, or Java. But in the most
popular language for web-programming it is very place for such operator.
- There are many other contexts
HTML is external context, but others are internal task-dependent contexts.
HTML can be used together with other contexts.
HTML context is the main context in every PHP file, and we write <?php at
the beginning to switch it.Actually, on web page we have 3 external contexts - HTML, >script> tag,
<style> tag. PHP+CSS usually is not used. PHP+JS is not just escaping. It is encoding in special notation. Escaping can be applied only to strings, but encoding can be applied to any type of variables. Only urlencode is really escaping (and it also can be used together with other contexts). But urlencode is an internal context. If we construct an URL, e.g. for filters, we should encode every part. $filterUrl = '/my_route/?state=active'; if ($postData['contains_text']) $filterUrl .= '&contains_text='.urlencode($postData['contains_text']); When we write data-attirbute, additional context is task-dependent, but HTML context is always present. <div class="some-class" data-url="<?= htmlspecialchars('/my_route/?state=active&category_id=123') ?>" data-settings="<?= htmlspecialchars(json_encode($settings) ?>" ></div> - Other people will ask about operator for another context And you can say: We already added an operator for the main web context, because it is the most frequently used context. If you have a lot of work with other contexts, please use template engine. - You want to add new operator just for your needs It's not only my needs for one project. I meet this problem in many projects without template engine. The results of the poll, which I wrote about in my previous message, show that it has a place not only for me. Some feature requests on http://bugs.php.net with the same question were created in 2002. - Some people can use various flags, or use third and fourth parameters of `htmlspecialchars()`. How many such projects of total number of projects? Default flags can be set to be enough safe. Third parameter can be chaged via ini_set. Fourth parameter is not required for many cases. Except maybe when some people encode the data before saving it into database. Also, new operator is not a replacement for htmlspecialchars, so it could still be used. It just is looked not very good - we use special set of parameters, so you cannot add operator which we could not use. This problem with flags can be solved by adding default value for them with PHP_INI_USER mode, and getter and setter for it. But I'm not sure you think this is a good idea. - Exact flags / Default flags In popular frameworks there are the following flags: Symfony — `ENT_QUOTES` | `ENT_SUBSTITUTE` Yii — `ENT_QUOTES` | `ENT_SUBSTITUTE` Zend — `ENT_QUOTES` | `ENT_SUBSTITUTE` Twig — `ENT_QUOTES` | `ENT_SUBSTITUTE` https://github.com/symfony/symfony/blob/f29d46f29b91ea5c30699cf6bdb8e65545d1dd26/src/Symfony/Component/Templating/PhpEngine.php#L421 https://github.com/yiisoft/yii2/blob/c370c17e93f364a843ed7c31e1e1f7fc8caef0a3/framework/helpers/BaseHtml.php#L104 https://github.com/zendframework/zend-escaper/blob/1a855b5f7074607b1260d85c5526a59b1ab36593/src/Escaper.php#L117 https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L1039 - Tilde sign There is a good argument about tilde sign. It is absent in keyboard layouts for some european languages. But it is the details of implementation. I think it should be special sign which is typed with Shift and is located rather far from "=" sign. Possible variants are "<?! ?>", "<?@ ?>", "<?^ ?>". First variant looks more suitable. I don't see any arguments, why other operators can be implemented but this operator cannot. So, please tell me, what do you think about RFC? 2016-06-29 21:39 GMT+05:00 Михаил Востриков <michael.vostrikov@gmail.com>: > Hello. I've created an article on russian technical site habrahabr.ru. > https://habrahabr.ru/post/304162/ > > There is a poll about introducing of such operator. About 60% from those > people who have projects without template engine are "for" this operator. > And even a half of those who don't also think that such operator can be > useful. > > I think you can use Google Translate to read it, common sense and code > examples should be understandable. > > https://translate.google.com/translate?sl=en&tl=ru&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F304162%2F&edit-text=&act=url > > Current results: > > > How often do you work with the projects with template rendering on PHP > where template engines are not used? > 35% (163) Always > 22% (104) Quite often > 18% (86) Quite rare > 25% (117) Almost never > > Voted 470 people. Abstained 116 people. > > > How do you think, such an operator would be useful? > 56% (264) Yes > 44% (207) No > > Voted 471 people. Abstained 121 people. > > > I don't use PHP teplate rendering ... > 51% (147) and I think that such an operator is not needed > 49% (139) but I think that such an operator will come in handy > > Voted 286 people. Abstained 247 people. > > > Screenshot in Russian: > > https://habrastorage.org/files/675/9ac/883/6759ac8834044ef0b5a09163c791f376.png > > > 60% are "for" this operator, projects of others 40% will not be affected. > I think this is a good reason to create an RFC and discuss it on more > global level.
What for?
<?html= $foo.$bar ?> is easy to verify
<?= html($foo).$bar ?> is not easy to verify
Regards
Rasmus Schultz wrote on 30.06.2016 22:27:
I wish you'd think about the bigger issue of autoloading functions,
which would solve this and many similar problems much more generally.I mean, this:
<?html= $foo ?>
versus this:
<?= html($foo) ?>
What for?
I don't see the point in inventing new syntax, and introducing a new
concept, for what is effectively just a limited set of certain
specific functions.We have functions already - rather than adding new features, we should
improve the features we already have instead, which benefits the
language as a whole, not just templates. Improving on functions is
long over due...I would prefer to have ENT_HTML5 as the default flag included, since normally
all new html code is html5.
Maybe split voting between <?~ and <?html= which gives a later option for
<?json=, <?csv and other contexts.Regards
ThomasМихаил Востриков wrote on 30.06.2016 21:35:
I've tried to gather all arguments for and against.
To be clear. I suggest new operator like '<?~ $value ?>' which is
equivalent of <?= htmlspecialchars($value,ENT_QUOTES
| ENT_SUBSTITUTE) ?>.
It is only for HTML context. Flag combination is taken from most popular
frameworks - Symfony, Zend, Yii, and Twig. Of course, exact form of
operator and default flags are the details of implementation.
- You can write short function in userland.
The problem is not that we have no function. The problem is that the same
action is always repeated, and if we don't repeat it then it leads to
security problems. More than 90% of output data - is data from DB and must
be HTML-encoded.There is no such problem with other contexts. If we don't call json_encode
when passing an array or object into javascript, this only breaks the
script, and it will be noticeable, there won't be security problems.With new operator we can write or <?~ ?>, or <?= ?>, they are mutually
exclusive, and we need specially write one or another, but with helper
function we have the same beginning <?= and then can write helper function
or not.Also there is a problem with function autoloading.
- It is no place for such operators in the language.
It is no place for a such operators in C++, or C#, or Java. But in the most
popular language for web-programming it is very place for such operator.
- There are many other contexts
HTML is external context, but others are internal task-dependent contexts.
HTML can be used together with other contexts.
HTML context is the main context in every PHP file, and we write <?php at
the beginning to switch it.Actually, on web page we have 3 external contexts - HTML, >script> tag,
<style> tag. PHP+CSS usually is not used. PHP+JS is not just escaping. It is encoding in special notation. Escaping can be applied only to strings, but encoding can be applied to any type of variables. Only urlencode is really escaping (and it also can be used together with other contexts). But urlencode is an internal context. If we construct an URL, e.g. for filters, we should encode every part. $filterUrl = '/my_route/?state=active'; if ($postData['contains_text']) $filterUrl .= '&contains_text='.urlencode($postData['contains_text']); When we write data-attirbute, additional context is task-dependent, but HTML context is always present. <div class="some-class" data-url="<?= htmlspecialchars('/my_route/?state=active&category_id=123') ?>" data-settings="<?= htmlspecialchars(json_encode($settings) ?>" ></div> - Other people will ask about operator for another context And you can say: We already added an operator for the main web context, because it is the most frequently used context. If you have a lot of work with other contexts, please use template engine. - You want to add new operator just for your needs It's not only my needs for one project. I meet this problem in many projects without template engine. The results of the poll, which I wrote about in my previous message, show that it has a place not only for me. Some feature requests on http://bugs.php.net with the same question were created in 2002. - Some people can use various flags, or use third and fourth parameters of `htmlspecialchars()`. How many such projects of total number of projects? Default flags can be set to be enough safe. Third parameter can be chaged via ini_set. Fourth parameter is not required for many cases. Except maybe when some people encode the data before saving it into database. Also, new operator is not a replacement for htmlspecialchars, so it could still be used. It just is looked not very good - we use special set of parameters, so you cannot add operator which we could not use. This problem with flags can be solved by adding default value for them with PHP_INI_USER mode, and getter and setter for it. But I'm not sure you think this is a good idea. - Exact flags / Default flags In popular frameworks there are the following flags: Symfony — `ENT_QUOTES` | `ENT_SUBSTITUTE` Yii — `ENT_QUOTES` | `ENT_SUBSTITUTE` Zend — `ENT_QUOTES` | `ENT_SUBSTITUTE` Twig — `ENT_QUOTES` | `ENT_SUBSTITUTE` https://github.com/symfony/symfony/blob/f29d46f29b91ea5c30699cf6bdb8e65545d1dd26/src/Symfony/Component/Templating/PhpEngine.php#L421 https://github.com/yiisoft/yii2/blob/c370c17e93f364a843ed7c31e1e1f7fc8caef0a3/framework/helpers/BaseHtml.php#L104 https://github.com/zendframework/zend-escaper/blob/1a855b5f7074607b1260d85c5526a59b1ab36593/src/Escaper.php#L117 https://github.com/twigphp/Twig/blob/f0a4fa678465491947554f6687c5fca5e482f8ec/lib/Twig/Extension/Core.php#L1039 - Tilde sign There is a good argument about tilde sign. It is absent in keyboard layouts for some european languages. But it is the details of implementation. I think it should be special sign which is typed with Shift and is located rather far from "=" sign. Possible variants are "<?! ?>", "<?@ ?>", "<?^ ?>". First variant looks more suitable. I don't see any arguments, why other operators can be implemented but this operator cannot. So, please tell me, what do you think about RFC? 2016-06-29 21:39 GMT+05:00 Михаил Востриков <michael.vostrikov@gmail.com>: > Hello. I've created an article on russian technical site habrahabr.ru. > https://habrahabr.ru/post/304162/ > > There is a poll about introducing of such operator. About 60% from those > people who have projects without template engine are "for" this operator. > And even a half of those who don't also think that such operator can be > useful. > > I think you can use Google Translate to read it, common sense and code > examples should be understandable. > > https://translate.google.com/translate?sl=en&tl=ru&js=y&prev=_t&hl=ru&ie=UTF-8&u=https%3A%2F%2Fhabrahabr.ru%2Fpost%2F304162%2F&edit-text=&act=url > > Current results: > > > How often do you work with the projects with template rendering on PHP > where template engines are not used? > 35% (163) Always > 22% (104) Quite often > 18% (86) Quite rare > 25% (117) Almost never > > Voted 470 people. Abstained 116 people. > > > How do you think, such an operator would be useful? > 56% (264) Yes > 44% (207) No > > Voted 471 people. Abstained 121 people. > > > I don't use PHP teplate rendering ... > 51% (147) and I think that such an operator is not needed > 49% (139) but I think that such an operator will come in handy > > Voted 286 people. Abstained 247 people. > > > Screenshot in Russian: > > https://habrastorage.org/files/675/9ac/883/6759ac8834044ef0b5a09163c791f376.png > > > 60% are "for" this operator, projects of others 40% will not be affected. > I think this is a good reason to create an RFC and discuss it on more > global level.
What for?
<?html= $foo.$bar ?> is easy to verify<?= html($foo).$bar ?> is not easy to verify
But a fixed version of <?html= is just a single subset of filtering,
while html($foo) can correctly handle differences. The reason it's not
easy to 'verify' is because it has flexibility in it's operation which
forcing a single configuration just to make some operations easy does
not provide.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
On Thu, Jun 30, 2016 at 1:35 PM, Михаил Востриков <
michael.vostrikov@gmail.com> wrote:
- Other people will ask about operator for another context
And you can say: We already added an operator for the main web context,
because it is the most frequently used context. If you have a lot of work
with other contexts, please use template engine.
If you're answer for the other contexts is to use a template engine, why is
the answer for this not to use a template engine? You want to add this
because people don't use template engine, so what makes you think they will
use this when there is a valid solution already available?
If you're answer for the other contexts is to use a template engine,
why is the answer for this not to use a template engine?
You want to add this because people don't use template engine,
so what makes you think they will use this when there is a valid solution
already available?
Because it is almost impossible to add template engine in a big project
with PHP templates. But new version of language usually can easily be used.
Because this is very frequent operation, and error or inattention with
current syntax can cause security problems.
Because other filters are task-dependent, not external-context-dependent.
Of course, we can have own file format with PHP processing for our tasks,
and external context won't be HTML, but then it is task-dependent context.
Actually, HTML is also task-dependent external context, and PHP was made
(and is used) specially for this task.
And because for other context you may need a filter combination, so single
operator will not be very useful.
But last filter in a chain is our external context, and external context in
very many cases is HTML/XML.
2016-07-01 1:42 GMT+05:00 Ryan Pallas derokorian@gmail.com:
On Thu, Jun 30, 2016 at 1:35 PM, Михаил Востриков <
michael.vostrikov@gmail.com> wrote:
- Other people will ask about operator for another context
And you can say: We already added an operator for the main web context,
because it is the most frequently used context. If you have a lot of work
with other contexts, please use template engine.If you're answer for the other contexts is to use a template engine, why
is the answer for this not to use a template engine? You want to add this
because people don't use template engine, so what makes you think they will
use this when there is a valid solution already available?
Because it is almost impossible to add template engine in a big project
with PHP templates. But new version of language usually can easily be used.
I interpret "But new version of language usually can easily be used" as
in a new PHP version being installed on a server touted as being
"easier" than changing/replaced/adding a new template language component
with a framework?
I object to this. I can easier add a new template to e.g. a Laravel
project (own parser, own extension, living next to existing blade
templates) then switching to a new PHP version on production servers.
- Markus
I can easier add a new template to e.g. a Laravel
project (own parser, own extension, living next to existing blade
templates)
Your project already has a template engine, and framework has common code
which works with such engines.
But how much time do you need to convert all existing templates to a new TE?
I mean the projects without template engine, which work and are developed
every day.
Their number is rather large - various CMSs, projects with custom core, Yii
and Zend don't have TE by default.
In a big project there are a lot of PHP templates with <?= htmlspecialchars
?> or <?= h() ?> or <?= (int)$some_id ?> everywhere.
If we miss this somewhere, we could got an XSS.
2016-07-01 12:53 GMT+05:00 Markus Fischer markus@fischer.name:
Because it is almost impossible to add template engine in a big project
with PHP templates. But new version of language usually can easily be
used.I interpret "But new version of language usually can easily be used" as
in a new PHP version being installed on a server touted as being
"easier" than changing/replaced/adding a new template language component
with a framework?I object to this. I can easier add a new template to e.g. a Laravel
project (own parser, own extension, living next to existing blade
templates) then switching to a new PHP version on production servers.
- Markus
On Fri, Jul 1, 2016 at 10:51 AM, Михаил Востриков <
michael.vostrikov@gmail.com> wrote:
I can easier add a new template to e.g. a Laravel
project (own parser, own extension, living next to existing blade
templates)Your project already has a template engine, and framework has common code
which works with such engines.
But how much time do you need to convert all existing templates to a new
TE?I mean the projects without template engine, which work and are developed
every day.
Their number is rather large - various CMSs, projects with custom core, Yii
and Zend don't have TE by default.
In a big project there are a lot of PHP templates with <?= htmlspecialchars
?> or <?= h() ?> or <?= (int)$some_id ?> everywhere.
If we miss this somewhere, we could got an XSS.2016-07-01 12:53 GMT+05:00 Markus Fischer markus@fischer.name:
Because it is almost impossible to add template engine in a big project
with PHP templates. But new version of language usually can easily be
used.I interpret "But new version of language usually can easily be used" as
in a new PHP version being installed on a server touted as being
"easier" than changing/replaced/adding a new template language component
with a framework?I object to this. I can easier add a new template to e.g. a Laravel
project (own parser, own extension, living next to existing blade
templates) then switching to a new PHP version on production servers.
- Markus
--
In a big project there are a lot of PHP templates with <?=
htmlspecialchars
?> or <?= h() ?> or <?= (int)$some_id ?> everywhere.
How will a new output operator help in this case? You still have to search
for <?=
, <? echo
, <? print
occurrences and replace them with <?~
.
Saying that one can forget to add <?= h()
can be applied to <?~
as
well, you can miss <? echo
somewhere in a template and get the same
result at the end.
--
Thank you and best regards,
Eugene Leonovich
All,
Anybody can write an RFC and call a vote whenever they want within the
guidelines set forth for RFCs.
It would be much more productive to get the RFC written and to provide
suggestions on improvements (e.g. syntax choice, default options, ways to
customize), rather battling against it. Or stay quite and vote no. Or do
both.
I am personally against this idea as it stands but maybe there is a middle
ground and maybe some good can come of it. For example, autoloading
functions.
As a suggestion:
Perhaps the ability to register a default stream filter for the default
output buffer paired with file level declarations and/or context tagging
within blocks is a possible solution. So something like:
declare(strict_types=1; output_filter_args=['label' => [ENT_QUOTES |
ENT_SUBSTITUTE]]);
// must be constant scalar expression?
// Would be passed in directly as is, the choice for an array with context
aware keys/values is up to you
register_output_filter(function($buffer, $label) { }, 'label'); // should
have similar API to spl autoloading, with multiple callback stack
<?label:="string"; ?>
Something like this would start to solve some of the problems of context,
default arguments, etc.
I think functions to set the filter options might be better but using
declare makes it easier to limit to current file scope and ensures
consistent placement at top. Also I realize I said to use stream filters
and I've used a closure here. Stream filters are complex due to the whole
bucket brigade/continuous data stream thing, but have the advantage of
being much more performant and resource friendly. Maybe allow both types?
Simple callbacks for ease of use, stream filters for performance and
complex stuffs.
TL;DR: what's your account? Let's give RFC karma and vote it down if you
don't want it.
- Davey
On Fri, Jul 1, 2016 at 10:51 AM, Михаил Востриков <
michael.vostrikov@gmail.com> wrote:I can easier add a new template to e.g. a Laravel
project (own parser, own extension, living next to existing blade
templates)Your project already has a template engine, and framework has common code
which works with such engines.
But how much time do you need to convert all existing templates to a new
TE?I mean the projects without template engine, which work and are developed
every day.
Their number is rather large - various CMSs, projects with custom core,
Yii
and Zend don't have TE by default.
In a big project there are a lot of PHP templates with <?=
htmlspecialchars
?> or <?= h() ?> or <?= (int)$some_id ?> everywhere.
If we miss this somewhere, we could got an XSS.2016-07-01 12:53 GMT+05:00 Markus Fischer markus@fischer.name:
Because it is almost impossible to add template engine in a big
project
with PHP templates. But new version of language usually can easily be
used.I interpret "But new version of language usually can easily be used" as
in a new PHP version being installed on a server touted as being
"easier" than changing/replaced/adding a new template language
component
with a framework?I object to this. I can easier add a new template to e.g. a Laravel
project (own parser, own extension, living next to existing blade
templates) then switching to a new PHP version on production servers.
- Markus
--
In a big project there are a lot of PHP templates with <?=
htmlspecialchars
?> or <?= h() ?> or <?= (int)$some_id ?> everywhere.How will a new output operator help in this case? You still have to search
for<?=
,<? echo
,<? print
occurrences and replace them with<?~
.
Saying that one can forget to add<?= h()
can be applied to<?~
as
well, you can miss<? echo
somewhere in a template and get the same
result at the end.--
Thank you and best regards,
Eugene Leonovich
How will a new output operator help in this case?
You still have to search for<?=
,<? echo
,<? print
occurrences and
replace them with<?~
.
Saying that one can forget to add<?= h()
can be applied to<?~
as
well,
you can miss<? echo
somewhere in a template and get the same result at
the end
No. <?= (int)$some_id ?>, <? echo htmlspecialchars()
?> and others usually
are safe in current code.
But we need to write new code for new functionality in a project.
Let's say we've added new column in database, and added output of it in our
template.
It is very easy to write <div><?= $obj->new_text_column ?></div> for
testing purposes and then leave it as is.
The problem is that <?= h($something) ?> and <?= $something ?> both work
good, one is a subset of another. But the second variant is unsafe. We can
forget to write that main part.
But we cannot forget to write '=' in '<?=' or '~' in '<?~', because it will
not work. It just will not output anything and we can notice this.
Also we need to select what to write, they are mutualy exclusive. And we
have to write <?~ ?> almost everywhere, <?= ?> is needed only sometimes.
It would be much more productive to get the RFC written and to provide
suggestions on improvements
Thanks. My wiki account is 'michael-vostrikov'.
2016-07-01 16:46 GMT+05:00 Davey Shafik davey@php.net:
All,
Anybody can write an RFC and call a vote whenever they want within the
guidelines set forth for RFCs.It would be much more productive to get the RFC written and to provide
suggestions on improvements (e.g. syntax choice, default options, ways to
customize), rather battling against it. Or stay quite and vote no. Or do
both.I am personally against this idea as it stands but maybe there is a middle
ground and maybe some good can come of it. For example, autoloading
functions.As a suggestion:
Perhaps the ability to register a default stream filter for the default
output buffer paired with file level declarations and/or context tagging
within blocks is a possible solution. So something like:declare(strict_types=1; output_filter_args=['label' => [ENT_QUOTES |
ENT_SUBSTITUTE]]);// must be constant scalar expression?
// Would be passed in directly as is, the choice for an array with context
aware keys/values is up to youregister_output_filter(function($buffer, $label) { }, 'label'); // should
have similar API to spl autoloading, with multiple callback stack<?label:="string"; ?>
Something like this would start to solve some of the problems of context,
default arguments, etc.I think functions to set the filter options might be better but using
declare makes it easier to limit to current file scope and ensures
consistent placement at top. Also I realize I said to use stream filters
and I've used a closure here. Stream filters are complex due to the whole
bucket brigade/continuous data stream thing, but have the advantage of
being much more performant and resource friendly. Maybe allow both types?
Simple callbacks for ease of use, stream filters for performance and
complex stuffs.TL;DR: what's your account? Let's give RFC karma and vote it down if you
don't want it.
- Davey