Hi,
Could htmlspecialchars()
use ENT_QUOTES
by default?
I recently worked on an example script, where I tried to keep it simple by
using htmlspecialchars directly, e.g.
echo "<img src='" . htmlspecialchars($url) . "'>";
I'd completely forgotten that single quotes are not escaped by default,
creating a XSS vulnerability, e.g.
$url = "/' onerror='alert(1)";
All the common frameworks I could find use ENT_QUOTES
to do this safely
(details below).
Christoph (cmb69) suggests this was done for HTML4 compatibility, with
older versions of PHP possibly having issues with numeric character
references (a quick search suggests PHP 5.4?).
PHP uses the numeric version ' with ENT_QUOTES, and it should continue
to do so - because the named version, ' was added in HTML5, but can
still cause problems with legacy parsers; for example Android 4, and the
one still in use by Microsoft Outlook (&/>/< was in the
original HTML spec, and " was added in HTML2).
I'd also be tempted to suggest ENT_SUBSTITUTE
should be included, as I
prefer to keep as much of the valid data (rather than losing everything),
but that's not as important as escaping the apostrophe by default.
Craig
WordPress uses ENT_QUOTES
(ish).
https://developer.wordpress.org/reference/functions/esc_html/
Laravel, with Blade, uses ENT_QUOTES:
https://github.com/illuminate/support/blob/master/helpers.php#L118
Symfony or Slim, with Twig, uses ENT_QUOTES
| ENT_SUBSTITUTE:
https://github.com/twigphp/Twig/blob/3.x/src/Extension/EscaperExtension.php#L243
CodeIgniter uses ENT_QUOTES
| ENT_SUBSTITUTE:
https://github.com/codeigniter4/CodeIgniter4/blob/develop/system/ThirdParty/Escaper/Escaper.php#L120
CakePHP uses ENT_QUOTES
| ENT_SUBSTITUTE:
https://github.com/cakephp/cakephp/blob/master/src/Core/functions.php#L67
YII uses ENT_QUOTES
| ENT_SUBSTITUTE:
https://github.com/yiisoft/yii2/blob/master/framework/helpers/BaseHtml.php#L111
Phalcon uses ENT_QUOTES:
https://github.com/phalcon/phalcon/blob/v5.0.x/src/Html/Escaper.php#L78
FuelPHP uses ENT_QUOTES:
https://github.com/fuel/core/blob/1.9/develop/config/config.php#L459
FWIW i'm surprised with the lack of ENT_DISALLOWED
, personally i use
tohtml(string $str):string{ return htmlentities($str, ENT_QUOTES
|
ENT_HTML401 | ENT_SUBSTITUTE
| ENT_DISALLOWED, 'UTF-8', true);}
On Sat, 26 Dec 2020 at 12:03, Craig Francis craig@craigfrancis.co.uk
wrote:
Hi,
Could
htmlspecialchars()
useENT_QUOTES
by default?I recently worked on an example script, where I tried to keep it simple by
using htmlspecialchars directly, e.g.echo "<img src='" . htmlspecialchars($url) . "'>";
I'd completely forgotten that single quotes are not escaped by default,
creating a XSS vulnerability, e.g.$url = "/' onerror='alert(1)";
All the common frameworks I could find use
ENT_QUOTES
to do this safely
(details below).Christoph (cmb69) suggests this was done for HTML4 compatibility, with
older versions of PHP possibly having issues with numeric character
references (a quick search suggests PHP 5.4?).PHP uses the numeric version ' with ENT_QUOTES, and it should continue
to do so - because the named version, ' was added in HTML5, but can
still cause problems with legacy parsers; for example Android 4, and the
one still in use by Microsoft Outlook (&/>/< was in the
original HTML spec, and " was added in HTML2).I'd also be tempted to suggest
ENT_SUBSTITUTE
should be included, as I
prefer to keep as much of the valid data (rather than losing everything),
but that's not as important as escaping the apostrophe by default.Craig
WordPress uses
ENT_QUOTES
(ish).https://developer.wordpress.org/reference/functions/esc_html/
Laravel, with Blade, uses ENT_QUOTES:
https://github.com/illuminate/support/blob/master/helpers.php#L118
Symfony or Slim, with Twig, uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/twigphp/Twig/blob/3.x/src/Extension/EscaperExtension.php#L243
CodeIgniter uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/codeigniter4/CodeIgniter4/blob/develop/system/ThirdParty/Escaper/Escaper.php#L120
CakePHP uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/cakephp/cakephp/blob/master/src/Core/functions.php#L67
YII uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/yiisoft/yii2/blob/master/framework/helpers/BaseHtml.php#L111
Phalcon uses ENT_QUOTES:
https://github.com/phalcon/phalcon/blob/v5.0.x/src/Html/Escaper.php#L78
FuelPHP uses ENT_QUOTES:
https://github.com/fuel/core/blob/1.9/develop/config/config.php#L459
On Sat, Dec 26, 2020 at 12:03 PM Craig Francis craig@craigfrancis.co.uk
wrote:
Hi,
Could
htmlspecialchars()
useENT_QUOTES
by default?I recently worked on an example script, where I tried to keep it simple by
using htmlspecialchars directly, e.g.echo "<img src='" . htmlspecialchars($url) . "'>";
I'd completely forgotten that single quotes are not escaped by default,
creating a XSS vulnerability, e.g.$url = "/' onerror='alert(1)";
All the common frameworks I could find use
ENT_QUOTES
to do this safely
(details below).Christoph (cmb69) suggests this was done for HTML4 compatibility, with
older versions of PHP possibly having issues with numeric character
references (a quick search suggests PHP 5.4?).PHP uses the numeric version ' with ENT_QUOTES, and it should continue
to do so - because the named version, ' was added in HTML5, but can
still cause problems with legacy parsers; for example Android 4, and the
one still in use by Microsoft Outlook (&/>/< was in the
original HTML spec, and " was added in HTML2).I'd also be tempted to suggest
ENT_SUBSTITUTE
should be included, as I
prefer to keep as much of the valid data (rather than losing everything),
but that's not as important as escaping the apostrophe by default.Craig
WordPress uses
ENT_QUOTES
(ish).https://developer.wordpress.org/reference/functions/esc_html/
Laravel, with Blade, uses ENT_QUOTES:
https://github.com/illuminate/support/blob/master/helpers.php#L118
Symfony or Slim, with Twig, uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/twigphp/Twig/blob/3.x/src/Extension/EscaperExtension.php#L243
CodeIgniter uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/codeigniter4/CodeIgniter4/blob/develop/system/ThirdParty/Escaper/Escaper.php#L120
CakePHP uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/cakephp/cakephp/blob/master/src/Core/functions.php#L67
YII uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/yiisoft/yii2/blob/master/framework/helpers/BaseHtml.php#L111
Phalcon uses ENT_QUOTES:
https://github.com/phalcon/phalcon/blob/v5.0.x/src/Html/Escaper.php#L78
FuelPHP uses ENT_QUOTES:
https://github.com/fuel/core/blob/1.9/develop/config/config.php#L459
I agree that we should switch the default to ENT_QUOTES. I also agree that
we should enable ENT_SUBSTITUTE
by default. I don't see any downside to
these two options.
Would you like to submit a PR?
Nikita
On Sat, Dec 26, 2020 at 12:03 PM Craig Francis craig@craigfrancis.co.uk
wrote:Hi,
Could
htmlspecialchars()
useENT_QUOTES
by default?[...]
I agree that we should switch the default to ENT_QUOTES. I also agree that
we should enableENT_SUBSTITUTE
by default. I don't see any downside to
these two options.Would you like to submit a PR?
Thanks Nikita,
I've submitted a PR, I think it follows the Coding Standards, and I've got
the tests passing, but feel free to edit/delete if it's not right.
https://github.com/php/php-src/pull/6583
If this is accepted, I can also look at the documentation updates.
Craig
Le 6 janv. 2021 à 16:46, Nikita Popov nikita.ppv@gmail.com a écrit :
On Sat, Dec 26, 2020 at 12:03 PM Craig Francis craig@craigfrancis.co.uk
wrote:Hi,
Could
htmlspecialchars()
useENT_QUOTES
by default?I recently worked on an example script, where I tried to keep it simple by
using htmlspecialchars directly, e.g.echo "<img src='" . htmlspecialchars($url) . "'>";
I'd completely forgotten that single quotes are not escaped by default,
creating a XSS vulnerability, e.g.$url = "/' onerror='alert(1)";
All the common frameworks I could find use
ENT_QUOTES
to do this safely
(details below).Christoph (cmb69) suggests this was done for HTML4 compatibility, with
older versions of PHP possibly having issues with numeric character
references (a quick search suggests PHP 5.4?).PHP uses the numeric version ' with ENT_QUOTES, and it should continue
to do so - because the named version, ' was added in HTML5, but can
still cause problems with legacy parsers; for example Android 4, and the
one still in use by Microsoft Outlook (&/>/< was in the
original HTML spec, and " was added in HTML2).I'd also be tempted to suggest
ENT_SUBSTITUTE
should be included, as I
prefer to keep as much of the valid data (rather than losing everything),
but that's not as important as escaping the apostrophe by default.Craig
WordPress uses
ENT_QUOTES
(ish).https://developer.wordpress.org/reference/functions/esc_html/
Laravel, with Blade, uses ENT_QUOTES:
https://github.com/illuminate/support/blob/master/helpers.php#L118
Symfony or Slim, with Twig, uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/twigphp/Twig/blob/3.x/src/Extension/EscaperExtension.php#L243
CodeIgniter uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/codeigniter4/CodeIgniter4/blob/develop/system/ThirdParty/Escaper/Escaper.php#L120
CakePHP uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/cakephp/cakephp/blob/master/src/Core/functions.php#L67
YII uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/yiisoft/yii2/blob/master/framework/helpers/BaseHtml.php#L111
Phalcon uses ENT_QUOTES:
https://github.com/phalcon/phalcon/blob/v5.0.x/src/Html/Escaper.php#L78
FuelPHP uses ENT_QUOTES:
https://github.com/fuel/core/blob/1.9/develop/config/config.php#L459
I agree that we should switch the default to ENT_QUOTES. I also agree that
we should enableENT_SUBSTITUTE
by default. I don't see any downside to
these two options.Would you like to submit a PR?
Nikita
For ENT_SUBSTITUTE, there has been https://bugs.php.net/bug.php?id=69450 https://bugs.php.net/bug.php?id=69450, but I don’t understand the objection in that bug report. Maybe there is some issue related to non-Unicode multibyte encodings?
—Claude
Le 6 janv. 2021 à 16:46, Nikita Popov nikita.ppv@gmail.com a écrit :
On Sat, Dec 26, 2020 at 12:03 PM Craig Francis craig@craigfrancis.co.uk
wrote:Hi,
Could
htmlspecialchars()
useENT_QUOTES
by default?I recently worked on an example script, where I tried to keep it simple by
using htmlspecialchars directly, e.g.echo "<img src='" . htmlspecialchars($url) . "'>";
I'd completely forgotten that single quotes are not escaped by default,
creating a XSS vulnerability, e.g.$url = "/' onerror='alert(1)";
All the common frameworks I could find use
ENT_QUOTES
to do this safely
(details below).Christoph (cmb69) suggests this was done for HTML4 compatibility, with
older versions of PHP possibly having issues with numeric character
references (a quick search suggests PHP 5.4?).PHP uses the numeric version ' with ENT_QUOTES, and it should continue
to do so - because the named version, ' was added in HTML5, but can
still cause problems with legacy parsers; for example Android 4, and the
one still in use by Microsoft Outlook (&/>/< was in the
original HTML spec, and " was added in HTML2).I'd also be tempted to suggest
ENT_SUBSTITUTE
should be included, as I
prefer to keep as much of the valid data (rather than losing everything),
but that's not as important as escaping the apostrophe by default.Craig
WordPress uses
ENT_QUOTES
(ish).https://developer.wordpress.org/reference/functions/esc_html/
Laravel, with Blade, uses ENT_QUOTES:
https://github.com/illuminate/support/blob/master/helpers.php#L118
Symfony or Slim, with Twig, uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/twigphp/Twig/blob/3.x/src/Extension/EscaperExtension.php#L243
CodeIgniter uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/codeigniter4/CodeIgniter4/blob/develop/system/ThirdParty/Escaper/Escaper.php#L120
CakePHP uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/cakephp/cakephp/blob/master/src/Core/functions.php#L67
YII uses
ENT_QUOTES
| ENT_SUBSTITUTE:https://github.com/yiisoft/yii2/blob/master/framework/helpers/BaseHtml.php#L111
Phalcon uses ENT_QUOTES:
https://github.com/phalcon/phalcon/blob/v5.0.x/src/Html/Escaper.php#L78
FuelPHP uses ENT_QUOTES:
https://github.com/fuel/core/blob/1.9/develop/config/config.php#L459
I agree that we should switch the default to ENT_QUOTES. I also agree that
we should enableENT_SUBSTITUTE
by default. I don't see any downside to
these two options.Would you like to submit a PR?
Nikita
For ENT_SUBSTITUTE, there has been https://bugs.php.net/bug.php?id=69450 https://bugs.php.net/bug.php?id=69450, but I don’t understand the objection in that bug report. Maybe there is some issue related to non-Unicode multibyte encodings?
—Claude
Only ISO-2022 encodings got bytes that can match symbols sanitized by
htmlspecialchars.
Bug objection insist that utf-8 parsing rules should be enacted by
sanitizing function and not by application which displays text. And PHP
code is enacting those rules in most unfriendly API way.
--
Tomas
Could
htmlspecialchars()
useENT_QUOTES
by default?
[...]
I'd also be tempted to suggestENT_SUBSTITUTE
should be included, as I prefer to keep as much of the valid data (rather than losing everything), but that's not as important as escaping the apostrophe by default.
For ENT_SUBSTITUTE, there has been https://bugs.php.net/bug.php?id=69450, but I don’t understand the objection in that bug report. Maybe there is some issue related to non-Unicode multibyte encodings?
Only ISO-2022 encodings got bytes that can match symbols sanitized by htmlspecialchars.
Bug objection insist that utf-8 parsing rules should be enacted by sanitizing function and not by application which displays text. And PHP code is enacting those rules in most unfriendly API way.
Does anyone have an example where ENT_SUBSTITUTE
could be used to create an issue? ideally a security issue, but anything will do.
With htmlspecialchars($user_value)
, I don't think it would matter if it ended with <multibyte start byte>, like the example from Rasmus (0xE0), because that end byte would be replaced by U+FFFD.
With htmlspecialchars($user_value . $system_value)
, if $user_value ends with <multibyte start byte>, it's possible some characters at the beginning of $system_value could be replaced. But I can't find a way to do that with UTF-8; and even if it was possible, I would have thought some characters being replaced by U+FFFD, would be a much better solution than everything being lost (noting that $system_value will not contain any HTML characters, because they are escaped as well).
echo '<p>' . htmlspecialchars($user . ' is lying to you') . '</p>';
With: '<p>ABC�s lying to you</p>'
Without: '<p></p>'
And, in both cases, the output is valid UTF-8, and shouldn't affect anything it's concatenated with (i.e. the HTML context).
Personally, I think every part of our processes (input, processing, and output) should do its best to handle encoding issues (incase something is missed). I believe ENT_SUBSTITUTE
is the best way to deal with it during output. I don't think it's realistic to expect every single PHP developer to check for invalid characters in every single bit of input.
That said (and just to make things even more complicated), considering this is HTML encoding, we could go even further and add ENT_DISALLOWED. As Hans noted, some characters, such as 0x01, can be seen as valid in general, but not valid for HTML (where Text Nodes "must not contain control characters other than space characters"). All browsers seem to handle these control characters (by ignoring them), so I'm not too worried, but if it makes things safer, why not?
Craig
Hi,
Le 26 déc. 2020 à 12:02, Craig Francis craig@craigfrancis.co.uk a écrit :
(...)
PHP uses the numeric version ' with ENT_QUOTES, and it should continue
to do so - because the named version, ' was added in HTML5, but can
still cause problems with legacy parsers; for example Android 4, and the
one still in use by Microsoft Outlook (&/>/< was in the
original HTML spec, and " was added in HTML2).(...)
I agree that — in addition to ENT_QUOTES
— ENT_HTML401 (which encodes quotes as ') is a better default when encoding, but I also think that ENT_HTML5 (which recognises both ' and ') is a better default when decoding. This is not just when $flags is missing, it is also when neither of ENT_HTML401, ENT_HTML5 or ENT_XML1 appears explicitly in the bitmask, i.e.:
htmlspecialchars($x, ENT_QUOTES); // should be equivalent to ENT_QUOTES
| ENT_HTML401
html_entity_decode($x, ENT_QUOTES); // should be equivalent to ENT_QUOTES
| ENT_HTML5
The difference between ENT_HTML401 and ENT_HTML5 and their practical effect (one of them more compatible when encoding and the other more compatible when decoding) is probably too subtle for most people, assuming they even know the existence of such flags. (In the codebase I’m taking care of, there is somewhere a comment that says “html_entity_decode does not decode '” followed by code handling manually that specific entity.)
—Claude
Hi,
Le 26 déc. 2020 à 12:02, Craig Francis craig@craigfrancis.co.uk a
écrit :(...)
PHP uses the numeric version ' with ENT_QUOTES, and it should
continue
to do so - because the named version, ' was added in HTML5, but can
still cause problems with legacy parsers; for example Android 4, and the
one still in use by Microsoft Outlook (&/>/< was in the
original HTML spec, and " was added in HTML2).(...)
I agree that — in addition to
ENT_QUOTES
— ENT_HTML401 (which encodes
quotes as ') is a better default when encoding, but I also think that
ENT_HTML5 (which recognises both ' and ') is a better default
when decoding.
That's a good point for decoding, if I saw:
echo html_entity_decode(' ' ' ')
I would expect it to decode both.
I'm tempted to update my PR to use ENT_HTML5 on html_entity_decode
and
htmlspecialchars_decode
, to avoid that issue.
But the tests show that it affects a few others, which I think are fine,
but should check what others think:
> DECODED, Form Feed
> NOT DECODED, Carriage Return
> NOT DECODED, Invalid character
> NOT DECODED, Invalid character
> NOT DECODED, Invalid character
> NOT DECODED, Invalid character
? > NOT DECODED, Not a character
? > NOT DECODED, Not a character
Craig