Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:112790 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 9654 invoked from network); 7 Jan 2021 09:53:08 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 7 Jan 2021 09:53:08 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id EC20E1804F3 for ; Thu, 7 Jan 2021 01:29:47 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from pastas.eik.lt (pastas.eik.lt [213.226.176.35]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 7 Jan 2021 01:29:47 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by pastas.eik.lt (Postfix) with ESMTP id C8F38611DB; Thu, 7 Jan 2021 09:29:44 +0000 (UTC) Received: from pastas.eik.lt ([127.0.0.1]) by localhost (pastas.eik.lt [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yKgklNL-8VSF; Thu, 7 Jan 2021 09:29:43 +0000 (UTC) Received: from [192.168.1.165] (78-56-84-144.static.zebra.lt [78.56.84.144]) by pastas.eik.lt (Postfix) with ESMTPSA id 8CB5D60884; Thu, 7 Jan 2021 09:29:42 +0000 (UTC) To: Claude Pache , Nikita Popov Cc: Craig Francis , PHP internals References: <99C71641-5A5B-49C8-8D96-F0C080352B91@gmail.com> Message-ID: Date: Thu, 7 Jan 2021 11:29:39 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <99C71641-5A5B-49C8-8D96-F0C080352B91@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [PHP-DEV] ENT_COMPAT for htmlentities and htmlspecialchars From: tokul@users.sourceforge.net (Tomas Kuliavas) On 2021-01-07 11:00, Claude Pache wrote: > >> Le 6 janv. 2021 à 16:46, Nikita Popov a écrit : >> >> On Sat, Dec 26, 2020 at 12:03 PM Craig Francis >> wrote: >> >>> Hi, >>> >>> Could htmlspecialchars() use ENT_QUOTES by default? >>> >>> I recently worked on an example script, where I tried to keep it simple by >>> using htmlspecialchars directly, e.g. >>> >>> echo ""; >>> >>> I'd completely forgotten that single quotes are not escaped by default, >>> creating a XSS vulnerability, e.g. >>> >>> $url = "/' onerror='alert(1)"; >>> >>> All the common frameworks I could find use ENT_QUOTES to do this safely >>> (details below). >>> >>> Christoph (cmb69) suggests this was done for HTML4 compatibility, with >>> older versions of PHP possibly having issues with numeric character >>> references (a quick search suggests PHP 5.4?). >>> >>> PHP uses the numeric version ' with ENT_QUOTES, and it should continue >>> to do so - because the named version, ' was added in HTML5, but can >>> still cause problems with legacy parsers; for example Android 4, and the >>> one still in use by Microsoft Outlook (&/>/< was in the >>> original HTML spec, and " was added in HTML2). >>> >>> I'd also be tempted to suggest ENT_SUBSTITUTE should be included, as I >>> prefer to keep as much of the valid data (rather than losing everything), >>> but that's not as important as escaping the apostrophe by default. >>> >>> Craig >>> >>> >>> >>> >>> WordPress uses ENT_QUOTES (ish). >>> >>> https://developer.wordpress.org/reference/functions/esc_html/ >>> >>> Laravel, with Blade, uses ENT_QUOTES: >>> >>> https://github.com/illuminate/support/blob/master/helpers.php#L118 >>> >>> Symfony or Slim, with Twig, uses ENT_QUOTES | ENT_SUBSTITUTE: >>> >>> >>> https://github.com/twigphp/Twig/blob/3.x/src/Extension/EscaperExtension.php#L243 >>> >>> CodeIgniter uses ENT_QUOTES | ENT_SUBSTITUTE: >>> >>> >>> https://github.com/codeigniter4/CodeIgniter4/blob/develop/system/ThirdParty/Escaper/Escaper.php#L120 >>> >>> CakePHP uses ENT_QUOTES | ENT_SUBSTITUTE: >>> >>> https://github.com/cakephp/cakephp/blob/master/src/Core/functions.php#L67 >>> >>> YII uses ENT_QUOTES | ENT_SUBSTITUTE: >>> >>> >>> https://github.com/yiisoft/yii2/blob/master/framework/helpers/BaseHtml.php#L111 >>> >>> Phalcon uses ENT_QUOTES: >>> >>> https://github.com/phalcon/phalcon/blob/v5.0.x/src/Html/Escaper.php#L78 >>> >>> FuelPHP uses ENT_QUOTES: >>> >>> https://github.com/fuel/core/blob/1.9/develop/config/config.php#L459 >> >> I agree that we should switch the default to ENT_QUOTES. I also agree that >> we should enable ENT_SUBSTITUTE by default. I don't see any downside to >> these two options. >> >> Would you like to submit a PR? >> >> Nikita > > For ENT_SUBSTITUTE, there has been https://bugs.php.net/bug.php?id=69450 , but I don’t understand the objection in that bug report. Maybe there is some issue related to non-Unicode multibyte encodings? > > —Claude Only ISO-2022 encodings got bytes that can match symbols sanitized by htmlspecialchars. Bug objection insist that utf-8 parsing rules should be enacted by sanitizing function and not by application which displays text. And PHP code is enacting those rules in most unfriendly API way. -- Tomas