Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:112794 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 45318 invoked from network); 7 Jan 2021 14:34:44 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 7 Jan 2021 14:34:44 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id CD4A51804F3 for ; Thu, 7 Jan 2021 06:11:26 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 7 Jan 2021 06:11:26 -0800 (PST) Received: by mail-wm1-f51.google.com with SMTP id g185so5660491wmf.3 for ; Thu, 07 Jan 2021 06:11:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=WhxzVTr0dTjYi8ac6As8rH1Ujpd4oN5MQ85FzmzvtcI=; b=rSIXSSU2tK/fE716fmWG14pUucmbw0j1YUU2MWKWeRwF8NUAvNAFOusBB2Z/e3bYfB oQeDcv4hBoCasqa3rSaTQjccp+jaZtxwJ0lQ/cye+xnL4V741w0U7x9TotlEQSetBkeH jTrI17VyXy3k1OCIUA1LQ9S9jxn6bn5/IjD9hGiwR9YaSwGuWtPVNKVivHATKTgBapxH 07Hrkc8I3pTPZCRv99TjIdcGNDinBsj+gr2bfeG5bZfcUhBBos+AFtcFhiHxyyKde+az 2yze4hRl9XHYR/SXzHxtZ5msXDHZLvAiJZoBlYvGHATMmRuxy9fCrJG2DgkBgEK0qafP h73w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=WhxzVTr0dTjYi8ac6As8rH1Ujpd4oN5MQ85FzmzvtcI=; b=O6Z5EZbW3EvbrlhBFMU9aSm111DTeW6l8ffZ9fRpHRlM3mmv2Jz2zAis5jWZ4k3FGa 8SM5N0r8C+1JkuK3xtkwJ/PFbqyGDpIVBpK5EzFRxN9gjV5EYbT+QO1aMy6ErSKsEGxU m9cckcDT2qEPWLqtwvmXM1abAiDczhr/RRLiG7JO5o9dyEeLaR6dmyhD2PuqW6g+LPji VAmvTasmAjBRhDJ7TO4iwdHnGwVYgd9AHyL73rwP5G+rkLvmDcO9BhpZ3cNixFjKiN0e B1Xu7e1shZDFZ3L5PjNNoMslLWQ4X7lzs/KznhT5fI3xk8zp/RL8ocPkPxRRX44QxhLa 69sA== X-Gm-Message-State: AOAM533RbNDJHbOqvDcgl64Y2Yoznr1FYDt7cQAF2rAa8TxgR2NVHsV0 2wnWrIEpO9uRzxay/WdfAeiIoyl1WD9b1A== X-Google-Smtp-Source: ABdhPJyjFel+tvhM1Vmp4eh93ebBk/Mpw4iJkuxWkN1J2vhDoS1JKgK/jRMBOa3xNtr0XaYGmL943Q== X-Received: by 2002:a1c:9c52:: with SMTP id f79mr8148289wme.3.1610028683403; Thu, 07 Jan 2021 06:11:23 -0800 (PST) Received: from claude.fritz.box ([89.249.45.14]) by smtp.gmail.com with ESMTPSA id x66sm7597805wmg.26.2021.01.07.06.11.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jan 2021 06:11:22 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) In-Reply-To: Date: Thu, 7 Jan 2021 15:11:21 +0100 Cc: PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: References: To: Craig Francis X-Mailer: Apple Mail (2.3608.120.23.2.4) Subject: Re: [PHP-DEV] ENT_COMPAT for htmlentities and htmlspecialchars From: claude.pache@gmail.com (Claude Pache) Hi, > Le 26 d=C3=A9c. 2020 =C3=A0 12:02, Craig Francis = a =C3=A9crit : >=20 > (...) > PHP uses the numeric version ' with ENT_QUOTES, and it should = continue > to do so - because the named version, ' was added in HTML5, but = can > still cause problems with legacy parsers; for example Android 4, and = the > one still in use by Microsoft Outlook (&/>/< was in the > original HTML spec, and " was added in HTML2). >=20 > (...) I agree that =E2=80=94 in addition to ENT_QUOTES =E2=80=94 ENT_HTML401 = (which encodes quotes as ') is a better default when encoding, but = I also think that ENT_HTML5 (which recognises both ' and ') is = a better default when decoding. This is not just when $flags is missing, = it is also when neither of ENT_HTML401, ENT_HTML5 or ENT_XML1 appears = explicitly in the bitmask, i.e.: htmlspecialchars($x, ENT_QUOTES); // should be equivalent to ENT_QUOTES = | ENT_HTML401 html_entity_decode($x, ENT_QUOTES); // should be equivalent to = ENT_QUOTES | ENT_HTML5 The difference between ENT_HTML401 and ENT_HTML5 and their practical = effect (one of them more compatible when encoding and the other more = compatible when decoding) is probably too subtle for most people, = assuming they even know the existence of such flags. (In the codebase = I=E2=80=99m taking care of, there is somewhere a comment that says = =E2=80=9Chtml_entity_decode does not decode '=E2=80=9D followed by = code handling manually that specific entity.) =E2=80=94Claude