Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:112797 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 63098 invoked from network); 7 Jan 2021 15:32:45 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 7 Jan 2021 15:32:45 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 132D81804E2 for ; Thu, 7 Jan 2021 07:09:29 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 7 Jan 2021 07:09:28 -0800 (PST) Received: by mail-wr1-f44.google.com with SMTP id i9so5984137wrc.4 for ; Thu, 07 Jan 2021 07:09:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZgqfG+1dVPv2w3Ld2Bn14VPRtZNBmh6yHNTE/uUycpU=; b=Piofn0XkN/h5r2dQyZZSxGffz0h/ohdmaRHKA02lq7krfqE9p56aoH2PpakSvq3cY3 yc2RselCYJcAMe6prGAj92rphKvZ+O00XZEVkrhQWpSFypTuZZfo5N3dLn9rjCnxp929 Zw4RogyeMLtcoXkN5hfhsOyAmmkJZUIATFp40= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZgqfG+1dVPv2w3Ld2Bn14VPRtZNBmh6yHNTE/uUycpU=; b=RYnxWUEuPnI+5WN6IO4SAE13GCc43QGtfrxmj3deXohiMb40vtI9CIK9W35yng19hr MSs9flQx4VJK9Iy/QuHOHGcA0pwBI3mYKphCJvWVHb1joYAkY5jk1xDT3hyILPBW+8Sr /QXGEoLhHLF6hdXF8pvtwOKydz3zt+x8CZt8BpUuRWPX64h2ROHc25WMa61sTKTwtctH 3c9TfoSdrCBvNj2C44sbGJfeRakYf6fTU18H7Zz4VShkvUPvo296okyswAFRgqeFq1dZ 9JS5M0DADgJQNzSX3kqLoIUluziFAAKRxC7ZKQqX0a1ysCOqDhRKuEQs+LaXjRxvbzVs VQHg== X-Gm-Message-State: AOAM533lNL36etWU2+H3TcwDwbUoHLTF9KTnVRLI3Lu0diSeJ7LlRF18 yYgprHyQkJ4bWMln12wpJGZ8L2ywq8Wh1wAr57X+8A== X-Google-Smtp-Source: ABdhPJw1l8rxKpgzFwXKnylOadtGyN6ttS8Wl5Mh/CymfuXT/tW5Hb5YlWf6QjxjEb+yyn7+At6TRdL6wWqgB4H9VRA= X-Received: by 2002:a5d:4a4e:: with SMTP id v14mr9366591wrs.80.1610032167454; Thu, 07 Jan 2021 07:09:27 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 7 Jan 2021 15:09:16 +0000 Message-ID: To: Claude Pache Cc: PHP internals Content-Type: multipart/alternative; boundary="0000000000006b1e5805b850d342" Subject: Re: [PHP-DEV] ENT_COMPAT for htmlentities and htmlspecialchars From: craig@craigfrancis.co.uk (Craig Francis) --0000000000006b1e5805b850d342 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 7 Jan 2021 at 14:11, Claude Pache wrote: > Hi, > > > Le 26 d=C3=A9c. 2020 =C3=A0 12:02, Craig Francis a > =C3=A9crit : > > > > (...) > > PHP uses the numeric version ' with ENT_QUOTES, and it should > continue > > to do so - because the named version, ' was added in HTML5, but ca= n > > still cause problems with legacy parsers; for example Android 4, and th= e > > one still in use by Microsoft Outlook (&/>/< was in the > > original HTML spec, and " was added in HTML2). > > > > (...) > > I agree that =E2=80=94 in addition to ENT_QUOTES =E2=80=94 ENT_HTML401 (w= hich encodes > quotes as ') is a better default when encoding, but I also think tha= t > ENT_HTML5 (which recognises both ' and ') is a better default > when decoding. > That's a good point for decoding, if I saw: echo html_entity_decode(' ' ' ') I would expect it to decode both. I'm tempted to update my PR to use ENT_HTML5 on `html_entity_decode` and `htmlspecialchars_decode`, to avoid that issue. But the tests show that it affects a few others, which I think are fine, but should check what others think: > DECODED, Form Feed > NOT DECODED, Carriage Return ﷐ > NOT DECODED, Invalid character ﷯ > NOT DECODED, Invalid character ￾ > NOT DECODED, Invalid character ￿ > NOT DECODED, Invalid character 𯿾 > NOT DECODED, Not a character 𯿿 > NOT DECODED, Not a character Craig --0000000000006b1e5805b850d342--