Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124376 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id D9DB31A00B7 for ; Wed, 10 Jul 2024 22:42:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1720651462; bh=6Dgi4ow2HJD4wInIjMR/iFkKq9eeVDMcmvR3w2dG/jk=; h=In-Reply-To:References:Date:From:To:Subject:From; b=lBRZFnWzvmMYw6f4hIV3BcrXNLOkj59NzXIDsS+iwQqFb4u/0e43ymh51YCw7sN6P 67Ipq49MndgmEmr+RvTGy84Q34jYT0n0Fdi1K/90W7DPXQYRRVY+ZeQG5n1tKMxw0X lRpR4D+dP9YYoeUZQFmUjmRLo+xYG14SDcDBzebBHfJlR9D5lddlYgv2j8+51Z+Ycp M7R0xO7f6i5getu4OI0uIUo7RMpLXkkeqZ2Es/ccWnAB4Es+SgybV4cK8guTN4lGDl dD6xzWqVSl/uJztcfiSf3zlN1gF+/rYpC6d77+yDpAHbPCUufRRihWLqKQhwgRIxys vV4EhBexCfJBg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 97D6418081E for ; Wed, 10 Jul 2024 22:44:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fout5-smtp.messagingengine.com (fout5-smtp.messagingengine.com [103.168.172.148]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 10 Jul 2024 22:44:21 +0000 (UTC) Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailfout.nyi.internal (Postfix) with ESMTP id C71C11380468; Wed, 10 Jul 2024 18:42:54 -0400 (EDT) Received: from imap47 ([10.202.2.97]) by compute7.internal (MEProxy); Wed, 10 Jul 2024 18:42:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= trainedmonkey.com; h=cc:content-type:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1720651374; x= 1720737774; bh=6Dgi4ow2HJD4wInIjMR/iFkKq9eeVDMcmvR3w2dG/jk=; b=m WDlX0xkIY9DA6UuehkROHCc5etNpuPBwo6BgdGvJ3PU31lLrAqmxm4BRKAMzGujL rlBOWv7KkZKiD+PvkbqOE+PKPhn/VdWfwn2WI3rnUnJz8sYBIDXywmM8H03zlsYh FgPa+sDHPmea1ZBxl2YWNhiKmpwM4Y3ddpSW36WNUI0EpHqdAFaM3nIiGMTMfTN6 UpnxGgxUXSfFNDcvI9v46+T2XC62OBuiNxzoDtwUxlYYFw5iqkfh+Yzzo3k2hxEf +EfdNf2/nhxeYOjQYVMoJlTTxsXzoRJ7l5Oge1mR8LyxBf3o0dG1nl9w3zwvEoPh 4J7wJof7Dy/oEdXg0OIxQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1720651374; x=1720737774; bh=6Dgi4ow2HJD4wInIjMR/iFkKq9ee VDMcmvR3w2dG/jk=; b=KMfm2esnNKaGHTB2i6/Lwl08zQhct6egxZksqvYS0Kfc +uVSpo4Q8U6YS4KwPfO2I2w381hr0wSw6lY6Xbx+wxsJxqbG7wYafS9pEvqtMJQ3 6Q7yFBQUoIeu7aBPVwaS6YGurzUb1nSIYoYcXso/y9V2x11tzxwoburRtA4Ixnyf sxRhzm0SGxzziNycEqjrr9kLm3WJgzcxzTLmqboem+WaFnNOot5KVSnUzDc6Nf29 jZ1jDBVRMsCIqR5Xn5hZzJ9NyW/2bzFq861O11snEUMqLCm+bZ4NTF5+wiNinkJa lOHbkxPZ4ZIH00Wli13tOzmVsgk0G3lrz0q492TMZg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrfeefgddufecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvufgtsegrtderreerreejnecuhfhrohhmpedflfhimhcu hghinhhsthgvrggufdcuoehjihhmfiesthhrrghinhgvughmohhnkhgvhidrtghomheqne cuggftrfgrthhtvghrnhepueeggeeludeiueevieffuedtkeetteeutdelgefgffeufeff ueevhfefleejieejnecuffhomhgrihhnpehphhhprdhnvghtnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepjhhimhifsehtrhgrihhnvggumhho nhhkvgihrdgtohhm X-ME-Proxy: Feedback-ID: ia2404087:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 6522FA60078; Wed, 10 Jul 2024 18:42:53 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-568-g843fbadbe-fm-20240701.003-g843fbadb Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Message-ID: In-Reply-To: <80EA6CA9-E14E-4672-A88A-46EFE9E2F3F0@automattic.com> References: <80EA6CA9-E14E-4672-A88A-46EFE9E2F3F0@automattic.com> Date: Wed, 10 Jul 2024 15:42:20 -0700 To: "Dennis Snell" , Internals Subject: Re: [PHP-DEV] Decoding HTML and the Ambiguous Ampersand Content-Type: multipart/alternative; boundary=8f0326255aea45259e6eaa8055911763 From: jimw@trainedmonkey.com ("Jim Winstead") --8f0326255aea45259e6eaa8055911763 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable On Tue, Jul 9, 2024, at 5:00 PM, Dennis Snell wrote: > [ lots snipped ] >=20 > We=E2=80=99re exploring pure-PHP solutions to these problems in WordPr= ess in attempts to improve the reliability and safety of handling HTML. = I=E2=80=99d love to hear your thoughts and know if anyone is willing to = work with me to create an RFC or directly propose patches. We=E2=80=99ve= created a step function which allows finding the next character referen= ce and decoding it separately, enabling some novel features like highlig= hting the character references in source text. >=20 > Should I propose an RFC for this? The missing character references would probably be okay to add via a PR = now. Handling the ambiguous ampersand rules (either through a new flag or new= functions) would need to go through the RFC process. My gut tells me th= at a new flag might be the easiest way to get it in, but I haven't looke= d at the problem very deeply. Timing wise, most people are very focused on hitting the deadline for PH= P 8.4's feature freeze, and this is almost certainly too late for that. = I would take the time to prototype solutions, make sure the full scope o= f the problem is understood, and look to hit the ground running with an = RFC and discussion in early September. I'm also not sure what sort of connection may be found with the new HTML= 5 DOM features/implementation (https://wiki.php.net/rfc/domdocument_html= 5_parser). Thanks. Jim --8f0326255aea45259e6eaa8055911763 Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable
On Tue, Jul 9, 2024, at 5:00 PM, Dennis Snell wrote:<= br>
[ lots= snipped ]

We=E2=80=99re exploring pure-PHP solutions to = these problems in WordPress in attempts to improve the reliability and s= afety of handling HTML. I=E2=80=99d love to hear your thoughts and know = if anyone is willing to work with me to create an RFC or directly propos= e patches. We=E2=80=99ve created a step function which allows finding th= e next character reference and decoding it separately, enabling some nov= el features like highlighting the character references in source text.

Should I propose an RFC for this?
<= /div>

The missing character references would probably = be okay to add via a PR now.
=
Handling the ambiguous amper= sand rules (either through a new flag or new functions) would need to go= through the RFC process. My gut tells me that a new flag might be the e= asiest way to get it in, but I haven't looked at the problem very deeply= .

Timing wise, most people are very focused on hitting th= e deadline for PHP 8.4's feature freeze, and this is almost certainly to= o late for that. I would take the time to prototype solutions, make sure= the full scope of the problem is understood, and look to hit the ground= running with an RFC and discussion in early September.

I= 'm also not sure what sort of connection may be found with the new HTML5= DOM features/implementation (https://wiki.php.net/rfc/domdocument_html5_parser).

--8f0326255aea45259e6eaa8055911763--