Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121075 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 43273 invoked from network); 16 Sep 2023 23:05:21 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 16 Sep 2023 23:05:21 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 86ED1180089 for ; Sat, 16 Sep 2023 16:05:17 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS29838 64.147.123.0/24 X-Spam-Virus: No X-Envelope-From: Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 16 Sep 2023 16:05:16 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id A12343200916 for ; Sat, 16 Sep 2023 19:05:15 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute4.internal (MEProxy); Sat, 16 Sep 2023 19:05:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= garfieldtech.com; h=cc:content-type:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1694905515; x= 1694991915; bh=5/SOovqDxMtaaKBtAekHMqRULMkIj+I0kVVluhmIW5g=; b=F vFbx3I5p9u84YCKY3Jz9zF4JF5gBlerBWJUmenZccykNHYiHwIM5QcQr9WgzvsP/ BnDrrjyBWKwok07MXupJUJfeEpM9mn7VNAYT79m8XVDuvaXymDRjgyn3+PaUKjKa 2s76kcwNnQbv/1vSq3+A+hnOKU3EcYsVcPTIVApPqfQWHvmiKf6uHQ6sfq/5pRUa 1Ta0nMY9ON4vSV4Bw7y6I25s9cyy2kcGcJzh8Or2+cLQp6XZ7RSX32o7mAVP6s+/ 2+mke34RXjXygRD6VqcdO75iMfxH22wAFewSx6rLXDBtSybzaj5b6MFCwlslOnin HdhMSCdfiVKK9I3tLQ8Hw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1694905515; x=1694991915; bh=5/SOovqDxMtaa KBtAekHMqRULMkIj+I0kVVluhmIW5g=; b=r2gK3onTw8RDdgjYscfmAk8Lueft8 +RIAFyO2O/XKNSFNbNwML/8bKCvQqD2XR+JzPmABWCyBuGZ4mQDRvWWPAZR6JrnS T+nwVbEidb/SA1e+XcsM+CtkmVHhq+52wNYDEHUwPc9RwXYvtQXvaoCJfqkQAxFy EBKOZv3/wlrsZWwQp1M1d489PqZ4wdYCQXLNLeEiK8lonBits/9M8J4M0m9oStpJ B8/K54OYu7VLv9cjcCQ+z35iQ4O9bnVWSX9aaENki8MgAansKrA8sZZkF+qBvKzm 4/vMknPCDwbTlI2+fbY7iJhXQvjSS9wEUmlq/ZG27ibYZPFkBixViCXCw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudejhedgudekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne goufhushhpvggtthffohhmrghinhculdegledmnecujfgurhepofgfggfkjghffffhvffu tgesthdtredtreertdenucfhrhhomhepfdfnrghrrhihucfirghrfhhivghlugdfuceolh grrhhrhiesghgrrhhfihgvlhguthgvtghhrdgtohhmqeenucggtffrrghtthgvrhhnpefh hfehteehjeevtddvheeftedvffefjedtffegleekffelgedvvdevleegkeegfeenucffoh hmrghinhepphhhphdrnhgvthdpfehvgehlrdhorhhgnecuvehluhhsthgvrhfuihiivgep tdenucfrrghrrghmpehmrghilhhfrhhomheplhgrrhhrhiesghgrrhhfihgvlhguthgvtg hhrdgtohhm X-ME-Proxy: Feedback-ID: i8414410d:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id C6EA21700089; Sat, 16 Sep 2023 19:05:14 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-745-g95dd7bea33-fm-20230905.001-g95dd7bea Mime-Version: 1.0 Message-ID: In-Reply-To: References: Date: Sat, 16 Sep 2023 18:04:54 -0500 To: "php internals" Content-Type: text/plain Subject: Re: [PHP-DEV] Re: [RFC] [Discussion] DOM HTML5 parsing and serialization support From: larry@garfieldtech.com ("Larry Garfield") On Fri, Sep 15, 2023, at 6:17 PM, Niels Dossche wrote: > On 9/2/23 21:41, Niels Dossche wrote: >> Hello internals >> >> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support". >> https://wiki.php.net/rfc/domdocument_html5_parser >> >> Kind regards >> Niels > > > Hi internals > > I'd like to announce a change to the RFC. The new RFC version is 0.5.1, > the old one was 0.4.0. > The diff can be viewed via the revision history button on the right. > > I had a productive discussion with Tim and Arne about the class hierarchy. > Here's a summary of the changes and the rationale. > > Until now, the RFC specified that DOM\HTML5Document extends DOMDocument. > However, as we're introducing a new class anyway, we believe we should > take the opportunity to improve the API. > We have the following concerns: > a) It's a bit of an awkward class hierarchy. *If* we hypothetically > would want to get rid of DOMDocument in the far far future, we can't > easily do that. > b) API is messy. Some methods are useless for HTML5Document. E.g.: > validate(), loadXML(), loadXMLFile(). They can be a source of confusion. > c) The fact that you can pass HTML5Document to methods accepting > DOMDocument may result in unexpected behaviour when the method expects > a particular behaviour. It would be better if developers could "opt-in" > to accepting both DOMDocument and HTML5Document in a method using a > common base class. > d) The properties set by DOMDocument's constructor are overridden by > load methods, which is surprising. That's even mentioned as the second > top comment on https://www.php.net/manual/en/domdocument.loadxml.php. > Furthermore, the XML version argument of the constructor is even > useless for HTML5 documents. > > So we propose the following changes to the RFC. > > We'll add a common abstract base class DOM\Document (name taken from > the DOM spec & Javascript world). > DOM\Document contains the properties and abstract methods common to > both HTML and XML documents. > Examples of what it includes/excludes: > * includes: firstElementChild, lastElementChild, ... > * excludes: xmlStandalone, xmlVersion, validate(), ... > Then we'll have two subclasses: DOM\HTMLDocument (previously we called > this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the > name to be more resilient to version changes and match the DOM spec > name. > DOMDocument will also use DOM\Document as a base class to make it > interchangeable with the new classes. > > The above would solve points a, b, and c. > To solve point d, we can use "factory methods": > This means HTMLDocument's constructor will be made private, and instead > we'll have three static methods that create a new instance: > - HTMLDocument::fromHTMLString(string $xml): HTMLDocument; That should be string $html, yes? > - HTMLDocument::fromHTMLFile(string $filename): HTMLDocument; > - HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"): > HTMLDocument; > > > Or to put it in PHP code: > > ``` > namespace DOM { > // The base abstract document class > abstract class Document extends DOM\Node implements DOM\ParentNode { > /* all properties and methods that are common and sensible for both > XML & HTML documents */ > } > > class XMLDocument extends Document { > /* insert specific XML methods and properties (e.g. xmlVersion, > validate(), ...) here */ > > private function __construct() {} > > public static function fromEmptyDocument(string $version = "1.0", > string $encoding = "UTF-8"); > public static function fromFile(string $path); > public static function fromString(string $source); > } > > class HTMLDocument extends Document { > /* insert specific Html methods and properties here */ > > private function __construct() {} > > public static function fromEmptyDocument(string $encoding = "UTF-8"); > public static function fromFile(string $path); > public static function fromString(string $source); > } > } > > class DOMDocument extends DOM\Document { > /* Keep methods, properties, and constructor the same as they are now */ > } > ``` > > We're only adding XMLDocument for completeness and API parity. It's a > drop-in replacement for DOMDocument, and behaves the exact same. > The difference is that the API is on par with HTMLDocument, and the > construction is designed to be more misuse-resistant. > DOMDocument will NOT change, and remains for the foreseeable future. > > We also have to change the $ownerDocument field in DOMNode to have type > ?DOM\Document instead of ?DOMDocument. > Problem is that this breaks BC (but only a minor break): > https://3v4l.org/El7Ve. > Overriding properties is kind of useless, but if someone does it, then > the compiler will complain loudly during compilation and it should be > easy to fix. > > > Of course, these changes means that the discussion period will run a > bit longer than originally foreseen. > > Kind regards > Niels This all makes sense to me, and I like this as a way forward. Nice work! --Larry Garfield