Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parser
Kind regards
Niels
Am 02-Sep-2023 21:41:50 +0200 schrieb dossche.niels@gmail.com:
Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi Niels,
thank you for your proposal and your work on this.
This proposal introduces the DOM\HTML5Document class that extends the
DOMDocument class. The reason we introduce a new class instead of replacing
the methods of the existing class is to ensure full backwards compatibility.
Although I do not dislike your idea with a new class for HTML5 parsing I have one question. Why not make the decision, which parser to use, dependent from the doctype declaration at the start of the html document?
Best regards
Christian
Hey Christian
Thank you for going through my proposal.
Am 02-Sep-2023 21:41:50 +0200 schrieb dossche.niels@gmail.com:
Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi Niels,
thank you for your proposal and your work on this.
This proposal introduces the DOM\HTML5Document class that extends the
DOMDocument class. The reason we introduce a new class instead of replacing
the methods of the existing class is to ensure full backwards compatibility.Although I do not dislike your idea with a new class for HTML5 parsing I have one question. Why not make the decision, which parser to use, dependent from the doctype declaration at the start of the html document?
There's three major reasons to not depend solely on the doctype:
- Not all documents have a doctype. So in that case, do we have to pick the old parser or the HTML5 parser?
- People create DOM documents from scratch, without loading some code or file first.
In that case, we can't choose upfront if the document is an HTML5 document or a legacy document.
I believe that by using a class, and thus having a clear choice upfront, this minimizes surprises. - Having a separate class allows us to use the type system to restrict some users to only allow HTML5 documents.
Best regards
Christian
Kind regards
Niels
Thanks for the proposal Niels,
I’ve dealt with my own grief working through issues in DOMDocument and wanting it to work but finding it inadequate.
HTML5
This would be a great starting point; I would love it if we took the opportunity to fix named character reference decoding, as PHP has (to my knowledge) never respected (at least in HTML5) that they decode differently inside attributes as they do inside markup, considering rules such as the ambiguous ampersand and decode errors.
It’s also been frustrating that DOMDocument parses tags in RCDATA sections where they don’t exist, such as in TITLE or TEXTAREA elements, escapes certain types of invalid comments so that they appear rendered in the saved document, and misses basic semantic rules (e.g. creating a BUTTON element as a child of a BUTTON element instead of closing out the already-open BUTTON).
I’d like to share some what a few of us have been working on inside WordPress, which is to build a conformant streaming HTML5 parser:
- https://developer.wordpress.org/reference/classes/wp_html_tag_processor/
- https://make.wordpress.org/core/2023/08/19/progress-report-html-api/
It’s just food for thought right now because adding HTML5 support to DOMDocument would benefit everyone, but we decided we had common need in PHP to work with HTML not in a DOM, but in a streaming fashion, one with very little runtime overhead. My long-term plan has been to get a good grasp for the interface needs and thoroughly test it within the WordPress community and then propose its inclusion into PHP. It’s been incredibly handy so far, and on my laptop runs at around 20 MB/s, which is not great, but good enough for many needs. My naive C port runs on the same laptop at around 80 MB/s and I believe that we can likely triple or quadruple that speed again if any of us working on it knew how to take advantage of SIMD instrinsics.
It tries to accomplish a few goals:
- be fast enough
- interpret HTML as an HTML5-compliant browser will
- find specific locations within an HTML document and then read or modify them
- pass through any invalid HTML it encounters for the browser to resolve/fix unless modifying the part of the document containing those invalid constructions
I only bring up this different interface because once we started digging deep into DOMDocument we found that the problems with it were far from superficial; that there is a host of problems and a mismatched interface to our common needs. It has surprised me that PHP, the language of the web, has had such trouble handling HTML, the language of the web, and we wanted to completely resolve this issue once and for all within WordPress so we can clean up decades’ old problems with encoding, decoding, security, and sanitization.
Warmly,
Dennis Snell
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
Niels
Hi Dennis
Thanks for the proposal Niels,
I’ve dealt with my own grief working through issues in DOMDocument and wanting it to work but finding it inadequate.
HTML5
This would be a great starting point; I would love it if we took the opportunity to fix named character reference decoding, as PHP has (to my knowledge) never respected (at least in HTML5) that they decode differently inside attributes as they do inside markup, considering rules such as the ambiguous ampersand and decode errors.
It’s also been frustrating that DOMDocument parses tags in RCDATA sections where they don’t exist, such as in TITLE or TEXTAREA elements, escapes certain types of invalid comments so that they appear rendered in the saved document, and misses basic semantic rules (e.g. creating a BUTTON element as a child of a BUTTON element instead of closing out the already-open BUTTON).
With this proposal: a real HTML5 parser, these above mentioned problems will fortunately be a problem from the past :)
I’d like to share some what a few of us have been working on inside WordPress, which is to build a conformant streaming HTML5 parser:
- https://developer.wordpress.org/reference/classes/wp_html_tag_processor/ https://developer.wordpress.org/reference/classes/wp_html_tag_processor/
- https://make.wordpress.org/core/2023/08/19/progress-report-html-api/ https://make.wordpress.org/core/2023/08/19/progress-report-html-api/It’s just food for thought right now because adding HTML5 support to DOMDocument would benefit everyone, but we decided we had common need in PHP to work with HTML not in a DOM, but in a streaming fashion, one with very little runtime overhead. My long-term plan has been to get a good grasp for the interface needs and thoroughly test it within the WordPress community and then propose its inclusion into PHP. It’s been incredibly handy so far, and on my laptop runs at around 20 MB/s, which is not great, but good enough for many needs. My naive C port runs on the same laptop at around 80 MB/s and I believe that we can likely triple or quadruple that speed again if any of us working on it knew how to take advantage of SIMD instrinsics.
It tries to accomplish a few goals:
- be fast enough
- interpret HTML as an HTML5-compliant browser will
- find specific locations within an HTML document and then read or modify them
- pass through any invalid HTML it encounters for the browser to resolve/fix unless modifying the part of the document containing those invalid constructions
I've seen someone link this on Reddit today, it's a really nice project!
It reminds me of Cloudflare's lol-html, which is also a streaming parser used to modify and sanitize documents linearly.
I believe this could be a great addition, it solves a different problem that the ext/dom extension solves. So I think it would be a great complementary addition.
I only bring up this different interface because once we started digging deep into DOMDocument we found that the problems with it were far from superficial; that there is a host of problems and a mismatched interface to our common needs. It has surprised me that PHP, the language of the web, has had such trouble handling HTML, the language of the web, and we wanted to completely resolve this issue once and for all within WordPress so we can clean up decades’ old problems with encoding, decoding, security, and sanitization.
Yes, I was also quite surprised of the lacking support for modern web features, and also the problems with spec compliance.
I only recently got into maintaining ext/dom. So there's still a lot of work to do.
I had already started with adding more DOM APIs in the 8.3 release cycle and plan to continue that effort in 8.4.
Another major project I want to do for 8.4, besides HTML5 support, is fixing the spec compliance issues in an opt-in manner. This would help with security & sanitization problems (HTML5 should help with the encoding&decoding).
Warmly,
Dennis Snell
Kind regards
Niels
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parser https://wiki.php.net/rfc/domdocument_html5_parserKind regards
Niels
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parser
Thanks, Niels. This is much needed.
This proposal introduces the DOM\HTML5Document class that extends the DOMDocument class.
In light of the recent discussion regarding casing of acronyms in class
and method names 1, perhaps this proposal should consider
Dom\Html5Document
instead? I know this isn't consistent with the other
DOMDocument
classes, so maybe remaining consistent is better in this case.
I propose to use Lexbor.
Lexbor is licensed under the Apache-2.0 license 2. In general, I don't
think this will present any problems, as long as we adhere to the terms
of the license (i.e., include the NOTICE file, etc.), but I did want to
call attention to it in case bundling the source code of an Apache-2.0
project within a PHP-3.01 project is a cause for concern for anyone.
Cheers,
Ben
Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
Niels
Hi internals
I'd like to announce a change to the RFC. The new RFC version is 0.5.1, the old one was 0.4.0.
The diff can be viewed via the revision history button on the right.
I had a productive discussion with Tim and Arne about the class hierarchy.
Here's a summary of the changes and the rationale.
Until now, the RFC specified that DOM\HTML5Document extends DOMDocument.
However, as we're introducing a new class anyway, we believe we should take the opportunity to improve the API.
We have the following concerns:
a) It's a bit of an awkward class hierarchy. If we hypothetically would want to get rid of DOMDocument in the far far future, we can't easily do that.
b) API is messy. Some methods are useless for HTML5Document. E.g.: validate(), loadXML(), loadXMLFile(). They can be a source of confusion.
c) The fact that you can pass HTML5Document to methods accepting DOMDocument may result in unexpected behaviour when the method expects a particular behaviour. It would be better if developers could "opt-in" to accepting both DOMDocument and HTML5Document in a method using a common base class.
d) The properties set by DOMDocument's constructor are overridden by load methods, which is surprising. That's even mentioned as the second top comment on https://www.php.net/manual/en/domdocument.loadxml.php. Furthermore, the XML version argument of the constructor is even useless for HTML5 documents.
So we propose the following changes to the RFC.
We'll add a common abstract base class DOM\Document (name taken from the DOM spec & Javascript world).
DOM\Document contains the properties and abstract methods common to both HTML and XML documents.
Examples of what it includes/excludes:
- includes: firstElementChild, lastElementChild, ...
- excludes: xmlStandalone, xmlVersion, validate(), ...
Then we'll have two subclasses: DOM\HTMLDocument (previously we called this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the name to be more resilient to version changes and match the DOM spec name.
DOMDocument will also use DOM\Document as a base class to make it interchangeable with the new classes.
The above would solve points a, b, and c.
To solve point d, we can use "factory methods":
This means HTMLDocument's constructor will be made private, and instead we'll have three static methods that create a new instance:
- HTMLDocument::fromHTMLString(string $xml): HTMLDocument;
- HTMLDocument::fromHTMLFile(string $filename): HTMLDocument;
- HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"): HTMLDocument;
Or to put it in PHP code:
namespace DOM {
// The base abstract document class
abstract class Document extends DOM\Node implements DOM\ParentNode {
/* all properties and methods that are common and sensible for both XML & HTML documents */
}
class XMLDocument extends Document {
/* insert specific XML methods and properties (e.g. xmlVersion, validate(), ...) here */
private function __construct() {}
public static function fromEmptyDocument(string $version = "1.0", string $encoding = "UTF-8");
public static function fromFile(string $path);
public static function fromString(string $source);
}
class HTMLDocument extends Document {
/* insert specific Html methods and properties here */
private function __construct() {}
public static function fromEmptyDocument(string $encoding = "UTF-8");
public static function fromFile(string $path);
public static function fromString(string $source);
}
}
class DOMDocument extends DOM\Document {
/* Keep methods, properties, and constructor the same as they are now */
}
We're only adding XMLDocument for completeness and API parity. It's a drop-in replacement for DOMDocument, and behaves the exact same.
The difference is that the API is on par with HTMLDocument, and the construction is designed to be more misuse-resistant.
DOMDocument will NOT change, and remains for the foreseeable future.
We also have to change the $ownerDocument field in DOMNode to have type ?DOM\Document instead of ?DOMDocument.
Problem is that this breaks BC (but only a minor break): https://3v4l.org/El7Ve.
Overriding properties is kind of useless, but if someone does it, then the compiler will complain loudly during compilation and it should be easy to fix.
Of course, these changes means that the discussion period will run a bit longer than originally foreseen.
Kind regards
Niels
Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
NielsHi internals
I'd like to announce a change to the RFC. The new RFC version is 0.5.1,
the old one was 0.4.0.
The diff can be viewed via the revision history button on the right.I had a productive discussion with Tim and Arne about the class hierarchy.
Here's a summary of the changes and the rationale.Until now, the RFC specified that DOM\HTML5Document extends DOMDocument.
However, as we're introducing a new class anyway, we believe we should
take the opportunity to improve the API.
We have the following concerns:
a) It's a bit of an awkward class hierarchy. If we hypothetically
would want to get rid of DOMDocument in the far far future, we can't
easily do that.
b) API is messy. Some methods are useless for HTML5Document. E.g.:
validate(), loadXML(), loadXMLFile(). They can be a source of confusion.
c) The fact that you can pass HTML5Document to methods accepting
DOMDocument may result in unexpected behaviour when the method expects
a particular behaviour. It would be better if developers could "opt-in"
to accepting both DOMDocument and HTML5Document in a method using a
common base class.
d) The properties set by DOMDocument's constructor are overridden by
load methods, which is surprising. That's even mentioned as the second
top comment on https://www.php.net/manual/en/domdocument.loadxml.php.
Furthermore, the XML version argument of the constructor is even
useless for HTML5 documents.So we propose the following changes to the RFC.
We'll add a common abstract base class DOM\Document (name taken from
the DOM spec & Javascript world).
DOM\Document contains the properties and abstract methods common to
both HTML and XML documents.
Examples of what it includes/excludes:
- includes: firstElementChild, lastElementChild, ...
- excludes: xmlStandalone, xmlVersion, validate(), ...
Then we'll have two subclasses: DOM\HTMLDocument (previously we called
this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the
name to be more resilient to version changes and match the DOM spec
name.
DOMDocument will also use DOM\Document as a base class to make it
interchangeable with the new classes.The above would solve points a, b, and c.
To solve point d, we can use "factory methods":
This means HTMLDocument's constructor will be made private, and instead
we'll have three static methods that create a new instance:
- HTMLDocument::fromHTMLString(string $xml): HTMLDocument;
That should be string $html, yes?
- HTMLDocument::fromHTMLFile(string $filename): HTMLDocument;
- HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"):
HTMLDocument;Or to put it in PHP code:
namespace DOM { // The base abstract document class abstract class Document extends DOM\Node implements DOM\ParentNode { /* all properties and methods that are common and sensible for both XML & HTML documents */ } class XMLDocument extends Document { /* insert specific XML methods and properties (e.g. xmlVersion, validate(), ...) here */ private function __construct() {} public static function fromEmptyDocument(string $version = "1.0", string $encoding = "UTF-8"); public static function fromFile(string $path); public static function fromString(string $source); } class HTMLDocument extends Document { /* insert specific Html methods and properties here */ private function __construct() {} public static function fromEmptyDocument(string $encoding = "UTF-8"); public static function fromFile(string $path); public static function fromString(string $source); } } class DOMDocument extends DOM\Document { /* Keep methods, properties, and constructor the same as they are now */ }
We're only adding XMLDocument for completeness and API parity. It's a
drop-in replacement for DOMDocument, and behaves the exact same.
The difference is that the API is on par with HTMLDocument, and the
construction is designed to be more misuse-resistant.
DOMDocument will NOT change, and remains for the foreseeable future.We also have to change the $ownerDocument field in DOMNode to have type
?DOM\Document instead of ?DOMDocument.
Problem is that this breaks BC (but only a minor break):
https://3v4l.org/El7Ve.
Overriding properties is kind of useless, but if someone does it, then
the compiler will complain loudly during compilation and it should be
easy to fix.Of course, these changes means that the discussion period will run a
bit longer than originally foreseen.Kind regards
Niels
This all makes sense to me, and I like this as a way forward. Nice work!
--Larry Garfield
Hi Larry
Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
NielsHi internals
I'd like to announce a change to the RFC. The new RFC version is 0.5.1,
the old one was 0.4.0.
The diff can be viewed via the revision history button on the right.I had a productive discussion with Tim and Arne about the class hierarchy.
Here's a summary of the changes and the rationale.Until now, the RFC specified that DOM\HTML5Document extends DOMDocument.
However, as we're introducing a new class anyway, we believe we should
take the opportunity to improve the API.
We have the following concerns:
a) It's a bit of an awkward class hierarchy. If we hypothetically
would want to get rid of DOMDocument in the far far future, we can't
easily do that.
b) API is messy. Some methods are useless for HTML5Document. E.g.:
validate(), loadXML(), loadXMLFile(). They can be a source of confusion.
c) The fact that you can pass HTML5Document to methods accepting
DOMDocument may result in unexpected behaviour when the method expects
a particular behaviour. It would be better if developers could "opt-in"
to accepting both DOMDocument and HTML5Document in a method using a
common base class.
d) The properties set by DOMDocument's constructor are overridden by
load methods, which is surprising. That's even mentioned as the second
top comment on https://www.php.net/manual/en/domdocument.loadxml.php.
Furthermore, the XML version argument of the constructor is even
useless for HTML5 documents.So we propose the following changes to the RFC.
We'll add a common abstract base class DOM\Document (name taken from
the DOM spec & Javascript world).
DOM\Document contains the properties and abstract methods common to
both HTML and XML documents.
Examples of what it includes/excludes:
- includes: firstElementChild, lastElementChild, ...
- excludes: xmlStandalone, xmlVersion, validate(), ...
Then we'll have two subclasses: DOM\HTMLDocument (previously we called
this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the
name to be more resilient to version changes and match the DOM spec
name.
DOMDocument will also use DOM\Document as a base class to make it
interchangeable with the new classes.The above would solve points a, b, and c.
To solve point d, we can use "factory methods":
This means HTMLDocument's constructor will be made private, and instead
we'll have three static methods that create a new instance:
- HTMLDocument::fromHTMLString(string $xml): HTMLDocument;
That should be string $html, yes?
Indeed, silly copy paste mistake on my part.
In fact, the bullet point list is a bit outdated: it uses an older proposed method name (i.e. it says fromHTMLString instead of fromString; the HTML part is unnecessary duplication as the method is in the HTMLDocument class). I forgot to update that in my email draft when the final name was decided.
Anyway, the code snippet from my email and the RFC text both have the right method name and parameter name.
- HTMLDocument::fromHTMLFile(string $filename): HTMLDocument;
- HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"):
HTMLDocument;Or to put it in PHP code:
namespace DOM { // The base abstract document class abstract class Document extends DOM\Node implements DOM\ParentNode { /* all properties and methods that are common and sensible for both XML & HTML documents */ } class XMLDocument extends Document { /* insert specific XML methods and properties (e.g. xmlVersion, validate(), ...) here */ private function __construct() {} public static function fromEmptyDocument(string $version = "1.0", string $encoding = "UTF-8"); public static function fromFile(string $path); public static function fromString(string $source); } class HTMLDocument extends Document { /* insert specific Html methods and properties here */ private function __construct() {} public static function fromEmptyDocument(string $encoding = "UTF-8"); public static function fromFile(string $path); public static function fromString(string $source); } } class DOMDocument extends DOM\Document { /* Keep methods, properties, and constructor the same as they are now */ }
We're only adding XMLDocument for completeness and API parity. It's a
drop-in replacement for DOMDocument, and behaves the exact same.
The difference is that the API is on par with HTMLDocument, and the
construction is designed to be more misuse-resistant.
DOMDocument will NOT change, and remains for the foreseeable future.We also have to change the $ownerDocument field in DOMNode to have type
?DOM\Document instead of ?DOMDocument.
Problem is that this breaks BC (but only a minor break):
https://3v4l.org/El7Ve.
Overriding properties is kind of useless, but if someone does it, then
the compiler will complain loudly during compilation and it should be
easy to fix.Of course, these changes means that the discussion period will run a
bit longer than originally foreseen.Kind regards
NielsThis all makes sense to me, and I like this as a way forward. Nice work!
--Larry Garfield
Glad to hear that :)
Cheers
Niels
We'll add a common abstract base class DOM\Document (name taken from the
DOM spec & Javascript world).
DOM\Document contains the properties and abstract methods common to both
HTML and XML documents.
Hi,
Yes looks a lot better.
Great work overall! And thank you for taking on this effort.
I would have a small suggestion: to make the abstract class an interface.
This will allow even more flexibility on how things can be build further,
suggesting composition over inheritance.
In user land we cannot have interfaces with properties (yet) but in php
internal interfaces we have example with interface UnitEnum that has name
property, extendes by BackedEnum that adds value property.
Thank you,
Alex
Hi Alexandru
We'll add a common abstract base class DOM\Document (name taken from the
DOM spec & Javascript world).
DOM\Document contains the properties and abstract methods common to both
HTML and XML documents.Hi,
Yes looks a lot better.
Great work overall! And thank you for taking on this effort.I would have a small suggestion: to make the abstract class an interface.
This will allow even more flexibility on how things can be build further,
suggesting composition over inheritance.
In user land we cannot have interfaces with properties (yet) but in php
internal interfaces we have example with interface UnitEnum that has name
property, extendes by BackedEnum that adds value property.
Right, we discussed the use of an interface internally too.
Indeed as you suggest, we chose an abstract class over an interface because of the property limitation.
Looking at UnitEnum & BackedEnum (https://github.com/php/php-src/blob/bae30682b896b26f177f83648bd58c77ba3480a8/Zend/zend_enum.stub.php) I don't see the properties defined on the interface. In practice, all enums get the properties of course, just not via the interface.
So as we cannot represent the properties on the interfaces (yet), we'll stick with the abstract class for now.
Thank you,
Alex
Kind regards
Niels
Hi Alexandru
We'll add a common abstract base class DOM\Document (name taken from the
DOM spec & Javascript world).
DOM\Document contains the properties and abstract methods common to both
HTML and XML documents.Hi,
Yes looks a lot better.
Great work overall! And thank you for taking on this effort.I would have a small suggestion: to make the abstract class an interface.
This will allow even more flexibility on how things can be build further,
suggesting composition over inheritance.
In user land we cannot have interfaces with properties (yet) but in php
internal interfaces we have example with interface UnitEnum that has name
property, extendes by BackedEnum that adds value property.Right, we discussed the use of an interface internally too.
Indeed as you suggest, we chose an abstract class over an interface because of the property limitation.
Looking at UnitEnum & BackedEnum (https://github.com/php/php-src/blob/bae30682b896b26f177f83648bd58c77ba3480a8/Zend/zend_enum.stub.php) I don't see the properties defined on the interface. In practice, all enums get the properties of course, just not via the interface.
So as we cannot represent the properties on the interfaces (yet), we'll stick with the abstract class for now.Thank you,
AlexKind regards
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi Niels,
Can you expand on the reasoning for two of the decisions in your proposal? I'm not sure I really see the reason/benefit :
-
fromX() methods are on the individual classes, rather than the parent, which as I understand it, you're using as a poor-mans interface with properties. I'd have thought that at the very least the parent should declare those methods as abstract
-
Why "fromEmptyDocument()" rather than just allowing
new DOM\{XML,HTML}Document
via a public constructor?
The only mention of 'constructor' I see in the email or the RFC is the line below:
the properties set by DOMDocument's constructor are overridden by its load methods, which is surprising
But I don't really see how making the constructor private and forcing use of a static method changes this aspect, at all - it just introduces a different way to create an instance.
Otherwise, it's great to see some activity on DOM handling classes.
Cheers
Stephen
Hi Stephen
Hi Alexandru
We'll add a common abstract base class DOM\Document (name taken from the
DOM spec & Javascript world).
DOM\Document contains the properties and abstract methods common to both
HTML and XML documents.Hi,
Yes looks a lot better.
Great work overall! And thank you for taking on this effort.I would have a small suggestion: to make the abstract class an interface.
This will allow even more flexibility on how things can be build further,
suggesting composition over inheritance.
In user land we cannot have interfaces with properties (yet) but in php
internal interfaces we have example with interface UnitEnum that has name
property, extendes by BackedEnum that adds value property.Right, we discussed the use of an interface internally too.
Indeed as you suggest, we chose an abstract class over an interface because of the property limitation.
Looking at UnitEnum & BackedEnum (https://github.com/php/php-src/blob/bae30682b896b26f177f83648bd58c77ba3480a8/Zend/zend_enum.stub.php https://github.com/php/php-src/blob/bae30682b896b26f177f83648bd58c77ba3480a8/Zend/zend_enum.stub.php) I don't see the properties defined on the interface. In practice, all enums get the properties of course, just not via the interface.
So as we cannot represent the properties on the interfaces (yet), we'll stick with the abstract class for now.Thank you,
AlexKind regards
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php https://www.php.net/unsub.php
Hi Niels,Can you expand on the reasoning for two of the decisions in your proposal? I'm not sure I really see the reason/benefit :
- fromX() methods are on the individual classes, rather than the parent, which as I understand it, you're using as a poor-mans interface with properties. I'd have thought that at the very least the parent should declare those methods as abstract
At least the fromEmptyDocument signature differs between HTMLDocument & XMLDocument, so it's not possible to declare that method on the parent.
It's possible to define the other two on the parent. However, I choose against that because:
- They have different behaviour if called on XML vs HTML, because they return their respective new instance. At least the other methods have the same behaviour between XML & HTML. Furthermore it would be weird to have 2 out of 3 factory methods on the parent, but the other one not.
- It makes some sense at least to call e.g. createComment() while not knowing the concrete type of the document (e.g. when supporting both XML & HTML documents), so having that method in the parent makes sense. For example:
function addCopyright(DOM\Document $doc) {
$doc->append($doc->createComment("(c) foo 2023"));
}
However, you're not going to call the factory methods on a variable that can be either XML or HTML document, as you wouldn't know a priori what you're getting back.
- Why "fromEmptyDocument()" rather than just allowing
new DOM\{XML,HTML}Document
via a public constructor?
While technically possible, I find it confusing to have both a normal constructor and factory methods.
When people see factory methods, they'll know - from experience - that every construction needs to go via factories.
Otherwise, people are going to call the constructor and search for a loadHTML/loadHTMLFile method like they do with DOMDocument.
The only mention of 'constructor' I see in the email or the RFC is the line below:
the properties set by DOMDocument's constructor are overridden by its load methods, which is surprising
But I don't really see how making the constructor private and forcing use of a static method changes this aspect, at all - it just introduces a different way to create an instance.
The problem with DOMDocument's approach is the following:
$dom = new DOMDocument(encoding: "bla");
$dom->loadXML(...); // Oops encoding gone!
With the new classes, you load the document at construction time (or no document at all).
Therefore, this confusion/API misuse is prevented by the API design.
You can see it as having 3 different constructors that disallow loading another document on the same instance afterwards.
Otherwise, it's great to see some activity on DOM handling classes.
Cheers
Stephen
Cheers
Niels
On 19 Sep 2023, at 01:00, Niels Dossche
Cheers
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php
(Resending with history removed due to apparent size limit)
Hi Niels,
Obviously, a method with different signatures (fromEmptyDocument
) can't be in the parent, and I'll discuss that below, but having the other factory methods that do have the same signature in parent (even if only an abstract method, forcing concrete implementations to define it) means that it's possible to safely do meta-programming with these classes, where you load a string/file, using a classname from a variable. This use-case doesn't necessarily need to know which specific instance it's getting back, so long as it's an instance of the base class (or interface, if it had been one), but that information can still be presented, via the return type (either static
, or self
on the parent and the actual class name on each of the implementations).
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'s createFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.
Cheers
Stephen
Hi
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'s
createFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.
I was one of the persons who discussed this updated API with Niels in
private and looking up the discussion it was me who proposed making the
constructor private and just providing named constructors.
From the perspective of the user of the API, I like the symmetry
between all the named constructors:
Whenever I want to create a new document, I use one of the fromXyz()
methods. And when I use those, I get exactly what it says on the tin.
Making the regular constructor available for use would effectively make
whatever that constructor does a special case / the default case. This
makes sense for DateTimeImmutable, because the __construct() variant is
likely much more often used than the various createFromXyz() variants.
For the HtmlDocument I find it less obvious that creating an empty
document would be the default case compared to loading an existing
document from a file.
We should probably rename the named constructors to include the "create"
prefix for consistency with DTI though.
Best regards
Tim Düsterhus
Hi
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'screateFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.I was one of the persons who discussed this updated API with Niels in private and looking up the discussion it was me who proposed making the constructor private and just providing named constructors.
From the perspective of the user of the API, I like the symmetry between all the named constructors:
Whenever I want to create a new document, I use one of the fromXyz() methods. And when I use those, I get exactly what it says on the tin.
Making the regular constructor available for use would effectively make whatever that constructor does a special case / the default case. This makes sense for DateTimeImmutable, because the __construct() variant is likely much more often used than the various createFromXyz() variants. For the HtmlDocument I find it less obvious that creating an empty document would be the default case compared to loading an existing document from a file.
We should probably rename the named constructors to include the "create" prefix for consistency with DTI though.
Best regards
Tim Düsterhus
Hi Tim,
Thanks for providing context on where this idea came from.
The constructor + specialised factory methods pattern is not exactly new in PHP (e.g. it took about 3 minutes to find multiple Symphony classes doing this), and disabling the public constructor for purely cosmetic reasons sounds like a very weird, and ironic choice to me, when the stated goal is to make the API less surprising than the previous one.
Cheers
Stephen
Hi Stephen
Hi
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'screateFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.I was one of the persons who discussed this updated API with Niels in private and looking up the discussion it was me who proposed making the constructor private and just providing named constructors.
From the perspective of the user of the API, I like the symmetry between all the named constructors:
Whenever I want to create a new document, I use one of the fromXyz() methods. And when I use those, I get exactly what it says on the tin.
Making the regular constructor available for use would effectively make whatever that constructor does a special case / the default case. This makes sense for DateTimeImmutable, because the __construct() variant is likely much more often used than the various createFromXyz() variants. For the HtmlDocument I find it less obvious that creating an empty document would be the default case compared to loading an existing document from a file.
We should probably rename the named constructors to include the "create" prefix for consistency with DTI though.
Best regards
Tim DüsterhusHi Tim,
Thanks for providing context on where this idea came from.
The constructor + specialised factory methods pattern is not exactly new in PHP (e.g. it took about 3 minutes to find multiple Symphony classes doing this), and disabling the public constructor for purely cosmetic reasons sounds like a very weird, and ironic choice to me, when the stated goal is to make the API less surprising than the previous one.
Besides the points that have been mentioned by Tim and Larry, there's also the expectation of the programmer that migrates to the new classes to take into account.
They're used to calling the constructor on the old class and calling a load method afterwards.
As there is still a constructor, but no load method, this might be confusing for them.
So in my opinion, disabling it makes it less surprising than the previous one.
Also, just because Symfony does this doesn't mean it's automatically something we should follow.
Cheers
Stephen
Kind regards
Niels
Hi Stephen
Hi
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'screateFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.I was one of the persons who discussed this updated API with Niels in private and looking up the discussion it was me who proposed making the constructor private and just providing named constructors.
From the perspective of the user of the API, I like the symmetry between all the named constructors:
Whenever I want to create a new document, I use one of the fromXyz() methods. And when I use those, I get exactly what it says on the tin.
Making the regular constructor available for use would effectively make whatever that constructor does a special case / the default case. This makes sense for DateTimeImmutable, because the __construct() variant is likely much more often used than the various createFromXyz() variants. For the HtmlDocument I find it less obvious that creating an empty document would be the default case compared to loading an existing document from a file.
We should probably rename the named constructors to include the "create" prefix for consistency with DTI though.
Best regards
Tim DüsterhusHi Tim,
Thanks for providing context on where this idea came from.
The constructor + specialised factory methods pattern is not exactly new in PHP (e.g. it took about 3 minutes to find multiple Symphony classes doing this), and disabling the public constructor for purely cosmetic reasons sounds like a very weird, and ironic choice to me, when the stated goal is to make the API less surprising than the previous one.
Besides the points that have been mentioned by Tim and Larry, there's also the expectation of the programmer that migrates to the new classes to take into account.
They're used to calling the constructor on the old class and calling a load method afterwards.
As there is still a constructor, but no load method, this might be confusing for them.
So in my opinion, disabling it makes it less surprising than the previous one.
Also, just because Symfony does this doesn't mean it's automatically something we should follow.Cheers
Stephen
Kind regards
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi Niels, Larry
To clarify my earlier message about having the factory methods on the faux-interface - when I said meta programming that was probably the wrong term to use, so I apologise for the confusion.
It's true that the developer needs to know the expected type of the string being parsed. It's not true that the code directly invoking the 'fromString' method necessarily needs to know - passing classnames as a string and calling static methods on the string is also an established concept in PHP, which works really well when the method being called is defined on a single top-level class or interface.
A real world example I had in mind, is a Request (e.g. a curl request, or some other http-fetching) class that can decode responses into native PHP objects, with options for the callee to define the object type that's wanted; with just the two provided classes there's already double the checks required to know if the specified class name is one the library knows how to use. If a hypothetical class to handle SVG, RSS, ODF (or any other specialised use of XML) class uses the faux-interface from this RFC as it's base, there's no way for the Request class to know that it can instantiate an instance from a string, without falling back to method_exists()
checks.
The only reason I mentioned the DateTime or Symphony classes at all, is to highlight that people have been using classes with a regular constructor, and one or more static factory methods, for years, and if there has been any confusion about it thus far, it's not something I've ever heard complaints about, nor seen efforts to address.
Calling arbitrarily named static methods "named constructors" doesn't really make a convincing argument when someone suddenly realises that the names should be something else halfway through the same conversation. That doesn't mean there's something wrong specifically with having static factory methods when they're needed, but that's my whole point: the default way to instantiate an object in PHP is through its constructor, and classes that don't provide a public constructor are much less common, usually with an obvious reason why (e.g. they're a singleton, or they're objects migrated from resource handles).
I've had enough of these conversations to realise that you've made a decision and you're sticking to it, regardless of what I might write (well except for changing the completely arbitrary names, I guess), so I won't continue trying to make my point to you.
Please at least do one thing though: make it clear in the RFC that the public constructor is removed because ... well I don't know, pick one of the reasons you've thrown up I guess. But right now the only reason even vaguely mentioned in the RFC is that it prevents the "changed encoding" issue, but it doesn't do that, the lack of a load method achieves that, and IMO the lack of a regular constructor is a significant enough change from most PHP classes, that people should be aware of that when reading and voting on your proposal.
Cheers
Stephen
Hi Stephen
Hi Stephen
Hi
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'screateFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.I was one of the persons who discussed this updated API with Niels in private and looking up the discussion it was me who proposed making the constructor private and just providing named constructors.
From the perspective of the user of the API, I like the symmetry between all the named constructors:
Whenever I want to create a new document, I use one of the fromXyz() methods. And when I use those, I get exactly what it says on the tin.
Making the regular constructor available for use would effectively make whatever that constructor does a special case / the default case. This makes sense for DateTimeImmutable, because the __construct() variant is likely much more often used than the various createFromXyz() variants. For the HtmlDocument I find it less obvious that creating an empty document would be the default case compared to loading an existing document from a file.
We should probably rename the named constructors to include the "create" prefix for consistency with DTI though.
Best regards
Tim DüsterhusHi Tim,
Thanks for providing context on where this idea came from.
The constructor + specialised factory methods pattern is not exactly new in PHP (e.g. it took about 3 minutes to find multiple Symphony classes doing this), and disabling the public constructor for purely cosmetic reasons sounds like a very weird, and ironic choice to me, when the stated goal is to make the API less surprising than the previous one.
Besides the points that have been mentioned by Tim and Larry, there's also the expectation of the programmer that migrates to the new classes to take into account.
They're used to calling the constructor on the old class and calling a load method afterwards.
As there is still a constructor, but no load method, this might be confusing for them.
So in my opinion, disabling it makes it less surprising than the previous one.
Also, just because Symfony does this doesn't mean it's automatically something we should follow.Cheers
Stephen
Kind regards
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php https://www.php.net/unsub.php
Hi Niels, Larry
To clarify my earlier message about having the factory methods on the faux-interface - when I said meta programming that was probably the wrong term to use, so I apologise for the confusion.
It's true that the developer needs to know the expected type of the string being parsed. It's not true that the code directly invoking the 'fromString' method necessarily needs to know - passing classnames as a string and calling static methods on the string is also an established concept in PHP, which works really well when the method being called is defined on a single top-level class or interface.
A real world example I had in mind, is a Request (e.g. a curl request, or some other http-fetching) class that can decode responses into native PHP objects, with options for the callee to define the object type that's wanted; with just the two provided classes there's already double the checks required to know if the specified class name is one the library knows how to use. If a hypothetical class to handle SVG, RSS, ODF (or any other specialised use of XML) class uses the faux-interface from this RFC as it's base, there's no way for the Request class to know that it can instantiate an instance from a string, without falling back tomethod_exists()
checks.
You'd still have to know the signature, and what options it accepts and affect the parser to be able to use it in a meaningful way.
Anyway, you can still do this use-case without a method_exists check or something alike by using union types for example.
You'll likely still want to put some kind of limitations on, so limiting via type seems sensible.
The only reason I mentioned the DateTime or Symphony classes at all, is to highlight that people have been using classes with a regular constructor, and one or more static factory methods, for years, and if there has been any confusion about it thus far, it's not something I've ever heard complaints about, nor seen efforts to address.
Calling arbitrarily named static methods "named constructors" doesn't really make a convincing argument when someone suddenly realises that the names should be something else halfway through the same conversation. That doesn't mean there's something wrong specifically with having static factory methods when they're needed, but that's my whole point: the default way to instantiate an object in PHP is through its constructor, and classes that don't provide a public constructor are much less common, usually with an obvious reason why (e.g. they're a singleton, or they're objects migrated from resource handles).
I've had enough of these conversations to realise that you've made a decision and you're sticking to it, regardless of what I might write (well except for changing the completely arbitrary names, I guess), so I won't continue trying to make my point to you.
Please at least do one thing though: make it clear in the RFC that the public constructor is removed because ... well I don't know, pick one of the reasons you've thrown up I guess. But right now the only reason even vaguely mentioned in the RFC is that it prevents the "changed encoding" issue, but it doesn't do that, the lack of a load method achieves that, and IMO the lack of a regular constructor is a significant enough change from most PHP classes, that people should be aware of that when reading and voting on your proposal.
I will list the reasons more clearly in the RFC.
Cheers
Stephen
Kind regards
Niels
Hi
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'s
createFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.I was one of the persons who discussed this updated API with Niels in
private and looking up the discussion it was me who proposed making the
constructor private and just providing named constructors.From the perspective of the user of the API, I like the symmetry
between all the named constructors:Whenever I want to create a new document, I use one of the fromXyz()
methods. And when I use those, I get exactly what it says on the tin.Making the regular constructor available for use would effectively make
whatever that constructor does a special case / the default case. This
makes sense for DateTimeImmutable, because the __construct() variant is
likely much more often used than the various createFromXyz() variants.
For the HtmlDocument I find it less obvious that creating an empty
document would be the default case compared to loading an existing
document from a file.We should probably rename the named constructors to include the "create"
prefix for consistency with DTI though.Best regards
Tim Düsterhus
I agree with Tim here. If you have named constructors that are all "equally common," then it's non-obvious which one should be the "unnamed default." I know that fromX()
requires an X, fromY()
requires a Y, but what does __construct()
require? I cannot tell from the call site. If __construct() is just an alias of on of the fromX() methods, then what purpose does it serve other than confusion?
In addition, although it's probably not super relevant here, static factory methods are chainable in a way that new
is not. No extra parens are necessary. For classes I'm creating at hoc (rather than services that are always managed via DI), I often prefer a named constructor just for convenience.
I don't see it as a Java-ism at all. Java is (in)famous for excessive layers of factory classes, which is something different than what is going on here. (How fair that reputation is in 2023, I don't know.) This is just "named constructors," which has been a reasonably common pattern in PHP land for a long time.
--Larry Garfield
Hi Stephen
On 19 Sep 2023, at 01:00, Niels Dossche
Cheers
Niels--
To unsubscribe, visit: https://www.php.net/unsub.php
(Resending with history removed due to apparent size limit)
Hi Niels,
Obviously, a method with different signatures (
fromEmptyDocument
) can't be in the parent, and I'll discuss that below, but having the other factory methods that do have the same signature in parent (even if only an abstract method, forcing concrete implementations to define it) means that it's possible to safely do meta-programming with these classes, where you load a string/file, using a classname from a variable. This use-case doesn't necessarily need to know which specific instance it's getting back, so long as it's an instance of the base class (or interface, if it had been one), but that information can still be presented, via the return type (eitherstatic
, orself
on the parent and the actual class name on each of the implementations).
I don't really understand this part. What kind of meta-programming are you thinking of? Can you give an example?
Calling the create methods on the abstract type does not seem sensible, because the input data is not interchangeable between the two subclasses.
E.g. passing an XML input into HTMLDocument will not go well (and vice versa).
So it seems to me that you need to know the type to be able to give sensible input.
Regarding the private constructor: I understand the issue with the old class being confusing - but your new class doesn't have that issue, because there are no "loadXXX" methods: as you said, if you're loading an existing document, you're forced to do it via the static factory methods. With that change alone, there's zero risk of confusion in allowing direct constructor calls, because once the object is instantiated there is no subsequent way to load a document and inadvertently change the encoding.
Having a regular constructor and then one or more factory methods for specific cases is already a concept in PHP (i.e. DateTime[Immutable]'s
createFromXXX
as well as a constructor), and IMO needing to call a factory method to get an "empty" document seems out of place in PHP - it seems a bit like a Java-ism - using a factory, where one just isn't required.
I agree with what Tim and Larry have said here.
I do agree too that it should be called createFromString and createFromFile to be consistent with the DateTime classes.
And perhaps instead of createFromEmpty it should be createEmpty because you're not creating an instance "from" something here, you're creating it with "nothing".
Cheers
Stephen
Kind regards
Niels
Hi
[…]
Thank you for the fruitful discussion. This updated API is much better.
When reading the updated RFC yesterday, I initially wanted to complain
about the "The options argument" section still existing, but the
$options argument nowhere to be seen. But I see that you now added it.
One thing I now note that could also be improved, now that we make some
larger improvements to the API:
The $options argument still takes an integer and expects the developer
to use the bitwise-or operator to combine multiple constants to set the
appropriate flags. Despite this being a commonly used pattern in the
standard library, I think it provides for a bad developer experience and
makes it harder to extend the list of options for non-boolean options. I
would never design an API like this when creating a userland library.
I have two suggestions that might or might not be better:
-
Make the options explicit arguments with a default value and expect
the user to leverage named arguments to override specific arguments, e.g.HtmlDocument::fromString(
string $string,
bool $ignoreErrors = false, //LIBXML_NOERROR
bool $reducedMemoryUsage = false, //LIBXML_COMPACT
bool $impliedTags = true, //LIBXML_HTML_NOIMPLIED
);
This would provide for improved discoverability and the options checking
would be implicit and easily available in IDEs.
- Make the option parameter an array that takes an array of flags. This
could either be an associative array (like used with unserialize,
setcookie or password_hash) or a list of self-descriptive options. The
latter would allow for a clean(er) migration to tagged-unions if/when
they make it into PHP.
Example usage:
HtmlDocument::fromString(
'<html></html>',
[
LIBXML_NOERROR,
LIBXML_COMPACT,
LIBXML_HTML_NOIMPLIED,
],
);
With tagged-unions it could look like this in a future version, note the
HtmlVersion option that takes a parameter.
HtmlDocument::fromString(
'<html></html>',
[
Dom\LoadingOption::IgnoreError,
Dom\LoadingOption::ReducedMemoryUsage,
Dom\LoadingOption::NoImpliedTags,
Dom\LoadingOption::HtmlVersion(5),
],
);
Even if not yet reaping the full benefits of the "list of options", I
think it more naturally splits across multiple lines compared to
'bitwise-or'ing and it can also be more useful in stack traces for error
handlers that are able to handle arrays, because there won't be some
non-descript integer you have to decode manually.
What do you think?
Best regards
Tim Düsterhus
Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
Niels
Hi internals
After the discussion here, I have made the following changes to the RFC.
The RFC's version is now 0.6.0, previously it was 0.5.3.
- Clarified the reasoning to use factory methods.
- Changed the factory method names to be consistent with DateTimeImmutable etc (i.e. fromString -> createFromString, fromFile -> createFromFile, createEmptyDocument -> createEmpty).
- Marked HTMLDocument & XMLDocument classes as final. This allows us to extend them in the future without BC breakage. This is also motivated by the fact that extending them would be problematic anyway (see https://www.php.net/manual/en/domdocument.registernodeclass for how to do it with DOMDocument and the problems it has).
- Clarified that we also need to change the $document property of DOMXPath from DOMDocument to DOM\Document, and the constructor too. You can read about this in the BC section.
Kind regards
Niels
Hi internals
Hello internals
I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization support".
https://wiki.php.net/rfc/domdocument_html5_parserKind regards
Niels
Some minor changes after a discussion with Tim:
- The old class names are now written with a leading backslash to emphasize they are in the global namespace. Hopefully this will resolve confusion around them.
- Fixed unclear wording, i.e. "type hint" -> "type declaration"
- The alias for DOMException will be DOM\DOMException. Because that's the official name per DOM spec, and otherwise importing it and using it would be confusing with the global namespace <php>Exception</php> class (see also https://github.com/php/php-src/pull/9071#issuecomment-1193162754 for precedent).
Kind regards
Niels