Hi,
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.
The current version of expat makes processing SOAP documents (for
example) very hard, because XML schema is not available.
On the other hand libxml is a very robust xml processor, providing
support for:
- XML Schema
- DOM
- Validation (against a DTD or Schema)
- SAX
- XML Catalog
- Docbook and HTML Parsers
- XPath, XPointer, XInclude
- Base XML datatypes (XML Schema Part 2)
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)
I've currently completed the first two steps of the integration. I've
removed the expat library from ext/xml and replaced it with libxml.
I've also ported the XML extension to use libxml as the underlying
processing engine.
The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml). - xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.
Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.
Thoughts?
-Sterling
--
"The computer programmer is a creator of universes for which he
alone is responsible. Universes of virtually unlimited complexity
can be created in the form of computer programs."
- Joseph Weizenbaum
The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml).- xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.Thoughts?
I take it from this that old apps written against expat using the
xml_set_element_handler() style will continue working? Like I said in New
York, I agree we need to move to libxml, but I worry about breaking
thousands of existing php-xml apps. If you have managed to make it
BC-safe, perfect!
-Rasmus
At 15:32 27/04/2003, Sterling Hughes wrote:
Hi,
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.
I'm also somewhat worried about the downwards compatibility with
expat. Given the relatively wide scope of XML parsing, it'd be a small
miracle if libxml can replace expat without breaking anybody's scripts,
because of whatever quirks libxml and/or expat have. I suggest we keep
expat bundled as well, the price of bundling it is negligible.
Zeev
Hi,
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.
I'm still no big fan of bundling libxml2 with php, but if there's a
consensus that it should be bundled, so be it, if anyone takes the job
of merging the constantly updated libxml2 sources into the php-tree. On
the other hand, wouldn't it be enough, if it's bundled in the
downloadable sources and not put into CVS?
On the other hand libxml is a very robust xml processor, providing
support for:
Just for your information: Rob Richards is currently working hard on the
new domxml extension, which follows the DOM Standard much better than
todadys domxml. Furthermore it will also have better memory management.
I think, it's soon in a state where we can put it into CVS.
We were also talking about a new name for this new extension as domxml
is quite missleading (as Sterling said, libxml2 can do a lot more than
just DOM processing). Any thoughts about that?
chregu
- XML Schema
- DOM
- Validation (against a DTD or Schema)
- SAX
- XML Catalog
- Docbook and HTML Parsers
- XPath, XPointer, XInclude
- Base XML datatypes (XML Schema Part 2)
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)I've currently completed the first two steps of the integration. I've
removed the expat library from ext/xml and replaced it with libxml.
I've also ported the XML extension to use libxml as the underlying
processing engine.The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml).- xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.Thoughts?
-Sterling
--
"The computer programmer is a creator of universes for which he
alone is responsible. Universes of virtually unlimited complexity
can be created in the form of computer programs."
- Joseph Weizenbaum
--
christian stocker | bitflux GmbH | schoeneggstrasse 5 | ch-8004 zurich
phone +41 1 240 56 70 | mobile +41 76 561 88 60 | fax +41 1 240 56 71
http://www.bitflux.ch | chregu@bitflux.ch | gnupg-keyid 0x5CE1DECB
Let me just clarify:
This is not directly related to the new extension.
The idea with this change is to allow people to develop different
extensions and API's ontop of PHP's XML support, and giving them a
robust library to do so, preferrably one that requires minimal external
functionality.
Expat is a very weak library, libxml is a very strong library. My idea
is to replace it before PHP5, then we have a default library that people
can develop extensions for (like the new "libxml2" extension, etc.) As
for bundling, the only other option I see is telling people "you must
install libxml2 on your system to install php." And while that might
make the purists on this list happy, its not a practical solution.
-Sterling
Hi,
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.I'm still no big fan of bundling libxml2 with php, but if there's a
consensus that it should be bundled, so be it, if anyone takes the job
of merging the constantly updated libxml2 sources into the php-tree. On
the other hand, wouldn't it be enough, if it's bundled in the
downloadable sources and not put into CVS?On the other hand libxml is a very robust xml processor, providing
support for:Just for your information: Rob Richards is currently working hard on the
new domxml extension, which follows the DOM Standard much better than
todadys domxml. Furthermore it will also have better memory management.
I think, it's soon in a state where we can put it into CVS.We were also talking about a new name for this new extension as domxml
is quite missleading (as Sterling said, libxml2 can do a lot more than
just DOM processing). Any thoughts about that?chregu
- XML Schema
- DOM
- Validation (against a DTD or Schema)
- SAX
- XML Catalog
- Docbook and HTML Parsers
- XPath, XPointer, XInclude
- Base XML datatypes (XML Schema Part 2)
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)I've currently completed the first two steps of the integration. I've
removed the expat library from ext/xml and replaced it with libxml.
I've also ported the XML extension to use libxml as the underlying
processing engine.The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml).- xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.Thoughts?
-Sterling
--
"The computer programmer is a creator of universes for which he
alone is responsible. Universes of virtually unlimited complexity
can be created in the form of computer programs."
- Joseph Weizenbaum
--
"The three most dangerous things in the world are a programmer
with a soldering iron, a hardware type with a program patch and
a user with an idea."
- Unknown
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.
The current version of expat makes processing SOAP documents (for
example) very hard, because XML schema is not available.
I'd like to talk with you about what features will be implemented in
time for php5, as there are some specific features that would be good
for soap (ie. pull parsing). I'm at the airport now, so later this week.
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)
Can the wrapper be made to work with the php streams? It would be nice
for instance, to simply be able to pass php://stdin to the library and
have it handle input. A call back to handle protocol headers, or
alternate encodings such as dime would be necessary. Anyway, those are
some things I'd like to discuss.
I've currently completed the first two steps of the integration. I've
removed the expat library from ext/xml and replaced it with libxml.
I've also ported the XML extension to use libxml as the underlying
processing engine.The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml).
Are they mappable in any way, or simply library specific errors?
- xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.Thoughts?
My concern is of course BC. I think a little bit of breakage is ok, but
by little I'm thinking < 1%. Otherwise, there needs to be some way to
load the expat xml extension. Perhaps the expat extension functions
could be renamed to expat_xml_, libxml would be libxml_, and a utility
function such as xml_use_library(XML_EXPAT) would map xml_* functions to
the xpat library (or libxml). The default mapping would be to libxml.
Shane
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)Can the wrapper be made to work with the php streams? It would be nice
for instance, to simply be able to pass php://stdin to the library and
have it handle input. A call back to handle protocol headers, or
alternate encodings such as dime would be necessary. Anyway, those are
some things I'd like to discuss.
See my code in pear/PECL/soap/php_xml.c that already ties the libxml io
layer into php streams.
(php_stream_xmlIO_XXX and the xmlRegisterInputCallbacks() call in
soap.c)
This allows opening of any URL supported by streams; I don't think that
the library allows you pass a handle that has already been opened, but I
could be wrong.
--Wez.
Shane,
As for BC: again, I agree 100%. If it breaks BC in anyway, other than a
few error codes/error messages changing, then we can fix it, or revert
back to expat. I've tested it with pear and pres2 (including my xml and
php slides), and everything seems to be in working order. However, once
I commit it, other people can start testing.
PHP5 is going to be unstable, things will break, but we will have a very
long QA period, and a lot of people testing their apps against it to
make sure that there are no BC breaks. Couple that with the fact that
libxml2 tries to be expat "compliant," I'm pretty confident we can
squash any and all BC breaks before a PHP5 release.
As for the other features, well, that's a discussion for a different
thread. My main concern in this one is just switching the underlying
library, that way we're shipping with a robust XML library bundled as
default. This will allow extension developers (hopefully) to work on
more XML related technologies in extension space.
-Sterling
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.The current version of expat makes processing SOAP documents (for
example) very hard, because XML schema is not available.I'd like to talk with you about what features will be implemented in
time for php5, as there are some specific features that would be good
for soap (ie. pull parsing). I'm at the airport now, so later this week.
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)Can the wrapper be made to work with the php streams? It would be nice
for instance, to simply be able to pass php://stdin to the library and
have it handle input. A call back to handle protocol headers, or
alternate encodings such as dime would be necessary. Anyway, those are
some things I'd like to discuss.I've currently completed the first two steps of the integration. I've
removed the expat library from ext/xml and replaced it with libxml.
I've also ported the XML extension to use libxml as the underlying
processing engine.The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml).Are they mappable in any way, or simply library specific errors?
- xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.Thoughts?
My concern is of course BC. I think a little bit of breakage is ok, but
by little I'm thinking < 1%. Otherwise, there needs to be some way to
load the expat xml extension. Perhaps the expat extension functions
could be renamed to expat_xml_, libxml would be libxml_, and a utility
function such as xml_use_library(XML_EXPAT) would map xml_* functions to
the xpat library (or libxml). The default mapping would be to libxml.Shane
--
"Whether you think you can or think you can't -- you are right."
- Henry Ford
BC is important. But don't repeat the mistake of letting the
code ignore encoding="..." in an XML document. Bug #23292 :)
An XML parser is supposed to deal with that and produce a desired
output encoding. AFAIK libxml_2_ does the right thing.
While we're at it. How about libxml2's sibling libxslt? Could
that replace Sablotron?
-- Adam
Shane,
As for BC: again, I agree 100%. If it breaks BC in anyway, other than a
few error codes/error messages changing, then we can fix it, or revert
back to expat. I've tested it with pear and pres2 (including my xml and
php slides), and everything seems to be in working order. However, once
I commit it, other people can start testing.PHP5 is going to be unstable, things will break, but we will have a very
long QA period, and a lot of people testing their apps against it to
make sure that there are no BC breaks. Couple that with the fact that
libxml2 tries to be expat "compliant," I'm pretty confident we can
squash any and all BC breaks before a PHP5 release.As for the other features, well, that's a discussion for a different
thread. My main concern in this one is just switching the underlying
library, that way we're shipping with a robust XML library bundled as
default. This will allow extension developers (hopefully) to work on
more XML related technologies in extension space.-Sterling
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.The current version of expat makes processing SOAP documents (for
example) very hard, because XML schema is not available.I'd like to talk with you about what features will be implemented in
time for php5, as there are some specific features that would be good
for soap (ie. pull parsing). I'm at the airport now, so later this week.
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)Can the wrapper be made to work with the php streams? It would be nice
for instance, to simply be able to pass php://stdin to the library and
have it handle input. A call back to handle protocol headers, or
alternate encodings such as dime would be necessary. Anyway, those are
some things I'd like to discuss.I've currently completed the first two steps of the integration. I've
removed the expat library from ext/xml and replaced it with libxml.
I've also ported the XML extension to use libxml as the underlying
processing engine.The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml).Are they mappable in any way, or simply library specific errors?
- xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.Thoughts?
My concern is of course BC. I think a little bit of breakage is ok, but
by little I'm thinking < 1%. Otherwise, there needs to be some way to
load the expat xml extension. Perhaps the expat extension functions
could be renamed to expat_xml_, libxml would be libxml_, and a utility
function such as xml_use_library(XML_EXPAT) would map xml_* functions to
the xpat library (or libxml). The default mapping would be to libxml.Shane
--
"Whether you think you can or think you can't -- you are right."
- Henry Ford--
--
Adam Dickmeiss mailto:adam@indexdata.dk http://www.indexdata.dk
Index Data T: +45 33410100 Mob.: 212 212 66
BC is important. But don't repeat the mistake of letting the
code ignore encoding="..." in an XML document. Bug #23292 :)
An XML parser is supposed to deal with that and produce a desired
output encoding. AFAIK libxml_2_ does the right thing.While we're at it. How about libxml2's sibling libxslt? Could
that replace Sablotron?
libxslt is already integrated in todays domxml and will certainly also
be in the future "domxml". I on't see a reason to replace sablotron,
they can both coexist.
chregu
-- Adam
Shane,
As for BC: again, I agree 100%. If it breaks BC in anyway, other than a
few error codes/error messages changing, then we can fix it, or revert
back to expat. I've tested it with pear and pres2 (including my xml and
php slides), and everything seems to be in working order. However, once
I commit it, other people can start testing.PHP5 is going to be unstable, things will break, but we will have a very
long QA period, and a lot of people testing their apps against it to
make sure that there are no BC breaks. Couple that with the fact that
libxml2 tries to be expat "compliant," I'm pretty confident we can
squash any and all BC breaks before a PHP5 release.As for the other features, well, that's a discussion for a different
thread. My main concern in this one is just switching the underlying
library, that way we're shipping with a robust XML library bundled as
default. This will allow extension developers (hopefully) to work on
more XML related technologies in extension space.-Sterling
After some discussions with various people at the PHP-Con, I decided it
was important that we (at least) have libxml integrated with PHP by
PHP5. When it comes to XML processing, expat is a legacy library and it
doesn't support nearly what is required for processing XML by todays
standards.The current version of expat makes processing SOAP documents (for
example) very hard, because XML schema is not available.I'd like to talk with you about what features will be implemented in
time for php5, as there are some specific features that would be good
for soap (ie. pull parsing). I'm at the airport now, so later this week.
- FTP and HTTP transports (as well as an IO wrapper library like
Streams)Can the wrapper be made to work with the php streams? It would be nice
for instance, to simply be able to pass php://stdin to the library and
have it handle input. A call back to handle protocol headers, or
alternate encodings such as dime would be necessary. Anyway, those are
some things I'd like to discuss.I've currently completed the first two steps of the integration. I've
removed the expat library from ext/xml and replaced it with libxml.
I've also ported the XML extension to use libxml as the underlying
processing engine.The following incompatibilities exist:
- some XML_ERROR_ * constants are irrelevant (they are stilled defined,
but they have no meaning for libxml).Are they mappable in any way, or simply library specific errors?
- xml_error_get_string() just returns a blank string. This can be
changed in the next couple of days, i just need to implement error
strings ontop of libxml error codes.Having this library bundled internally will allow people to develop
other extensions which use libxml features, and will allow for future
extensions (for example, a fully compliant DOM extension) to easily be
added, without requiring extra bundling.Thoughts?
My concern is of course BC. I think a little bit of breakage is ok, but
by little I'm thinking < 1%. Otherwise, there needs to be some way to
load the expat xml extension. Perhaps the expat extension functions
could be renamed to expat_xml_, libxml would be libxml_, and a utility
function such as xml_use_library(XML_EXPAT) would map xml_* functions to
the xpat library (or libxml). The default mapping would be to libxml.Shane
--
"Whether you think you can or think you can't -- you are right."
- Henry Ford--
--
Adam Dickmeiss mailto:adam@indexdata.dk http://www.indexdata.dk
Index Data T: +45 33410100 Mob.: 212 212 66
--
christian stocker | bitflux GmbH | schoeneggstrasse 5 | ch-8004 zurich
phone +41 1 240 56 70 | mobile +41 76 561 88 60 | fax +41 1 240 56 71
http://www.bitflux.ch | chregu@bitflux.ch | gnupg-keyid 0x5CE1DECB