Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:1223 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 43698 invoked from network); 4 May 2003 17:55:16 -0000 Received: from unknown (HELO bambi.bitflux.ch) (212.71.97.156) by pb1.pair.com with SMTP; 4 May 2003 17:55:16 -0000 Received: from localhost (localhost [127.0.0.1]) by bambi.bitflux.ch (Postfix) with ESMTP id AC99FD685; Sun, 4 May 2003 19:55:15 +0200 (CEST) Received: from bambi.bitflux.ch ([127.0.0.1]) by localhost (bambi [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12010-07; Sun, 4 May 2003 19:55:15 +0200 (CEST) Received: by bambi.bitflux.ch (Postfix, from userid 1000) id 7B4A4D669; Sun, 4 May 2003 19:55:15 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by bambi.bitflux.ch (Postfix) with ESMTP id 75DCB2817; Sun, 4 May 2003 19:55:15 +0200 (CEST) Date: Sun, 4 May 2003 19:55:15 +0200 (CEST) X-X-Sender: chregu@bambi.chregu.tv To: Dmitri Dmitrienko Cc: Sterling Hughes , In-Reply-To: <002401c31263$364d1730$3d01a8c0@vdmitri> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Virus-Scanned: by amavisd-new-20030314-p1 (Debian) at bitflux.ch Subject: Re: [PHP-DEV] Re: Bundling libxml2 and expat compatibility layer From: chregu@bitflux.ch (Christian Stocker) References: <002401c31263$364d1730$3d01a8c0@vdmitri> Dmitri xmlParseFile does build a DOM-Tree out of your XML-Document, which is of course slower than the SAX-parsing expat is doing.. _But_ libxml2 can parse your XML document in SAX-style only without building an DOM-Tree. Without looking at Sterling's code, I assume, that's what he did and you should compare this code to expat and _not_ xmlParseFile.. libxml2 is certainly not slow, it has a well known reputation as being very fast for what it does. And if you don't know the difference between SAX and DOM, please do a google lookup before trolling here.. Comparing SAX with DOM is comparing Apple with Oranges chregu On Sun, 4 May 2003, Dmitri Dmitrienko wrote: > Sterling, > > As I said before, I know what I am doing comparing those libraries. > If I wasn't I would not raise this discussion. > > Ok. If you think I compare apples and oranges, well. Suppose I do. > But take a look what's finally offered by domxml. It's a verified tree > of nodes that were resulted from parsing original xml. > Nothing more nothing less. > I think everybody clearly understand advantages of having such trees. > I'm actually not against it. It should be quite clear thought. > > Now about the matter of discussing. > It's performance. When I set proper callbacks for expat I get the same no= de > tree in 5 times faster. > What this "good" domxml parser spends MY time for ? > Answer is very simple. Have a look at parser.c shipped with libxml2. > It's what I'd call geeze. It's written from scratches as if we are in 19t= h > century. > Ok Sterling if you think this approach is ok for all, why don't switch ba= ck > to the same parser for PHP ? > Let's introduce PHP 2.0 once again :))), geeze. > > What I'd love to see is Flex-based lexer for xml that has proven its real= ly > good performance. > > Let other people say what they are thinking about their needs in performa= nce > terms. > > IMHO it's too early to switch to libxml2. It's pretty slow when parsing x= ml. > > All the best, > Dmitri. > > > > > Dmitri, > > > > Geeze. As Christian said, you're comparing apples and oranges. In > > order to properly benchmark, compare the push parser interface with > > expat's interface. Or, look on the web, there have been plenty of > > benchmarks libxml2 is the *fastest* XML parsing library available > > (besides msxml, which is closed source). In terms of SAX processing, > > expat has a very, very slight advantage in some situations, but nothing > > to speak of. > > > > -Sterling > > > > On Sun, 2003-05-04 at 07:36, Dmitri Dmitrienko wrote: > > > Hi Christian, > > > > > > I compared _parsing_, only parsing. Does it make sense ? > > > Certainly, I expected some overhead for memory allocating when buildi= ng > DOM > > > tree. > > > I believe this overhead should be adequate. For example less than 3-5 > times. > > > But actually the overhead is much higher, incredibly higher. > > > > > > Could you explain what this time is spent for ? Why xmlParseFile() is= so > > > slow ? > > > > > > Also, would be nice to hear your opinion why xmlFreeDoc() is slow... > > > It should only free allocated memory, nothing above. I expected 1-10m= s > for > > > it while actially got 133ms, quite comparable with time for parsing. > > > > > > Also, why xmlParseMemory() is 3 times slower than xmlParseFile() ??? = It > > > can't be explained easily, I guess. > > > > > > I think libxml2 is a really SLOW library, purely slow, and will not > satisfy > > > people who concern about performance. > > > > > > -Dmitri > > > > > > > > > >I didn't look at Sterlings code yet, but you can't compare SAX parsi= ng > of > > > >expat with DOM parsing of libxml2. Libxml2 however does support SAX,= as > > > >well and I assume (and hope) Sterling used only this for ext/xml > > > >replacement (making an in-memory DOM-Tree per default in ext/xml wou= ld > > > >make a lot of people very unhappy ;) ) > > > > > > > >Dmitri, what exactly did you compare? > > > > > > > >chregu > > > > > > On Sun, 4 May 2003, Dmitri Dmitrienko wrote: > > > > > > > Hi, > > > > > > > > Sterling, before doing such a weird thing of moving everybody to > libxml2 > > > > please compare performance of what we have with expat and what we'l= l > get > > > > with libxml2. > > > > > > > > I tested them both with quite a big xml file ~500kB. Expat parsed = doc > in > > > > 19ms while libxml2 in 267ms. > > > > It is 14 times slower. I understand that there is a big difference > between > > > > what expat does and what libxml2. > > > > On the other hand, there are some 3rd party xmldom-libraries that > parse > > > > xmlfile to xmldom in ~ 110-130ms. > > > > At least two times faster. > > > > > > > > Also should be noted that libxml2 spends INCREDIBLY long time when > freeing > > > > parsed document 133ms. > > > > Moreover, when I tried to parse pre-loaded document (xmlParseMemory= ), > it > > > > showed even worse results 786ms. > > > > > > > > I believe it's too early to switch to this library. At least there > some > > > > reasons to think more about. > > > > > > > > Best regards, > > > > Dmitri. > > > > > > > > > > > > "Sterling Hughes" wrote in message > > > > news:1051978274.11377.131.camel@hasele... > > > > > Hi, > > > > > > > > > > Well, OK, I have libxml2 successfully bundled with PHP, and I've > further > > > > > gone ahead and created a C-level compatibility layer which maps > expat > > > > > <-> libxml2. I've also moved the detection logic for both expat = and > > > > > libxml into php5/bundle/libxml and php5/bundle/expat respectively= =2E > This > > > > > way you can choose your backend at the configure line, and things > will > > > > > work transparently (by default, expat and libxml are compiled in, > and > > > > > the XML extension uses expat). I've also done the "namespace > > > > > redefinition" heavy lifting - I'm not quite sure it works, but I > have > > > > > renamed most (from what I can tell, all) public symbols, like wit= h > > > > > expat. I'm sure this could be ironed out pretty easily if I made > any > > > > > mistakes. > > > > > > > > > > As far as I'm concerned, the important thing here is bundling > libxml2. > > > > > I think everyone who is implementing XML support around PHP will > agree > > > > > that expat just isn't meeting our needs, specifically: > > > > > > > > > > 1) The ability to easily access and modify XML documents from wit= hin > a > > > > > programatic structure, al=E1 DOM (this would also make it easy fo= r me > to > > > > > implement my SimpleXML[1] extension). > > > > > > > > > > a) The ability to query an XML document via Xpath > > > > > > > > > > 2) The ability to validate an XML document against either a DTD o= r a > XML > > > > > Schema (very important, especially for SOAP.) > > > > > > > > > > 3) Proper unicode support > > > > > > > > > > 4) Support for XPointer and XLink > > > > > > > > > > 5) Support for Docbook and HTML parsing > > > > > > > > > > 6) Expat doesn't even full support the same capabilities that > libxml2 > > > > > does when it comes to SAX processing. > > > > > > > > > > However, when we bundle expat with PHP, and make ext/xml therefor= e > an > > > > > "always available" extension. We create the illusion that it is = the > > > > > "recommended" and "best" solution for XML parsing with PHP, when = in > fact > > > > > it really isn't. > > > > > > > > > > Our needs as far as XML support are growing, whether it be > implementing > > > > > technologies that exist on top of XML (SOAP, WSDL, RDF) or > implementing > > > > > extensions that make it easier to access XML (SimpleXML, DOM), ex= pat > > > > > makes it way to hard (for all intensive purposes, impossible) to > > > > > implement these systems. > > > > > > > > > > Therefore, I'm suggesting that we bundle libxml2, while (for now) > > > > > keeping in expat as well. This will cause *absolutely* no backwa= rds > > > > > compatibility changes, while at the same time, it will allow you = to > use > > > > > only libxml2 for XML processing (--without-bundle-expat), with 97= % > [2] > > > > > backwards compatibility maintained. > > > > > > > > > > -Sterling > > > > > > > > > > [1] http://news.php.net/article.php?group=3Dphp.xml.dev&article= =3D6 > > > > > [2] This is one of 54% of facts made up on the spot. Suffice it = to > say > > > > > that the new extension is "mostly" backwards compatible, and the > places > > > > > where it breaks, shouldn't have been relied upon anyway. > > > > > > > > > > -- > > > > > "Reductionists like to take things apart. The rest of us are > > > > > just trying to get it together." > > > > > - Larry Wall, Programming Perl, 3rd Edition > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > nam...christian stocker adr...pflanzschulstr. 31, ch-8004 zurich > > > pho...+41 43 317 9984 www...http://blog.bitflux.ch > > > mob...+41 76 561 8860 ema...chregu@phant.ch > > > wor...+41 1 240 5670 gpg...0x5CE1DECB > > -- > > Good judgement comes from experience, and experience comes from > > bad judgement. > > - Fred Brooks > > > > > --=20 nam...christian stocker adr...pflanzschulstr. 31, ch-8004 zurich pho...+41 43 317 9984 www...http://blog.bitflux.ch mob...+41 76 561 8860 ema...chregu@phant.ch wor...+41 1 240 5670 gpg...0x5CE1DECB