Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:1147 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 90825 invoked from network); 1 May 2003 23:46:05 -0000 Received: from unknown (HELO mail.green-ant.com) (66.123.236.51) by pb1.pair.com with SMTP; 1 May 2003 23:46:05 -0000 Received: (qmail 8311 invoked from network); 1 May 2003 23:46:01 -0000 Received: from adsl-66-123-236-49.dsl.snfc21.pacbell.net (HELO ?10.10.2.30?) (66.123.236.49) by 0 with SMTP; 1 May 2003 23:46:01 -0000 User-Agent: Microsoft-Entourage/10.1.0.2006 Date: Thu, 01 May 2003 16:45:58 -0700 To: Walt Boring , Message-ID: In-Reply-To: <1051831866.2944.10.camel@hemna.uh.nu> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Subject: Re: [PHP-DEV] xmldoc() takes ages From: macosx@green-ant.com ("Mark J. Hershenson") References: <1051831866.2944.10.camel@hemna.uh.nu> > Howdy, > I have a 12Meg xml file that I am trying to get PHP 4.3.1 to parse > using libxml2 with xmldoc(), and it takes ~ 11 minutes to do this on a > P4 1.8Hhz box. > > The same operation in libxml2's python interface takes 6 seconds. > > here is my code. Any ideas why it takes 11 minutes? 12Megs of valid > xml is large, but not 11 minutes worth of large. > > > //my includes here > set_time_limit( 345600 ); > > $xml = implode('', file("foo_3.xml")); > debug_message("parsing xml file of len = ".strlen($xml)); > > $doc = xmldoc( $xml ); > //it takes 11 minutes to get here! > debug_message("GOT THE DOC"); > exit; > > ?> Given the size, it seems better to use domxml_open_file() and bypass pulling that all of that data into memory? From what I've seen personally and heard as well, that should considerably streamline the flow of data: file -> XML parser Instead of: file -> lots of zvals for the strings in PHP taking up much more than the original 12 MB of data -> imploding all those strings into a single variable -> passing the data by copying into a variable (no reference shown in example) -> XML parser Then you don't have to bring a 12 megabyte datafile into PHP userland by creating an array for each line and then imploding those lines. I'd be interested to hear how much of a difference using domxml_open_file() would make. Hope it helps! -- mjh