Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:7056 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 33770 invoked by uid 1010); 13 Jan 2004 16:55:54 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 33714 invoked from network); 13 Jan 2004 16:55:53 -0000 Received: from unknown (HELO edwardbear.org) (216.179.74.133) by pb1.pair.com with SMTP; 13 Jan 2004 16:55:53 -0000 Received: by edwardbear.org via sendmail from stdin id (Debian Smail3.2.0.115) Tue, 13 Jan 2004 11:55:14 -0500 (EST) Date: Tue, 13 Jan 2004 11:55:14 -0500 To: Rob Richards Cc: Adam Maccabee Trachtenberg , internals@lists.php.net Message-ID: <20040113165514.GC23361@bumblebury.com> References: <00fe01c3d9d1$6e8618a0$f7dea8c0@cyberware.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <00fe01c3d9d1$6e8618a0$f7dea8c0@cyberware.local> User-Agent: Mutt/1.5.4i Subject: Re: [PHP-DEV] SimpleXML: Moving Forward From: sterling@php.net (Sterling Hughes) > From: Adam Maccabee Trachtenberg > > > 1) SimpleXML creates PHP data structures from XML documents. It only > > handles XML elements, attributes, and text nodes. The syntax for > > accessing the text node children of an element is akin to object > > properties ($foo->bar); the syntax of accessing attributes is akin > > to array elements ($foo['bar']). > > This goes back to my question on what is the goal of SimpleXML? > Is it supposed to be an easy api to be able to access any xml document or > only not complex ones? > Attributes are handled associative arrays, so given an element with 2 > attributes with the same name, but in different namespaces, it wont work: > > Attributes should be ns qualified within the array: $node->foo['a:bar'] This would respect namespaces "registered" by register_ns() > xpath wont help here either as xsearch returns an array of sxe objects with > the attribute nodes (which causes some additional problems). > Its fine if this would have to be handled in dom, but to me the question > really has never been fully answered. > See also example under the xpath comments for elements containing mixed text > and element nodes. > > > 4) XPath and validation functions will be available in SimpleXML, but > > we will not try to code generic extensions that work with both > > SimpleXML and DOM if for no other reason than this is not > > guaranteed to be simple. (e.g. SimpleXML must remove from XPath > > results nodes that aren't elements, attributes, and text nodes.) > I've decided (unless some more people pipe up support for removing children() and attributes() its current 2-3 against) to leave children() and attributes(), but remove the other methods. Things like schema validation and xpath queries will become procedural. > return types need to be standardized. attributes or getAttributes returns > name/value array, while the current xsearch will return array of a sxe > objects of the attribute node (which stated before is bad in the current > state of simplexml). > xsearch will become a procedure, simplexml_query($node, 'expression', $matches); > Also, consider the following (an element contains a mix of text and element > nodes): > $foo = simplexml_load_string('abtestcd'); > $ns = $foo->xsearch('child::text()'); > foreach ($ns as $node) { > print "Node Value: ".$node."\n"; > } > > Output: > Node Value: abcd > Node Value: abcd > > One would expect: > Node Value: ab > Node Value: cd > > Is the output correct, should something like this not be handled via > simpleXML, or is the xsearch incorrect when it returns the parent of a text > node? > Yep, this is the intended behaviour of simplexml. The simplexml_save_string() function will allow you to get the entire node contents (including tags). As for processing text childs separately, use DOM. It can interpret the same results of an XPath query in the manner you desire. > Your initial point concerning what SimpleXML is was a good start, but it > still doesn't define the boundaries of what it is meant to handle. When do > you tell someone that what they are doing should not be done in SimpleXML? > This is where I get lost with the API as I don't really know its intended > limitations. > Well, this is the purpose in finalizing the underlying API. The answer to that question is simple, if it doesn't do what you want, then SimpleXML is not what you want. This is part of the reason I want to finalize on no methods. If you need methods, use DOM, the two are totally interoperable, requiring zero document copies to work with both. You can process a DOM object then load it into simplexml for the final processing. Conversely you can take a simplexml object and load it into DOM for complex processing. I'm certainly stopping the API at children() and attributes(), regardless, as anything else is just silly, and it seems that only people felt strongly about these two functions. (*) Schema validation and XPath searching will become functions in SimpleXML space. -Sterling (*) Btw, getChildren() is currently broken from a userspace perspective as it is mainly implemented for the SPL recursive iterator. This will have to change, simplexml will not add userspace APIs for other extensions.