Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:7051 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 89432 invoked by uid 1010); 13 Jan 2004 15:40:23 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 89408 invoked from network); 13 Jan 2004 15:40:23 -0000 Received: from unknown (HELO nycsmtp3out.rdc-nyc.rr.com) (24.29.99.224) by pb1.pair.com with SMTP; 13 Jan 2004 15:40:23 -0000 Received: from [192.168.123.158] (66-108-187-246.nyc.rr.com [66.108.187.246]) by nycsmtp3out.rdc-nyc.rr.com (8.12.10/Road Runner SMTP Server 1.0) with ESMTP id i0DFeKOf013482; Tue, 13 Jan 2004 10:40:21 -0500 (EST) In-Reply-To: <4003FEFA.2090700@cschneid.com> References: <00fe01c3d9d1$6e8618a0$f7dea8c0@cyberware.local> <4003FEFA.2090700@cschneid.com> Mime-Version: 1.0 (Apple Message framework v609) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-ID: Content-Transfer-Encoding: 7bit Cc: Rob Richards , internals@lists.php.net Date: Tue, 13 Jan 2004 10:40:19 -0500 To: Christian Schneider X-Mailer: Apple Mail (2.609) Subject: Re: [PHP-DEV] SimpleXML: Moving Forward From: adam@trachtenberg.com (Adam Trachtenberg) On Jan 13, 2004, at 9:21 AM, Christian Schneider wrote: > But let's take a look on how I'd use it (xml formatted for > readability): > $foo = simplexml_load_string(' > > ab > foo2a > cd > foo2b > ef > > foo3 > foo4 > foo3 > > gh > '); Ugh. This is pretty much the limit of what I think is reasonable for SimpleXML to handle. It think the API would be more consistent if the document looked like: > > foo2a > foo2b > > foo4 > > '); However, that may be placing too many restrictions upon documents to make SimpleXML useful. Like I said before, I've never tried to use SimpleXML with text nodes and elements sharing the same parent. > foreach ($foo as $node) => foo2a foo2b foo3 > foreach ($foo->foo2 as $node) => foo2a foo2b > foreach ($foo->foo3 as $node) => foo4 > foreach ((array)$foo->foo3 as $node) => foo4 > foreach ($foo->foo3->foo4 as $node) => nothing > foreach ((array)$foo->foo3->foo4 as $node) => foo4 > > What seems wrong here is that to output nodes where there can be 0 to > multiple instances I have to do something like: > if ($foo->$nodename) > { > if (is_array($foo->$nodename)) > { > foreach ($foo->$nodename as $node) > echo "$node\n"; > } > else > echo "{$foo->$nodename}\n"; > } > else > echo "No node $nodename found\n"; > > $nodename = 'node1' => No node node1 found > $nodename = 'node2' => foo2a foo2b > $nodename = 'node3' => foo3 I raised this as an issue yesterday. Sterling said he'd look into this. However, to tie this into my reply to Rob, I think there's some expectation that the developer knows what she's getting and that the cases where you have 0, 1, or many potential elements are few. (That said, I just developed something where I do essentially this all over the place and it sucks.) Here are my thoughts on solutions: 1) Place all elements in an array (or nodeList) regardless whether there's 0, 1, or many. This is the DOM solution. This just leads to annoying code where you need to do $foo->item(0) and $foo->firstChild. However, I don't really see any way around this otherwise. Either it's general or not. It can't be both. (Unless there's some magical type that's both an array and a scalar.) I'm willing to put up with this headache because the klunkyness here is outweighted by the niceness for most cases. 2) If a document has an XML Schema (or RelaxNG schema), SimpleXML could optionally inspect the schema to see if there are minOccurs and maxOccurs attributes in the schema for an element. If maxOccurs > 1, then the elements would be placed in an array even if there was only one element in that particular instance. This allows us to solve the problem by making the user specifically tell us how they want SimpleXML to handle a document. It does add some overhead, but simplicity is often more complex behind the scenes. This has the benefit of using an existing XML technology to solve the problem, but I don't know how expensive it'd be. Again, my opinion is that arbitrary XML documents are best parsed using DOM and well-defined ones are best parsed using SimpleXML. >> Attributes are handled associative arrays, so given an element with 2 >> attributes with the same name, but in different namespaces, it wont >> work: >> > > Right now foo['bar'] will be an array('x', 'y') in that case. We're > losing the namespaces here but get the values. Simple or broken? Not > sure. This case still makes me puke. :) Right now, SimpleXML always makes you lose the namespaces unless you use XPath. I don't think that's too much to ask that if you can handle XML Namespaces you can also handle XPath. I would prefer to guide people through XPath in these nasty cases than make the general API handle them. > As right now there is no easy (read non-xpath/xquery) way of getting > the attributes hidden in the magic array of $foo I think > getAttributes should be added too. AFAIK, it's actually also impossible to find out the name of the document element using SimpleXML, even using XPath. I ended up doing: $xml = simplexml_load_string($data); $type = dom_import_simplexml($xml)->tagName; Without this feature, it's difficult to make SimpleXML work in cases where a page could be potentially processing two different XML documents because you can't inspect the XML document to figure out what type it is. :) > No other functions though. Should these be methods? I think so. > >> $foo = simplexml_load_string('abtestcd'); >> $ns = $foo->xsearch('child::text()'); > > foreach ($ns as $node) > > print "Node Value: ".$node."\n"; > > I would actually expect abcd but only once: > Node Value: abcd > > Concatenating all text parts _and_ returning them once for each part > definitely seems wrong. Aren't those two lines contradictory? :) > +1 on getChildren/getAttributes (function or method) > -1 on more functions > > I think it's quite usable this way and simple enough to use to earn > the name SimpleXML. I think this is where we're coming out. (Modulo the XPath and Validation functions.) -adam -- adam trachtenberg adam@trachtenberg.com