Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:7050 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 36971 invoked by uid 1010); 13 Jan 2004 15:12:54 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 36936 invoked from network); 13 Jan 2004 15:12:53 -0000 Received: from unknown (HELO nycsmtp3out.rdc-nyc.rr.com) (24.29.99.224) by pb1.pair.com with SMTP; 13 Jan 2004 15:12:53 -0000 Received: from [192.168.123.158] (66-108-187-246.nyc.rr.com [66.108.187.246]) by nycsmtp3out.rdc-nyc.rr.com (8.12.10/Road Runner SMTP Server 1.0) with ESMTP id i0DFCqOf029242; Tue, 13 Jan 2004 10:12:52 -0500 (EST) In-Reply-To: <00fe01c3d9d1$6e8618a0$f7dea8c0@cyberware.local> References: <00fe01c3d9d1$6e8618a0$f7dea8c0@cyberware.local> Mime-Version: 1.0 (Apple Message framework v609) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-ID: Content-Transfer-Encoding: 7bit Cc: Date: Tue, 13 Jan 2004 10:12:51 -0500 To: "Rob Richards" X-Mailer: Apple Mail (2.609) Subject: Re: [PHP-DEV] SimpleXML: Moving Forward From: adam@trachtenberg.com (Adam Trachtenberg) On Jan 13, 2004, at 7:33 AM, Rob Richards wrote: > From: Adam Maccabee Trachtenberg > >> 1) SimpleXML creates PHP data structures from XML documents. It only >> handles XML elements, attributes, and text nodes. The syntax for >> accessing the text node children of an element is akin to object >> properties ($foo->bar); the syntax of accessing attributes is akin >> to array elements ($foo['bar']). > > This goes back to my question on what is the goal of SimpleXML? > Is it supposed to be an easy api to be able to access any xml document > or > only not complex ones? Here's where I see the benefit of SimpleXML. SimpleXML should be used when you know the schema of an XML document and want to extract specific pieces of data from it. My favorite use-cases are: RSS, REST, and configuration files. This doesn't mean there's necessarily a formal XML Schema or RelaxNG document, but that the developer is familiar enough with the layout of the XML document that she knows what she's looking for and can formulate code to access the information she wants. In most cases, this will be through directly accessing text nodes through $foo->bar->baz. More complex cases will be handled using XPath: /rss:foo[begins-with('dc:bar', '2004-01')]/rss:baz. In my ideal world, you can use SimpleXML for all XML documents, regardless of complexity (read: namespaces, right?). However, if this lead to an unnecessary amount of complexity, I would sacrifice this point. Also, since there's some assumption of developer fore-knowledge of the document's schema, there's no need for an overwhelming set of introspection functions, since that's where DOM excels. To sum up: it would be helpful to see some *real world* XML documents that people want to parse using SimpleXML. We'd then try very hard to make sure SimpleXML was easy to use for those documents. It's easy to make up theoretical XML documents that are well-formed and pathologically nasty, but it'd much prefer to leave those to DOM. > Attributes are handled associative arrays, so given an element with 2 > attributes with the same name, but in different namespaces, it wont > work: > > > xpath wont help here either as xsearch returns an array of sxe objects > with > the attribute nodes (which causes some additional problems). > Its fine if this would have to be handled in dom, but to me the > question > really has never been fully answered. > See also example under the xpath comments for elements containing > mixed text > and element nodes. Ugh. That's nasty. I would prefer to not handle this in SimpleXML. Have you really even seen a case where someone did this? >> When deciding the behavior of these functions (e.g. Does >> getChildren() return just the direct descendents or all children >> regardless of depth?), we'll define them to mimic XPath's behavior: >> (e.g. /child::node()). This reduces the potential for disagreement >> over what is the "correct" way to do things. (I'm just looking for >> a way to prevent protracted discussions over issues that have no >> clear "right" answers and can never really be solved.) > > Should only be direct descendants. One should be able to navigate the > entire > tree (elements/attributes) in a standard way without having to use > xpath. > imho, this is one of the biggest reason why the two functions should be > implemented. I agree here. >> 4) XPath and validation functions will be available in SimpleXML, but >> we will not try to code generic extensions that work with both >> SimpleXML and DOM if for no other reason than this is not >> guaranteed to be simple. (e.g. SimpleXML must remove from XPath >> results nodes that aren't elements, attributes, and text nodes.) > > return types need to be standardized. attributes or getAttributes > returns > name/value array, while the current xsearch will return array of a sxe > objects of the attribute node (which stated before is bad in the > current > state of simplexml). I also agree here. This is one of the reasons I feel it's important to hash out these details now, so that all the functions work consistently. I would prefer to always return an array (or a SimpleXML_List object that's similar to DOM nodeList) of SimpleXML objects from any querying function, whether it's getChildren(), getAttributes, or xPathQuery(). I think this is most consistent. For example: a a Therefore, $xml->xPathQuery('/foo/bar') and $foo->getChildren() (and maybe $foo?) would be equivalent. > Also, consider the following (an element contains a mix of text and > element > nodes): > $foo = simplexml_load_string('abtestcd'); > $ns = $foo->xsearch('child::text()'); > foreach ($ns as $node) { > print "Node Value: ".$node."\n"; > } > > Output: > Node Value: abcd > Node Value: abcd > > One would expect: > Node Value: ab > Node Value: cd > > Is the output correct, should something like this not be handled via > simpleXML, or is the xsearch incorrect when it returns the parent of a > text > node? Honestly, I don't think anyone (read: I) never considerer that SimpleXML would be used in cases that mix text and element nodes. I never encountered this in my use-cases from above. Currently, I believe the SimpleXML document model assumes that an element contains (zero or more elements) or (one text node). So, if you take your XML example from above and do: print $foo You get: abcd There's no way to access "ab" and "cd" as separate entities, so I would almost say the consistent answer is to concatenate the two text nodes from the XPath query and return just one text node, "abcd". If you're looking for boundaries, I would tell you "Don't use SimpleXML for this because it may not act as you expect." > Your initial point concerning what SimpleXML is was a good start, but > it > still doesn't define the boundaries of what it is meant to handle. > When do > you tell someone that what they are doing should not be done in > SimpleXML? > This is where I get lost with the API as I don't really know its > intended > limitations. Does this bring us any closer to defining the boundaries? Would you like them shifted? :) -adam -- adam trachtenberg adam@trachtenberg.com