Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:7056
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Date: Tue, 13 Jan 2004 11:55:14 -0500
To: Rob Richards <rrichards@ctindustries.net>
Cc: Adam Maccabee Trachtenberg <adam@trachtenberg.com>,
	internals@lists.php.net
Message-ID: <20040113165514.GC23361@bumblebury.com>
References: <Pine.LNX.4.58.0401130248410.21920@miranda.org> <00fe01c3d9d1$6e8618a0$f7dea8c0@cyberware.local>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <00fe01c3d9d1$6e8618a0$f7dea8c0@cyberware.local>
User-Agent: Mutt/1.5.4i
Subject: Re: [PHP-DEV] SimpleXML: Moving Forward
From: sterling@php.net (Sterling Hughes)

> From: Adam Maccabee Trachtenberg
> 
> > 1) SimpleXML creates PHP data structures from XML documents. It only
> >    handles XML elements, attributes, and text nodes. The syntax for
> >    accessing the text node children of an element is akin to object
> >    properties ($foo->bar); the syntax of accessing attributes is akin
> >    to array elements ($foo['bar']).
> 
> This goes back to my question on what is the goal of SimpleXML?
> Is it supposed to be an easy api to be able to access any xml document or
> only not complex ones?
> Attributes are handled associative arrays, so given an element with 2
> attributes with the same name, but in different namespaces, it wont work:
> <foo a:bar="x" b:bar="y">
>

Attributes should be ns qualified within the array:

$node->foo['a:bar']

This would respect namespaces "registered" by register_ns()

> xpath wont help here either as xsearch returns an array of sxe objects with
> the attribute nodes (which causes some additional problems).
> Its fine if this would have to be handled in dom, but to me the question
> really has never been fully answered.
> See also example under the xpath comments for elements containing mixed text
> and element nodes.
> 
> > 4) XPath and validation functions will be available in SimpleXML, but
> >    we will not try to code generic extensions that work with both
> >    SimpleXML and DOM if for no other reason than this is not
> >    guaranteed to be simple. (e.g. SimpleXML must remove from XPath
> >    results nodes that aren't elements, attributes, and text nodes.)
> 

I've decided (unless some more people pipe up support for removing
children() and attributes() its current 2-3 against) to leave children()
and attributes(), but remove the other methods.  Things like schema
validation and xpath queries will become procedural.

> return types need to be standardized. attributes or getAttributes returns
> name/value array, while the current xsearch will return array of a sxe
> objects of the attribute node (which stated before is bad in the current
> state of simplexml).
>

xsearch will become a procedure, simplexml_query($node, 'expression', $matches);

> Also, consider the following (an element contains a mix of text and element
> nodes):
> $foo = simplexml_load_string('<foo>ab<foo2>test</foo2>cd</foo>');
> $ns = $foo->xsearch('child::text()');
> foreach ($ns as $node) {
>  print "Node Value: ".$node."\n";
> }
> 
> Output:
> Node Value: abcd
> Node Value: abcd
> 
> One would expect:
> Node Value: ab
> Node Value: cd
> 
> Is the output correct, should something like this not be handled via
> simpleXML, or is the xsearch incorrect when it returns the parent of a text
> node?
> 

Yep, this is the intended behaviour of simplexml.  The
simplexml_save_string() function will allow you to get the entire node
contents (including tags).  As for processing text childs separately,
use DOM.  It can interpret the same results of an XPath query in the
manner you desire. 

> Your initial point concerning what SimpleXML is was a good start, but it
> still doesn't define the boundaries of what it is meant to handle. When do
> you tell someone that what they are doing should not be done in SimpleXML?
> This is where I get lost with the API as I don't really know its intended
> limitations.
>

Well, this is the purpose in finalizing the underlying API.  The answer
to that question is simple, if it doesn't do what you want, then
SimpleXML is not what you want.  This is part of the reason I want to
finalize on no methods.  If you need methods, use DOM, the two are
totally interoperable, requiring zero document copies to work with both.
You can process a DOM object then load it into simplexml for the final
processing.  Conversely you can take a simplexml object and load it into
DOM for complex processing.

I'm certainly stopping the API at children() and attributes(),
regardless, as anything else is just silly, and it seems that only
people felt strongly about these two functions. (*)

Schema validation and XPath searching will become functions in
SimpleXML space.

-Sterling

(*) Btw, getChildren() is currently broken from a userspace perspective
as it is mainly implemented for the SPL recursive iterator.  This will
have to change, simplexml will not add userspace APIs for other
extensions.