Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:18222 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 83899 invoked by uid 1010); 19 Aug 2005 08:10:43 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 83884 invoked from network); 19 Aug 2005 08:10:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Aug 2005 08:10:43 -0000 X-Host-Fingerprint: 204.11.219.139 lerdorf.com Linux 2.4/2.6 Received: from ([204.11.219.139:54594] helo=colo.lerdorf.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id E6/BF-33075-30495034 for ; Fri, 19 Aug 2005 04:10:43 -0400 Received: from [192.168.200.106] (c-24-6-1-160.hsd1.ca.comcast.net [24.6.1.160]) (authenticated bits=0) by colo.lerdorf.com (8.13.4/8.13.4/Debian-3) with ESMTP id j7J8Addq008017 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 19 Aug 2005 01:10:40 -0700 Message-ID: <430593FF.6020907@lerdorf.com> Date: Fri, 19 Aug 2005 01:10:39 -0700 User-Agent: Mozilla Thunderbird 1.0.6 (Macintosh/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Adam Maccabee Trachtenberg CC: internals@lists.php.net References: <43054765.3000208@lerdorf.com> In-Reply-To: X-Enigmail-Version: 0.92.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Simplexml and xml namespaces From: rasmus@lerdorf.com (Rasmus Lerdorf) Adam Maccabee Trachtenberg wrote: > On Thu, 18 Aug 2005, Rasmus Lerdorf wrote: > > >>But how does this really help? I don't see how it is possible to >>distinguish the namespaced title vs. the non-namespaced ones. My >>suggestion here would be that for namespaced nodes the namespace alias >>(or perhaps the actual namespace?) becomes the key in the nodes array. > > > XML Namespaces are a real PITA. I remember Rob, Sterling, and I went > through a variety of iterations around this. > > The biggest problems is that prefixes are really not something you can > rely on at all -- they are just a handy fiction -- the namespace name > is really what the XML processor uses. > > If you're consuming a feed and the provider alters the namespace > prefix, but binds it to the same namespace, then the document is > considered identical. However, if you're relying on a specific prefix > in your code (instead of the actual namespace), then your code is > busted. > > Since people don't always have control over producing the XML > documents process, it doesn't seem reasonable to force people not to > let others change prefixes. > > Second, default namespaces also screw things up entirely, as you have > no way to access . It's different from , so they > shouldn't be lumped together, but there's no prefix you can use to > access it. Now you have to have a way of registering prefixes, so you > can access elements in default namespaces. > > FWIW, this exact problem is the #1 XSLT FAQ because people don't > realize that elements in a default namespace aren't the same as > non-namespaced elements. > > (Of course there is the issue of what happens when something switches > from having a prefix to being in a default namespace -- again it is > the identical document, but code is broken.) > > Last, you can get weird rebinding of namespace prefixes: > > > > > > > These two s are different. > > Ultimately, for those reasons, if you want to reliably access a XML > document using namespaces prefixes, you really need to register your > own prefixes for every namespace used in the document and use those in > your code, or things could potentially break even under a valid XML > document. > > It was really those two issues that caused we (I think it was largely > Rob) to suggest we end up using children() and attributes() with the > namespace name instead of the prefix. > > I really do think it is the cleanest solution that doesn't break down > when you reach the edge cases. Yeah, I agree actually. My real beef is that simplexml and var_dump() don't place nicely with each other. var_dump() ends up lumping the namespaced elements in with the non-namespaced elements of the same name, but when you iterate through things manually they are not lumped together and the only way to get at the namespaced elements is by checking for them directly with the appropriate children() call. I am fine with having to manually dereference the namespace and keeping things completely separate. I'd just like it to be easier for people to use var_dump() on a simplexml object and not have it confuse the heck out of them by showing them arrays with 2 elements in them which when they iterate only get 1 or if they call count() on it only get 1. -Rasmus