Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:104690 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 28959 invoked from network); 13 Mar 2019 04:38:51 -0000 Received: from unknown (HELO mail-it1-f193.google.com) (209.85.166.193) by pb1.pair.com with SMTP; 13 Mar 2019 04:38:51 -0000 Received: by mail-it1-f193.google.com with SMTP id k193so349011ita.3 for ; Tue, 12 Mar 2019 18:28:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salesforce.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Wy2e4tTgZARlMaWq5vGKdEygvdmGPrf1HKFDwcgsg0U=; b=fHM3iEcFavcfJOLyn7CjImTmkFTa4p9ZoPnQmQScl6k/j8zvRsn9t7rICnZnm2qyJC PeiWEA5SCfJrg7+AQkRQZBtrYInpXEPZl16IgOdtcECX+bME1klHQhbzerIWmbK2OIz7 fCfeRRLe3BSQwxPGTiMmKcHhyLAlLtocFsx+0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Wy2e4tTgZARlMaWq5vGKdEygvdmGPrf1HKFDwcgsg0U=; b=MKBlZvGRD0wsnowSUDFNLxwE7jy2aVunvc6Il8/WIo+ERWtWvbaTR9Stj757ix5Bh+ 7iVbJfbW4SyqCJgFASGYGfjZ3epQzR7vDrHj9+JigmKNe4o3o+W3WSpdP7FTn6nkClnD ev3LxlPK5/3zXmylJm5Hd3AQQZkSl0p4m+yWgsstdU5urCUKoRQG9oEGHx61nM17VU3O TWHfFBUi0m0ciMDGW6pYbJceNtKMslyDqnmWd9pqFXTeY6lWYcKccYljorPWtojoYE7o wGZuN/Bgq/6MACFYZz242p5gqWJIHMnyVdMxmRXlOsEunsrUIyfnbE96v2XaEkDA3gMK +LGg== X-Gm-Message-State: APjAAAXab89UmuAyodw+kgdg+cGr9feTVRxIftTAvOrjF3KvGG3RFBLG XAZZ2KMM5zw070U6zl2urkB1KMqJSO4cXJYFEM0s1gQgwCDrdA== X-Google-Smtp-Source: APXvYqyutKNzbd76IvJOfgopY+dqhe02wPJG+tOI9BfymNyRHlyJprVno9wVPglxKrIxchFSqsIDtEICSu26KBsaNUY= X-Received: by 2002:a02:62cf:: with SMTP id d198mr9540039jac.64.1552440531868; Tue, 12 Mar 2019 18:28:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Wed, 13 Mar 2019 02:28:40 +0100 Message-ID: To: Benjamin Eberlei Cc: PHP Internals Content-Type: text/plain; charset="UTF-8" Subject: Re: [PHP-DEV] On fixing DOMNameSpaceNode and DOM NS API Inconsistency Problems From: dzuelke@salesforce.com (David Zuelke) I agree this should be fixed. It's pretty hilarious how this exact case (fiddling with the xmlns prefix) is the only comment for DOMElement::setAttributeNS: http://php.net/manual/en/domelement.setattributens.php (although unnecessary, as you don't have to "add namespaces" to a document, that automatically happens internally). My hypothesis as to why this special class even exists, and why it is all implemented this way: whoever wrote that code didn't know about this "hard-coded" part of the spec, or the spec wasn't even fully finished regarding that part at that time. For those who are curious, it's defined in Section 3 of the "Namespaces in XML 1.1 (Second Edition)" spec: https://www.w3.org/TR/2006/REC-xml-names11-20060816/#ns-decl However, pay particular attention to the following note (emphasis mine): "The prefix xmlns is used only to declare namespace bindings and is by definition bound to the namespace name http://www.w3.org/2000/xmlns/. It **must not be declared or undeclared**. Other prefixes must not be bound to this namespace name, and it must not be declared as the default namespace. Element names must not have the prefix xmlns." The DOM Level 2 specification section 1.1.8 ("XML Namespaces") contains another related definition: "Note: In the DOM, all namespace declaration attributes are by definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/". These are the attributes whose namespace prefix or qualified name is "xmlns". Although, at the time of writing, this is not part of the XML Namespaces specification [Namespaces], it is planned to be incorporated in a future revision." This means that both "xmlns:foo" and "xmlns" attributes are both in this "hard-coded" namespace. Both of these cases are already handled correctly by, I assume, the underlying libxml; if you set "xmlns:blah" or "xmlns" qualified name attributes in that namespace: $doc = new DOMDocument(); $root = new DOMElement("root"); $doc->appendChild($root); $root->setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:foo", "urn:bar"); $root->setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns", "urn:baz"); $root->setAttributeNS("urn:lol", "erp:foo", "whatup"); echo $doc->saveXML(); the namespace "xmlns" does not get declared, and PHP/libxml correctly produce: Honestly, I can't imagine there being much code around that even relies on any of the internal classes, at least not code that deserves to be broken, because it was written by clowns who to this day would claim that XML is terrible, when in fact they simply didn't understand how namespaces work, that prefixes are irrelevant, and that XML shouldn't be parsed with regular expressions ;) There is really no use case in the DOM that ever requires anyone to directly deal with "xmlns" attributes, as their creation (and removal) is something that just happens automatically; the only real case where you even have to worry about prefixes is with DOMElement::setAttributeNS(), where you can cause prefix collisions. But even that is trivial; internally, the mappings always get optimized, e.g. in the case of duplicate prefixes for the same namespace URI: $doc = new DOMDocument(); $root = new DOMElement("root"); $doc->appendChild($root); $root->setAttributeNS("urn:lol", "prefixone:foo", "whatup"); $root->setAttributeNS("urn:lol", "prefixtwo:bar", "whatup"); echo $doc->saveXML(); becomes, correctly: Anyway, I think an RFC and maybe some test cases would be good. I'll be happy to help dig out my old XML-fu and help :) David On Tue, Mar 12, 2019 at 11:54 PM Benjamin Eberlei wrote: > > Hi everyone, > > While looking for things to work on in php-src my friend Thomas pointed me > to a peculiar special case in ext/dom that leads to massive inconsistency > problems in the API. > > There is an undocumented class DOMNameSpaceNode that gets returned from > DOMElement::getAttributeNode(NS) if you select an attribute that represents > a namespace (xmlns). This special case is intentionally handled in the > code, contrary to Pythons DOM Extension which doesn't do this and contrary > to the DOM Specification, which does not have a special DOMNameSpaceNode. > Its all DOMAttr. > > Problematically DOMNameSpaceNode doesn't extend from DOMAttr, you cannot > pass this class to DOMElement::removeAttributeNode(DOMAttr $attr): > > Fatal error: Uncaught TypeError: Argument 1 passed to > DOMElement::removeAttributeNode() must be an instance of DOMAttr, instance > of DOMNameSpaceNode given > > Code example here: https://3v4l.org/jkC5s > > In addition the DOMNameSpaceNode renames all properties compared to > DOMAttr, clearly violating the interface documented here > http://php.net/manual/de/domelement.getattributenode.php > > Two potential fixes come to mind: > > 1. Have DOMNameSpaceNode extend DOMAttr - maybe this was the originally > intended behavior, becuase the properties are all named differently? > 2. Have DOMElement::removeAttributeNode accept also a DOMNameSpaceNode > (3.) Remove the non standard DOMNameSpaceNode class and use a DOMAttr > instead > > I think approach #1 is the right one from a BC perspective and also fixing > the DOM API to be more consistent with the standard. > > But I am also not opposed to #3 in the longer term, looking at Github > DOMNamespaceNode in the open source world is only used for one apc test, in > PHP Compat and when dumping it in Symfony Var Dumper. I imagine its more > deeply used in closed source code working deeply with XML. > > The second problem that this inconsistency creates is > DOMElement::removeAttributeNS($namespaceUri, $localName). When you want to > use it to remove a namespace attribute (xmlns). By default the xmlns > namespace has the uri http://www.w3.org/2000/xmlns/. Hence removing > xmlns:$name namespace would expected to be: > > DOMElement::removeAttributeNS("http://www.w3.org/2000/xmlns/", "foo") // > removes xmlns:foo > > But thats not how it works, there is special code implemented that requires > you to pick the URI from xmlns:foo value: > > > > DOMElement::removeAttributeNS("urn:foo", "foo"); > > But if your node has an attribute in this namespace with the same name, it > gets removed as well: > > > > Would incorrectly become this, removing 2 attributes: > > > > This is a bug that cannot be fixed in a BC way. I have no clue how to fix > this completly broken behavior in a good way. I propose to break BC here > and requiring "http://www.w3.org/2000/xmlns/" as namespaceURI from PHP 8.0 > to delete a namespace (xmlns) attribute. > > Should I put both issues into an RFC or are these rather bugs that can be > fixed without an RFC? > > greetings > Benjamin