Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:40376 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75808 invoked from network); 8 Sep 2008 21:47:17 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 8 Sep 2008 21:47:17 -0000 Authentication-Results: pb1.pair.com smtp.mail=greg@chiaraquartet.net; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=greg@chiaraquartet.net; sender-id=unknown Received-SPF: error (pb1.pair.com: domain chiaraquartet.net from 208.83.222.18 cause and error) X-PHP-List-Original-Sender: greg@chiaraquartet.net X-Host-Fingerprint: 208.83.222.18 unknown Linux 2.6 Received: from [208.83.222.18] ([208.83.222.18:37952] helo=mail.bluga.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 15/46-46475-46D95C84 for ; Mon, 08 Sep 2008 17:47:16 -0400 Received: from mail.bluga.net (localhost.localdomain [127.0.0.1]) by mail.bluga.net (Postfix) with ESMTP id 0DB9791E131 for ; Mon, 8 Sep 2008 14:46:31 -0700 (MST) Received: from Greg-Beavers-monster.local (CPE-76-84-4-101.neb.res.rr.com [76.84.4.101]) (using SSLv3 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.bluga.net (Postfix) with ESMTP id 7640491E130 for ; Mon, 8 Sep 2008 14:46:30 -0700 (MST) Message-ID: <48C59D5C.4050507@chiaraquartet.net> Date: Mon, 08 Sep 2008 16:47:08 -0500 User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070807) MIME-Version: 1.0 To: internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Subject: namespace examples (solving name resolution order issues) From: greg@chiaraquartet.net (Greg Beaver) Hi, This is a middle-length message with 4 parts. Part 1: on-list behavior Could we all please be more efficient? I don't care whether we get along, but we do need to solve a problem, and endless rhetorical flourishes != patches. Part 2: namespace examples. Let's examine a realistic code sample that uses a bunch of internal and user classes from Pyrus (PEAR installer rewritten for PHP 5.3). A few notes: 1) this code in svn has not yet been namespaced because the implementation is not yet set, and so is a good test case for how to namespace it. 2) the example is from PEAR2 code, and therefore will have PEAR2 namespace. The autoloader can therefore be written to avoid trying to autoload anything outside the namespace, which reduces unnecessary directory grepping. This is possible with any prefixed class names (which is another good reason to prefix your names, whether with namespaces or underscored names) 3) there are actually 2 code samples, one illustrating many userspace classes, and the other illustrating use of internal classes, which I will lump together as if they were 1 example. Userspace classes: PEAR2_Pyrus_PackageFile_v2Iterator_File PEAR2_Pyrus_PackageFile_v2Iterator_FileAttribsFilter PEAR2_Pyrus_PackageFile_v2Iterator_FileContents Internal classes: RecursiveIteratorIterator XMLReader DOMDocument OK, here is the code chunk. contents as $file) { // echo $file->name; // $file->installed_as = 'hi'; // } return new PEAR2_Pyrus_PackageFile_v2Iterator_File( new PEAR2_Pyrus_PackageFile_v2Iterator_FileAttribsFilter( new PEAR2_Pyrus_PackageFile_v2Iterator_FileContents( $this->packageInfo['contents'], 'contents', $this)), RecursiveIteratorIterator::LEAVES_ONLY); } // next chunk $a = new DOMDocument(); if ($isfile) { $a->load($file); } while ($this->reader->read()) { $depth = $this->reader->depth; if ($this->reader->nodeType == XMLReader::ELEMENT) { $tag = $this->reader->name; // snip continue; } if ($this->reader->nodeType == XMLReader::END_ELEMENT) { return $arr; } if ($this->reader->nodeType == XMLReader::TEXT || $this->reader->nodeType == XMLReader::CDATA) { $arr = $this->mergeValue($arr, $this->reader->value); } ?> Now, let's put this code into the PEAR2::Pyrus::PackageFile::v2Iterator namespace, so that the long "new" statement can be much shorter. Currently, in order for this code to be autoload-compatible, we need to use all of the classes: Without even one of the above use statements, upon the introduction (for instance) of an internal class named "File", the code would suddenly stop working, or worse, if method names happened to be the same, possibly perform dangerous calling into unexpected code, which I would even classify as a potential security vulnerability (i.e. unexpected execution of code could be used in perfectly valid PHP code to access the file system with code that does not do this in a different PHP version, without code change). Now, if we use the name resolution I have suggested, the code can work with this single namespace line: but has the performance slowdown of autoload being called for PEAR2::Pyrus::PackageFile::v2Iterator::DOMDocument, PEAR2::Pyrus::PackageFile::v2Iterator::XMLReader and PEAR2::Pyrus::PackageFile::v2Iterator::RecursiveIteratorIterator. The performance slowdown can be 100% removed (with identical functionality) via: Part 3: judgment of value Current approach: advantages: 1) internal classes resolve very fast disadvantages: 1) potential unexpected name resolution to internal class when namespaced classname exists New approach: advantages: 1) code runs the same regardless of load order or autoload disadvantages: 1) serious performance slowdown on each internal class name resolution Part 4: solving the disadvantages of the new approach 1) each internal class can be "use ::classname;" to remove this performance hit 100% of the time 2) to detect missed classnames, add a new error level, E_DEBUG, which is disabled by default and is used to note potentially buggy situations, i.e. "XMLReader internal class ambiguity, use ::XMLReader if you intend to use internal XMLReader class". lint could enable E_DEBUG, and this is easily implemented at parse time, since all internal classes will be loaded. 3) a simple script that detects internal class names in a script and adds use statements to the top of the script: classes = array_flip($classes); unset($this->classes['NSParser::Parser']); if (@is_file($path)) { $path = file_get_contents($path); } $this->tokens = token_get_all($path); foreach ($this->tokens as &$token) { if (!is_array($token)) { $token = array(ord($token), $token); } } $ret = "tokens[$this->i][0] == T_STRING) { if (isset($this->classes[$this->tokens[$this->i][1]])) { $this->use[$this->tokens[$this->i][1]] = 1; } } } while (++$this->i < count($this->tokens)); foreach ($this->use as $name => $unused) { $ret .= "use ::$name;\n"; } $ret .= "?>" . $path; $this->ret = $ret; } function __toString() { return $this->ret; } } ?> The above script combined with a quick eyeball inspection would eliminate all performance issues with minimal effort. Does this help clarify the issue? I think it's a no-brainer - the performance hit is easy to solve, and well worth the risk, the current behavior is far too close to a potential security vulnerability for my comfort zone. Let's also remember that an optimization that adds a 1-time addition of 1 line to source files is probably something most developers can live with - it's a hell of a lot easier than optimizing a slow algorithm. Thanks, Greg