Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:40376
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: error (pb1.pair.com: domain chiaraquartet.net from 208.83.222.18 cause and error)
Message-ID: <48C59D5C.4050507@chiaraquartet.net>
Date: Mon, 08 Sep 2008 16:47:08 -0500
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070807)
MIME-Version: 1.0
To:  internals@lists.php.net
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: namespace examples (solving name resolution order issues)
From: greg@chiaraquartet.net (Greg Beaver)

Hi,

This is a middle-length message with 4 parts.

Part 1:
on-list behavior

Could we all please be more efficient?  I don't care whether we get
along, but we do need to solve a problem, and endless rhetorical
flourishes != patches.

Part 2:
namespace examples.

Let's examine a realistic code sample that uses a bunch of internal and
user classes from Pyrus (PEAR installer rewritten for PHP 5.3).  A few
notes:

1) this code in svn has not yet been namespaced because the
implementation is not yet set, and so is a good test case for how to
namespace it.
2) the example is from PEAR2 code, and therefore will have PEAR2
namespace.  The autoloader can therefore be written to avoid trying to
autoload anything outside the namespace, which reduces unnecessary
directory grepping.  This is possible with any prefixed class names
(which is another good reason to prefix your names, whether with
namespaces or underscored names)
3) there are actually 2 code samples, one illustrating many userspace
classes, and the other illustrating use of internal classes, which I
will lump together as if they were 1 example.

Userspace classes:
PEAR2_Pyrus_PackageFile_v2Iterator_File
PEAR2_Pyrus_PackageFile_v2Iterator_FileAttribsFilter
PEAR2_Pyrus_PackageFile_v2Iterator_FileContents

Internal classes:
RecursiveIteratorIterator
XMLReader
DOMDocument

OK, here is the code chunk.

<?php
switch ($fake) {
            case 'contents' :
                // allows stuff like:
                // foreach ($pf->contents as $file) {
                //     echo $file->name;
                //     $file->installed_as = 'hi';
                // }
                return new PEAR2_Pyrus_PackageFile_v2Iterator_File(
                        new
PEAR2_Pyrus_PackageFile_v2Iterator_FileAttribsFilter(
                        new PEAR2_Pyrus_PackageFile_v2Iterator_FileContents(
                            $this->packageInfo['contents'], 'contents',
$this)),
                            RecursiveIteratorIterator::LEAVES_ONLY);
}
// next chunk
            $a = new DOMDocument();
            if ($isfile) {
                $a->load($file);
            }
        while ($this->reader->read()) {
            $depth = $this->reader->depth;
            if ($this->reader->nodeType == XMLReader::ELEMENT) {
                $tag = $this->reader->name;
// snip
                continue;
            }
            if ($this->reader->nodeType == XMLReader::END_ELEMENT) {
                return $arr;
            }
            if ($this->reader->nodeType == XMLReader::TEXT ||
                  $this->reader->nodeType == XMLReader::CDATA) {
                $arr = $this->mergeValue($arr, $this->reader->value);
            }
?>

Now, let's put this code into the PEAR2::Pyrus::PackageFile::v2Iterator
namespace, so that the long "new" statement can be much shorter.

Currently, in order for this code to be autoload-compatible, we need to
use all of the classes:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use PEAR2::Pyrus::PackageFile::v2Iterator::File;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileAttribsFilter;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileContents;
?>

Without even one of the above use statements, upon the introduction (for
instance) of an internal class named "File", the code would suddenly
stop working, or worse, if method names happened to be the same,
possibly perform dangerous calling into unexpected code, which I would
even classify as a potential security vulnerability (i.e. unexpected
execution of code could be used in perfectly valid PHP code to access
the file system with code that does not do this in a different PHP
version, without code change).

Now, if we use the name resolution I have suggested, the code can work
with this single namespace line:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
?>

but has the performance slowdown of autoload being called for
PEAR2::Pyrus::PackageFile::v2Iterator::DOMDocument,
PEAR2::Pyrus::PackageFile::v2Iterator::XMLReader and
PEAR2::Pyrus::PackageFile::v2Iterator::RecursiveIteratorIterator.  The
performance slowdown can be 100% removed (with identical functionality) via:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use ::XMLReader, ::DOMDocument, ::RecursiveIteratorIterator;
?>

Part 3:
judgment of value

Current approach:
advantages:
1) internal classes resolve very fast
disadvantages:
1) potential unexpected name resolution to internal class when
namespaced classname exists

New approach:
advantages:
1) code runs the same regardless of load order or autoload
disadvantages:
1) serious performance slowdown on each internal class name resolution

Part 4:
solving the disadvantages of the new approach

1) each internal class can be "use ::classname;" to remove this
performance hit 100% of the time
2) to detect missed classnames, add a new error level, E_DEBUG, which is
disabled by default and is used to note potentially buggy situations,
i.e. "XMLReader internal class ambiguity, use ::XMLReader if you intend
to use internal XMLReader class".  lint could enable E_DEBUG, and this
is easily implemented at parse time, since all internal classes will be
loaded.
3) a simple script that detects internal class names in a script and
adds use statements to the top of the script:

<?php
namespace NSParser;
class Parser
{
    protected $tokens;
    protected $i = -1;
    protected $classes = array();
    protected $use = array();
    protected $ret;

    function __construct($path, $namespace)
    {
        $classes = get_declared_classes();
        $classes = array_merge($classes, get_declared_interfaces());
        $this->classes = array_flip($classes);
        unset($this->classes['NSParser::Parser']);
        if (@is_file($path)) {
            $path = file_get_contents($path);
        }
        $this->tokens = token_get_all($path);
        foreach ($this->tokens as &$token) {
            if (!is_array($token)) {
                $token = array(ord($token), $token);
            }
        }
        $ret = "<?php\nnamespace " . $namespace . ";\n";
        do {
            if ($this->tokens[$this->i][0] == T_STRING) {
                if (isset($this->classes[$this->tokens[$this->i][1]])) {
                    $this->use[$this->tokens[$this->i][1]] = 1;
                }
            }
        } while (++$this->i < count($this->tokens));
        foreach ($this->use as $name => $unused) {
            $ret .= "use ::$name;\n";
        }
        $ret .= "?>" . $path;
        $this->ret = $ret;
    }

    function __toString()
    {
        return $this->ret;
    }
}
?>

The above script combined with a quick eyeball inspection would
eliminate all performance issues with minimal effort.

Does this help clarify the issue?  I think it's a no-brainer - the
performance hit is easy to solve, and well worth the risk, the current
behavior is far too close to a potential security vulnerability for my
comfort zone.

Let's also remember that an optimization that adds a 1-time addition of
1 line to source files is probably something most developers can live
with - it's a hell of a lot easier than optimizing a slow algorithm.

Thanks,
Greg