17 years ago by Greg Beaver — view source — reply

unread

Hi,

This is a middle-length message with 4 parts.

Part 1:
on-list behavior

Could we all please be more efficient? I don't care whether we get
along, but we do need to solve a problem, and endless rhetorical
flourishes != patches.

Part 2:
namespace examples.

Let's examine a realistic code sample that uses a bunch of internal and
user classes from Pyrus (PEAR installer rewritten for PHP 5.3). A few
notes:

this code in svn has not yet been namespaced because the
implementation is not yet set, and so is a good test case for how to
namespace it.
the example is from PEAR2 code, and therefore will have PEAR2
namespace. The autoloader can therefore be written to avoid trying to
autoload anything outside the namespace, which reduces unnecessary
directory grepping. This is possible with any prefixed class names
(which is another good reason to prefix your names, whether with
namespaces or underscored names)
there are actually 2 code samples, one illustrating many userspace
classes, and the other illustrating use of internal classes, which I
will lump together as if they were 1 example.

Userspace classes:
PEAR2_Pyrus_PackageFile_v2Iterator_File
PEAR2_Pyrus_PackageFile_v2Iterator_FileAttribsFilter
PEAR2_Pyrus_PackageFile_v2Iterator_FileContents

Internal classes:
RecursiveIteratorIterator
XMLReader
DOMDocument

OK, here is the code chunk.

<?php
switch ($fake) {
case 'contents' :
// allows stuff like:
// foreach ($pf->contents as $file) {
// echo $file->name;
// $file->installed_as = 'hi';
// }
return new PEAR2_Pyrus_PackageFile_v2Iterator_File(
new
PEAR2_Pyrus_PackageFile_v2Iterator_FileAttribsFilter(
new PEAR2_Pyrus_PackageFile_v2Iterator_FileContents(
$this->packageInfo['contents'], 'contents',
$this)),
RecursiveIteratorIterator::LEAVES_ONLY);
}
// next chunk
$a = new DOMDocument();
if ($isfile) {
$a->load($file);
}
while ($this->reader->read()) {
$depth = $this->reader->depth;
if ($this->reader->nodeType == XMLReader::ELEMENT) {
$tag = $this->reader->name;
// snip
continue;
}
if ($this->reader->nodeType == XMLReader::END_ELEMENT) {
return $arr;
}
if ($this->reader->nodeType == XMLReader::TEXT ||
$this->reader->nodeType == XMLReader::CDATA) {
$arr = $this->mergeValue($arr, $this->reader->value);
}
?>

Now, let's put this code into the PEAR2::Pyrus::PackageFile::v2Iterator
namespace, so that the long "new" statement can be much shorter.

Currently, in order for this code to be autoload-compatible, we need to
use all of the classes:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use PEAR2::Pyrus::PackageFile::v2Iterator::File;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileAttribsFilter;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileContents;
?>

Without even one of the above use statements, upon the introduction (for
instance) of an internal class named "File", the code would suddenly
stop working, or worse, if method names happened to be the same,
possibly perform dangerous calling into unexpected code, which I would
even classify as a potential security vulnerability (i.e. unexpected
execution of code could be used in perfectly valid PHP code to access
the file system with code that does not do this in a different PHP
version, without code change).

Now, if we use the name resolution I have suggested, the code can work
with this single namespace line:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
?>

but has the performance slowdown of autoload being called for
PEAR2::Pyrus::PackageFile::v2Iterator::DOMDocument,
PEAR2::Pyrus::PackageFile::v2Iterator::XMLReader and
PEAR2::Pyrus::PackageFile::v2Iterator::RecursiveIteratorIterator. The
performance slowdown can be 100% removed (with identical functionality) via:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use ::XMLReader, ::DOMDocument, ::RecursiveIteratorIterator;
?>

Part 3:
judgment of value

Current approach:
advantages:

internal classes resolve very fast
disadvantages:
potential unexpected name resolution to internal class when
namespaced classname exists

New approach:
advantages:

code runs the same regardless of load order or autoload
disadvantages:
serious performance slowdown on each internal class name resolution

Part 4:
solving the disadvantages of the new approach

each internal class can be "use ::classname;" to remove this
performance hit 100% of the time
to detect missed classnames, add a new error level, E_DEBUG, which is
disabled by default and is used to note potentially buggy situations,
i.e. "XMLReader internal class ambiguity, use ::XMLReader if you intend
to use internal XMLReader class". lint could enable E_DEBUG, and this
is easily implemented at parse time, since all internal classes will be
loaded.
a simple script that detects internal class names in a script and
adds use statements to the top of the script:

<?php
namespace NSParser;
class Parser
{
protected $tokens;
protected $i = -1;
protected $classes = array();
protected $use = array();
protected $ret;

function __construct($path, $namespace)
{
    $classes = `get_declared_classes()`;
    $classes = array_merge($classes, `get_declared_interfaces()`);
    $this->classes = array_flip($classes);
    unset($this->classes['NSParser::Parser']);
    if (@is_file($path)) {
        $path = file_get_contents($path);
    }
    $this->tokens = token_get_all($path);
    foreach ($this->tokens as &$token) {
        if (!is_array($token)) {
            $token = array(ord($token), $token);
        }
    }
    $ret = "&lt;?php\nnamespace " . $namespace . ";\n";
    do {
        if ($this->tokens[$this->i][0] == T_STRING) {
            if (isset($this->classes[$this->tokens[$this->i][1]])) {
                $this->use[$this->tokens[$this->i][1]] = 1;
            }
        }
    } while (++$this->i < count($this->tokens));
    foreach ($this->use as $name => $unused) {
        $ret .= "use ::$name;\n";
    }
    $ret .= "?>" . $path;
    $this->ret = $ret;
}

function __toString()
{
    return $this->ret;
}

}
?>

The above script combined with a quick eyeball inspection would
eliminate all performance issues with minimal effort.

Does this help clarify the issue? I think it's a no-brainer - the
performance hit is easy to solve, and well worth the risk, the current
behavior is far too close to a potential security vulnerability for my
comfort zone.

Let's also remember that an optimization that adds a 1-time addition of
1 line to source files is probably something most developers can live
with - it's a hell of a lot easier than optimizing a slow algorithm.

Thanks,
Greg

17 years ago by Chuck Hagenbuch — view source — reply

unread

Quoting Greg Beaver greg@chiaraquartet.net:

Part 3:
judgment of value

Current approach:
advantages:

internal classes resolve very fast
disadvantages:

potential unexpected name resolution to internal class when
namespaced classname exists

New approach:
advantages:

code runs the same regardless of load order or autoload
disadvantages:

serious performance slowdown on each internal class name resolution

In my opinion the unexpected name resolution to internal classes is a
serious flaw in the current namespace implementation, and will cause a
lot of confusion or (in my case) avoiding namespaces altogether. The
clincher for me is that the performance ramifications can be solved
with the exact same workaround that is currently necessary for all
user-defined classes (use ::Exception) AND - in PHP user-defined
classes are far more common and numerous than internal classes.

So I would be extremely happy to see this change go into PHP 5_3.
Thanks, Greg!

-chuck

17 years ago by Stanislav Malyshev — view source — reply

unread

Hi!

First of all, I'd like to thank Greg for moving discussion to the
constructive way, way of code examples and proposed solutions.

Currently, in order for this code to be autoload-compatible, we need to
use all of the classes:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use PEAR2::Pyrus::PackageFile::v2Iterator::File;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileAttribsFilter;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileContents;
?>

No, you don't. You just need to use namespace::File etc. where
appropriate. You may do what you did too, but you don't have to.

performance slowdown can be 100% removed (with identical functionality) via:

So you traded 3 uses for 3 uses. And on top of that - you also traded
imaginary case of somebody in the future adding internal class named
File (how probable is that? Quite near 0 - by now nobody would add
classes with names like that even in core, let alone in extensions) -
which is easily avoidable by using namespace:: in this particular case
and can be thus easily fixed if that improbably thing happens - for very
real case of having to write use for EVERY internal class, even if you
NEVER intended to override anything with anything but just using
namespaces for your own private stuff. You end up paying for imaginary
case which would never happen in reality with very real busywork - which
is unavoidable and would with 100% guarantee silently break your
application if you forget to do it even once.

Part 4:
solving the disadvantages of the new approach

each internal class can be "use ::classname;" to remove this
performance hit 100% of the time

Not "can", but "must". Effectively, each instance of the code that uses
internal class without :: or "use ::" is a bug by your approach, since
it immediately introduces performance-killing slowdown (remember - since
it's exhaustive search for file that doesn't exist, it may be dozens of
filesystem accesses without possibility of bytecode cache helping you).

The only logical option there would be to prohibit use of internal
classes without :: altogether - since there's no situation where it
isn't a bug. I'm not sure I want to accept that - are you?

to detect missed classnames, add a new error level, E_DEBUG, which is
disabled by default and is used to note potentially buggy situations,
i.e. "XMLReader internal class ambiguity, use ::XMLReader if you intend
to use internal XMLReader class". lint could enable E_DEBUG, and this
is easily implemented at parse time, since all internal classes will be
loaded.

That's not a solution, since it supposes you have 100% coverage unit
tests for all your code. I.e. you might add it, and if we eventually fix
error reporting so that muted errors would be fast it even might not
hurt you, but it wouldn't make life any better for real application
where you don't have 100% coverage for every possible code path.
BTW, adding new error level for every new problem doesn't seem smart for
me - IMHO we have enough levels as it is already.
Having a tool checking it may help, but I don't think it is smart to put
such a landmine into the language and expect the users to always use
proper tools to avoid it. I think it would be irresponsible.

a simple script that detects internal class names in a script and
adds use statements to the top of the script:

I would hate to have all my code to be peppered with tons of use
statements each time I use internal class... but maybe it's just my
opinion. If you allow resolving to internal class only with either "use
::Exception" or ::Exception, this proposal is non-contradictory, even
though I like it less than the current state of affairs.

Also note that if we accept that we must deal with functions - either
have rule for them saying you can use internal function name without ::
unlike class name (as it is now) - thus having functions and classes
resolve differently - or have to always use :: for internal functions
too (which IMHO in PHP case is as close to hell as you can get being a
programming language). Same for constants. So for this proposal to be
complete there should be answer for that too.

I'd like to know how many people would like that better than what we
have now - i.e. having unqualified name without use to always resolve to
namespace::name (all the rest follows from here automatically).

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

17 years ago by Greg Beaver — view source — reply

unread

Hi,

I've got what I hope is the solution to the problem that erases the
performance issue, and a patch to implement it, read on for details.

Stanislav Malyshev wrote:

Currently, in order for this code to be autoload-compatible, we need to
use all of the classes:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use PEAR2::Pyrus::PackageFile::v2Iterator::File;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileAttribsFilter;
use PEAR2::Pyrus::PackageFile::v2Iterator::FileContents;
?>

No, you don't. You just need to use namespace::File etc. where
appropriate. You may do what you did too, but you don't have to.
Thanks for this pointer, I hadn't realized one could use namespace:: in
a use statement. For those who aren't clear on what Stas is saying,
here is the better example:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use namespace::File, namespace::FileAttribsFilter, namespace::FileContents;
?>

It is worth noting that the actual code I pulled from Pyrus would be in
the PEAR2::Pyrus::PackageFile namespace, so the imports would then
become namespace::v2Iterator::***.

performance slowdown can be 100% removed (with identical
functionality) via:

So you traded 3 uses for 3 uses. And on top of that - you also traded
imaginary case of somebody in the future adding internal class named
File (how probable is that? Quite near 0 - by now nobody would add
classes with names like that even in core, let alone in extensions) -
which is easily avoidable by using namespace:: in this particular case
and can be thus easily fixed if that improbably thing happens - for
very real case of having to write use for EVERY internal class, even
if you NEVER intended to override anything with anything but just
using namespaces for your own private stuff. You end up paying for
imaginary case which would never happen in reality with very real
busywork - which is unavoidable and would with 100% guarantee silently
break your application if you forget to do it even once.
Another quick note: if namespaces become robust enough, PHP core could
introduce any classname we wish without fear of breaking userspace code

which is one of the primary goals of namespacing, is it not?

As for the performance nightmare scenario, I decided to whip up a little
benchmark:

<?php
namespace foo;
function __autoload($class)
{
if ($class == 'foo::XMLReader') {
class XMLReader {
function __construct() {echo CLASS,"\n";}
}
return;
}
// add performance intensive work to demonstrate performance issue
for($i=1;$i<=1000;stat(FILE),$i++);
}
spl_autoload_register(NAMESPACE . '::__autoload');
for ($i=1;$i < 100;$i++) {
$a = new Exception('hi');
echo get_class($a),"\n";
}
$b = new XMLReader;
?>

output: 100x "Exception" followed by "foo::XMLReader"

This little script demonstrates an autoload that is about 500x-800x
slower than most autoloads, as in most cases, there is a 1=>1
class->filename relationship, and thus only a couple of stat calls are
performed per autoload. However, this does demonstrate that in an
extreme environment, the difference between having a "use ::Exception;"
at the top and the above code is 466x. Yes, that's right, 466 times
slower. On my machine, that's a difference of instantaneous versus 3
seconds to execute.

Now for the good news. The reason it is so slow is that the autoload is
called for every single invocation of the class name. With the patch at
http://pear.php.net/~greg/resolution_fix.patch.txt this performance is
improved to only 5.2x slower than its counterpart, even with the
exceptionally ridiculous autoload. On my machine, there is no
perceptible difference in performance to the naked eye. It's a HUGE
performance gain over the pre-optimized patch. Note that the way I
calculated the performance was by running callgrind against the script,
and then doing simple division of the total number of "events" that
callgrind reported. Thus, the fastest (with use ::Exception) was 15
million events, the slowest (pre-optimization) was 7.3 billion events,
and the optimized was 82 million events, reflecting the additional 65
million events added by the heavy __autoload and the light hash table
lookups.

It is also important to note that this slowdown would only occur in
situations where an unqualified classname is used multiple times in the
same file, which most commonly happens with static class methods or
class constants. The iterator constants are a good example, as are
XMLReader and many other commonly used internal classes. For projects
that only use 2-3 internal class names, this patch won't help, but at
the same time, the performance issue is muted by having only a few
classname references as well.

The patch also implements the new classname resolution and all
existing namespace tests pass. The example code above should become
ns_070.phpt. The only caveat - I ran out of time, this patch is for
PHP_5_3, and will need the u porting to HEAD for unicode.

Now that the performance hit is minimal, can we agree to get along on
this proposed change Stas? ;) It transforms the critical nature of the
problem to a truly minor performance hit on internal class usage without
use or ::.

Greg

17 years ago by Stanislav Malyshev — view source — reply

unread

Hi!

a use statement. For those who aren't clear on what Stas is saying,
here is the better example:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use namespace::File, namespace::FileAttribsFilter, namespace::FileContents;
?>

No, actually that's not what I meant. You can do this too, but I meant
just use namespace::File. There's no law about using :: in the code,
really. Just write namespace::File, it's both short and clear.

Now for the good news. The reason it is so slow is that the autoload is
called for every single invocation of the class name. With the patch at
http://pear.php.net/~greg/resolution_fix.patch.txt this performance is

What this patch does?

Now that the performance hit is minimal, can we agree to get along on
this proposed change Stas? ;) It transforms the critical nature of the
problem to a truly minor performance hit on internal class usage without
use or ::.

Please read my response to your previous email, I have described there
issues with your proposal and how to make it non-contradictory and what
else needs to be done.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

17 years ago by Greg Beaver — view source — reply

unread

Stanislav Malyshev wrote:

Hi!

a use statement. For those who aren't clear on what Stas is saying,
here is the better example:

<?php
namespace PEAR2::Pyrus::PackageFile::v2Iterator;
use namespace::File, namespace::FileAttribsFilter,
namespace::FileContents;
?>

No, actually that's not what I meant. You can do this too, but I meant
just use namespace::File. There's no law about using :: in the code,
really. Just write namespace::File, it's both short and clear.

Now for the good news. The reason it is so slow is that the autoload is
called for every single invocation of the class name. With the patch at
http://pear.php.net/~greg/resolution_fix.patch.txt this performance is

What this patch does?

Now that the performance hit is minimal, can we agree to get along on
this proposed change Stas? ;) It transforms the critical nature of the
problem to a truly minor performance hit on internal class usage without
use or ::.

Please read my response to your previous email, I have described there
issues with your proposal and how to make it non-contradictory and
what else needs to be done.

Stas,

I did read your message and posted a response, it is you who needs to
read my reply. "What this patch does?" is patent evidence that you did
not. Please try again.

Greg

17 years ago by Gregory Beaver — view source — reply

unread

Stas and I continued our discussion off list, and decided to pop it back
on the list, so here is 1 message, another to follow.

Greg

===
Stanislav Malyshev wrote:

Hi!

...the message you replied to?

I must be missing something here. In my last reply, I raised a number
of points, including having unqualified names resolve only to
namespace and relation of this to functions and constants. I do not
see these points addressed or discussed anywhere in your email. Some
less important points - like tradeoffs you need to take for working
with internal classes (second reply part in my email) - weren't
mentioned either.
Hi Stas,

In your last reply, you asserted that the only solution was to always
resolve unqualified names to namespace::name. The patch I sent actually
shows that this is unnecessary.

Also, I do not see the explanation how this patch is different from
the last one and how the difference in the speed is achieved - you
just claim it has gone from 466x to 5.2x but I did not find any
explanation about what is the change allowed you to do this. I would
like to understand what this change is and to do that from reading a
patch is kind of hard and I can't be sure what I understood was
actually your intent. Could you explain it?
You could also contact me on IM or IRC if you prefer.
Basically, the patch does 2 things

resolution order is changed to:

foo.php:
$a = new foo();
a) does namespace::foo class exist?
b) try to autoload namespace::foo
c) try to locate internal class ::foo

if internal class ::foo is found, then cache this information so that
if another reference to the foo class occurs in foo.php, resolution
short circuits to:
a) use internal class ::foo

This second point is why the slowdown went from ridiculous to slight.

Since your primary complaint about the resolution order is performance,
and the patch removes the performance problem, this makes your
suggestion of never checking internal classes unnecessary. That was the
point of my previous email with the patch. Clear now?

Greg

17 years ago by Gregory Beaver — view source — reply

unread

Hi,

This is the 2nd of two exchanges that occurred off-list that Stas and I
would like to put back on-list.

Greg

Stanislav Malyshev wrote:

Hi!

Basically, the patch does 2 things

resolution order is changed to:

foo.php:
$a = new foo();
a) does namespace::foo class exist?
b) try to autoload namespace::foo
c) try to locate internal class ::foo

if internal class ::foo is found, then cache this information
so that
if another reference to the foo class occurs in foo.php, resolution
short circuits to:
a) use internal class ::foo

This second point is why the slowdown went from ridiculous to slight.

OK, now I understand better what it does, thanks. Do I understand
right that the caching is in runtime, when the class access opcode is
executed (since you couldn't autoload in compile time)? If so, how
long the cache entry is kept there? I.e. if I have execution path that
goes to file A, then B, then A again, then C, then B, etc. and then
unroll the stack back - do I store all results for second entrance
into A? Is it per-op-array (since we don't have per-file data
structure AFAIK)? What happens if include path (or autoloader)
changes, so that the class name that could not be loaded before can be
loaded now - e.g. application adds plugin definitions, etc. - will it
be ensured that autoloading is never attempted again for the same
class? What happens if somebody loads that namespace::foo class
manually (with require) - will the old resolution hold or the new one
apply?

Also I note that even if the cache stores all resolutions for all
internal classes times all namespaces, it will still require at least
one exhaustive autoload search per internal class per op-array - or
per file, if you somehow find a way to store cache per-file in
runtime, and these can not be eliminated with bytecode caching, unlike
all other filesystem accesses.
Hi,

The caching is at runtime. Basically, it's in executor_globals, and so
is not linked to any opcode array (I couldn't figure out any other
globally accessible storage location). It works like an associative
array. If file X.php loads XMLReader in namespace namespacename, it
creates this entry:

array(
'X.php\0namespacename::XMLReader' => {class entry for XMLReader}
);

So that the next time we reach X.php and the XMLReader class, it sees
this entry, and does not attempt autoload. As for your questions about
include_path/autoload, I had not added this to the patch, but the easy
solution is to clean the hash on include_path change, or autoload
change. I assume this code would break with the patch:

sneakyreader.php:
<?php
namespace foo;
class XMLReader {}
?>

main.php:
<?php
namespace foo;
$a = new XMLReader; // loads ::XMLReader
include 'sneakyreader.php';
$a = new XMLReader; // still loads ::XMLReader
?>

Whereas with the current behavior, it would instead instantiate
foo::XMLReader on the second access. However, this is definitely a wtf
situation - the exact same looking line of code would load a different
class name entirely depending on what was included.

The cache does not eliminate the first autoload per-file.

I'm sure there are other issues to work out, but this should give a
better idea of what I am trying to do.

As for your question of whether to simply error out if foo::XMLReader
doesn't exist (in the main.php example above), this also would be
acceptable for classes, but not for functions, as the ratio of
user-defined to internal functions in any source code example is 1:many,
many. However, we don't autoload functions or constants, so load ordr
is an absolute, much as it was for classes in PHP 4 before autoload was
introduced.

It is not an easy problem to solve, but we do need to solve it prior to
the next release. Should we take this back on-list?

Greg

17 years ago by Stanislav Malyshev — view source — reply

unread

Hi!

The caching is at runtime. Basically, it's in executor_globals, and so
is not linked to any opcode array (I couldn't figure out any other
globally accessible storage location). It works like an associative
array. If file X.php loads XMLReader in namespace namespacename, it
creates this entry:

array(
'X.php\0namespacename::XMLReader' => {class entry for XMLReader}
);

I though a bit about that patch, and I see following problems with it:

If you do not use autoloading, you still can get different
resolution, e.g.:

<?php
namespace foo;
$a = new XMLReader;

resolves to ::XMLReader and cached for this file. But suppose some other
file, somewhere, loaded foo::XMLReader prior to that. Then the same code
here would resolve now to foo::XMLReader as before, right?
Isn't it the same scenario you complained about? If so this patch does
not solve this for 100% of cases, unless I misunderstand how the patch
works.

While the caching indeed helps per-file, typical OO application
contains dozens of files, and each file may use dozens of internal
classes (my install has 100+ internal classes). Meaning, there is still
potential for hundreds of non-cacheable exhaustive autoload searches.
This is not a good scenario.
--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

17 years ago by Gregory Beaver — view source — reply

unread

Stanislav Malyshev wrote:

Hi!

The caching is at runtime. Basically, it's in executor_globals, and so
is not linked to any opcode array (I couldn't figure out any other
globally accessible storage location). It works like an associative
array. If file X.php loads XMLReader in namespace namespacename, it
creates this entry:

array(
'X.php\0namespacename::XMLReader' => {class entry for XMLReader}
);

I though a bit about that patch, and I see following problems with it:

If you do not use autoloading, you still can get different
resolution, e.g.:

<?php
namespace foo;
$a = new XMLReader;

resolves to ::XMLReader and cached for this file. But suppose some other
file, somewhere, loaded foo::XMLReader prior to that. Then the same code
here would resolve now to foo::XMLReader as before, right?
Isn't it the same scenario you complained about? If so this patch does
not solve this for 100% of cases, unless I misunderstand how the patch
works.

While the caching indeed helps per-file, typical OO application
contains dozens of files, and each file may use dozens of internal
classes (my install has 100+ internal classes). Meaning, there is still
potential for hundreds of non-cacheable exhaustive autoload searches.
This is not a good scenario.

Your first point is correct, and the second is interesting, but does not
quite match reality. I whipped up a script to count the number of
internal vs. user classes in a project, it's at

http://pear.php.net/~greg/detect_classes.phps

I ran it on PEAR2, which is a pre-namespace project at the moment, and
came up with these stats:

Internal classes: 239
User classes: 768
in 171 files

I ran the same script over PEAR, and got:

Internal classes: 110
User classes: 2027
in 231 files

Now, you may argue that these are atypical OO applications, so I also
ran the script over Zend Framework 1.5, with these results:

Internal classes: 950
User classes: 16167
in 1821 files

In other words, there is simply no comparison. Userspace class usage
outnumbers internal class usage by an order of magnitude in typical OO
PHP code.

PEAR2 uses an average of 2 internal classes per file, and 7 userspace
classes. PEAR uses an average of 0.5 internal classes per file, and 8
userspace classes per file. Zend Framework uses an average of 0.5
internal classes per file and ~9 userspace classes per file. Thus, if I
had to choose between defaulting to current namespace (as you suggested)
and to internal classes, the choice is easy. I would choose current
namespace because there would be far fewer use statements, and the
behavior is deterministic.

The script is at:

http://pear.php.net/~greg/detect_classes.phps

run with "php detect_classes.php /path/to/files"

Oh, and the script is one of the few cases where there are 3 internal
classes used and only 1 userspace class :).

Thanks,
Greg

17 years ago by Stanislav Malyshev — view source — reply

unread

Hi!

In other words, there is simply no comparison. Userspace class usage
outnumbers internal class usage by an order of magnitude in typical OO
PHP code.

I didn't claim userspace class usage outnumbers internal class usage or
otherwise. What I claimed is that since we have a lot of files
(framework has 1821, as you very helpfully pointed out) and those use
950 instances of internals classes, application using Framework has big
chance to "score" hundreds of uncacheable autoloads. And that is not
counting user code, which would add to the problem.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

17 years ago by Gregory Beaver — view source — reply

unread

Stanislav Malyshev wrote:

Hi!

In other words, there is simply no comparison. Userspace class usage
outnumbers internal class usage by an order of magnitude in typical OO
PHP code.

I didn't claim userspace class usage outnumbers internal class usage
or otherwise. What I claimed is that since we have a lot of files
(framework has 1821, as you very helpfully pointed out) and those use
950 instances of internals classes, application using Framework has
big chance to "score" hundreds of uncacheable autoloads. And that is
not counting user code, which would add to the problem.
Right, and since you pointed out this limitation and then found another
example that has the same problem that cannot be solved (non-autoload
with load order determining what classname is instantiated), that
invalidates any possible approach. I was trying to say that this leaves
us with only 1 choice, which is whether to use the name resolution you
proposed or stick with the current one. Currently we have:

check current namespace
check internal
autoload if possible
fail

I remind all of the name resolution for classes that you proposed:

check current namespace
autoload if possible
fail

Since my little script proved that the ratio of userspace vs. internal
classes heavily favors userspace classes, this load order makes sense.
However, I want to point out that it doesn't take a script to prove that
the ratio of userspace functions vs. internal functions is exactly the
opposite (far more internal functions vs. userspace functions), so I
propose we have separate name resolution for classes and functions

classes:

check current namespace
autoload if possible
fail

functions:

check current namespace
check internal
fail

Load order would matter for functions, but the risk is worth the benefit.

I believe the initial implementation of namespaces had this order for
classes, so Dmitry's first patch would have the necessary changes.

Greg

17 years ago by Stanislav Malyshev — view source — reply

unread

Hi!

check current namespace

autoload if possible

fail

functions:

check current namespace

check internal

fail

Load order would matter for functions, but the risk is worth the benefit.

I believe the initial implementation of namespaces had this order for
classes, so Dmitry's first patch would have the necessary changes.

That's exactly the mode that I described as second non-contradictory
mode. I would like to know how many people here feel it is better than
what we have now. I know many people right now may not be reading the
list due to the Zend Conference so we'd have to wait some time to get a
good feedback on that.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

17 years ago by Andrew Mason — view source — reply

unread

I for one would prefer the

<<
classes:

check current namespace
autoload if possible
fail

functions:

check current namespace
check internal
fail

I think this a good approach. I don't know how biased I am as I use
the autoloader quite alot but it seems reasonable to me.

Hi!

check current namespace

autoload if possible

fail

functions:

check current namespace

check internal

fail

Load order would matter for functions, but the risk is worth the benefit.

I believe the initial implementation of namespaces had this order for
classes, so Dmitry's first patch would have the necessary changes.

That's exactly the mode that I described as second non-contradictory mode. I
would like to know how many people here feel it is better than what we have
now. I know many people right now may not be reading the list due to the
Zend Conference so we'd have to wait some time to get a good feedback on
that.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

17 years ago by Roman Borschel — view source — reply

unread

Hi,

I for one would prefer the

<<
classes:

check current namespace

autoload if possible

fail

functions:

check current namespace

check internal

fail

I think this a good approach. I don't know how biased I am as I use
the autoloader quite alot but it seems reasonable to me.

I share that view.

Regards

Roman

17 years ago by Christian Seiler — view source — reply

unread

Hi,

<<
classes:

check current namespace

autoload if possible

fail

functions:

check current namespace

check internal

fail

I think this a good approach. I don't know how biased I am as I use
the autoloader quite alot but it seems reasonable to me.

I share that view.

I share this view, too.

Just for a small rationale why I strongly prefer this to "current /
internal / autoload" (the behaviour in CVS): Not only because of the
already stated fact that user classes are much more common than internal
classes: One of the reasons namespaces were introduced was to avoid
another DateTime debacle (anyone remember 5.1? ;-)) - thus that the
introduction of new internal classes in later PHP versions would not
lead to problems with existing code. If there is any chance that an
internal class may be used instead of a user class, this will break
this concept and thus the usefulness of namespaces would be reduced by
quite a bit. The argument, that use namespace::ClassName is possible is
bogus in my eyes since I wouldn't expect to have to import classes of
the same namespace - as far as I know no other language does that and
it is counter-intuitive. Importing classes from other namespaces (even
the global one) however is something I would expect.

(I probably repeated some arguments already presented but this is to sum
up my stance on this topic.)

Regards,
Christian

17 years ago by Marcus Boerger — view source — reply

unread

Hello Gregory,

Sunday, September 14, 2008, 7:40:01 AM, you wrote:

Stanislav Malyshev wrote:

Hi!

In other words, there is simply no comparison. Userspace class usage
outnumbers internal class usage by an order of magnitude in typical OO
PHP code.

I didn't claim userspace class usage outnumbers internal class usage
or otherwise. What I claimed is that since we have a lot of files
(framework has 1821, as you very helpfully pointed out) and those use
950 instances of internals classes, application using Framework has
big chance to "score" hundreds of uncacheable autoloads. And that is
not counting user code, which would add to the problem.
Right, and since you pointed out this limitation and then found another
example that has the same problem that cannot be solved (non-autoload
with load order determining what classname is instantiated), that
invalidates any possible approach. I was trying to say that this leaves
us with only 1 choice, which is whether to use the name resolution you
proposed or stick with the current one. Currently we have:

check current namespace

check internal

autoload if possible

fail

I remind all of the name resolution for classes that you proposed:

check current namespace

autoload if possible

fail

Since my little script proved that the ratio of userspace vs. internal
classes heavily favors userspace classes, this load order makes sense.
However, I want to point out that it doesn't take a script to prove that
the ratio of userspace functions vs. internal functions is exactly the
opposite (far more internal functions vs. userspace functions), so I
propose we have separate name resolution for classes and functions

classes:

check current namespace

autoload if possible

fail

If we no longer check in ternal then:
a) Functions and classes have a different load behavior - not just
support through autoload (which could in some form be added for functions
later.
b) we could very easy add nested namespaces as then there is no more
conflict in that.
c) is forcing people to always write the long way for internal classes what
we want? I doubt it because practically it means we chaqnge the name of all
internal classes in the presence of namespaces and the current load order
was deigned to prevent that iirc.

my conclusion is that i prefer the state of the art as it is and well i am
not yet convinced we need functions in namespaces. But it appears there are
many friends and I am not against it. So I propose we do it as compatible
as possible. And that means they should follow the same name lookup
pattern. And if that means a tiny bit of slowdown so be it. That is better
then just another inconsistency.

marcus

functions:

check current namespace

check internal

fail

Load order would matter for functions, but the risk is worth the benefit.

I believe the initial implementation of namespaces had this order for
classes, so Dmitry's first patch would have the necessary changes.

Greg

Best regards,
Marcus

namespace examples (solving name resolution order issues)

I'd like to know how many people would like that better than what we have now - i.e. having unqualified name without use to always resolve to namespace::name (all the rest follows from here automatically).

Please read my response to your previous email, I have described there issues with your proposal and how to make it non-contradictory and what else needs to be done.

Greg

I'd like to know how many people would like that better than what we
have now - i.e. having unqualified name without use to always resolve to
namespace::name (all the rest follows from here automatically).

Please read my response to your previous email, I have described there
issues with your proposal and how to make it non-contradictory and what
else needs to be done.