Hi!
It's been a few days since I wanted to send this email to internals, but
real life has been a bit chaotic so I apologize if it comes off as if I
didn't research the archives enough. I glossed over the Module conversation
from 10 months ago and the one that recently surfaced and after deeply
thinking about Rowan's and Larry's comments I wanted to throw this idea
into the pit.
Lets preface the conversation with the fact that 1) a module system for PHP
has been discussed for several years and 2) if there was an easy and
perfect solution it would have been long implemented by now. With that in
mind, I think there are mainly two major "camps": the ones that would
support something new similar to Node ESM vs CommonJS and those who won't.
Having dealt with this mess on the NodeJS side, I'm still on the side that
would support it because even though it's been 10 years worth of "mess", it
has greatly empowered progress. But I think PHP is too conservative to
indulge this camp, so I'm going to focus on Rowan's and Larry's position of
"we need something that builds on top of namespace, not replace it".
If we consider how GitHub, Composer and Docker Hub works, we can pin a very
important aspect of "namespaces": {entity}/{project}. Entity may either be
an individual or an organization, but the concept is mostly the same.
Although it can be argued that PHP has nothing to do with that, I think
that could be a "good-enough" foundation considering the complexity of the
subject. Here is what we could do:
<?php declare(strict_types=1);
namespace Acme\ProjectOne
{
public class Foo {} // same as class Foo {}
private class Bar {} // only visible inside Acme\ProjectOne
protected class Baz {} // visible inside Acme
}
namespace Acme\ProjectTwo
{
new \Acme\ProjectOne\Foo; // Work as always
new \Acme\ProjectOne\Bar; // Fatal error: Uncaught Error: Cannot
instantiate private class \Acme\ProjectOne\Bar from \Acme\ProjectTwo
new \Acme\ProjectOne\Baz; // Works
}
namespace Corp\Corp
{
new \Acme\ProjectOne\Foo; // Work as always
new \Acme\ProjectOne\Bar; // Fatal error: Uncaught Error: Cannot
instantiate private class \Acme\ProjectOne\Bar from \Corp\Corp
new \Acme\ProjectOne\Baz; // Fatal error: Uncaught Error: Cannot
instantiate protected class \Acme\ProjectOne\Baz from \Corp\Corp
}
function (\Acme\ProjectOne\Foo $foo) {} // Works as always
function (\Acme\ProjectOne\Bar $bar) {} // Open question: allow or disallow it?
function (\Acme\ProjectOne\Baz $baz) {} // Open question: allow or disallow it?
This would allow public, private and protected classes in a way that I
believe to be useful for the large ecosystem that surrounds Composer. From
my extremely limited understanding of the engine, I think the easy/natural
step would be to allow private/protected classes to be received outside
its namespace because a type declaration does not trigger autoload.
However, an important question is whether this is enough groundwork that
could lead to optimizations that have been discussed when the topic of
module is brought up. For instance, if type-hint outside the module is
disallowed, could that make it easier to pack and optimize an entire module
if we could instruct PHP how to load all symbols of a namespace all at
once? I don't know.
As I'm writing this down I don't know if it could be related or if its
something only making sense inside my head, but I see the above proposal
paired with a potential amendment to PSR-4 (and Composer), to stimulate the
community to pack small related symbols in a single file with an opt-in
approach:
composer.json:
// ...
"autoload": {
"psr-4-with-module": {
"App\\": "app/",
}
},
// ...
<?php declare(strict_types=1);
// app/Foo/Bar.php
namespace App\Foo;
class Bar {}
// app/Foo.module.php
namespace App\Foo;
enum Baz {}
enum Qux {}
new \App\Foo\Bar; // loads app/Foo/Bar.php
\App\Foo\Baz::option; // file app/Foo/Baz.php does not exist, tries
app/Foo.module.php before giving up
\App\Foo\Qux::option; // app/Foo.module.php has been loaded and Qux
has been registered already
Thoughts?
--
Marco Deleu
Hi!
It's been a few days since I wanted to send this email to internals, but
real life has been a bit chaotic so I apologize if it comes off as if I
didn't research the archives enough. I glossed over the Module conversation
from 10 months ago and the one that recently surfaced and after deeply
thinking about Rowan's and Larry's comments I wanted to throw this idea
into the pit.Lets preface the conversation with the fact that 1) a module system for
PHP has been discussed for several years and 2) if there was an easy and
perfect solution it would have been long implemented by now.If we consider how GitHub, Composer and Docker Hub works, we can pin a
very important aspect of "namespaces": {entity}/{project}.
Hi, thank you for starting this thread.
I have a similar thing, an idea that was sitting on my mind that I want to
put in writing, so I will share it here.
First of all, I want to acknowledge the problems with big projects, big
monoliths that can only have one composer.json and many libraries needed,
that would eventually clash and make upgrades harder than they need to be.
Can we develop the modules concept without linking ourselves to namespaces
or any other new entity?
I think it might be possible, if every symbol that is defined is attached
to a module when it is loaded.
And we can use something like:
module(string $name, callable $closure);
The $closure is defined in the current module, but is executed in the new
module.
Anything else follows the logic: the functions and methods are executed in
the module where they are defined and associated with. Any new symbols
defined inherit the current module.
When a module is defined by another parent module, a relation will be
created between them and the parent module would get access to the child
module symbols, but not also the other way around.
There is no need to change anything on the libraries code but only on
composer and autoloading. That is the key point for easy adoption.
Composer-related, each installed package will be loaded in their own module.
What is yet to figure out:
- can we define internal modules, where the child module symbols are
visible only in the parent defining module and not any level up? - once a module defined, can the parent - child relation be
redefined/changed/removed?
I apologize if this changes the solution discussion, but I didn't wanted to
start a new thread on the same topic.
--
Alex
Hi!
This would allow public, private and protected classes in a way that I
believe to be useful for the large ecosystem that surrounds Composer. From
my extremely limited understanding of the engine, I think the easy/natural
step would be to allow private/protected classes to be received outside
its namespace because a type declaration does not trigger autoload.
This has been discussed before - making Namespaces something beyond what
they are - a convenience string replace. That is given
namespace foo;
use otherns\boo;
function bar () {}
bar();
boo();
The engine quietly rewrites as
function foo\bar () {}
foo\bar();
otherns\boo();
The engine doesn't store any notion of namespace directly, it's just a
string replace on the symbol table. And if I recall correctly this is also
the reason that \ is the namespace operator - the original proposal was for
namespaces to have the :: operator, same as static classes, but late in the
implementation it was found that there were disambiguation problems and the
choice was made to use the \ operator rather than solve those problems. I
assume the problems were judged to be intractable, but I can't say for sure.
Also, a namespace block could appear anywhere, so if a stubborn programmer
really wanted to get access to a private or protected block they could put
a namespace block into their own code with the same namespace.
This also doesn't isolate a module with shared dependencies from changes to
those dependencies. This makes composer libraries impossible to use
effectively in the absence of a mechanism for coordinating those conflicts
from the core. And for worse, the WP core team doesn't want to provide
such. And yes, it really isn't the PHP dev team's direct responsibility to
step in and directly fix that problem. Indirectly perhaps? Only if it is
a benefit to everyone who uses the language.
Neither PHP nor JavaScript deal with directories or other file collections.
We have to look to Java and its jar files or golang modules for examples of
this. And no, the PSR standard of mapping namespaces to directories has
nothing to do with how the engine itself sees packages, namespaces, or the
world. In truth it sees none of these because they don't exist in the
runtime in any meaningful way that I'm aware of. Phar perhaps? I know
little of that system beyond the fact is the manner in which composer and a
few other libs are distributed.
I sense, and I could be wrong here, that there is no appetite for moving
beyond the one file at a time model. So the system I proposed last go round
was still a one file solution.
Other ideas I've seen kicked around - file based privacy. In this schema a
file can declare itself private (public is the assumed and Backward
compatible default) and then mark exceptions to this as public. But for
this to truly be useful I fear some file structure awareness would be
needed and again, at the moment, PHP doesn't do that.
A very long time ago I proposed a require into namespace mechanism. The
engine by default attaches all incoming namespaces to the root /. I
suggested (because I naively thought require was a function construct, not
a statement, and was ignorant of the difference at the time) that "target
namespace" could be the second argument of require and if provided all the
symbols established by that file become attached to that namespace. This
could only work if the autoloader correctly dispatched symbols to the
correct namespaces, and I don't have a clue how it could do that.
The need of the plugin community is code that just plugs into the app and
doesn't need to care about what the other applications are doing. This is
currently possible only if the plugin provides all of its own code and if
it does use composer libraries, it resorts to monkey-typing to change the
names of all the symbols to something prefixed with the plugin name to
avoid collisions. This approach is NOT optimal but it is the only one at
present.
However, an important question is whether this is enough groundwork that
could lead to optimizations that have been discussed when the topic of
module is brought up. For instance, if type-hint outside the module is
disallowed, could that make it easier to pack and optimize an entire module
if we could instruct PHP how to load all symbols of a namespace all at
once? I don't know.
Doing that would likely involve giving the engine some notion of directory,
again as Java does with JAR files, and PHP might do in PHAR files but I
know embarrassingly little about them. Could the existing PHAR structure be
used in some way for a starting point on this? I just don't know, not
without research.
If we consider how GitHub, Composer and Docker Hub works, we can pin a very important aspect of "namespaces": {entity}/{project}. Entity may either be an individual or an organization, but the concept is mostly the same. Although it can be argued that PHP has nothing to do with that, I think that could be a "good-enough" foundation considering the complexity of the subject.
While a two-level namespace root for a project is common, it's far from universal. Picking two examples from the first page of "popular packages" on packagist.org, Guzzle's root namespace is one level ("GuzzleHttp") and the Symfony Console component's root namespace is three levels ("Symfony\Component\Console").
So I think any module or visibility tied to namespaces would need a way to declare that prefix explicitly, not have the language assume it's a particular length.
If we just want namespace visibility, we could use Scala's approach, where the private modifier itself can be qualified:
private[\Symfony\Component\Console] class Foo { ... }
private[\Symfony] class Bar { ... }
If we want modules to have more existence - module-wide declares, optimisation, etc - then we need some way of declaring "this namespace prefix is a module" - a "module" keyword, or "declare_module" function, or something. Those are the lines that Larry and Arnaud were exploring along a while ago - see https://github.com/Crell/php-rfcs/blob/master/modules/spec-brainstorm.md and https://github.com/arnaud-lb/php-src/pull/10
What Michael Morris is talking about is really a completely different concept - it's more like "containers", in the sense of Docker, Kubernetes, etc, where different sections of code can be isolated, and declare classes with conflicting fully-qualified names. I don't think it's what most applications and libraries would want "modules" to be; it's probably best thought of as a completely separate feature.
--
Rowan Tommins
[IMSoP]
On Wed, May 14, 2025 at 4:08 AM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
What Michael Morris is talking about is really a completely different
concept - it's more like "containers", in the sense of Docker, Kubernetes,
etc, where different sections of code can be isolated, and declare classes
with conflicting fully-qualified names. I don't think it's what most
applications and libraries would want "modules" to be; it's probably best
thought of as a completely separate feature.
Well, it's what Go calls "modules". It's confusing because I was being
truthful, not snarky, when I said "Ask 10 programmers for the definition of
module and expect 12 answers." I'm self trained, so I expect to get my
terms wrong from time to time. But I know enough to identify problems and
needs and I've tried to be clear on that.
I'm currently reading up on Phar and seeing exactly how suited it would be
as a foundation for a module system. I've also been reading on how go
approaches things, but go has package management baked into the compiler -
PHP outsources this to userland. I'm going to guess that's largely because
of lack of staff - PHP has no large backers (leeches like Facebook that use
it heavily and could back it yes, but not backers) and Go has Google.
__
What Michael Morris is talking about is really a completely different concept - it's more like "containers", in the sense of Docker, Kubernetes, etc, where different sections of code can be isolated, and declare classes with conflicting fully-qualified names. I don't think it's what most applications and libraries would want "modules" to be; it's probably best thought of as a completely separate feature.
Well, it's what Go calls "modules". It's confusing because I was being truthful, not snarky, when I said "Ask 10 programmers for the definition of module and expect 12 answers." I'm self trained, so I expect to get my terms wrong from time to time. But I know enough to identify problems and needs and I've tried to be clear on that.
I'm currently reading up on Phar and seeing exactly how suited it would be as a foundation for a module system. I've also been reading on how go approaches things, but go has package management baked into the compiler - PHP outsources this to userland. I'm going to guess that's largely because of lack of staff - PHP has no large backers (leeches like Facebook that use it heavily and could back it yes, but not backers) and Go has Google.
Hi Michael,
Since it appears that nested classes probably won't pass by tomorrow (and thus no need to even touch short-syntax classes); I was going to focus on modules next. As I mentioned in that thread, the two are very closely related on a technical level -- it would have only taken 2-3 lines of changes to turn it into namespaces-as-modules and another 10 to turn it into proper modules (minus syntax support). However, I would implement it very differently knowing what I know today and with this as a goal (vs. nested classes). I have zero idea why people voted "no" and the people who expressed their reasons didn't entirely make sense either. So, I suspect it was just down to a poorly worded RFC and/or misunderstanding of how it worked. I'll have to revisit it again later.
Sorry for the vent; that's not what this thread is about.
Modules. First of all, I'd be more than happy to help with the implementation if you're up for some collaboration. Personally, here are my requirements I was going into it with:
- Impossible name collisions. If you want to name something Foo\Bar in your module and I want to name something Foo\Bar in mine; we should be free to do so. Implementing this is straightforward.
- Simple. I don't want to rely on a package manager to create modules or even use them. I really like the simplicity of "require_once" from time to time, and I don't want to see that go away. I have some ideas here, like
require_module my-module.php
- Easy to optimize. A module should be compiled as a complete unit so opcache (or the engine itself) can make full use of the context. There are a lot of optimizations left on the table right now because the engine dynamically compiles one file at a time.
I haven't really even considered syntax too much; but I personally don't want anything new -- or at least, too "out there." I want it to feel like a natural extension to the language rather than something bolted on.
I suspect there will need to be at least two new user-land elements to this:
- a "module loader" that operates similar to the unified class loader Gina proposed.
- importing modules and aliasing them (as needed).
Would you be interested in collaborating further?
— Rob
Well, it's what Go calls "modules". It's confusing because I was being
truthful, not snarky, when I said "Ask 10 programmers for the definition of
module and expect 12 answers." I'm self trained, so I expect to get my
terms wrong from time to time. But I know enough to identify problems and
needs and I've tried to be clear on that.
I don't know much about Go, but at a glance it uses a similar model to JavaScript and Python where classes don't have a universal name, the names are always local. That's not a different kind of module, it's a fundamentally different language design.
If you want to use two different versions of Guzzle in the same application, the first problem you need to solve has nothing to do with require, or autoloading, or Phar files. The first problem you need to solve is that you now have two classes called \GuzzleHttp\Client, and that breaks a bunch of really fundamental assumptions.
For example:
- plugin1 uses Guzzle v5, runs "$client1 = new \GuzzleHttp\Client", and returns it to the main application
- The main application passes $client1 to plugin2
- plugin2 uses Guzzle v4
plugin2 runs "$client2 = new \GuzzleHttp\Client"
$client1 and $client2 are instances of different classes, with the same name! How does "instanceof" behave? What about "get_class"? What if you serialize and unserialize?
I think if you changed the language enough that those questions didn't matter, it would be a language fork on the scale of Python 2 to 3, or even Perl 5 to Raku (originally called "Perl 6"). Every single application and library would have to be rewritten to use the new concept of what a class is. And most of them would get absolutely no benefit, because they want to reference the same version of a class everywhere in the application.
That's why I think "containers" are the more useful comparison - you need some way to put not just plugin1 itself, but all the third-party code it calls, into some kind of sandbox, as though it was running in a separate process. If you can control what classes can go into and out of that sandbox, then in any piece of code, you don't end up with conflicting meanings for the same name - just as a Linux container can't open a network port directly on the host.
Regards,
Rowan Tommins
[IMSoP]
On Wed, May 14, 2025 at 10:57 AM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
I don't know much about Go, but at a glance it uses a similar model to
JavaScript and Python where classes don't have a universal name, the
names are always local. That's not a different kind of module, it's a
fundamentally different language design.
That said, they call them modules. I'm not going to argue with them, what
do I know?
If you want to use two different versions of Guzzle in the same
application, the first problem you need to solve has nothing to do with
require, or autoloading, or Phar files. The first problem you need to solve
is that you now have two classes called \GuzzleHttp\Client, and that breaks
a bunch of really fundamental assumptions.
Your fundamental assumption is that the different versions are loaded onto
the same symbol. Given the problems you outline yourself, why do that?
You see, PHP doesn't have a mechanism for symbol changing at compile time.
Everything loads onto the root. Does it have to be that way? Or can a
file compile onto a namespace, effectively prefixing that namespace.
For example:
- plugin1 uses Guzzle v5, runs "$client1 = new \GuzzleHttp\Client", and
returns it to the main application- The main application passes $client1 to plugin2
- plugin2 uses Guzzle v4
plugin2 runs "$client2 = new \GuzzleHttp\Client"
If plugin 2 wants to use version 4 for whatever reason, why can't it load
it into \Plugin2\GuzzleHttpClient instead of onto the root??
This is what userland monkey-typers like Strauss do. It works, but there
are issues with this solution outlined elsewhere.
That's why I think "containers" are the more useful comparison - you need
some way to put not just plugin1 itself, but all the third-party code it
calls, into some kind of sandbox, as though it was running in a separate
process. If you can control what classes can go into and out of that
sandbox, then in any piece of code, you don't end up with conflicting
meanings for the same name - just as a Linux container can't open a network
port directly on the host.
Container, module, block, package, plugin, domain, division, fraction,
lump, branch, sliver, splinter, constituent or whatever the hell else you
call it, I don't care. What I need is a way to manage package version
conflicts which arise in the real world when plugins get abandoned or when
coordinating having everyone change dependencies at the same time isn't
feasible.
Container, module, block, package, plugin, domain, division, fraction,
lump, branch, sliver, splinter, constituent or whatever the hell else you
call it, I don't care.
I know you think I'm just being pedantic about names, but
what I was trying to get across was the distinction between different features that we could have both of, because they're solving separate problems.
It's basically about where the dividing line is. If you want this hierarchy of dependencies:
+-- Plugin1 -- AcmeSDK v2 -- Guzzle v5
App --+
+-- Plugin2 -- AcmeSDK v1 -- Guzzle v4
The requirement is not to hide Guzzle from Plugin1 - maybe it needs to create an object from Guzzle and pass it into AcmeSDK.
Instead, the requirement is for Plugin1 to hide both AcmeSDK and Guzzle from Plugin2. You don't want 7 different "things" (whatever you want to call them) in that diagram, you want 3 (App, Plugin1-and- recursive-dependencies, Plugin2-and- recursive-dependencies).
The Linux container analogy is something like this:
+-- container { WordPress -- PHP -- Apache }
Host --+
+-- container { MediaWiki -- PHP -- Apache }
The goal of containers isn't to hide WordPress from Apache or vice versa, it's to hide the two copies of Apache and PHP from each other. There are plenty of things hidden inside Apache (the equivalent of "private classes") but that's a completely separate concept.
I wasn't saying the feature had to be called "containers", just that the analogy might be useful.
Rowan Tommins
[IMSoP]
Well, it's what Go calls "modules". It's confusing because I was being
truthful, not snarky, when I said "Ask 10 programmers for the definition of
module and expect 12 answers." I'm self trained, so I expect to get my
terms wrong from time to time. But I know enough to identify problems and
needs and I've tried to be clear on that.I don't know much about Go, but at a glance it uses a similar model to JavaScript and Python where classes don't have a universal name, the names are always local. That's not a different kind of module, it's a fundamentally different language design.
Go has some weird scoping, for sure. Everything is done by convention instead of syntax. In other words, if you want to export a symbol, you capitalize it; otherwise, it is lower-cased and thus private to the module. Then each directory is a module, and even in the same project, you cannot access another lower-cased symbol from another directory -- er, module.
It is strange, and I don't think it translates to PHP. PHP is generally explicit via syntax over convention.
If you want to use two different versions of Guzzle in the same application, the first problem you need to solve has nothing to do with require, or autoloading, or Phar files. The first problem you need to solve is that you now have two classes called \GuzzleHttp\Client, and that breaks a bunch of really fundamental assumptions.
As written, that simply isn't possible in PHP because there is only one class allowed with a given name. Names of classes are global. I don't think this has to be the case, though. Different languages take different approaches to this. For example, JavaScript allows each module to "close over" its dependencies so each module can import its own version of dependencies. Originally, there wasn't even any deduplication, so you'd have 500 copies of left-pad or whatever. Then there is Go, which doesn't allow you to have multiple versions of modules. You get exactly one version, which is similar to how PHP currently works with composer by default. However, with some massaging, you can "prefix" your imports so you get only your own version. I believe many WordPress plugins do this, so each plugin can use their own version of things.
I'm fairly certain we can do a similar thing so that each module gets its own unique 'namespace' in the class table such that two modules can define the same classes. So ModuleA and ModuleB can have Foo\Bar without conflicting with one another. From the user's perspective, we can probably hide that technical detail from them but allow aliasing:
use module ModuleA; // import ModuleA's namespace into our current namespace for this file
use module ModuleB as Baz; // import ModuleB's namespace into our current namespace for this file, but with a prefix
Foo\Bar; // ModuleA\Foo\Bar
Baz\Foo\Bar; // ModuleB\Foo\Bar
I'm just spitballing syntax here, and I'm not suggesting it actually work like this, but I just want to illustrate that I think there are reasonable ways to allow modules to have conflicting names.
For example:
- plugin1 uses Guzzle v5, runs "$client1 = new \GuzzleHttp\Client", and returns it to the main application
- The main application passes $client1 to plugin2
- plugin2 uses Guzzle v4
plugin2 runs "$client2 = new \GuzzleHttp\Client"$client1 and $client2 are instances of different classes, with the same name! How does "instanceof" behave? What about "get_class"? What if you serialize and unserialize?
I'm of the opinion that the "names" of the module classes be distinct so that humans (and deserializers) know it is from a module. Something like [ModuleA]\Foo\Bar.
I think if you changed the language enough that those questions didn't matter, it would be a language fork on the scale of Python 2 to 3, or even Perl 5 to Raku (originally called "Perl 6"). Every single application and library would have to be rewritten to use the new concept of what a class is. And most of them would get absolutely no benefit, because they want to reference the same version of a class everywhere in the application.
I suspect the hard part will be defining the module in the first place. IE, the "package.json" or "go.mod" or whatever it gets called. As composer isn't a part of the PHP project, I don't want to take it for granted, but I also don't want to rely on it. That means each module may have to define its own "loader" or somehow define what PHP files encompass the module. As I mentioned earlier, PHP doesn't usually operate by convention, though the community tends to force it to anyway (PSR-4 autoloading comes to mind immediately); so we'd need something that is explicit but automatable so the community can implement conventions.
That's going to be the hard part.
That's why I think "containers" are the more useful comparison - you need some way to put not just plugin1 itself, but all the third-party code it calls, into some kind of sandbox, as though it was running in a separate process. If you can control what classes can go into and out of that sandbox, then in any piece of code, you don't end up with conflicting meanings for the same name - just as a Linux container can't open a network port directly on the host.
Exactly.
Regards,
Rowan Tommins
[IMSoP]
— Rob
As written, that simply isn't possible in PHP because there is only one class allowed with a given name. Names of classes are global. I don't think this has to be the case, though. Different languages take different approaches to this. For example, JavaScript allows each module to "close over" its dependencies so each module can import its own version of dependencies.
I would say that JavaScript doesn't just allow this, as an added feature, it requires it, as a fundamental design decision:
- In JavaScript, Python, etc, when you declare a function or class, you are creating an anonymous object, and assigning it to a local variable. Code reuse requires you to pass that object around.
- In PHP, Java, C#, etc, when you declare a function or class, you are adding a permanent named item to a global list. Code reuse is about knowing the global names of things.
It's worth noting that JavaScript didn't need to add any features to make NPM, Bower, etc work; everything they do is based on the fact that declarations are objects which can be passed around at will.
That's why I don't think "JavaScript can do it" is relevant, because the way JavaScript does it is impossible in PHP. We're much better off looking at how PHP works, and what problems we're actually trying to solve.
And that in turn is why I was reaching for Linux containers as an alternative analogy, to think about the problem without jumping to the wrong solution.
Rowan Tommins
[IMSoP]
As written, that simply isn't possible in PHP because there is only one class allowed with a given name. Names of classes are global. I don't think this has to be the case, though. Different languages take different approaches to this. For example, JavaScript allows each module to "close over" its dependencies so each module can import its own version of dependencies.
I would say that JavaScript doesn't just allow this, as an added feature, it requires it, as a fundamental design decision:
- In JavaScript, Python, etc, when you declare a function or class, you are creating an anonymous object, and assigning it to a local variable. Code reuse requires you to pass that object around.
- In PHP, Java, C#, etc, when you declare a function or class, you are adding a permanent named item to a global list. Code reuse is about knowing the global names of things.
It's worth noting that JavaScript didn't need to add any features to make NPM, Bower, etc work; everything they do is based on the fact that declarations are objects which can be passed around at will.
That's why I don't think "JavaScript can do it" is relevant, because the way JavaScript does it is impossible in PHP. We're much better off looking at how PHP works, and what problems we're actually trying to solve.
And that in turn is why I was reaching for Linux containers as an alternative analogy, to think about the problem without jumping to the wrong solution.
Rowan Tommins
[IMSoP]
Hey Rowan,
When working on nested classes, I did spend quite a bit of time tinkering with alternative implementations. One of those implementations was having the ability for classes to have their own class tables (both literally and emulated via name mangling) which would have allowed for classes to have private classes that could share names with external classes. This turned out to be an utter disaster of an idea for many reasons. Namely, PHP doesn't really have any native support for shadowing names. Sure, there is aliasing via use statements, but that only works for classes outside the current namespace. As long as we can guarantee that a module acts as a special namespace (under the hood), the only potential for collisions will be in the module itself.
All that is to say that I don't think comparing PHP to JavaScript is appropriate when considering modules. JavaScript doesn't have types, so I can pass you an EpicStringV2 when you're expecting an EpicStringV1, and as long as my EpicStringV2 has the right prototypical behavior and data, it will work just fine. PHP is typed, and fairly strongly typed. There is effectively no way to have multiple versions of the same type running around a codebase and pass type checks. Changing this would be effectively impossible and probably unsound from a type-theory perspective.
— Rob
As written, that simply isn't possible in PHP because there is only one class allowed with a given name. Names of classes are global. I don't think this has to be the case, though. Different languages take different approaches to this. For example, JavaScript allows each module to "close over" its dependencies so each module can import its own version of dependencies.
I would say that JavaScript doesn't just allow this, as an added feature, it requires it, as a fundamental design decision:
- In JavaScript, Python, etc, when you declare a function or class, you are creating an anonymous object, and assigning it to a local variable. Code reuse requires you to pass that object around.
- In PHP, Java, C#, etc, when you declare a function or class, you are adding a permanent named item to a global list. Code reuse is about knowing the global names of things.
It's worth noting that JavaScript didn't need to add any features to make NPM, Bower, etc work; everything they do is based on the fact that declarations are objects which can be passed around at will.
That's why I don't think "JavaScript can do it" is relevant, because the way JavaScript does it is impossible in PHP. We're much better off looking at how PHP works, and what problems we're actually trying to solve.
And that in turn is why I was reaching for Linux containers as an alternative analogy, to think about the problem without jumping to the wrong solution.
Rowan Tommins
[IMSoP]Hey Rowan,
When working on nested classes, I did spend quite a bit of time tinkering with alternative implementations. One of those implementations was having the ability for classes to have their own class tables (both literally and emulated via name mangling) which would have allowed for classes to have private classes that could share names with external classes. This turned out to be an utter disaster of an idea for many reasons. Namely, PHP doesn't really have any native support for shadowing names. Sure, there is aliasing via use statements, but that only works for classes outside the current namespace. As long as we can guarantee that a module acts as a special namespace (under the hood), the only potential for collisions will be in the module itself.
All that is to say that I don't think comparing PHP to JavaScript is appropriate when considering modules. JavaScript doesn't have types, so I can pass you an EpicStringV2 when you're expecting an EpicStringV1, and as long as my EpicStringV2 has the right prototypical behavior and data, it will work just fine. PHP is typed, and fairly strongly typed. There is effectively no way to have multiple versions of the same type running around a codebase and pass type checks. Changing this would be effectively impossible and probably unsound from a type-theory perspective.
— Rob
Haha, I just reread your email and realized we're basically saying the same thing (I think?).
— Rob
The Problem: Interoperability.
That's really it. Scenario
Alice provides whatchamacallit A that depends on other whatchamacallit D to
work.
Bob provides whatchamacallit B that also depends on D.
Charles is using A and B.
D gets updated with a new incompatible API to its prior version.
Alice publishes an update which includes a security fix.
Bob retired.
Charles, who can't program, can't update to Alice's latest code. His site
eventually gets pwned.
That's the problem. Packages with dependencies are not interoperable at
this time. They must be self contained. This is why WordPress doesn't
support Composer at all.
Drupal, Laravel et al bypass this problem by forcing all their
whachamacallits to stay on the same version. This has limited their market
penetration compared to WordPress because, despite being significantly
superior codebases in all respects, they aren't user friendly to someone
who doesn't code at all.
The Solution (10,000 overview)
Composer could be made to allow interoperable packages, but it will need
support at the language level to do so. Specifically, it needs to know who
wants what. It can then make decisions based on that information.
Composer's primary link to the language is the autoload closure it
provides. That closure currently takes one argument - the fully qualified
name of the symbol to be loaded - currently almost always classes as for
various reasons function autoloading isn't a thing. Can it not take a
second argument to modify its behavior? The current behavior is to flat
require the file if it is found in accordance to whichever schema is in
use. Perhaps we don't want that anymore - perhaps we want to return the
file path to use. This allows the engine to make decisions about how
exactly to include the file, including the possibility of monkey typing it
as can be done in userland, though when done in userland this effectively
generates a new package.
(5,000 ft. overview)
Suppose we have a whatchamacallit that declares its namespace as a new root
independent of / . If a file inclusion happens in this namespace, this
namespace prepends everything in the included file. So if I do a file
include in the \MyPlugin namespace and that file declares its namespace as
Twig, it will become \MyPlugin\Twig.
That works, but direct file include is no longer the PHP norm though.
Autoloading is. So we need to tell the Autoloader that we want a file path
returned - do NOT require the file yourself in your namespace. This could
be as simple as a boolean flag of true sent to the autoloader. BUT it isn't
- the autoloader (usually composer) needs to know the identity of this
requestor because by configuration in the package json (the details of
which are wildly out of scope) it might change which file path it returns.
When the engine gets the path it does the include and the prepending
business on the fly that Strauss and similar packages already do
in userland.
(2,500 ft overview)
The above I think would more or less work, but it would lead to massive
code duplication as Whatchamacallit A and B now have their own D's at \A\D
and \B\D (assuming namespaces match whatchamacallit names).
Here's what I think would prevent that:
A asks the autoloader for D. The autoloader returns a file path and the
engine mounts to \D
B asks for D. The autoloader returns a different file path so the engine
mounts to B\D and rewrites the D file with the new namespace the same way
Stauss would have done.
This works except for the problem of who had the older version, A or B? and
what order are A and B going to be asking - cause depending on the
application's architecture this order is not guaranteed.
To solve this the autoloader can tell the engine it is safe to mount the
file on root using an array return of [path, true] and mount on the
whatchamacallit's namespace if [path, false]. So
A asks for D. Autoloader returns [path, false]. Engine maps to \A\D and
monkey types D as needed.
B asks for D. Autoloader returns [path, true]. Engine maps to \D
Non whatchamacallit code at namespace C asks for D. It will get the same
version B is using and the autoloader shouldn't be queried unless C makes
this ask before B.
When C asks the autoloader gets (string RequestedSymbol, null) so it can
either do the require itself or return a string, either will work (and it
has to be this way for backwards compat).
When B asks the autoloader gets ( Requested, 'B' ) and it should return
[path, true]
I hope the above is followable. It's more of a morning brainstorm than a
spec.
The Problem: Interoperability.
That's really it. Scenario
Alice provides whatchamacallit A that depends on other whatchamacallit D to
work.
Bob provides whatchamacallit B that also depends on D.
Charles is using A and B.
D gets updated with a new incompatible API to its prior version.
Alice publishes an update which includes a security fix.
Bob retired.
Charles, who can't program, can't update to Alice's latest code. His site
eventually gets pwned.
Let me correct something here. The whole reason I was bringing in the distinction between "module" and "container" is that B and C are one kind of thing, but D is a different kind of thing.
D is something like Guzzle. There is zero motivation for Guzzle to be rewritten in a way that forces its dependencies to be isolated. It depends on packages like "psr/http-client" whose entire purpose is to define interfaces that multiple packages agree on.
A, meanwhile, isn't a thing at all; it's just any old PHP code - in your example, the whole spaghetti of WordPress core.
B and C are the only "whatchamacallits" - they are, in your example, WordPress plugins. They are the thing you want a boundary around, the black box you want conflicting names to be hidden by.
Suppose we have a whatchamacallit that declares its namespace as a new root
independent of / . If a file inclusion happens in this namespace, this
namespace prepends everything in the included file. So if I do a file
include in the \MyPlugin namespace and that file declares its namespace as
Twig, it will become \MyPlugin\Twig.
What does it mean, exactly, for a file inclusion to "happen in a namespace"? Bear in mind, most of the files we want to load, whether explicitly or via an autoloader, are not requests from A (the WordPress plugin) directly to D (Guzzle); they are references between files inside D, or in further dependencies that A has no idea about at all.
What PHP needs to track, somehow, is that a whole bunch of code is "inside" something, or "coloured by" something, in a way that is completely recursive.
That works, but direct file include is no longer the PHP norm though.
Autoloading is. So we need to tell the Autoloader that we want a file path
returned - do NOT require the file yourself in your namespace.
This for me is a non-starter: the existing packages which you want to make use of have little or no motivation to adapt to this new system.
Again, think about Linux containers: applications don't get a message saying "you're running in a container, please use different file I/O conventions"; they think they are accessing the root filesystem, and the host silently rewrites the access to be somewhere in the middle of a larger tree.
I think the way it would need to work would be some global state inside the compiler, so that regardless of how the code ended up being loaded, an extra transform was included in the compilation pipeline to attempt to rewrite all definitions, and all references to those definitions.
(I say "attempt", because even with all this built into the compiler, PHP's highly dynamic nature means there would be code patterns that the rewriter would not see; the whole thing would come with a bunch of caveats.)
The above I think would more or less work, but it would lead to massive
code duplication as Whatchamacallit A and B now have their own D's at \A\D
and \B\D (assuming namespaces match whatchamacallit names).
I don't think this is a problem that can or should be solved.
Imagine A and B both use the same version of E, but different versions of D; and E references D. If we try to de-duplicate, we load one copy of E, but when called from A it needs to reference \A\D and when called from B it needs to reference \B\D. Clearly, that's not going to work, so we're forced to define a separate \A\E and \B\E.
Note that this is completely different from any de-duplication of files on disk that a package manager might perform. It's a bit like the same C source file being compiled into two different object files with different #defines in effect.
I'm still not convinced that all this complexity actually leaves you better off than building a Composer plugin that automatically applies the rewriting to a whole directory at source code level.
Rowan Tommins
[IMSoP]
On Tue, May 20, 2025 at 6:18 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
The Problem: Interoperability.
That's really it. Scenario
Alice provides whatchamacallit A that depends on other whatchamacallit D
to
work.
Bob provides whatchamacallit B that also depends on D.
Charles is using A and B.
D gets updated with a new incompatible API to its prior version.
Alice publishes an update which includes a security fix.
Bob retired.
Charles, who can't program, can't update to Alice's latest code. His site
eventually gets pwned.Let me correct something here. The whole reason I was bringing in the
distinction between "module" and "container" is that B and C are one kind
of thing, but D is a different kind of thing.D is something like Guzzle. There is zero motivation for Guzzle to be
rewritten in a way that forces its dependencies to be isolated. It depends
on packages like "psr/http-client" whose entire purpose is to define
interfaces that multiple packages agree on.A, meanwhile, isn't a thing at all; it's just any old PHP code
I'll stop you there. You are deliberately misrepresenting what I wrote and
even a cursory glance at it makes that clear. You are not trying to be
constructive in any way, you're trolling.
I'll stop you there. You are deliberately misrepresenting what I wrote and
even a cursory glance at it makes that clear. You are not trying to be
constructive in any way, you're trolling.
I'm sorry you got that impression. I can assure you that I am not trolling, and my email was an entirely genuine attempt to engage with the problem that you were trying to describe.
My understanding of the example is that there are two WordPress plugins, which want independent sets of Composer dependencies. There might be 20 different Composer packages used by each plugin, but those packages don't need any special relationship with each other, they just need a special relationship with the WordPress plugin.
So if we can come up with a solution where only the WordPress plugins need to be changed, and you can use whatever dependencies you want without waiting for them to be changed to a new way of working, is that not a good thing?
I've tried several times to explain why I think Linux containers are a good analogy; I'm not sure if you didn't understand, or just didn't agree, so I don't know what else I can say.
Rowan Tommins
[IMSoP]
My understanding of the example is that there are two WordPress plugins, which want independent sets of Composer dependencies. There might be 20 different Composer packages used by each plugin, but those packages don't need any special relationship with each other, they just need a special relationship with the WordPress plugin.
Looking closely, I see I did make one honest mistake: in your example, the WordPress plugins are A and B, not B and C. So my sentence should have read "A and B are one kind of thing, but D is a different kind of thing".
Rowan Tommins
[IMSoP]
On Wed, May 21, 2025 at 8:27 AM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
So if we can come up with a solution where only the WordPress plugins need
to be changed, and you can use whatever dependencies you want without
waiting for them to be changed to a new way of working, is that not a good
thing?
Yes, and that's all I think is needed here. I could modify plugin code if I
needed to. The majority of developers capable of doing this have left
WordPress in disgust - I did too, but I needed the work. Since I have to
work with it I'd like to make it more sane. One major step in that
direction is the Timber library which bridges WP to Twig and gets rid of
that god damned loop architecture they think is the bees knees but in
reality is an antipattern and untestable spaghetti nightmare.
I've tried several times to explain why I think Linux containers are a
good analogy; I'm not sure if you didn't understand, or just didn't agree,
so I don't know what else I can say.
I have no disagreement with that, but it's an implementation detail. I'm
not there yet - I'm just trying to describe what I think is needed from
outside the engine.
Looking closely, I see I did make one honest mistake: in your example, the
WordPress plugins are A and B, not B and C. So my sentence should have read
"A and B are one kind of thing, but D is a different kind of thing".
That's what set me off the most and I over-reacted. To you and to the list
at large, I apologize. I'm just frustrated - I feel like a five year old
trying to explain a problem to a physicist.
Anyone familiar with C++'s friend keyword? It’s not a direct replacement
for modules, but it solves similar problems — allowing trusted classes or
functions to access private/protected members without making them public.
Friend has been brought up before and I believe it was in at least one RFC
before and voted down. That doesn't mean the issue can't be revisited, but
look into the archive and see if my memory is right and if so why was it
voted down before? IIRC it's tied to the fact PHP doesn't have a notion of
namespace level visibility. Classes and functions outside of classes must
be public in the current architecture.
I've tried several times to explain why I think Linux containers are a good analogy; I'm not sure if you didn't understand, or just didn't agree, so I don't know what else I can say.
I have no disagreement with that, but it's an implementation detail.
I'm not there yet - I'm just trying to describe what I think is needed
from outside the engine.
I think this is where we're not seeing eye to eye, and why we're getting
frustrated with each other, because I see it as far more fundamental
than details you have already gone into, like how autoloading will work.
Perhaps a more realistic example will help, and also avoid the confusion
over "A, B, and D" from earler.
Imagine a WordPress plugin, AlicesCalendar, which uses the Composer
packages monolog/monolog and google/apiclient. The google/apiclient
package also requires monolog/monolog.
Another WordPress plugin, BobsDocs, also uses both monolog/monolog and
google/apiclient, but using different versions.
Inside those different places, there are lines of code like this:
$logger = new \Monolog\Logger('alices-calendar'); // in AlicesCalendar
$logger = new \Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new \Monolog\Logger('google-api-php-client'); // in
google/apiclient
We need to rewrite those lines so that they all refer to the correct
version of Monolog\Logger.
If every package/module/whatever rewrites the classes inside every other
package/module/whatever, we might start with this:
$logger = new \AlicesCalendar\Monolog\Logger('alices-calendar'); // in
AlicesCalendar
$logger = new \BobsDocs\Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new \GoogleApiClient\Monolog\Logger('google-api-php-client');
// in google/apiclient
That only works if we somehow know that AlicesCalendar and BobsDocs use
the same google/apiclient; if not, we need four copies:
$logger = new \AlicesCalendar\Monolog\Logger('alices-calendar'); // in
AlicesCalendar
$logger = new
\AlicesCalendar\GoogleApiClient\Monolog\Logger('google-api-php-client');
// in google/apiclient when called from AlicesCalendar
$logger = new \BobsDocs\Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new
\BobsDocs\GoogleApiClient\Monolog\Logger('google-api-php-client'); // in
google/apiclient when called from BobsDocs
All of these are separate classes, which can't be used interchangeably,
and the names get longer and longer to isolate dependencies inside
dependencies.
But we don't actually need the Monolog\Logger used by AlicesCalendar to
be a different version from the one used by google/api-client. In fact,
it would be useful if they were the same, so we could pass around the
objects interchangeably inside the plugin code.
So what we want is some way of saying that AlicesCalendar and BobsDocs
are special; they want to isolate code in a way that normal
modules/packages/whatever don't. Then we can have 2 copies of
Monolog\Logger, not 3 or 4:
$logger = new \AlicesCalendar\Monolog\Logger('alices-calendar'); // in
AlicesCalendar
$logger = new \AlicesCalendar\Monolog\Logger('google-api-php-client');
// in google/apiclient when called from AlicesCalendar
$logger = new \BobsDocs\Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new \BobsDocs\Monolog\Logger('google-api-php-client'); // in
google/apiclient when called from BobsDocs
In this case, PHP doesn't need to know monolog/monolog even exists. It
just puts either "AlicesCalendar" or "BobsDocs" on any class name it sees.
Before we can even think about how we'd implement the rewriting (or
shadowing, or whatever) we need some requirements of what we want to
rewrite. By suggesting an image of "containers" or "sandboxes" rather
than "packages" or "modules", I was trying to define the requirement
that "AlicesCalendar and BobsDocs are special, in a way that
monolog/monolog and google/apiclient are not".
--
Rowan Tommins
[IMSoP]
On Thu, May 22, 2025 at 4:29 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
I've tried several times to explain why I think Linux containers are a
good analogy; I'm not sure if you didn't understand, or just didn't agree,
so I don't know what else I can say.I have no disagreement with that, but it's an implementation detail. I'm
not there yet - I'm just trying to describe what I think is needed from
outside the engine.I think this is where we're not seeing eye to eye, and why we're getting
frustrated with each other, because I see it as far more fundamental than
details you have already gone into, like how autoloading will work.Perhaps a more realistic example will help, and also avoid the confusion
over "A, B, and D" from earler.Imagine a WordPress plugin, AlicesCalendar, which uses the Composer
packages monolog/monolog and google/apiclient. The google/apiclient package
also requires monolog/monolog.Another WordPress plugin, BobsDocs, also uses both monolog/monolog and
google/apiclient, but using different versions.Inside those different places, there are lines of code like this:
$logger = new \Monolog\Logger('alices-calendar'); // in AlicesCalendar
$logger = new \Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new \Monolog\Logger('google-api-php-client'); // in
google/apiclientWe need to rewrite those lines so that they all refer to the correct
version of Monolog\Logger.If every package/module/whatever rewrites the classes inside every other
package/module/whatever, we might start with this:$logger = new \AlicesCalendar\Monolog\Logger('alices-calendar'); // in
AlicesCalendar
$logger = new \BobsDocs\Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new \GoogleApiClient\Monolog\Logger('google-api-php-client'); //
in google/apiclientThat only works if we somehow know that AlicesCalendar and BobsDocs use
the same google/apiclient; if not, we need four copies:$logger = new \AlicesCalendar\Monolog\Logger('alices-calendar'); // in
AlicesCalendar
$logger = new
\AlicesCalendar\GoogleApiClient\Monolog\Logger('google-api-php-client'); //
in google/apiclient when called from AlicesCalendar$logger = new \BobsDocs\Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new
\BobsDocs\GoogleApiClient\Monolog\Logger('google-api-php-client'); // in
google/apiclient when called from BobsDocsAll of these are separate classes, which can't be used interchangeably,
and the names get longer and longer to isolate dependencies inside
dependencies.But we don't actually need the Monolog\Logger used by AlicesCalendar to be
a different version from the one used by google/api-client. In fact, it
would be useful if they were the same, so we could pass around the objects
interchangeably inside the plugin code.So what we want is some way of saying that AlicesCalendar and BobsDocs are
special; they want to isolate code in a way that normal
modules/packages/whatever don't. Then we can have 2 copies of
Monolog\Logger, not 3 or 4:$logger = new \AlicesCalendar\Monolog\Logger('alices-calendar'); // in
AlicesCalendar
$logger = new \AlicesCalendar\Monolog\Logger('google-api-php-client'); //
in google/apiclient when called from AlicesCalendar$logger = new \BobsDocs\Monolog\Logger('bobs-docs'); // in BobsDocs
$logger = new \BobsDocs\Monolog\Logger('google-api-php-client'); // in
google/apiclient when called from BobsDocsIn this case, PHP doesn't need to know monolog/monolog even exists. It
just puts either "AlicesCalendar" or "BobsDocs" on any class name it sees.Before we can even think about how we'd implement the rewriting (or
shadowing, or whatever) we need some requirements of what we want to
rewrite. By suggesting an image of "containers" or "sandboxes" rather than
"packages" or "modules", I was trying to define the requirement that
"AlicesCalendar and BobsDocs are special, in a way that monolog/monolog and
google/apiclient are not".
This is worlds better, and I think I can work with this.
First, let's revisit how autoloading works, if for no other reason than to
test if I understand what's going on correctly. When PHP encounters a
symbol it doesn't recognize, it triggers the autoload process. Autoloaders
are closures registered with the engine using spl_autoload_register, and
PHP queries them one at a time (I don't remember the order offhand). The
autoloader function runs and PHP retests to see if it can resolve the
symbol. If it can, code execution continues. If it can't the next
autoloader is ran and if none are left a Fatal Error is thrown. Autoload
closures get 1 argument - the fully qualified class name. They are expected
to return void.
I believe it would be best to leave the wild and wooly world of package
management alone and just give the engine the ability to allow code in one
area to use a different code even though it has the same label, at least on
the surface. I think this is possible if the engine handles the symbol
assignment in a different way from the existing include statements. The
cleanest way to do that would be to have the autoloader return the file
path to require and, optionally, what namespace to prefix onto all
namespaces in the file.
In summary, let the package manager resolve packages and give it better
tools towards that end.
Returning to your example and closing question, how do we know that
AlicesCalendar and BobsDocs are special? Let the package manager tell us
with this hook:
spl_package_register( array[string] $packages):void
To use composer the user has to run require "/vendor/composer/autoload.php";
near the beginning of their application.
So inside that file a package aware version of composer can call this to
tell the engine what the package namespaces are - in your example
['AlicesCalendar', 'BobsDocs']. (Aside, if spl_package_register is called
multiple times the arrays are merged).
Now, PHP executes the application and enters the code of AlicesCalendar, it
will be largely unchanged:
namepace AlicesCalendar;
$logger = new Monolog\Logger('alices-calendar');
$api = new Google\ApiClient();
But thanks to the spl_package_register hook the engine knows that when it
sees a namespace that starts with or matches any string in the packages
array that the code is part of a package. This will cause it to sent the
autoload closure a second argument with that package namespace so that it
can determine what to send back.
So next it sees the Monolog\Logger symbol.
Does AlicesCalendar\Monolog\Logger exists? No, so we invoke the autoloader
callback with arguments ('AlicesCalendar\Monolog\Logger',
'AlicesCalendar'). The autologger checks its rules (way, way out of scope
here) and determines that AlicesCalendar is using the latest
Monolog\Logger. So it responds with
['file/path/to/latest/Monolog/Logger.php', ''], telling the engine what
code to require and that there is no prefix for the namespaces appearing in
that file ( "" should also work). The engine aliases
AlicesCalendar\Monolog\Logger to \Monolog\Logger so it doesn't have to
pester the autoloader again for this symbol.
The Google\ApiClient goes through the same process. As a result:
namepace AlicesCalendar;
$logger = new Monolog\Logger('alices-calendar');
$api = new Google\ApiClient();
echo $logger::class // \Monolog\Logger
echo $api::class // \Google\ApiClient
Now for the complexity - we reach BobsDocs
namespace BobsDocs;
$logger = new Monolog\Logger('bobs-docs')
$api = new Google\ApiClient();
Bobs docs needs an older version of Monolog and is configured appropriately
in its composer.json file, so when the engine calls the autoloader with
('BobsDocs\Monolog\Logger', 'BobsDocs') the autoloader returns
['file/path/to/older/Monolog/Logger.php', 'v1']. v1 is prefixed to the
namespace declarations in Monolog\Logger and the file is included. The
engine aliases BobsDocs\Monolog\Logger to \v1\Monolog\Logger.
Keep in mind - namespace prefix is a decision left to the package manager.
I'm sure a PSR will be made to establish best practice, but that's out of
scope here.
The Googl\ApiClient of BobDocs is again, up to the autoloader. Assuming it
too is different (since it's using an older Monolog) we'd get something
like this.
namespace BobsDocs;
$logger = new Monolog\Logger('bobs-docs')
$api = new Google\ApiClient();
echo $logger::class // \v1\Monolog\Logger
echo $api::class // \v1\Google\ApiClient
Now later in the code if we make a new \Monolog\Logger the autoloader won't
be invoked - the symbol was written when AlicesCalendar caused it to be
created indirectly.
This approach keeps package resolution out of the engine entirely, which I
think is consistent with PHP's setup. We'd just be improving the tools the
package manager / autoloader can leverage. Older code would still work
since the new autoloader behavior is opt in.
Hi Michael,
I'm going to skip over all the details about the autoloader for now, because I think they're going deep into implementation details, and I want to focus on the same top-level design as my previous email.
Bobs docs needs an older version of Monolog and is configured appropriately
in its composer.json file, so ... v1 is prefixed to the
namespace declarations in Monolog\Logger and the file is included. The
engine aliases BobsDocs\Monolog\Logger to \v1\Monolog\Logger.
If I'm following correctly, you suggest that we would end up with class names like this:
\v1\Monolog\Logger
\v2\Monolog\Logger
\v5\Google\Client
\v7\Google\Client
It feels like there's a lot of complexity in the package manager here - it's got to keep track of which versions of each package are installed, what they depend on, and decide what prefixes need to be used where. You also suggest that one version of each package is left with no prefix, which adds even more complexity.
The Googl\ApiClient of BobDocs is again, up to the autoloader. Assuming it
too is different (since it's using an older Monolog)
The biggest problem comes when this assumption doesn't hold. I actually chose these particular packages to illustrate this problem, then left it out of my previous message. It happens that the latest version of google/apiclient supports both monolog/monolog 2.9 and 3.0, so it's possible to have:
- AlicesCalendar wants to use google/apiclient 2.18 and monolog/monolog 2.9
- BobsDocs wants to use google/apiclient 2.18 and monolog/monolog 3.0
If the package manager is adding prefixes to individual package versions, we will have one class called \v2_18\Google\Client containing our familiar "new Logger" line. AlicesCalendar will expect that line to create a \v2_9\Monolog\Logger, but BobsDocs will expect it to create a \v3_0\Monolog\Logger. We can't please both of them without creating an extra copy of Google\Client with a different prefix.
So the version of an individual package isn't enough to decide the prefix, we need to know which set of packages it belongs to.
My suggestion uses a much simpler rule to define the prefix: if it's loaded "inside" AlicesCalendar, add the prefix "\AlicesCalendar". All the classes that are "inside" are completely sandboxed from the classes "outside", without needing any interaction with a package manager.
As far as I know, this is how existing userland solutions work, and I haven't yet spotted a reason why it needs to be any more complex than that.
Regards,
Rowan Tommins
[IMSoP]
Hi Michael,
I'm going to skip over all the details about the autoloader for now, because I think they're going deep into implementation details, and I want to focus on the same top-level design as my previous email.
Bobs docs needs an older version of Monolog and is configured appropriately
in its composer.json file, so ... v1 is prefixed to the
namespace declarations in Monolog\Logger and the file is included. The
engine aliases BobsDocs\Monolog\Logger to \v1\Monolog\Logger.If I'm following correctly, you suggest that we would end up with class names like this:
\v1\Monolog\Logger
\v2\Monolog\Logger
\v5\Google\Client
\v7\Google\ClientIt feels like there's a lot of complexity in the package manager here - it's got to keep track of which versions of each package are installed, what they depend on, and decide what prefixes need to be used where. You also suggest that one version of each package is left with no prefix, which adds even more complexity.
The Googl\ApiClient of BobDocs is again, up to the autoloader. Assuming it
too is different (since it's using an older Monolog)The biggest problem comes when this assumption doesn't hold. I actually chose these particular packages to illustrate this problem, then left it out of my previous message. It happens that the latest version of google/apiclient supports both monolog/monolog 2.9 and 3.0, so it's possible to have:
- AlicesCalendar wants to use google/apiclient 2.18 and monolog/monolog 2.9
- BobsDocs wants to use google/apiclient 2.18 and monolog/monolog 3.0
If the package manager is adding prefixes to individual package versions, we will have one class called \v2_18\Google\Client containing our familiar "new Logger" line. AlicesCalendar will expect that line to create a \v2_9\Monolog\Logger, but BobsDocs will expect it to create a \v3_0\Monolog\Logger. We can't please both of them without creating an extra copy of Google\Client with a different prefix.
So the version of an individual package isn't enough to decide the prefix, we need to know which set of packages it belongs to.
My suggestion uses a much simpler rule to define the prefix: if it's loaded "inside" AlicesCalendar, add the prefix "\AlicesCalendar". All the classes that are "inside" are completely sandboxed from the classes "outside", without needing any interaction with a package manager.
As far as I know, this is how existing userland solutions work, and I haven't yet spotted a reason why it needs to be any more complex than that.
Regards,
Rowan Tommins
[IMSoP]
My only concern is how this would be handled in the class tables. Right now, \AlicesCalendar\Monolog\Logger and \BobsDocs\Monolog\Logger would be considered entirely different types -- as in, not compatible. So if AlicesCalendar returns a type that BobsDocs expects, they won't be able to talk to each other.
So, this means we'd need a couple of different types of dependencies:
- "direct dependencies" that work in a containerized way
- "parent dependencies" that expect a parent to provide the dependency so it can interoperate between packages
I assume that it will be up to a dependency resolver (either composer or something else) will need to figure out which direct dependencies to "hoist" up and provide a compatible version between the two packages.
That then begs the question of whether this complication is needed at all? I can understand why having a 'containerized' package system is useful (in the case of WordPress or plugins in general), but I'm wondering if it is actually needed?
If we look at npm and yarn and how they handle this in the Javascript space, they basically install compatible packages when possible, and only 'contain' them when it would introduce an incompatibility.
I have some ideas here, but I need some time to think on it; but I also want to point out the problem to see if anyone else has any ideas.
— Rob
My only concern is how this would be handled in the class tables. Right now, \AlicesCalendar\Monolog\Logger and \BobsDocs\Monolog\Logger would be considered entirely different types -- as in, not compatible. So if AlicesCalendar returns a type that BobsDocs expects, they won't be able to talk to each other.
Once again, I'd like to use the Linux Container analogy: a process in one container never communicates directly with a process in another container. The process "thinks" it's running as normal, but is actually isolated inside a sandbox. The container then defines the inputs and outputs it wants to open between that sandbox and the host, and something running on the host can wire those up as necessary.
I assume that it will be up to a dependency resolver (either composer or something else) will need to figure out which direct dependencies to "hoist" up and provide a compatible version between the two packages.
I see this as the responsibility of each "container": if AlicesCalendar wants to use an un-sandboxed version of a PSR interface or a framework component, it declares that to the "host" (e.g. WordPress core). The PHP engine then knows to leave that interface name without a prefix. Any other class - whether it's written by Alice or installed by Composer - exists inside the sandbox, and gets a prefix.
Importantly, all of this should happen on the PHP symbol level (classes, interfaces, functions); the sandboxing mechanism doesn't need to know about package managers - just as Docker, Kunernetes, etc, don't know about APT / Yum / whatever Apine calls it.
Rowan Tommins
[IMSoP]
My only concern is how this would be handled in the class tables. Right now, \AlicesCalendar\Monolog\Logger and \BobsDocs\Monolog\Logger would be considered entirely different types -- as in, not compatible. So if AlicesCalendar returns a type that BobsDocs expects, they won't be able to talk to each other.
Once again, I'd like to use the Linux Container analogy: a process in one container never communicates directly with a process in another container. The process "thinks" it's running as normal, but is actually isolated inside a sandbox. The container then defines the inputs and outputs it wants to open between that sandbox and the host, and something running on the host can wire those up as necessary.
I assume that it will be up to a dependency resolver (either composer or something else) will need to figure out which direct dependencies to "hoist" up and provide a compatible version between the two packages.
I see this as the responsibility of each "container": if AlicesCalendar wants to use an un-sandboxed version of a PSR interface or a framework component, it declares that to the "host" (e.g. WordPress core). The PHP engine then knows to leave that interface name without a prefix. Any other class - whether it's written by Alice or installed by Composer - exists inside the sandbox, and gets a prefix.
Importantly, all of this should happen on the PHP symbol level (classes, interfaces, functions); the sandboxing mechanism doesn't need to know about package managers - just as Docker, Kunernetes, etc, don't know about APT / Yum / whatever Apine calls it.
Rowan Tommins
[IMSoP]
Yes, that aligns with what I was thinking too, for the most part.
Here are my thoughts, but first some vocabulary:
- direct dependency: a package that is used by the current package
- exported dependency: a direct dependency that can be used outside the current package
- peer dependency: an indirect dependency on another package that isn’t required to function but may offer additional functionality if installed. I have no idea how this would be defined or used yet.
- package: the physical artefact containing one or more modules.
Thinking back on several of my implementation explorations of nested classes, it should be possible to identify if a dependency/class is able to be used outside the package during compilation. So, I don’t think we need the user to explicitly state an exported dependency. In other words (and making up some syntax):
use module AlicesCalendar;
use module BobsDocs;
// AlicesCalendar needs to be "exposed" outside the package
public function doSomething(\AlicesCalendar\Week $week) { /* do stuff */ }
// BobsDocs remains an internal-only dependency
private function otherSomething(\BobsDocs\Doc $doc) { /* do stuff */ }
When compiling, we will see a public function "exposing" a direct dependency. Thus we would know that the current module would need to export the direct dependency on AlicesCalendar. This would prevent the developer from having to keep track of all the dependencies that need to be exported. From a developer’s point of view, they would use it like normal.
So, you could imagine a generated package manifest might look something like this:
{
"rootNamespace": "OfficeSuite"
"dependencies": {
"AlicesCalendar": "^1.0.0",
"BobsDocs": "^0.1.0"
},
"exports": {
"dependencies": ["AlicesCalendar"],
"functions": ["OfficeSuite\doSomething"],
"classes": []
}
}
A package manager (IDE, or even a human) could then surmise what packages they need to be shared between modules. It would also know that BobsDocs is entirely private to the module, so it doesn’t need to be compatible with any other module using BobsDocs.
Anyway, this is a bit into the weeds, but I want to point out what is possible or not, based on my experience working on nested classes. In other words, I’m 99% sure we can infer exported dependencies without requiring a human to manually do the work; what that actually looks like in practice is still very much in the air. So, please don’t take the above as an actual proposal, but as inspiration.
— Rob
Here are my thoughts, but first some vocabulary:
- direct dependency: a package that is used by the current package
- exported dependency: a direct dependency that can be used outside
the current package- peer dependency: an indirect dependency on another package that
isn’t required to function but may offer additional functionality if
installed. I have no idea how this would be defined or used yet.- package: the physical artefact containing one or more modules.
It seems my repeated suggestions of "container" or "sandbox" as the key
concept aren't being picked up. I'm still not sure if people disagree
with my reasoning, or just don't understand the distinction I'm trying
to make.
A key point I want to reiterate is that I think 99% of PHP applications
should not be using the feature we're talking about here. Most of the
time, having a package manager resolve a set of constraints to a set of
versions that are all mutually compatible is emphatically a good thing.
I also think that the "elephpant in the room" here is that any
implementation of this sandboxing is going to be imperfect, because it
just doesn't fit with PHP's nature. There are too many cases like
return ['\SomeNs\SomeClass', 'someMethod'];
where the compiler won't
know that a class name is being used, so won't know to
rewrite/alias/whatever to the sandboxed version.
It may actually be that anything we design on this list is doomed to be
worse than existing userland implementations, because userland
tools can make specific assumptions about things like the WordPress core
which we could never build into the engine.
Thinking back on several of my implementation explorations of nested
classes, it should be possible to identify if a dependency/class is
able to be used outside the package during compilation.
I think there are many types of "dependency" where you can't automate
the decision, e.g.
namespace MyPlugin;
class MyLogger implements \Psr\Log\LogInterface {
public function logWithBellsOn(\DingDong\BellInterface $bell,
string $message): \Duolog\Message {
return new \ExtraLoud\Message(new \Helpful\BellAdapter($bell),
$message);
}
}
Does that refer to a global \Psr\Log\LogInterface, or a sandboxed
\MyPlugin\Psr\Log\LogInterface? Is \ExtraLoud\Message a dependency which
should be imported/exported, or a hidden implementation detail? And so
on, for every class/interface mentioned.
Only the code's author can say which was intended in each case.
So, you could imagine a generated package manifest might look
something like this:{
"rootNamespace": "OfficeSuite"
"dependencies": {
"AlicesCalendar": "^1.0.0",
"BobsDocs": "^0.1.0"
},
"exports": {
"dependencies": ["AlicesCalendar"],
"functions": ["OfficeSuite\doSomething"],
"classes": []
}
}
Just to recap, the example wasn't of an office suite. AlicesCalendar and
BobsDocs were supposed to be two unrelated WordPress plugins, which both
happened to use Google's SDK. The only application that would know about
both of them is WordPress, and it wouldn't need to export anything.
The way I'm picturing it is more like this:
// The bootstrap file in AlicesCalendar will register a directory of
files as a "Container"
// Within this Container, class definitions and references are rewritten
to use a unique prefix
// The Container also has a separate autoloader stack
Container\register(
// This prefix is added to all names to make them unique
// e.g. \Monolog\Logger might become
\__Container\AlicesCalendar\Monolog\Logger
prefix: 'AlicesCalendar',
// Code in any file in these directories is considered to be
"inside" the container
directories: [
'/var/www/wordpress/wp-plugins/AlicesCalendar/src',
// This directory is probably populated by Composer, but the
Container logic doesn't care about that
'/var/www/wordpress/wp-plugins/AlicesCalendar/vendor',
],
// Classes that should be "imported" from outside the Container
// Classes matching these patterns will not be auto-prefixed
// If they are not defined before use, the "host" (e.g. WordPress
core) autoload stack will be called
import: [
// Classes inside the Container can implement the shared
definition of LoggerInterface
'\Psr\Log\LoggerInterface',
// Classes inside the Container can make use of this namespace
of classes defined outside the Container
'\WordPress\PluginTools\*'
],
// Classes that should be "exported" from inside the Container
// These will use the autoload stack inside the Container, but will
not be auto-prefixed
export: [
'\AlicesCalendar\PluginDefinition',
'\AlicesCalendar\Hooks\*'
],
);
// A completely unmodified Composer autoloader is loaded
// Because it's inside the Container, everything it registers will be on
a separate stack
require_once
'/var/www/wordpress/wp-plugins/AlicesCalendar/vendor/autoload.php';
// The plugin is registered to the application, which doesn't need to
know about the Container setup
wp_register_plugin('\AlicesCalendar\PluginDefinition');
// In a completely separate file, BobsDocs does all the same setup
// It lists its own imports and exports, and uses its own unique prefix
// Any relationship between the two plugins happens in the WordPress
Core code as usual
The guiding principle is that the code inside the container should need
as little modification as possible to be compatible, so that all the
code on packagist.org immediately becomes available to whatever plugin
wants it.
--
Rowan Tommins
[IMSoP]
I assume that it will be up to a dependency resolver (either composer or something else) will need to figure out which direct dependencies to "hoist" up and provide a compatible version between the two packages.
I see this as the responsibility of each "container": if AlicesCalendar
wants to use an un-sandboxed version of a PSR interface or a framework
component, it declares that to the "host" (e.g. WordPress core). The
PHP engine then knows to leave that interface name without a prefix.
Any other class - whether it's written by Alice or installed by
Composer - exists inside the sandbox, and gets a prefix.Importantly, all of this should happen on the PHP symbol level
(classes, interfaces, functions); the sandboxing mechanism doesn't need
to know about package managers - just as Docker, Kunernetes, etc, don't
know about APT / Yum / whatever Apine calls it.Rowan Tommins
[IMSoP]
This is where I'm not clear. Why wouldn't it need a concept of package/module/thing?
Even if we develop some way such that in Foo.php, loading the class \Beep\Boop\Narf pulls from /beep/boop/v1/Narf.php and loading it from Bar.php pulls the same class from /beep/boop/v2/Narf.php, and does something or other to keep the symbols separate... Narf itself is going to load \Beep\Boop\Poink at some point. So which one does it get? Or rather, there's now two Narfs. How do they know that the v1 version of Narf should get the v1 version of Poink and the v2 version should get the v2 version.
If there was a package/module/cluster/thing concept, then it would be easy enough (at least in concept) to extend whatever translation logic exists to the rest of that package/module/cluster/thing. Without that, however, I don't know how that transitive class usage would be addressed.
--Larry Garfield
Even if we develop some way such that in Foo.php, loading the class \Beep\Boop\Narf pulls from /beep/boop/v1/Narf.php and loading it from Bar.php pulls the same class from /beep/boop/v2/Narf.php, and does something or other to keep the symbols separate... Narf itself is going to load \Beep\Boop\Poink at some point. So which one does it get? Or rather, there's now two Narfs. How do they know that the v1 version of Narf should get the v1 version of Poink and the v2 version should get the v2 version.
The prefixing, in my mind, has nothing to do with versions. There is no
"v1" and "v2" directory, there are just two completely separate "vendor"
directories, with the same layout we have right now.
So it goes like this:
- Some code in wp-plugins/AlicesCalendar/vendor/Beep/Boop/Narf.php
mentions a class called \Beep\Boop\Poink - The Container mechanism has rewritten this to
__Container\AlicesCalendar\Beep\Boop\Poink, but that isn't defined yet - The isolated autoloader stack (loaded from
wp-plugins/AlicesCalendar/vendor/autoload.php) is asked for the original
name, \Beep\Boop\Poink - It includes the file
wp-plugins/AlicesCalendar/vendor/Beep/Boop/Poink.php which contains the
defintion of \Beep\Boop\Poink - The Container mechanism rewrites the class to
__Container\AlicesCalendar\Beep\Boop\Poink and carries on
When code in wp-plugins/BobsDocs/vendor/Beep/Boop/Narf.php mentions
\Beep\Boop\Poink, the same thing happens, but with a completely separate
sandbox: the rewritten class name is
__Container\BobsDocs\Beep\Boop\Poink, and the autoloader was loaded
from wp-plugins/BobsDocs/vendor/autoload.php
Unless explciitly specified as an "import" or "export", any reference to
any class name is prefixed in the same way, and loaded with the isolated
autoloader stack. To the host application, and any other plugins, the
code inside the "wp-plugins/AlicesCalendar/vendor" and
"wp-plugins/BobsDocs/vendor" directories is entirely hidden.
--
Rowan Tommins
[IMSoP]
Hey all,
It took me a while, but I'm finally caught up with this thread, and would like to give my 2 cents.
Even if we develop some way such that in Foo.php, loading the class \Beep\Boop\Narf pulls from /beep/boop/v1/Narf.php and loading it from Bar.php pulls the same class from /beep/boop/v2/Narf.php, and does something or other to keep the symbols separate... Narf itself is going to load \Beep\Boop\Poink at some point. So which one does it get? Or rather, there's now two Narfs. How do they know that the v1 version of Narf should get the v1 version of Poink and the v2 version should get the v2 version.
The prefixing, in my mind, has nothing to do with versions. There is no "v1" and "v2" directory, there are just two completely separate "vendor" directories, with the same layout we have right now.
So it goes like this:
- Some code in wp-plugins/AlicesCalendar/vendor/Beep/Boop/Narf.php mentions a class called \Beep\Boop\Poink
- The Container mechanism has rewritten this to __Container\AlicesCalendar\Beep\Boop\Poink, but that isn't defined yet
- The isolated autoloader stack (loaded from wp-plugins/AlicesCalendar/vendor/autoload.php) is asked for the original name, \Beep\Boop\Poink
- It includes the file wp-plugins/AlicesCalendar/vendor/Beep/Boop/Poink.php which contains the defintion of \Beep\Boop\Poink
- The Container mechanism rewrites the class to __Container\AlicesCalendar\Beep\Boop\Poink and carries on
When code in wp-plugins/BobsDocs/vendor/Beep/Boop/Narf.php mentions \Beep\Boop\Poink, the same thing happens, but with a completely separate sandbox: the rewritten class name is __Container\BobsDocs\Beep\Boop\Poink, and the autoloader was loaded from wp-plugins/BobsDocs/vendor/autoload.php
In this thread I see a lot of talking about Composer and autoloaders. But in essence those are just tools we use to include files into our current PHP process, so for the sake of simplicity (and compatibility), let's disregard all of that for a moment. Instead, please bear with me while we do a little gedankenexperiment...
First, imagine one gigantic PHP file, huge.php
, that contains all the PHP code that is included from all the libraries you need during a single PHP process lifecycle. That is in the crudest essence how PHP's include system currently works: files get included, those files declare symbols within the scope of the current process. If you were to copy-paste all the code you need (disregarding the declare()
statements) in one huge PHP file, you essentially get the same result.
So in our thought experiment we'll be doing just that. The only rule is that we copy all the code verbatim (again, disregarding the declare()
statements), because that's how PHP includes also work.
<?php
// ...
namespace Acme;
class Foo {}
namespace Acme;
class Bar
{
public readonly Foo $foo;
public function __construct()
{
// For some reason, there is a string reference here. Don't ask.
$fooClass = '\Acme\Foo';
$this->foo = new $fooClass();
}
}
namespace Spam;
use Acme\Bar;
class Ham extends Bar {}
namespace Spam;
use Acme\Bar;
class Bacon extends Bar {}
// ...
Now, the problem here is that if we copy-paste two different versions of the same class with the same FQN into our huge.php
file they will try to declare the same symbols which will cause a conflict. Let's say our Ham
depends on one version of Acme\Bar
and our Bacon
depends on another version of Acme\Bar
:
<?php
// ...
namespace Acme;
class Foo {}
namespace Acme;
class Bar {
public readonly Foo $foo;
public function __construct()
{
// For some reason, there is a string reference here. Don't ask.
$fooClass = '\Acme\Foo';
$this->foo = new $fooClass();
}
}
namespace Spam;
use Acme\Bar;
class Ham extends Bar {}
namespace Acme;
class Foo {} // Fatal error: Cannot declare class Foo, because the name is already in use
namespace Acme;
class Bar { // Fatal error: Cannot declare class Bar, because the name is already in use
public readonly Foo $foo;
public function __construct()
{
// For some reason, there is a string reference here. Don't ask.
$fooClass = '\Acme\Foo';
$this->foo = new $fooClass();
}
}
namespace Spam;
use Acme\Bar;
class Bacon extends Bar {}
// ...
So how do we solve this in a way that we can copy-paste the code from both versions of Acme\Foo
, verbatim into huge.php
?
Well, one way is to break the single rule we have created: modify the code. What if we just let the engine quietly rewrite the code? Well, then we quickly run into an issue. Any non-symbol references to classes are hard to detect and rewrite, so this would break:
<?php
// ...
namespace Acme;
class Foo {}
namespace Acme;
class Bar {
public readonly Foo $foo;
public function __construct()
{
// For some reason, there is a string reference here. Don't ask.
$fooClass = '\Acme\Foo';
$this->foo = new $fooClass();
}
}
namespace Spam;
use Acme\Bar;
class Ham extends Bar {}
namespace Spam\Bacon\Acme; // Quietly rewritten
class Foo {}
namespace Spam\Bacon\Acme; // Quietly rewritten
class Bar {
public readonly Foo $foo;
public function __construct()
{
// For some reason, there is a string reference here. Don't ask.
$fooClass = '\Acme\Foo'; // <== Whoops, missed this one!!!
$this->foo = new $fooClass();
}
}
namespace Spam;
use Spam\Bacon\Acme\Bar; // Quietly rewritten
class Bacon extends Bar {}
// ...
So let's just follow our rule for now. Now how do we include Foo and Bar twice? Well, let's try Rowan's approach of "containerizing." Let's take a very naive approach to what that syntax might look like. We simply copy-paste the code for our second version of Acme\Bar
into the scope of a container. For the moment let's assume that the container works like a "UnionFS" of sorts, where symbols declared inside the container override any symbols that may already exist outside the container:
<?php
// ...
namespace Acme;
class Foo {}
namespace Acme;
class Bar {
public readonly Foo $foo;
public function __construct()
{
// For some reason, there is a string reference here. Don't ask.
$fooClass = '\Acme\Foo';
$this->foo = new $fooClass();
}
}
namespace Spam;
use Acme\Bar;
class Ham extends Bar {}
container Bacon_Acme {
namespace Acme;
class Foo {}
namespace Acme;
class Bar {
public readonly Foo $foo;
public function __construct()
{
// For some reason, there is a string reference here. Don't ask.
$fooClass = '\Acme\Foo';
$this->foo = new $fooClass();
}
}
}
namespace Spam;
use Bacon_Acme\\Acme\Bar;
class Bacon extends Bar {}
// ...
That seems like it could work. As you can see, I've decided to use double backspace (\\
) to separate container and namespace in this example. You may wonder how this would look in the real world, where not all code is copy-pasted into a single huge.php
. Of course, part of this depends on the autoloader implementation, but let's start with a first step of abstraction by replacing the copy-pasted code with includes:
// ...
require_once '../vendor/acme/acme/Foo.php';
require_once '../vendor/acme/acme/Bar.php';
namespace Spam;
use Acme\Bar;
class Ham extends Bar {}
container Bacon_Acme {
require_once '../lib/acme/acme/Foo.php';
require_once '../lib/acme/acme/Bar.php';
}
namespace Spam;
use Bacon_Acme\\Acme\Bar;
class Bacon extends Bar {}
Now what if we want the autoloader to be able to resolve this? Well, once a class symbol is resolved with a container prefix, it would have to also perform all its includes inside the scope of that container.
function autoload($class_name) {
// Do autoloading shizzle.
}
`spl_autoload_register()`;
namespace Spam;
use Bacon_Acme\\Acme\Bar;
class Bacon extends Bar {}
// Meanwhile, behind the scenes, in the autoloader:
container Bacon_acme {
autoload(Acme\Bar::class);
}
Now this mail is already quite long enough, so I'm gonna wrap it up here and do some more brainstorming. But I hope I may have inspired some of you. All in all, I think my approach might actually work, although I haven't even considered what the implementation would even look like.
Again, the main point I want to make is to just disregard composer and the autoloader for now; those are just really fancy wrappers around import statements. Whatever solution we end up with would have to work independently of Composer and/or the autoloader.
Alwin
Hey all,
It took me a while, but I'm finally caught up with this thread, and would like to give my 2 cents.
Even if we develop some way such that in Foo.php, loading the class \Beep\Boop\Narf pulls from /beep/boop/v1/Narf.php and loading it from Bar.php pulls the same class from /beep/boop/v2/Narf.php, and does something or other to keep the symbols separate... Narf itself is going to load \Beep\Boop\Poink at some point. So which one does it get? Or rather, there's now two Narfs. How do they know that the v1 version of Narf should get the v1 version of Poink and the v2 version should get the v2 version.
The prefixing, in my mind, has nothing to do with versions. There is no "v1" and "v2" directory, there are just two completely separate "vendor" directories, with the same layout we have right now.
So it goes like this:
- Some code in wp-plugins/AlicesCalendar/vendor/Beep/Boop/Narf.php mentions a class called \Beep\Boop\Poink
- The Container mechanism has rewritten this to __Container\AlicesCalendar\Beep\Boop\Poink, but that isn't defined yet
- The isolated autoloader stack (loaded from wp-plugins/AlicesCalendar/vendor/autoload.php) is asked for the original name, \Beep\Boop\Poink
- It includes the file wp-plugins/AlicesCalendar/vendor/Beep/Boop/Poink.php which contains the defintion of \Beep\Boop\Poink
- The Container mechanism rewrites the class to __Container\AlicesCalendar\Beep\Boop\Poink and carries on
When code in wp-plugins/BobsDocs/vendor/Beep/Boop/Narf.php mentions \Beep\Boop\Poink, the same thing happens, but with a completely separate sandbox: the rewritten class name is __Container\BobsDocs\Beep\Boop\Poink, and the autoloader was loaded from wp-plugins/BobsDocs/vendor/autoload.php
In this thread I see a lot of talking about Composer and autoloaders. But in essence those are just tools we use to include files into our current PHP process, so for the sake of simplicity (and compatibility), let's disregard all of that for a moment. Instead, please bear with me while we do a little gedankenexperiment...
First, imagine one gigantic PHP file,
huge.php
, that contains all the PHP code that is included from all the libraries you need during a single PHP process lifecycle. That is in the crudest essence how PHP's include system currently works: files get included, those files declare symbols within the scope of the current process. If you were to copy-paste all the code you need (disregarding thedeclare()
statements) in one huge PHP file, you essentially get the same result.So in our thought experiment we'll be doing just that. The only rule is that we copy all the code verbatim (again, disregarding the
declare()
statements), because that's how PHP includes also work.<?php // ... namespace Acme; class Foo {} namespace Acme; class Bar { public readonly Foo $foo; public function __construct() { // For some reason, there is a string reference here. Don't ask. $fooClass = '\Acme\Foo'; $this->foo = new $fooClass(); } } namespace Spam; use Acme\Bar; class Ham extends Bar {} namespace Spam; use Acme\Bar; class Bacon extends Bar {} // ...
Now, the problem here is that if we copy-paste two different versions of the same class with the same FQN into our
huge.php
file they will try to declare the same symbols which will cause a conflict. Let's say ourHam
depends on one version ofAcme\Bar
and ourBacon
depends on another version ofAcme\Bar
:<?php // ... namespace Acme; class Foo {} namespace Acme; class Bar { public readonly Foo $foo; public function __construct() { // For some reason, there is a string reference here. Don't ask. $fooClass = '\Acme\Foo'; $this->foo = new $fooClass(); } } namespace Spam; use Acme\Bar; class Ham extends Bar {} namespace Acme; class Foo {} // Fatal error: Cannot declare class Foo, because the name is already in use namespace Acme; class Bar { // Fatal error: Cannot declare class Bar, because the name is already in use public readonly Foo $foo; public function __construct() { // For some reason, there is a string reference here. Don't ask. $fooClass = '\Acme\Foo'; $this->foo = new $fooClass(); } } namespace Spam; use Acme\Bar; class Bacon extends Bar {} // ...
So how do we solve this in a way that we can copy-paste the code from both versions of
Acme\Foo
, verbatim intohuge.php
?Well, one way is to break the single rule we have created: modify the code. What if we just let the engine quietly rewrite the code? Well, then we quickly run into an issue. Any non-symbol references to classes are hard to detect and rewrite, so this would break:
<?php // ... namespace Acme; class Foo {} namespace Acme; class Bar { public readonly Foo $foo; public function __construct() { // For some reason, there is a string reference here. Don't ask. $fooClass = '\Acme\Foo'; $this->foo = new $fooClass(); } } namespace Spam; use Acme\Bar; class Ham extends Bar {} namespace Spam\Bacon\Acme; // Quietly rewritten class Foo {} namespace Spam\Bacon\Acme; // Quietly rewritten class Bar { public readonly Foo $foo; public function __construct() { // For some reason, there is a string reference here. Don't ask. $fooClass = '\Acme\Foo'; // <== Whoops, missed this one!!! $this->foo = new $fooClass(); } } namespace Spam; use Spam\Bacon\Acme\Bar; // Quietly rewritten class Bacon extends Bar {} // ...
So let's just follow our rule for now. Now how do we include Foo and Bar twice? Well, let's try Rowan's approach of "containerizing." Let's take a very naive approach to what that syntax might look like. We simply copy-paste the code for our second version of
Acme\Bar
into the scope of a container. For the moment let's assume that the container works like a "UnionFS" of sorts, where symbols declared inside the container override any symbols that may already exist outside the container:<?php // ... namespace Acme; class Foo {} namespace Acme; class Bar { public readonly Foo $foo; public function __construct() { // For some reason, there is a string reference here. Don't ask. $fooClass = '\Acme\Foo'; $this->foo = new $fooClass(); } } namespace Spam; use Acme\Bar; class Ham extends Bar {} container Bacon_Acme { namespace Acme; class Foo {} namespace Acme; class Bar { public readonly Foo $foo; public function __construct() { // For some reason, there is a string reference here. Don't ask. $fooClass = '\Acme\Foo'; $this->foo = new $fooClass(); } } } namespace Spam; use Bacon_Acme\\Acme\Bar; class Bacon extends Bar {} // ...
That seems like it could work. As you can see, I've decided to use double backspace (
\\
) to separate container and namespace in this example. You may wonder how this would look in the real world, where not all code is copy-pasted into a singlehuge.php
. Of course, part of this depends on the autoloader implementation, but let's start with a first step of abstraction by replacing the copy-pasted code with includes:// ... require_once '../vendor/acme/acme/Foo.php'; require_once '../vendor/acme/acme/Bar.php'; namespace Spam; use Acme\Bar; class Ham extends Bar {} container Bacon_Acme { require_once '../lib/acme/acme/Foo.php'; require_once '../lib/acme/acme/Bar.php'; } namespace Spam; use Bacon_Acme\\Acme\Bar; class Bacon extends Bar {}
Now what if we want the autoloader to be able to resolve this? Well, once a class symbol is resolved with a container prefix, it would have to also perform all its includes inside the scope of that container.
function autoload($class_name) { // Do autoloading shizzle. } `spl_autoload_register()`; namespace Spam; use Bacon_Acme\\Acme\Bar; class Bacon extends Bar {} // Meanwhile, behind the scenes, in the autoloader: container Bacon_acme { autoload(Acme\Bar::class); }
Now this mail is already quite long enough, so I'm gonna wrap it up here and do some more brainstorming. But I hope I may have inspired some of you. All in all, I think my approach might actually work, although I haven't even considered what the implementation would even look like.
Again, the main point I want to make is to just disregard composer and the autoloader for now; those are just really fancy wrappers around import statements. Whatever solution we end up with would have to work independently of Composer and/or the autoloader.
Alwin
I’m starting to think that maybe modules might be a bad idea; or at least, class/module visibility.
As an anecdote, I was looking to extract a protobuf encoding library from a larger codebase and create a separate library for Larry’s Serde library. During the extraction I realized that many of the classes and functions I was relying on actually used @internal classes/functions. If “module” visibility were a thing… would my implementation have been possible?
In other words, if visibility comes with modules; there really needs to be some kind of escape hatch. Some way to say, “I know what I’m doing, so get out of my way.”
In Java, you can do this by creating a namespaced class in the target namespace. I’m not sure about other languages; but it’s something to think about.
— Rob
I’m starting to think that maybe modules might be a bad idea; or at least, class/module visibility.
As an anecdote, I was looking to extract a protobuf encoding library from a larger codebase and create a separate library for Larry’s Serde library. During the extraction I realized that many of the classes and functions I was relying on actually used @internal classes/functions. If “module” visibility were a thing… would my implementation have been possible?
In other words, if visibility comes with modules; there really needs to be some kind of escape hatch. Some way to say, “I know what I’m doing, so get out of my way.”
Isn't this exactly the same as any other "access control" feature?
We have private/protected methods and properties, final methods and classes, readonly properties; other languages also have sealed classes, module and/or file private, etc. All of these are ways for the author of the code to express how they intend it to be used, and to protect users against accidentally violating assumptions the code is relying on.
They are of course not security measures, as they can be bypassed via Reflection or just editing the source code.
If you're using someone else's code in ways they didn't intend, that's up to you, but you may need to make changes to do so, i.e. fork it rather than relying on the distributed version.
In your example, the author clearly marked that those classes were internal implementation details; if you use them directly and later update the library, you risk your code breaking either completely or subtly. If you copy them into your own codebase, you are free to remove the "@internal" annotations, or future "module private" declarations, and make whatever other changes are needed to suit your use case.
Regards,
Rowan Tommins
[IMSoP]
I’m starting to think that maybe modules might be a bad idea; or at least, class/module visibility.
As an anecdote, I was looking to extract a protobuf encoding library from a larger codebase and create a separate library for Larry’s Serde library. During the extraction I realized that many of the classes and functions I was relying on actually used @internal classes/functions. If “module” visibility were a thing… would my implementation have been possible?
In other words, if visibility comes with modules; there really needs to be some kind of escape hatch. Some way to say, “I know what I’m doing, so get out of my way.”
Isn't this exactly the same as any other "access control" feature?
We have private/protected methods and properties, final methods and classes, readonly properties; other languages also have sealed classes, module and/or file private, etc. All of these are ways for the author of the code to express how they intend it to be used, and to protect users against accidentally violating assumptions the code is relying on.
They are of course not security measures, as they can be bypassed via Reflection or just editing the source code.
If you're using someone else's code in ways they didn't intend, that's up to you, but you may need to make changes to do so, i.e. fork it rather than relying on the distributed version.
If the goal is to hint consumers of a library about the (lack of) guarantees regarding a method or its signature, then perhaps an #[\Internal]
attribute makes sense.
namespace Acme\Foo;
class Foo
{
#[\Internal('Acme')]
public function bar() { /* ... */ }
}
In the example above, I image calling or extending the Foo::bar()
method from somewhere outside the Acme
namespace would trigger an E_USER_WARNING
or E_USER_NOTICE. The warning/notice could then be suppressed when explicitly overriding an #[\Internal]
method with #[\Override]
.
Alwin
In the example above, I image calling or extending the
Foo::bar()
method from somewhere outside theAcme
namespace would trigger anE_USER_WARNING
or E_USER_NOTICE. The warning/notice could then be suppressed when explicitly overriding an#[\Internal]
method with#[\Override]
.
I don't see any reason for the message to be any quieter or easier to override than calling a private method.
Indeed, one use of an internal/module-private feature would be when the author wants to split up a class that has a large number of private methods: the new classes need to be able to talk to each other, so the existing "private" keyword is no longer appropriate, but the intended surface to users of the library has not changed. The user is making exactly the same decision by ignoring the "internal" flag as they are if they use reflection or code-rewriting to ignore/remove the "private" flag.
Rowan Tommins
[IMSoP]
In the example above, I image calling or extending the
Foo::bar()
method from somewhere outside theAcme
namespace would trigger anE_USER_WARNING
or E_USER_NOTICE. The warning/notice could then be suppressed when explicitly overriding an#[\Internal]
method with#[\Override]
.I don't see any reason for the message to be any quieter or easier to override than calling a private method.
If there were a dedicated internal
modifier keyword, sure. However if one simply wants to advertise clearly to other developers that they should expect the code to break anytime using an attribute, I feel a compile-time critical error is a bit too strict.
Indeed, one use of an internal/module-private feature would be when the author wants to split up a class that has a large number of private methods: the new classes need to be able to talk to each other, so the existing "private" keyword is no longer appropriate, but the intended surface to users of the library has not changed. The user is making exactly the same decision by ignoring the "internal" flag as they are if they use reflection or code-rewriting to ignore/remove the "private" flag.
I understand your point, but playing devil's advocate for a minute here: I'd argue that if a class reaches the point where it has too many private methods, something has clearly gone wrong in the architecture or abstraction of that class. It probably means part of the functionality needs to be hoisted out to a separate class with its own self-contained, stable interface.
One could also argue that if the author really wishes to break private methods up across multiple classes, they could also use reflection (or preferably: a method call wrapped in a closure bound to the target class) to access their own private methods.
My experience with module-level visibility out in the wild (mostly in Java), has mostly been libraries where the authors apparently couldn't be bothered to dedicate to a stable interface for more specific implementations of certain template-pattern adapters and the like – the Android source code is full of this.
[rant]
I specifically remember dealing with a DB adapter interface and accompanying abstract class, along with implementations for several DBMSs. There was an implementation for the specific DBMS I wanted to use, however, per the interface, it was completely tied into a separate pagination construction which led to a leaky abstraction, and was badly optimized for my use case. I just needed to override one or two methods to make it work for me, but the platform visibility on those methods meant I had to copy-paste both the abstract class and the adapter class to fix the functionality, giving me more code to maintain in the process.
I think this is a good example where package/module-level visibility makes developers complacent, and not care about offering a library that is easy to extend or encapsulate.
[/rant]
Alwin
Ok, the conversation is getting sidetracked, but I think some progress is
being made.
I started this latest iteration last year with a thread about introducing
something similar to the ES module system of JavaScript to PHP. What
attracts me to this particular model is that it should already be familiar
to the vast majority of PHP users. Prior to ES modules browsers had no
natural module import mechanic. Prior to ES modules all symbols were
attached to the window. You can see this if you serve open this index.html
from a server (Note that opening the file locally will result in the js
being blocked by modern browser security. )
<!DOCTYPE html>
<html>
<head>
<script>
var a = 1234
</script>
</head>
<body>
<script>
console.log(a)
console.log(window.a)
</script>
</body>
</html>
The above spits 1234 into the console twice. Second example - let's put a
module in.
<!DOCTYPE html>
<html>
<head>
<script>
var a = 1234
</script>
<script type="module">
const a = 5678
var b = 9123
</script>
</head>
<body>
<script>
console.log(a)
console.log(window.a)
console.log(b)
</script>
</body>
</html>
This outputs 1234 twice and an error is raised about b being undefined.
I bring the above up to demonstrate that is the desired behavior of what I
originally called a PHP module and have been bullied over and taken to task
about not understanding the meaning of "module". Rowain seems to be more
comfortable characterizing this as containers. If everyone is happy with
that term I really don't care - I just want a way to isolate a code block
so that whatever happens in there stays in there unless I explicitly export
it out, and the only way I see things in that scope is if I bring them in.
The other thing that was done with ES is that the syntax for the modules
was tightened. JavaScripters cannot dictate what browser a user chooses, so
the bad decisions of the early days of JS never really went away until ES
came along which enforced their strict mode by default. PHP has no such
strict mode - it has a strict types mode but that isn't the same thing.
There are multiple behaviors in PHP that can't go away because of backwards
compatibility problems, and one of those might indeed be how namespaces are
handled. In PHP a namespace is just a compile shortcut for resolving symbol
names. The namespace is prefixed to the start of every symbol within it.
Unlike Java or C#, PHP has no concept of namespace visibility. At the end
of the day it's a shortcut and its implementation happens entirely at
compile time.
Previously in the discussion Alwin Garside made a long but insightful post
on namespaces and their workings that I've been thinking on and trying to
digest for the last several days. What I've arrived at is the
discussions about composer and autoloaders are indeed a red herring to the
discussion. At the end of the day, PHP's include statements are a means to
separate the php process into multiple files. In his email he explored some
of the rewriting that could be done, and myself and Rowain have also
explored this in the form of namespace pathing and aliasing.
We've gotten away from the original focus of containing this code and how
that would work. So once again this moron is going to take a stab at it.
Container modules are created with require_module('file/path'). All code
that executes as a result of this call is isolated to its container. That
includes the results of any require or include calls made by the module
file itself or any file it requires.
Since the module file is cordoned off to its own container from the rest of
the application whatever namespaces it uses are irrelevant to outside code.
Any symbols created in the module will not be established in the script
that made the require_module() call. Since it is coming into being with a
new require mechanism it could be subjected to more efficient parsing rules
if that is desired, but that's a massive can of worms for later discussion.
One of those will be necessary - it will need to return something to the
php code that called it. The simplest way to go about this is to just
require that it have a return. So...
$myModule = require_module('file/path');
or perhaps
const myModule = require_module('file/path');
The module probably should return a static class or class instance, but it
could return a closure. In JavaScript the dynamic import() statement
returns a module object that is most similar to PHP's static classes, with
each export being a member or method of the module object.
Circling back to a question I know will be asked - what about autoloaders?
To which I answer, what about them? If the module wants to use an
autoloader it has to require one just as the initial php file that required
it had to have done at some point. The container module is for all intents
and purposes its own php process that returns some interface to allow it to
talk to the process that spawned it.
Will this work? I think yes. Will it be efficient? Hell no. Can it be
optimized somehow? I don't know.
Ok, the conversation is getting sidetracked, but I think some progress is being made.
I started this latest iteration last year with a thread about introducing something similar to the ES module system of JavaScript to PHP. What attracts me to this particular model is that it should already be familiar to the vast majority of PHP users. Prior to ES modules browsers had no natural module import mechanic. Prior to ES modules all symbols were attached to the window. You can see this if you serve open this index.html from a server (Note that opening the file locally will result in the js being blocked by modern browser security. )
<!DOCTYPE html> <html> <head> <script> var a = 1234 </script> </head> <body> <script> console.log(a) console.log(window.a) </script> </body> </html>
The above spits 1234 into the console twice. Second example - let's put a module in.
<!DOCTYPE html> <html> <head> <script> var a = 1234 </script> <script type="module"> const a = 5678 var b = 9123 </script> </head> <body> <script> console.log(a) console.log(window.a) console.log(b) </script> </body> </html>
This outputs 1234 twice and an error is raised about b being undefined.
I bring the above up to demonstrate that is the desired behavior of what I originally called a PHP module and have been bullied over and taken to task about not understanding the meaning of "module". Rowain seems to be more comfortable characterizing this as containers. If everyone is happy with that term I really don't care - I just want a way to isolate a code block so that whatever happens in there stays in there unless I explicitly export it out, and the only way I see things in that scope is if I bring them in.
The other thing that was done with ES is that the syntax for the modules was tightened. JavaScripters cannot dictate what browser a user chooses, so the bad decisions of the early days of JS never really went away until ES came along which enforced their strict mode by default. PHP has no such strict mode - it has a strict types mode but that isn't the same thing. There are multiple behaviors in PHP that can't go away because of backwards compatibility problems, and one of those might indeed be how namespaces are handled. In PHP a namespace is just a compile shortcut for resolving symbol names. The namespace is prefixed to the start of every symbol within it. Unlike Java or C#, PHP has no concept of namespace visibility. At the end of the day it's a shortcut and its implementation happens entirely at compile time.
Previously in the discussion Alwin Garside made a long but insightful post on namespaces and their workings that I've been thinking on and trying to digest for the last several days. What I've arrived at is the discussions about composer and autoloaders are indeed a red herring to the discussion. At the end of the day, PHP's include statements are a means to separate the php process into multiple files. In his email he explored some of the rewriting that could be done, and myself and Rowain have also explored this in the form of namespace pathing and aliasing.
We've gotten away from the original focus of containing this code and how that would work. So once again this moron is going to take a stab at it.
Container modules are created with require_module('file/path'). All code that executes as a result of this call is isolated to its container. That includes the results of any require or include calls made by the module file itself or any file it requires.
Since the module file is cordoned off to its own container from the rest of the application whatever namespaces it uses are irrelevant to outside code. Any symbols created in the module will not be established in the script that made the require_module() call. Since it is coming into being with a new require mechanism it could be subjected to more efficient parsing rules if that is desired, but that's a massive can of worms for later discussion. One of those will be necessary - it will need to return something to the php code that called it. The simplest way to go about this is to just require that it have a return. So...
$myModule = require_module('file/path');
or perhaps
const myModule = require_module('file/path');
The module probably should return a static class or class instance, but it could return a closure. In JavaScript the dynamic import() statement returns a module object that is most similar to PHP's static classes, with each export being a member or method of the module object.
Circling back to a question I know will be asked - what about autoloaders? To which I answer, what about them? If the module wants to use an autoloader it has to require one just as the initial php file that required it had to have done at some point. The container module is for all intents and purposes its own php process that returns some interface to allow it to talk to the process that spawned it.
Will this work? I think yes. Will it be efficient? Hell no. Can it be optimized somehow? I don't know.
This could work! I have a couple of critiques, but they aren’t negative:
I think I like it. It might be worth pointing out that JavaScript "hoists" the imports to file-level during compilation — even if you have the import statement buried deep in a function call. Or, at least it used to. I haven’t kept track of the language that well in the last 10 years, so I wouldn’t be surprised if it changed; or didn’t. I don’t think this is something we need to worry about too much here.
It’s also worth pointing out that when PHP compiles a file, every file has either an explicit or implicit return. https://www.php.net/manual/en/function.include.php#:~:text=Handling%20Returns%3A,from%20included%20files.
So, in other words, what is it about require_module that is different from require
or include
? Personally, I would then change PHP from "compile file" mode when parsing the file to "compile module" mode. From a totally naive point-of-view, this would cause PHP to:
- if we already have a module from that file; return the module instead of compiling it again.
- swap out symbol tables to the module’s symbol table.
- start compiling the given file.
- concatenate all files as included/required.
- compile the resulting huge file.
- switch back to the calling symbol table (which may be another module).
- return the module.
For a v1, I wouldn’t allow autoloading from inside a module — or any autoloaded code automatically isn’t considered to be part of the module (it would be the responsibility of the main program to handle autoloading). This is probably something that needs to be solved, but I think it would need a whole new approach to autoloading which should be out of scope for the module RFC (IMHO).
In other words, you can simply include/require a module to load the entire module into your current symbol table; or use require_module to "contain" it.
As for what should a module return? I like your idea of just returning an object or closure.
— Rob
Ok, the conversation is getting sidetracked, but I think some progress is being made.
I started this latest iteration last year with a thread about introducing something similar to the ES module system of JavaScript to PHP. What attracts me to this particular model is that it should already be familiar to the vast majority of PHP users. Prior to ES modules browsers had no natural module import mechanic. Prior to ES modules all symbols were attached to the window. You can see this if you serve open this index.html from a server (Note that opening the file locally will result in the js being blocked by modern browser security. )
<!DOCTYPE html> <html> <head> <script> var a = 1234 </script> </head> <body> <script> console.log(a) console.log(window.a) </script> </body> </html>
The above spits 1234 into the console twice. Second example - let's put a module in.
<!DOCTYPE html> <html> <head> <script> var a = 1234 </script> <script type="module"> const a = 5678 var b = 9123 </script> </head> <body> <script> console.log(a) console.log(window.a) console.log(b) </script> </body> </html>
This outputs 1234 twice and an error is raised about b being undefined.
I bring the above up to demonstrate that is the desired behavior of what I originally called a PHP module and have been bullied over and taken to task about not understanding the meaning of "module". Rowain seems to be more comfortable characterizing this as containers. If everyone is happy with that term I really don't care - I just want a way to isolate a code block so that whatever happens in there stays in there unless I explicitly export it out, and the only way I see things in that scope is if I bring them in.
The other thing that was done with ES is that the syntax for the modules was tightened. JavaScripters cannot dictate what browser a user chooses, so the bad decisions of the early days of JS never really went away until ES came along which enforced their strict mode by default. PHP has no such strict mode - it has a strict types mode but that isn't the same thing. There are multiple behaviors in PHP that can't go away because of backwards compatibility problems, and one of those might indeed be how namespaces are handled. In PHP a namespace is just a compile shortcut for resolving symbol names. The namespace is prefixed to the start of every symbol within it. Unlike Java or C#, PHP has no concept of namespace visibility. At the end of the day it's a shortcut and its implementation happens entirely at compile time.
Previously in the discussion Alwin Garside made a long but insightful post on namespaces and their workings that I've been thinking on and trying to digest for the last several days. What I've arrived at is the discussions about composer and autoloaders are indeed a red herring to the discussion. At the end of the day, PHP's include statements are a means to separate the php process into multiple files. In his email he explored some of the rewriting that could be done, and myself and Rowain have also explored this in the form of namespace pathing and aliasing.
We've gotten away from the original focus of containing this code and how that would work. So once again this moron is going to take a stab at it.
Container modules are created with require_module('file/path'). All code that executes as a result of this call is isolated to its container. That includes the results of any require or include calls made by the module file itself or any file it requires.
Since the module file is cordoned off to its own container from the rest of the application whatever namespaces it uses are irrelevant to outside code. Any symbols created in the module will not be established in the script that made the require_module() call. Since it is coming into being with a new require mechanism it could be subjected to more efficient parsing rules if that is desired, but that's a massive can of worms for later discussion. One of those will be necessary - it will need to return something to the php code that called it. The simplest way to go about this is to just require that it have a return. So...
$myModule = require_module('file/path');
or perhaps
const myModule = require_module('file/path');
The module probably should return a static class or class instance, but it could return a closure. In JavaScript the dynamic import() statement returns a module object that is most similar to PHP's static classes, with each export being a member or method of the module object.
Circling back to a question I know will be asked - what about autoloaders? To which I answer, what about them? If the module wants to use an autoloader it has to require one just as the initial php file that required it had to have done at some point. The container module is for all intents and purposes its own php process that returns some interface to allow it to talk to the process that spawned it.
Will this work? I think yes. Will it be efficient? Hell no. Can it be optimized somehow? I don't know.
This could work! I have a couple of critiques, but they aren’t negative:
I think I like it. It might be worth pointing out that JavaScript "hoists" the imports to file-level during compilation — even if you have the import statement buried deep in a function call. Or, at least it used to. I haven’t kept track of the language that well in the last 10 years, so I wouldn’t be surprised if it changed; or didn’t. I don’t think this is something we need to worry about too much here.
It’s also worth pointing out that when PHP compiles a file, every file has either an explicit or implicit return. https://www.php.net/manual/en/function.include.php#:~:text=Handling%20Returns%3A,from%20included%20files.
So, in other words, what is it about require_module that is different from
require
orinclude
? Personally, I would then change PHP from "compile file" mode when parsing the file to "compile module" mode. From a totally naive point-of-view, this would cause PHP to:
- if we already have a module from that file; return the module instead of compiling it again.
- swap out symbol tables to the module’s symbol table.
- start compiling the given file.
- concatenate all files as included/required.
- compile the resulting huge file.
- switch back to the calling symbol table (which may be another module).
- return the module.
For a v1, I wouldn’t allow autoloading from inside a module — or any autoloaded code automatically isn’t considered to be part of the module (it would be the responsibility of the main program to handle autoloading). This is probably something that needs to be solved, but I think it would need a whole new approach to autoloading which should be out of scope for the module RFC (IMHO).In other words, you can simply include/require a module to load the entire module into your current symbol table; or use require_module to "contain" it.
As for what should a module return? I like your idea of just returning an object or closure.
— Rob
I just had another thought; sorry about the back-to-back emails. This wouldn’t preclude something like composer (or something else) from being used to handle dependencies, it would just mean that the package manager might export a "Modules" class + constants — we could also write a composer plugin that does just this:
require_once 'vendor/autoload.php';
$module = require_module Vendor\Module::MyModule;
where Vendor\Module is a generated and autoloaded class containing consts to the path of the exported module.
— Rob
This could work! I have a couple of critiques, but they aren’t negative:
I think I like it. It might be worth pointing out that JavaScript "hoists"
the imports to file-level during compilation — even if you have the import
statement buried deep in a function call. Or, at least it used to. I
haven’t kept track of the language that well in the last 10 years, so I
wouldn’t be surprised if it changed; or didn’t. I don’t think this is
something we need to worry about too much here.
As I pointed out in detail to Rob off list JavaScript has 2 import
mechanisms that are subtly different from each other. Those interested can
read here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import
It’s also worth pointing out that when PHP compiles a file, every file has
either an explicit or implicit return.
https://www.php.net/manual/en/function.include.php#:~:text=Handling%20Returns%3A,from%20included%20files
.
True, but it's a rarely used mechanism and a return statement isn't
required - hence an implicit return is possible.
So, in other words, what is it about require_module that is different from
require
orinclude
? Personally, I would then change PHP from "compile
file" mode when parsing the file to "compile module" mode. From a totally
naive point-of-view, this would cause PHP to:
- if we already have a module from that file; return the module
instead of compiling it again.- swap out symbol tables to the module’s symbol table.
- start compiling the given file.
- concatenate all files as included/required.
- compile the resulting huge file.
- switch back to the calling symbol table (which may be another
module).- return the module.
For a v1, I wouldn’t allow autoloading from inside a module — or any
autoloaded code automatically isn’t considered to be part of the module (it
would be the responsibility of the main program to handle autoloading).
This is probably something that needs to be solved, but I think it would
need a whole new approach to autoloading which should be out of scope for
the module RFC (IMHO).In other words, you can simply include/require a module to load the entire
module into your current symbol table; or use require_module to "contain"
it.
Yes, that is certainly possible but comes at an opportunity cost from an
engine design standpoint. This is an exceedingly rare opportunity to go
back and fix mistakes that have dogged the language for sometime that
simply can't be fixed without creating large backwards compatibility
problems. For instance, say for the sake of example that PHP files could be
compiled 10 times faster if the object parser could assume the whole file
was code and there's not going to be any <?php ?> tags or Heredoc or Nowdoc
blocks. It might be worth it then to have modules not allow such, and if
an author had a block of code that they really wanted this templating
behavior to apply to they still could issue a require. Maybe it's time to
dig up some downvoted RFC's that got killed for these reasons.
As for what should a module return? I like your idea of just returning an
object or closure.
I'm leaning towards something akin to a static class. If modules have
export keywords (note that if they have their own parser they can also have
keywords without disrupting the existing PHP ecosystem) then compiling one
would be a static class with members and methods matching what was exported.
I just had another thought; sorry about the back-to-back emails. This
wouldn’t preclude something like composer (or something else) from being
used to handle dependencies, it would just mean that the package manager
might export a "Modules" class + constants — we could also write a composer
plugin that does just this:
require_once 'vendor/autoload.php';
$module = require_module Vendor\Module::MyModule;
where Vendor\Module is a generated and autoloaded class containing consts
to the path of the exported module.
That leads into some thoughts I have on loading modules in general.
require_module is the simplest expression. That said, 'use module' might
also be appropriate.
require_once 'vendor/autoload.php';
use module Vendor/Module as MyModule;
Something like that?
JavaScript has a distinction between global scope import, which is almost
analogous to our namespaces and use statement, and dynamic scope import,
which is almost analogous to our require statements. Don't know if it's
useful to us to draw such a distinction.
$myModule = require_module('file/path');
or perhaps
const myModule = require_module('file/path');
The module probably should return a static class or class instance, but
it could return a closure. In JavaScript the dynamic import()
statement returns a module object that is most similar to PHP's static
classes, with each export being a member or method of the module object.Circling back to a question I know will be asked - what about
autoloaders? To which I answer, what about them? If the module wants
to use an autoloader it has to require one just as the initial php file
that required it had to have done at some point. The container module
is for all intents and purposes its own php process that returns some
interface to allow it to talk to the process that spawned it.Will this work? I think yes. Will it be efficient? Hell no. Can it be
optimized somehow? I don't know.
I think there's a key assumption here still that is at the root of much of the disagreement in this thread.
Given that code from multiple files is clustered together into a "thing"
and Given we can use that "thing" to define a boundary for:
- name resolution (what Michael is after);
- visibility (what I am after);
- more efficient optimizations (what Arnaud showed is possible);
- various other things
Then the key question is: Who defines that boundary?
Is it the code author that defines that boundary? Or is it the code consumer?
Similarly, is it the code author or consumer that has to Do Work(tm) in order to leverage the desired capability? Or both?
This is an abstract question that I think needs to be resolved before we go any further. There are certainly ways to do it with either party in control of the boundary, but I suspect many of them will be mutually-exclusive, so deciding which tradeoffs we want and what future features we're OK with blocking is highly important.
My own take:
The boundary must be definable by the author. The author knows the code better than the consumer. The odds of the author botching the boundary and making subtle bugs is orders of magnitude less than the consumer of the code botching the boundary. (Eg, if a class is declared module-private, but it's the consumer that defines what module it is in, then access to that class is completely out of the control of the author and it's really easy for some code to break.) Potentially we could allow the consumer to decide how they want to leverage that boundary (either by just using the code as is normally now, or wrapping it into a name resolution container), but the boundary itself needs to be author-defined, not consumer defined, or things will break.
I realize that makes it less useful for the goal of "support old and unmaintained WordPress plugins that haven't been updated in 3 years" (as it will be about 15 years before WP plugins that have bothered to make modules/containers/boundaries get abandoned), but my priority is the consistency and reliability of the language mroeso than supporting negligent maintainers.
One possible idea: Starting from the proposal Arnaud and I made earlier (see earlier posts), have a Module.php file rather than module.ini, which defines a class that specifies the files to include/exclude etc. Then in addition to the "just use as it is" usage pattern we described, the consumer could also run something like:
$container = require_modules(['foo/Module.php', 'bar/Module.php'], containerize: true);
Which would give back an object/class/thing through which all the code that was just loaded is accessed, creating a separate loading space that is build along the boundaries established by the module/package authors already. (Note: This still relies on all of those packages being modularized by their authors, but again, I think that is a requirement.)
--Larry Garfield
I think there's a key assumption here still that is at the root of much of the disagreement in this thread.
Given that code from multiple files is clustered together into a "thing"
and Given we can use that "thing" to define a boundary for:
- name resolution (what Michael is after);
- visibility (what I am after);
- more efficient optimizations (what Arnaud showed is possible);
- various other things
Then the key question is: Who defines that boundary?Is it the codeauthor that defines that boundary? Or is it the codeconsumer?
Similarly, is it the code author or consumer that has to Do Work(tm) in order to leverage the desired capability? Or both?
My take on this is that both need to exist as separate features.
The author should define the boundary for things like visibility and
optimisation. It should be possible to take an existing Composer
package, add some metadata to it in some way, and have the engine make
some useful assumptions. The consumer of that package should not need to
care about the difference, except in edge cases like trying to define
additional classes with the same prefix as a third-party package.
On the other hand, the consumer should define the boundary for isolating
name resolution. It should be possible to take any folder of PHP files,
with no metadata about package management, and load it in a context
where duplicate class names are allowed. The author of the package
shouldn't need to make any changes, except in edge cases like highly
dynamic code which needs additional hints to work correctly when sandboxed.
The first feature is an add-on to the extremely successful ecosystem
based on the assumption that packages will inter-operate based on agreed
names. The second feature is a bridge between that ecosystem and the
very different world of plugin-based web applications, which want to
manage multiple pieces of code which are not designed to co-operate, and
run them side by side.
JS has come up a few times as a comparison, because there the two
features overlap heavily. That's because the language has always started
from the opposite assumption to PHP: declarations in JS are local by
default, and can only be referenced outside of the current function
scope if explicitly passed outwards in some way. In PHP - and Java, C#,
and many others - the opposite is true, and declarations have a global
name by default; keywords like "private" and "internal" are then used to
indicate that although code can name something, it's not allowed to use it.
Both have their strengths and weaknesses, but at this point every JS
module going back 15+ years (CommonJS was founded in 2009, to
standardise existing practices) is based on the "interact by export"
model; and every PHP package going back 25+ years (PEAR founded in 1999;
Composer in 2011) is based on the "interact by name" model.
--
Rowan Tommins
[IMSoP]
I think there's a key assumption here still that is at the root of much of the disagreement in this thread.
Given that code from multiple files is clustered together into a "thing"
and Given we can use that "thing" to define a boundary for:
- name resolution (what Michael is after);
- visibility (what I am after);
- more efficient optimizations (what Arnaud showed is possible);
- various other things
Then the key question is: Who defines that boundary?Is it the codeauthor that defines that boundary? Or is it the codeconsumer?
Similarly, is it the code author or consumer that has to Do Work(tm) in order to leverage the desired capability? Or both?
My take on this is that both need to exist as separate features.
The author should define the boundary for things like visibility and
optimisation. It should be possible to take an existing Composer
package, add some metadata to it in some way, and have the engine make
some useful assumptions. The consumer of that package should not need to
care about the difference, except in edge cases like trying to define
additional classes with the same prefix as a third-party package.On the other hand, the consumer should define the boundary for isolating
name resolution. It should be possible to take any folder of PHP files,
with no metadata about package management, and load it in a context
where duplicate class names are allowed. The author of the package
shouldn't need to make any changes, except in edge cases like highly
dynamic code which needs additional hints to work correctly when sandboxed.
Were we to do that, then the consumer container-loading needs to take any potential module-definition into account. Eg, if one class from a module is pulled into a container, all of them must be.
Though I'm still not clear how transitive dependencies get handled either way. Crell/Serde depends on Crell/AttributeUtils, which depends on Crell/fp. If someone wants to containerize, say, the ObjectImporter from Serde, will that necessarily mean containerizing Serde, AttributeUtils, and fp? If so, how will it know to include all of those, but not include Crell/Config (which uses Serde)?
And while I know we keep trying to get away from talking about Composer and autoloading, for any of this to work, Composer would need to be modified to allow downloading multiple versions of a package at the same time and keeping them separate on disk. I do not know what that could look like.
--Larry Garfield
Were we to do that, then the consumer container-loading needs to take any potential module-definition into account. Eg, if one class from a module is pulled into a container, all of them must be.
You wouldn't containerize "something from a library", any more than you containerize "part of Nginx". You create a container, and put a bunch of stuff in it that doesn't know it's running in a container. A Linux container doesn't know that Nginx requires a bunch of shared libraries and daemons, it just creates a file system and a process tree, and lets you do what you like with them.
Let's say I'm writing a WordPress plugin. It's just a bunch of files on disk, some of which I've written, some of which I've obtained from open source projects. Maybe there's a giant file with lots of classes in, a vendor directory I've generated using Composer, some Phar files, and some fancy modules with metadata files. Maybe I distribute an install script that fetches updated versions of all those things; maybe I just stick the whole thing in a tar file and host it on my website.
I want to have WordPress load all that code as part of my plugin, and not complain that somewhere in there I've called a class Monolog\Logger, and that name is already used.
I don't need WordPress, or PHP, to know whether that class is "really" a version of Monolog, or how it ended up in the folder. And I don't need twenty different containers for all the different things in the folder. I just need to put that whole folder into a single container, to separate it from someone else's plugin.
The container somehow creates a new namespace root, like a Linux container creates a new file system root. The code inside uses require, autoloading, module definitions, etc etc, but can't escape the container.
Then in some way, I define what's allowed to cross the boundary between the main application and this container - e.g. what parts of the WordPress API need to be visible inside the container, and what parts of the container need to be called back from WordPress.
And that, if it's possible at all, is the plugin use case sorted. No changes to Composer, no need to rewrite every single PHP package ever written. Probably some caveats where dynamic code can accidentally escape the container. Completely separate from the kind of "module" you and Arnaud were experimenting with.
Rowan Tommins
[IMSoP]
Were we to do that, then the consumer container-loading needs to take any potential module-definition into account. Eg, if one class from a module is pulled into a container, all of them must be.
You wouldn't containerize "something from a library", any more than you
containerize "part of Nginx". You create a container, and put a bunch
of stuff in it that doesn't know it's running in a container. A Linux
container doesn't know that Nginx requires a bunch of shared libraries
and daemons, it just creates a file system and a process tree, and lets
you do what you like with them.Let's say I'm writing a WordPress plugin. It's just a bunch of files on
disk, some of which I've written, some of which I've obtained from open
source projects. Maybe there's a giant file with lots of classes in, a
vendor directory I've generated using Composer, some Phar files, and
some fancy modules with metadata files. Maybe I distribute an install
script that fetches updated versions of all those things; maybe I just
stick the whole thing in a tar file and host it on my website.I want to have WordPress load all that code as part of my plugin, and
not complain that somewhere in there I've called a class
Monolog\Logger, and that name is already used.I don't need WordPress, or PHP, to know whether that class is "really"
a version of Monolog, or how it ended up in the folder. And I don't
need twenty different containers for all the different things in the
folder. I just need to put that whole folder into a single container,
to separate it from someone else's plugin.The container somehow creates a new namespace root, like a Linux
container creates a new file system root. The code inside uses require,
autoloading, module definitions, etc etc, but can't escape the
container.Then in some way, I define what's allowed to cross the boundary between
the main application and this container - e.g. what parts of the
WordPress API need to be visible inside the container, and what parts
of the container need to be called back from WordPress.And that, if it's possible at all, is the plugin use case sorted. No
changes to Composer, no need to rewrite every single PHP package ever
written. Probably some caveats where dynamic code can accidentally
escape the container. Completely separate from the kind of "module" you
and Arnaud were experimenting with.
Well, now you're talking about something with a totally separate compile step, which is not what Michael seemed to be describing at all. But it seems like that would be necessary. At which point, we're basically talking about "load this Phar file into a custom internalized namespace", which, from my limited knowledge of Phar, seems like the most logical way to do it. That also sidesteps all the loading and linking shenanigans.
Doing it that way, as a Phar-loading-wrapper, is probably the most likely to actually be viable. I'm still not sure I'd support it, but that seems the only viable option so far proposed.
--Larry Garfield
Well, now you're talking about something with a totally separate compile step, which is not what Michael seemed to be describing at all. But it seems like that would be necessary.
There's definitely some crossed wires somewhere. I deliberately left the
mechanics vague in that last message, and certainly didn't mention any
specific compiler steps. I'm a bit lost which part you think is "not
what Michael seemed to be describing".
Picking completely at random, a file in Monolog has these lines in:
namespace Monolog\Handler;
...
use Monolog\Utils;
...
class StreamHandler extends AbstractProcessingHandler {
...
$this->url = Utils::canonicalizePath($stream);
My understanding is that our goal is to allow two slightly different
copies of that file to be included at the same time. As far as I know,
there have been two descriptions of how that would work:
a) Before or during compilation, every reference is automatically
prefixed, so that the class is declared as
"__SomeMagicPrefix\Monolog\Handler\StreamHandler", and the reference to
"\Monolog\Utils" is replaced by a reference to
"__SomeMagicPrefix\Monolog\Utils". There are existing userland
implementations that take this approach.
b) While the class is being compiled, PHP swaps out the entire symbol
table, so that the class is still called
"\Monolog\Handler\StreamHandler", and the reference to "\Monolog\Utils"
is to the class of that name in the current symbol table. In a different
symbol table, both names refer to separately compiled classes.
The "new namespace root" in my last message is either (a) the special
prefix, or (b) the actual root of the new symbol table. In either case,
you need to decide which classes to declare under that root; either
recursively tracking what requires what, or just where on disk the file
was loaded from.
Even if we're willing to require the authors of Monolog to rewrite their
library for the convenience of WordPress plugin authors, I don't see how
we can get away from every class in PHP being fundamentally identified
by name, and the compiler needing to manage those names somehow.
We can imagine a parallel universe where PHP declarations worked like JS
or Python:
import * from Monolog\Handler;
...
$Utils = import Monolog\Utils;
...
$StreamHandler = class extends $AbstractProcessingHandler {
...
$this->url = $Utils::canonicalizePath($stream);
But at that point, we're just inventing a new programming language.
At which point, we're basically talking about "load this Phar file into a custom internalized namespace", which, from my limited knowledge of Phar, seems like the most logical way to do it. That also sidesteps all the loading and linking shenanigans.
I don't think Phar files would particularly help. As far as I know,
they're just a file system wrapper; you still have to include/require
the individual files inside the archive, and they're still compiled in
exactly the same way.
Whether we want to isolate "any definition you find in the directory
/var/www/wordpress/wp-plugins/foo/" or "any definition you find in the
Phar archive phar:///var/www/wordpress/wp-plugins/foo.phar", the tricky
part is how to do the actual isolating.
--
Rowan Tommins
[IMSoP]
Well, now you're talking about something with a totally separate compile step, which is not what Michael seemed to be describing at all. But it seems like that would be necessary.
There's definitely some crossed wires somewhere. I deliberately left
the mechanics vague in that last message, and certainly didn't mention
any specific compiler steps. I'm a bit lost which part you think is
"not what Michael seemed to be describing".Picking completely at random, a file in Monolog has these lines in:
namespace Monolog\Handler;
...
use Monolog\Utils;
...
class StreamHandler extends AbstractProcessingHandler {
...
$this->url = Utils::canonicalizePath($stream);My understanding is that our goal is to allow two slightly different
copies of that file to be included at the same time. As far as I know,
there have been two descriptions of how that would work:
This is what I was getting at. As I understand what Michael's examples have described, it allows pulling a different version of one or more files into some kind of container/special namespace/thingiewhatsit, at runtime.
I fundamentally do not believe pulling arbitrary files into such a structure is wise, possible, or will achieve anything resembling the desired result, because basically no application or library is single-file anymore. If you try to side-load an interface out of Monolog, but not a class that implements that interface, now what? I don't even know what to expect to happen, other than "it will not work, at all."
The only way I can see for this to work is to do as you described: Yoink everything in Monolog, Monolog's dependencies, and their dependencies, etc. into a container-ish thing that gets accessed in a different way than normal. That doesn't force Monolog to change; it just means that the plugin/module/library using Monolog has to do the leg-work to set up that package in some way before getting to runtime. It can't just say "grab these 15 files but not these 10 and pull them into a container," since it doesn't know which of those 15 files depend on those 10 files, or vice versa, or what other 20 files one of those 15 depends on that's in a totally different package.
That's like trying to containerize pdo.so and xml.so, but not PHP-FPM or pdo_mysql.so. Technically Docker will let you, there's just no way it can lead to a working system.
So if it would require globbing together a long chain of dependencies into a thing that you can reliably side-load at once, and expect that set of packages/versions to work together, then something that builds on Phar seems like the natural way to do that. Essentially using Phar files as the "container image."
You're right, that may not be how Phar works today. But building that behavior on top of Phar seems vastly easier, more stable, and more likely to lead to something that vaguely resembles a working system than allowing arbitrary code to arbitrarily containerize arbitrary files at runtime and expecting it to go well.
--Larry Garfield
I fundamentally do not believe pulling arbitrary files into such a structure is wise, possible, or will achieve anything resembling the desired result, because basically no application or library is single-file anymore.
I don't think anybody, in any of the examples on this thread, has ever suggested listing individual files to be loaded into the container/module/whatever.
The suggestions I can think of have been:
-
track things recursively, e.g. if A.php is in the container, and loads B.php, put B.php in the container
-
choose based on directory, e.g. whenever the file path being loaded begins "/var/www/plugins/foo", put it in the container (this seems by far the simplest to me)
-
choose based on being in a Phar archive, i.e. whenever the file path being loaded begins "phar:/var/www/plugins/foo.phar:" (this seems entirely equivalent to the previous point to me)
Perhaps what you're picturing is that the compiler needs to know up front what classes do and don't exist, so want to create some kind of index? That's not how I picture it. If code in the container references a class that doesn't exist, it should call any autoloaders registered inside the container, and if they fail to define the class, it should error as normal.
There needs to be some way to "import" and "export" symbols to communicate between the container and its host application, but I think for those it is safe to list individual items, because you're not trying to pull their dependencies, just point to the right piece of code.
Rowan Tommins
[IMSoP]
On Mon, Jun 2, 2025 at 10:40 PM Larry Garfield larry@garfieldtech.com
wrote:
Well, now you're talking about something with a totally separate
compile step, which is not what Michael seemed to be describing at all.
But it seems like that would be necessary.There's definitely some crossed wires somewhere. I deliberately left
the mechanics vague in that last message, and certainly didn't mention
any specific compiler steps. I'm a bit lost which part you think is
"not what Michael seemed to be describing".Picking completely at random, a file in Monolog has these lines in:
namespace Monolog\Handler;
...
use Monolog\Utils;
...
class StreamHandler extends AbstractProcessingHandler {
...
$this->url = Utils::canonicalizePath($stream);My understanding is that our goal is to allow two slightly different
copies of that file to be included at the same time. As far as I know,
there have been two descriptions of how that would work:This is what I was getting at. As I understand what Michael's examples
have described, it allows pulling a different version of one or more files
into some kind of container/special namespace/thingiewhatsit, at runtime.
At some point that could be a fair assessment of what I was saying. I'm
coming around to Rowain's container view though, enough to start thinking
of this as container modules. I don't want to get in the weeds of how the
files for a container module get set up by whatever package manager is
chosen as that's a massive problem to solve in its own right. For now I
would like to focus on this idea of having a container that can do whatever
it needs to do without affecting the code that started it in any way.
Avoiding the enormous code/file duplication that will result from this is a
separate, later problem and admittedly might not be solvable. But having a
container mechanism, even if it isn't optimized, would be healthier than
having plugins that carry their own Stauss monkey-typed copies of the
libraries they need even if those are several minor versions behind (which
should be compatible if the author obeys semantic versioning).
The Problem: Interoperability.
That's really it.
I think this is why Rowan keeps telling you to call or compare this with
"Containers" and not modules. When I opened this thread, my interest was in
bundling multiple files all at once so that the PHP engine can make
assumptions and optimizations about it and expand namespace to also allow
class visibility. To me, and I believe to a vast majority of PHP users,
interoperability is not a problem. We don't need, and, depending on how we
position it, don't want multiple versions of the same package on a single
application.
--
Marco Deleu
On Tue, May 20, 2025 at 11:08 AM Michael Morris tendoaki@gmail.com
wrote:The Problem: Interoperability.
That's really it.
I think this is why Rowan keeps telling you to call or compare this with
"Containers" and not modules.
Which is why I switched to calling them a "whachamacallit" to set the issue
aside and focus on concepts, not terms. He insisted on trolling despite
that.
When I opened this thread, my interest was in bundling multiple files all
at once so that the PHP engine can make assumptions and optimizations about
it and expand namespace to also allow class visibility. To me, and I
believe to a vast majority of PHP users, interoperability is not a problem.
We don't need, and, depending on how we position it, don't want multiple
versions of the same package on a single application.
Not all of us have the pleasure of living in ivory towers like you do. In
the real world code isn't perfect and problems need to be solved.
No one wants multiple versions of the same package in a single application.
How stupid do you think I am?
We don't always get what we want - the need for multiple package versions
does arise in the real world which is why other languages such as golang
and JavaScript can allow it.
Hi!
It's been a few days since I wanted to send this email to internals, but
real life has been a bit chaotic so I apologize if it comes off as if I
didn't research the archives enough. I glossed over the Module conversation
from 10 months ago and the one that recently surfaced and after deeply
thinking about Rowan's and Larry's comments I wanted to throw this idea
into the pit.Lets preface the conversation with the fact that 1) a module system for
PHP has been discussed for several years and 2) if there was an easy and
perfect solution it would have been long implemented by now. With that in
mind, I think there are mainly two major "camps": the ones that would
support something new similar to Node ESM vs CommonJS and those who won't.
Having dealt with this mess on the NodeJS side, I'm still on the side that
would support it because even though it's been 10 years worth of "mess", it
has greatly empowered progress. But I think PHP is too conservative to
indulge this camp, so I'm going to focus on Rowan's and Larry's position of
"we need something that builds on top of namespace, not replace it".If we consider how GitHub, Composer and Docker Hub works, we can pin a
very important aspect of "namespaces": {entity}/{project}. Entity may
either be an individual or an organization, but the concept is mostly the
same. Although it can be argued that PHP has nothing to do with that, I
think that could be a "good-enough" foundation considering the complexity
of the subject. Here is what we could do:<?php declare(strict_types=1); namespace Acme\ProjectOne { public class Foo {} // same as class Foo {} private class Bar {} // only visible inside Acme\ProjectOne protected class Baz {} // visible inside Acme } namespace Acme\ProjectTwo { new \Acme\ProjectOne\Foo; // Work as always new \Acme\ProjectOne\Bar; // Fatal error: Uncaught Error: Cannot instantiate private class \Acme\ProjectOne\Bar from \Acme\ProjectTwo new \Acme\ProjectOne\Baz; // Works } namespace Corp\Corp { new \Acme\ProjectOne\Foo; // Work as always new \Acme\ProjectOne\Bar; // Fatal error: Uncaught Error: Cannot instantiate private class \Acme\ProjectOne\Bar from \Corp\Corp new \Acme\ProjectOne\Baz; // Fatal error: Uncaught Error: Cannot instantiate protected class \Acme\ProjectOne\Baz from \Corp\Corp } function (\Acme\ProjectOne\Foo $foo) {} // Works as always function (\Acme\ProjectOne\Bar $bar) {} // Open question: allow or disallow it? function (\Acme\ProjectOne\Baz $baz) {} // Open question: allow or disallow it?
This would allow public, private and protected classes in a way that I
believe to be useful for the large ecosystem that surrounds Composer. From
my extremely limited understanding of the engine, I think the easy/natural
step would be to allow private/protected classes to be received outside
its namespace because a type declaration does not trigger autoload.
However, an important question is whether this is enough groundwork that
could lead to optimizations that have been discussed when the topic of
module is brought up. For instance, if type-hint outside the module is
disallowed, could that make it easier to pack and optimize an entire module
if we could instruct PHP how to load all symbols of a namespace all at
once? I don't know.As I'm writing this down I don't know if it could be related or if its
something only making sense inside my head, but I see the above proposal
paired with a potential amendment to PSR-4 (and Composer), to stimulate the
community to pack small related symbols in a single file with an opt-in
approach:composer.json:
// ... "autoload": { "psr-4-with-module": { "App\\": "app/", } }, // ...
<?php declare(strict_types=1); // app/Foo/Bar.php namespace App\Foo; class Bar {} // app/Foo.module.php namespace App\Foo; enum Baz {} enum Qux {} new \App\Foo\Bar; // loads app/Foo/Bar.php \App\Foo\Baz::option; // file app/Foo/Baz.php does not exist, tries app/Foo.module.php before giving up \App\Foo\Qux::option; // app/Foo.module.php has been loaded and Qux has been registered already
Thoughts?
--
Marco Deleu
Hey all,
Anyone familiar with C++'s friend keyword? It’s not a direct replacement
for modules, but it solves similar problems — allowing trusted classes or
functions to access private/protected members without making them public.
The idea: allow one class to explicitly grant access to another class or
function. Useful for tightly coupled code that still wants to maintain
encapsulation. Since friend would be a new keyword, it’s safe to add
(currently a parse error).
Examples:
`
class Engine {
private string $status = 'off';
friend class Car;
friend function debugEngine;
}
class Car {
public function start(Engine $e) {
$e->status = 'on'; // allowed
}
}
function debugEngine(Engine $e) {
echo $e->status; // also allowed
}
`
This avoids reflection, awkward internal APIs, or overly permissive
visibility. Could be useful in frameworks, testing tools, or any place
where selective trust is helpful.
Thoughts?
Hammed