[PRE-RFC] Runtime modules

1 day ago by drealecs@gmail.com — view source — reply

unread

Hi internals,

I would like to ask for early feedback on an idea I have been
exploring recently: runtime modules.
I put my initial idea in writing here:
https://news-web.php.net/php.internals/127343

I have recently been playing around with the idea locally.
It seems technically doable, but it touches enough parts of the engine
that I would like to check whether this direction makes sense before
writing a proper RFC.

The problem has been discussed many times over the past few years. I
am now looking at it from real-life package level usage with Composer,
and how to achieve package symbol isolation in a way that would
minimally impact the ecosystem.

The rough idea is to have request-lifetime runtime modules.
A runtime module would be a named internal unit with its own userland
class/function/constant tables.
Code running in a module would define symbols into that module, and
symbol identity for module-owned code would effectively become
(module, symbol_name).
The root context would keep the existing behavior, and conceptually we
can view it as a root/default module.
Root context userland symbols would not automatically be visible to
runtime modules.
A module would see its own symbols, PHP internal/builtin symbols, and
symbols from its direct dependencies. This maps fairly naturally to a
Composer style model where a package depends on PHP and on other
packages, but not implicitly on application/root symbols.

Just to make it clearer, composer related, I see it this way:

each package would be defined in its own module
module dependencies would map directly from Composer package dependencies

One possible userland API shape I have been using to experiment with is:

module_add_dependency(string $module): void
module_run(string $module, Closure $closure): mixed

module_add_dependency() declares a module dependency for the current module.
module_run() executes the passed closure in the specified module context.
All execution contexts have attached to them the module they were
defined for, and any new symbols defined while an execution runs would
have the same module. The exception is module_run(), which overrides
the closure module before running it.

The exact API is not the main point at this stage, but I aim to keep it minimal.
I am more interested in whether the model itself is reasonable.

Technically, the engine would need to track module ownership for
compiled code and symbols, keep per-module symbol tables and direct
dependency lists, and make lookup, type resolution, autoload, and
include_once behavior module aware.

Also, due to the dynamic nature of PHP, objects can be passed to
module code that might not have their class known, but I do not see
this as a blocker.

The design I have been considering also rejects visible shadowing:
unrelated modules may define the same symbol name, but adding a
dependency or declaring a later symbol would fail if it makes two
different symbols with the same name visible from the same context.

I would like feedback on this package oriented runtime module model,
especially whether you see any major technical blockers or design
flaws.

I aim to work on turning this into a complete RFC within the next 6
months, but it might take more. I am far from experienced with
internals details, and I will most probably need guidance and help
with the implementation.
It is not something I want to rush, as this could be an important
addition to the language and we need to get it right.

Thank you,
Alex

3 hours ago by Rowan Tommins [IMSoP] — view source — reply

unread

The rough idea is to have request-lifetime runtime modules.
A runtime module would be a named internal unit with its own userland
class/function/constant tables.
Code running in a module would define symbols into that module, and
symbol identity for module-owned code would effectively become
(module, symbol_name).
The root context would keep the existing behavior, and conceptually we
can view it as a root/default module.
Root context userland symbols would not automatically be visible to
runtime modules.

Hi Alex,

If I understand right, this is what I have suggested in previous threads
could be called "containers". The reason I prefer that name is that it
frames expectations of who needs to make changes: the person
distributing a piece of code, or the person consuming it.

In most contexts, terms like "module", "package", "library", etc, refer
to ways to distribute a piece of code; structuring code, adding
metadata, etc, so that the code can be combined with others to produce
larger pieces of functionality. A "container", in the sense of Docker,
Podman, Kubernetes, etc, is a way to consume other people's code;
taking a complete configured application and isolating it without
modifying each internal part.

The example I've been using is a WordPress plugin which wants to use a
specific version of Guzzle, without colliding with other plugins. To do
that, it needs to isolate not just Guzzle itself, but a tree of at least
a dozen other packages which Guzzle depends on. If every one of those
packages needs to be altered in some way, as implied by the term
"module", the chance of success is low.

On the other hand, if the WordPress plugin can create a "container"
where all of those packages run unchanged, then the feature would
immediately give access to thousands of existing packages.

One possible userland API shape I have been using to experiment with is:
module_add_dependency(string $module): void
module_run(string $module, Closure $closure): mixed
module_add_dependency() declares a module dependency for the current module.

module_run() executes the passed closure in the specified module context.

Given the above, I'm not sure what "module_add_dependency" would do;
what is the difference between "depending on" something, and "running"
that thing?

I also don't think using a string as an identifier is useful or
necessary; avoiding reliance on global names is the whole point of the
exercise, after all.

Instead, how about this?

class ExecutionContainer {
    public function run(callable $code): mixed;
}

Creating a new container initialises a new symbol table, autoloader
stack, etc, and gives you an object referring to them. Calling that
object's run() method then executes some code in the context of that
container, and returns its result.

Also, due to the dynamic nature of PHP, objects can be passed to
module code that might not have their class known, but I do not see
this as a blocker.

I think this is actually the biggest challenge: what happens when
objects are passed between containers?

To use the previous example: as an initialisation step, the WordPress
plugin might want to set up an API client inside its container; later,
it might want to make use of that API client, plus an object passed to
it by a WordPress hook.

The containers are inside the same thread, so in principle there's no
problem referencing object handles which are "owned by" a different
container. But what is the type of those objects? How do they respond to
get_class(), instanceof, etc?

Perhaps there are things we can learn from other languages like Java's
"isolated ClassLoader" which I mentioned on another thread?
https://www.javathinking.com/blog/what-is-an-isolated-classloader-in-java/

The design I have been considering also rejects visible shadowing:
unrelated modules may define the same symbol name, but adding a
dependency or declaring a later symbol would fail if it makes two
different symbols with the same name visible from the same context.

I think this would mean in practice that every container should start
with an empty symbol table (or rather, one with only built-in symbols).
If a container starts with all currently-loaded symbols, it would no
longer have any control over name collisions, so would be useless.

You could perhaps have a way to "import" and "export" specific symbols,
so that e.g. "Psr\Log\LoggerInterface" refers to the same thing in two
different containers; but I think this would need to explicit, so the
container always ran consistently.

I would like feedback on this package oriented runtime module model,
especially whether you see any major technical blockers or design
flaws.

I think this would be a powerful feature, but one that the vast majority
of PHP applications won't use. So the key to success will be minimising
the impact in performance and engine complexity.

Regards,

--
Rowan Tommins
[IMSoP]

2 hours ago by Alexander Egorov — view source — reply

unread

Hi, Alex and Rowan!

I think this is actually the biggest challenge: what happens when objects are passed between containers?

To use the previous example: as an initialisation step, the WordPress plugin might want to set up an API client inside its container; later, it might want to make use of that API client, plus an object passed to it by a WordPress hook.

The containers are inside the same thread, so in principle there's no problem referencing object handles which are "owned by" a different container. But what is the type of those objects? How do they respond to get_class(), instanceof, etc?

Perhaps there are things we can learn from other languages like Java's "isolated ClassLoader" which I mentioned on another thread? https://www.javathinking.com/blog/what-is-an-isolated-classloader-in-java/

I actually really like the idea of some sort of "containers" in general,
because it is quite a common problem in my practice when we have a
more-or-less big monolithic app and eventually come to conflicting
dependencies.

As for passing objects between containers, it does not seem like an
unsolvable problem. Even though our thoughts are very abstract for now,
generally speaking there could be some matching of objects in
inter-container usage. Like, if "my" container is now using
\My\Namespace\SomeObject from my dependencies (let it be, say,
myvendor/mylib: 1.0), when receiving such an object from another
container, it could be known what exact dependency it is from. And if it
is also myvendor/mylib: 1.0 - there should be no problems. If versions
don't match, there could be some explicit inter-dependency mapping defined
by the user, or an explicit runtime error if no mapping was provided.

Such an inter-container mapping mechanism would require careful designing
of course, but technically it does not look unsolvable. Generally
speaking, we could have the mentioned \My\Namespace\SomeObject loaded both
from v1.0 and say v1.1 so we could map one to another on the calling side.
Here comes the quirk that PHP itself is not aware about the composer and
its structure. But PHP can always know where exactly a specific class was
sourced from, so we could use it as a distinguisher. I bet distinguishing
by filename would be a very bad design though. But if PHP allowed to
somehow "tag" loaded classes, it would be handy for containers AND
composer.

Like, when "starting" a new container, the only thing we need to provide is
a class-loader, which would presumably "tag" the loaded classes in a known
manner. In the case of composer, it could tag them by version. In our code,
we could import all needed classes with some advanced syntax with
specifying the "tag". As an example from the top of my head:
use \My\Namespace\SomeObject tagged "..." [as SomeObjectV1], explicitly
telling which version we need. And we could have both versions imported
this way (and map one to another, accordingly). Autoload will receive this
"tag" (if provided) and load a corresponding class. The power of containers
would come with carefully designed defaults for these class versions when
no version was explicitly stated (but container-based autoloader already
knows where it is rooted from).

These are still very rough thoughts though, but with deeper thinking, I believe
it can evolve what Alex and you are suggesting.

Regards,
Alexander Egorov.

1 hour ago by Rowan Tommins [IMSoP] — view source — reply

unread

As for passing objects between containers, it does not seem like an
unsolvable problem. Even though our thoughts are very abstract for now,
generally speaking there could be some matching of objects in
inter-container usage. Like, if "my" container is now using
\My\Namespace\SomeObject from my dependencies (let it be, say,
myvendor/mylib: 1.0), when receiving such an object from another
container, it could be known what exact dependency it is from. And if it
is also myvendor/mylib: 1.0 - there should be no problems. If versions
don't match, there could be some explicit inter-dependency mapping defined
by the user, or an explicit runtime error if no mapping was provided.

Just knowing the version / source of an individual class is not enough.
At the very least, you need a fingerprint that also includes the
versions of the classes it inherits from, the interfaces it implements,
and the traits it uses.

And then what about other relationships, like the classes they create
and return? For example:

class WidgetFactory {
public function makeWidget(): Widget { ... }
}

$factoryA = new WidgetFactory();
$factoryB = $someContainer->run(fn() => new WidgetFactory);

$widgetA = $factoryA->makeWidget();
$widgetB = $factoryB->makeWidget();

If the container contains the same version of WidgetFactory, but a
different version of Widget, what objects do I end up with?

Like, when "starting" a new container, the only thing we need to provide is
a class-loader, which would presumably "tag" the loaded classes in a known
manner. In the case of composer, it could tag them by version. In our code,
we could import all needed classes with some advanced syntax with
specifying the "tag". As an example from the top of my head:
use \My\Namespace\SomeObject tagged "..." [as SomeObjectV1], explicitly
telling which version we need. And we could have both versions imported
this way (and map one to another, accordingly).

This feels like it's moving much more back to the "module" idea -
changes which have to propagate deep into existing code before you can
use them.

To me, the guiding principle of containers needs to be that the
configuration all exists at the boundary of the container. I'm
thinking of the EXPOSE and VOLUME keywords in a Dockerfile, for example.

--
Rowan Tommins
[IMSoP]