Again, another "modules" proposal, but in more steps.
(sorry, this is a very long post).
===
TL;DR: the final goal (in a long time, possibly) is to be able to do
something like this:
<?php declare(module=1);
// some_file.php
export class SomeClass {
public function someMessage() {
myInternalFunction();
}
}
function myInternalFunction() {
echo "Hello world!";
}
<?php
// index.php
import SomeClass from 'some_file.php';
$o = new SomeClass();
$o->someMessage(); // Works.
myInternalFunction(); // Fatal error: undefined function "myInternalFunction".
===
Trivia:
Recently, we had proposals for "Friend classes" and "Pure PHP files".
These suggestions aren't new at all, but they demonstrate some wish in
the PHP ecosystem to make certain changes for PHP to be closer to other
programming languages.
On my side, I've reviewed some of the discussions regarding "modules",
and though it's quite messy because there are lots of different views
and opinions, I think, maybe too optimistically, that there might be a
way to pave the way to "PHP modules" in another way than a huge change
in the entire engine.
I think we can implement actual "modules" in two steps:
- Implement definition files first, so they can be handled in the
compilation step and have no runtime effect (see later). - Implement "modules" with an "import" keyword, and make "import"-ed
modules in a way that a "module" is only a definition file that can
"export" some of its defined structures and/or "import" structures from
other modules, with a completely enclosed and standalone scope/context.
===
First step: definition files.
One of the proposals for "modules" implied files with a different PHP
extension, to make them easily distinguishible from other files, and the
recent "pure files" suggestion follows the same idea: removing the
"<?php" tag so that PHP doesn't need it anymore.
However, these discussions had certain conclusions that I agree with:
- Making the extensions different will profoundly change how PHP
includes files. Extensions like ".inc", ".php5" or alike were
discouraged for specific reasons, and nowadays, most PHP handlers
(apache, nginx, caddy, and possibly others) are defaulted to ".php"
extension files, so adding a new extension means changing both the
engine and the whole ecosystem. The conclusion is overall that changing
the extension will not give any benefit, and only brings disadvantages. - "pure" files, as in "no open/close tags" brings no real value, because
having "<?php" is similar to having a shebang line in many other file
types, and even PHP files themselves can contain a shebang line, so it's
already a nice indicator, and ALL tooling around PHP code needs them to
distinguish PHP code from "non-PHP" whatever-they-are characters.
On my side, when I first read the "pure" proposal, I was thinking mostly
about "pure" as in "has no side-effect".
Which brings another idea: what if include-ing a PHP file actually had
zero side-effect, apart its compilation process?
That's where I'm coming with this idea: definition files:
Notes:
- I will often refer to "built-in" in here, and "built-in" means
"built in PHP or one of the enabled extensions", which implies
"accessible at compile-time"- When I say "global scope", I also imply "global namespace scope"
for every namespace defined in the file.
- A PHP "definition" file is a file that has ZERO global state,
calls, or mutable statements. - It is declared with a
declare(def=1);statement at the top of it. - It can only contain declarations:
(include|require)(_once)?
const,function,namespace,class,return,interface,
trait,use,enum, etc. - It can include/require a file, as long as this statement is a
string literal (or a built-in constant) - Since the file must not contain statements, the global scope of the
file must not refer to any variable, and must not define variables
either. Even superglobals. if/else/elseif,switchormatchstatements can also be
allowed, only if they respect the previous points. This way, you can
still define functions/constants/classes depending on PHP versions
(see next points about constants). I'm not sure about iterators
(for,while,do...whileorforeach), because I see no proper
use-case, but they can still be allowed if they imply no global call
statements, though it seems very unlikely anyway.try/catch/finallyare useless if global scope has no calls, so
they can be safely forbidden.- Statements like
break,continueare not allowed either, because
they implicitly expect a "parent context". newis allowed for built-in classes only, and can only receive
literals (because it cannot refer to variables or user-based constants.)exit/dieare also forbidden in the global scope, and should be
replaced with exceptions.throwis allowed, but can only throw exceptions from built-in
eceptions. User-created exceptions are not allowed.- Global scope can never contain the closing tag
?>, as a safety
against potential "echo" calls. It /might/ use it as only last
characters of said file, but IMO it's much easier to handle the "no
closing tag" case than "possible as last statement of the file". The
problem is that having a closing tag at the end can mess with IDEs
that ensure a line feed at the end of every file (file that will
therefore have anecho "\n";statement in it...) - The file's global scope can refer to constants defined internally by
PHP or its extensions. This means every constant that isn't in the
userkey when callingget_defined_constants(true)in PHP, as
well as magic constants like__DIR__. Only constants that are
always available at compile-time will be checked. This way, it can't
accidentaly trigger anUndefined constantwarning for userland,
but it can trigger one if a native extension isn't enabled or
doesn't have said constant, which can also be detected at compile-time.
This concept brings more advantages than previous proposals:
- Your file is still normal PHP, compatible as usual.
- Can still be interpreted by all IDEs that support PHP code
- Doesn't need a different file extension
- The fact that it's a definition file is explicitly visible at the
beginning of the file when you open it - It can still be included/required by any other file without the file
itself knowing that it actually includes a "definition file", and is
therefore fully eligible to be compatible with all current autoload
setups, including Composer - Still allows things that frameworks do for conditional
function/class declaration (if it relies on the PHP version
constant, for example). Will not be able to useversion_compare()
though, but there are workarounds. - Potential compile-time built-in constant optimization (if not
already done by the compiler, I didn't search for this yet) - Everything that is not global/namespace-scope (functions, classes,
etc.) can still contain whatever code they need, and theoretically
it can even contain the PHP closing tag, since it's compiled as an
"ECHO" statement. - All potential errors when including/requiring such file will be
compile-time errors, therefore if the file is "correct", compiling
it definitely means that it has its place in the opcache for a very
long time as no runtime can alter its global context. - Having no actual call statements in the global/namespaced scope
ensures no "echo", but overall has absolutely zero runtime impact
other than compile-time errors, since there cannot be notice/warning
errors that might also pollute the current buffer. (I might have
forgotten what else can throw a notice/warning, but feel free to
correct me if I do).
There are only a tiny amount of drawbacks to this (from what I've
thought about so far):
- All definition files will have to begin with
<?php declare(def=1);
(fair enough IMO, since some static analysers are already capable of
adding "declare(strict_types=1)` automatically...) - Potential tiny compile-time performance drop and/or memory
consumption, because all global statements would have to be checked
and analysed. And maybe a bit more if constants are also validated.
For end-users, a "definition" file has only one single advantage: it has
no runtime impact when being loaded, and only when its defined
structures are used. This is a guarantee of trust that can benefit all
frameworks and libraries.
But this advantage paves the way to "modules" in a very interesting
manner: all modules must be definition files in the first place.
===
Second step: PHP modules.
TL;DR: loading a "module" is similar to an "include/require"_once,
but at the compiler-level instead of the engine/runtime-level.
The concept of "module" in my mind in PHP is the following:
A PHP Module is a normal PHP file that is, at first, a definition
file, but instead starts with declare(module=1);. This declaration
automatically implies declare(def=1);, and the wrong combo
declare(module=1,def=0); can throw a compile-time error.
On the Module side:
- It has access to new keywords:
importandexport, as well as
import ... as ...andexport ... as .... - Module names are useless, since the module is the file itself.
- A module can
exportwhatever is declared in said file: constants,
functions, classes, enums, interfaces..., as long as it's only a
declaration and not an actual call. - The
exportkeyword must implicitly be in the global namespace,
even if it is written inside another namespace.
This implies that these variants have to be considered strictly
equivalent:
is the exact same as the following:<?php declare(module=1); namespace My\Namespace; export class MyClass {}; <?php declare(module=1); namespace My\Namespace { export class MyClass {}; } <?php declare(module=1); namespace My\Namespace { class MyClass {}; } export MyClass; - Composer can add a new package type named "php-module" that accepts
only one single PHP file as input, that file must be a module. - A module can import other modules.
- Conditional, or encapsulated
importcan also be resolved at
compile-time, and since they can only contain compiler-accessible
statements (string literals or built-in constants), the behavior
will be similar to an "include" statement at runtime on a definition
file anyway. - Unused imports can be detected at compile-time and throw a notice
message - (my opinion, so definitely optional) Two exports must never have the
same name. By this, I mean that you could useimport someFunction, SomeClass, SOME_CONSTANT from 'file.php';freely without having to
specify what you are trying to import. This is of personal taste,
to enforce users not to use the same names to avoid confusion in
general. I see no proper use-case to allow constants and classes to
have the same name, for example, but some people might dislike this.
On the engine-side it will still properly define the expected
structures from the module, and on the userland, errors will be
thrown if said structure is used improperly anyway. To me,
considering how big this feature is, this is just a way to
"opinionate for better naming" :) (and it avoids ugly things like
import const SOME_CONST as SOME_CONST_ALIAS, class SomeClass as SomeClassAlias, function someFunction as someFunctionAlias from 'file.php';, right?)
And on the userland-side:
- The
importandimport ... as ...keywords also becomes
accessible to ANY other PHP file, whether it is a definition, a
module, or a regular PHP file. - Just like in definition files, the
importkeyword can only refer
to string literals and/or built-in constants. This restriction is
only applied for the newimportkeyword, and not to the rest of
the file. - The
importkeyword will explicitly make all imported structures
accessible in the current file, just like if the module was loaded
withinclude, but onlyexport-ed structures are accessible. - Imports can be placed anywhere in the file, and will be resolved at
compile-time, since their only drawback is "adding more structures
in memory". - An import can define an alias that will only be accessible to the
importer-file, likeimport SomeClass as MyAlias from 'file.php';.
Internally, there are other interesting things that happen:
- All definitions from inside a module will have be prefixed in the
symbols table with a hash corresponding to the current module file's
hash. It can be similar to how anonymous classes are registered
internally. The goal is to make them inaccessible (as much as
possible) from the global scope. - All calls to module file's internal definitions will use this hash
prefix to refer to said structures. - When using "import", it will do 3 things:
o Analyseimport-ed statements, to retrieve only the structures
that are asked by the end-user
o Load the file (from file, or from opcache, if not already in memory)
o Create the modules definitions (if not already in memory) with
internal hashes as previously described, and tree-shake unused
structures as of the list ofimport-ed ones. Can be done at
importer-compile-time too.
This way, it would behave similarly to(require|include)_oncebut
at a more granular level: with only the structures that were
imported by the current file. Any subsequent call toimportfor
the same file will do the same thing, and since these files have no
runtime impact and only contain definitions, it should have close to
no loading impact. Subsequent imports with the same structures will
load only the ones that are already in memory (because they can be
referenced with the hash-prefix), and if a "new structure" is found
that has not been loaded already in the global space, it will create
it in the global scope at runtime. This makes sure that module files
with 100 exports will not load /all/ structures in memory when the
file isimported. - Modules have no impact on autoload, since they don't function the
same way. - Since internal functions/constants/etc. are hash-prefixed, they will
never conflict with other internal structures. This means that a
module could define thestr_containsfunction, if it wanted. And
it could even reuse the native function by usinguse function str_contains as base_str_contains. Would also work for
object-oriented structures (class, interface, etc.) as well as
constants too. - We can create a
ReflectionModuleclass, which constructor
accepts a file name, and throws an exception if the file is not a
module. This class would expose the list of exported structures from
said module. Maybe it can also contain the processed hash/prefix and
the internal structures too, but having these available kinda
defeats the purpose of having internal structures in the first
place... But the exported structures would be ReflectionClass,
ReflectionConstant, etc, with the "module" flag explained in next point: - Other Reflection classes will contain an internal "defined in
module" flag, as well as a nullable string corresponding to the path
to the module file if the structure is effectively defined in a module. - FQCNs will always resolve to the "global public name", and never to
the internal hash-prefixed name. - A new global function can be used:
spl_register_module($prefix, $filePath);. This way, we could definitely imagine a flexible
prefix to resolve to a module path forimportstatements. It
allows Composer-package-compatible syntax likeimport Request from '@symfony/http-foundation/Request.php';being registered with
something likespl_register_module('@symfony/http-foundation', __DIR__.'/vendor/symfony/http-foundation/'); - The
spl_register_modulefunction can refer to structures directly,
if needed:spl_register_module('@symfony/http-foundation/Request', __DIR__.'/vendor/symfony/http-foundation/Request.php');.
This allows for no-extension imports likeimport Request from '@symfony/http-foundation/Request';(but this is just for the fancy
looks)
The function itself will register the input as a prefix or as a
module path based on whether the specified path is a directory or a
file, checked at runtime. - Multiple paths can be used for the same prefix, as long as they are
all directories. - If a module is registered as a file instead of a directory, there
can be only one. - (yeah, I know, this concept looks like a reinvention of
include_paths, but hey, it's modules now!) - Autoload-like features can be used for projects using Composer:
o Thecomposer.jsonfile can contain new field: "modules" and
"modules_dev".
o This field would contain a key=>value list of prefix=>file_path
items, that Composer will register through the aforementioned
new spl_register_module() function.
o Composer will (sorry folks) need a way to make sure two PHP
packages don't contain the same module prefixes resolving to two
paths, regardless of them being directories or files. Maybe
Packagist (sorry again) will need this too. This is important to
avoid vendor name squatting in modules. - These module-autoload rules would not change anything at existing
autoload, but they would mostly be here to map a PHP package with
its exposed API. - This also makes sure that any PHP package can say "All API exposed
as a module is covered by BC policy, all the rest is not". Easier
for maintainers to keep their internal stuff, and a bit easier to
make the Open/Closed principle available at a package-level instead
of a class-level.
===
I already worked on the first step to "PHP definition files" and made a
PR of it:
https://github.com/php/php-src/compare/master...Pierstoval:php-src:defs
With my tiny knowledge of PHP internals, I required the help of Cursor
for that, and I added a lot of .phpt test files to ensure the basics
are covered, built the project & ran all the tests on my LMDE 7 (Debian)
machine multiple times with different configs (embed, fpm, debug, etc.),
everything works so far.
Apart the tests, the PR seems quite light, but it obviously needs
thorough review (or rewrite...) before even being converted into an RFC.
It just had the advantage of being fully ready and thoroughly tested
(hopefully I didn't forget anything) in less than a day...
/> Note: I did NOT use any llm to write this message, it was only used
for some bits of code in the above PR, nothing more./
If you have read everything down to this line, thank you very much! It's
the fruit of quite some work!
Now to you folks, it's yours to take and talk :)
--
Alex "Pierstoval" Rock
Polydisciplinary professional web development and training