Optional pre-compiler for PHP8?

5 years ago by Mike Schinkel — view source

unread

Hello all:

While reading the [RFC] Union Types v2 thread and comments from Dmitry[1], and especially Benjamin[2] who suggested "building a static analysis tool which could prove that certain type checks would never fail, and prime OpCache" it occurred to me that a PHP pre-compiler could potentially be used to resolve numerous issues the community has been debating.

But first, let me define what I envision for a pre-compiler:

A command-line tool that could take a PHP file and/or application and generate pre-compiled files with an extension of .phpc or similar.
This of these .phpc files being implemented similar to .phar` files, but actually compiled to a OpCache binary form.
Pre-compiled files would be deployed alongside .php files, or optionally(?) standalone without PHP files.
Libraries and (WordPress) plugins could deliver pre-compiled files too, alongside their .php source files
Command line switches could allow for:
Compiling with or without (selected) deprecations
Selected constants defined on the command line
Packaging code on a one-to-one per PHP file, as one file per namespace, one file per app, etc.
The pre-compilation process would be able to:
Type-check everything that has type-hints
Do type checking that is too expensive to do at runtime
Pre-compiling:
Could help eliminate the complexity of auto-loading and opening many files, at least for pre-compiled code.
Would be an option for type checking and improved performance, but not be required.

If the PHP community were to embrace the idea of an optional pre-compiler then we could see the following benefits:

Full type checking capability without any concerns for runtime performance issues related to type checking.
Ability to significant improve performance over time, possibly even more than a JIT model.
Potential to support optimized real types — as in Hack — where code needs to be highly performant
Ability to deprecate features for pre-compiled code while still supporting them when not precompiled.

While benefits #1 to #3 are highly valuable, consider benefit #4. If we had such a pre-compiler than the concern for BC for pre-compiled code could become moot as the deprecations would not affect any existing code that is not pre-compiled.

This could potentially give us the best of both worlds?

Further, those most interested in deprecations and moving to enterprisey language features certainly use a CI/CD build process so it should be not problem at all for them to incorporate a pre-compile step.

Lastly, having such an optional process — with its primary promoted benefit being performance — could be a great incentive for those running less strict and backwards-compatible PHP code to refactor their source code to gain greater performance. This contrasts with deprecating features and breaking BC just "because it is a better way to program." Give them a carrot rather than use a stick.

So for those who know PHP's internal core code:

Is there any reason this is not technically viable?

And for everyone:

What do you think of this as a potential future for PHP?

-Mike

[1] https://news-web.php.net/php.internals/107699
[2] https://news-web.php.net/php.internals/107702

5 years ago by Mark Randall — view source

unread

Hello all:
And for everyone:>
What do you think of this as a potential future for PHP?

I had received the impression that a lot of the problems for performance
optimizations relate to how PHP can shift things around at runtime,
where identical code at the run-site means something completely
different in practice because the same class name or function has been
included from one file, rather than another.

I imagine if PHP had full knowledge of all its state, that might provide
an avenue for additional optimizations.

5 years ago by Rowan Tommins — view source

unread

While reading the [RFC] Union Types v2 thread and comments from Dmitry[1], and especially Benjamin[2] who suggested "building a static analysis tool which could prove that certain type checks would never fail, and prime OpCache" it occurred to me that a PHP pre-compiler could potentially be used to resolve numerous issues the community has been debating.

I chose the phrase "static analysis tool" deliberately, because I wanted
to think about the minimum requirements for such a tool, rather than its
long-term possibilities. The basic requirements are fairly straight-forward:

a static analyser that can infer types in a PHP program; we know
that's possible from a number of third-party tools, although they do
rely on docblock comments for things the language doesn't (yet) let you
define
the ability to generate OpCodes for some code and store it to disk;
this is more or less what OpCache does if enabled for CLI mode

However, combining those usefully may not be that easy.

The first problem is that OpCache is designed to work one file at a
time, because a program can load any combination of files at run-time.
Static analysers, on the other hand, need to process a whole directory
at a time, so that calls can be matched to definitions; multiple
definitions of the same function or class tend to cause problems, even
though only one is loaded at run-time. So we'd probably need some
built-in definition of a "package", which could be analysed and compiled
as one unit, and didn't rely on any run-time loading.

The second problem is that, as I understand it, type checks aren't
actually separate OpCodes, so eliminating them from the compiled program
may not be that easy. There are some cases where you can just eliminate
the type check from a definition, e.g.:

class A {
    private int $x=1;
    private function foo(int $x) { }
    public function bar() {
       $this->foo($this->x);
    }
}

Since we know that function foo is only ever called with the correctly
typed argument, we can compile it as though it had no type declaration.
However, in the seemingly obvious case Benjamin gave, the optimisation
isn't so easy:

function x(): int {}
function y(int $foo) {}
y(x());

We can't eliminate the type check for all calls to x(), or for all calls
to y(), but we want to eliminate the duplicate check for that particular
line. So the OpCodes need to represent that somehow. I've no idea how
easy or hard that would be.

In order to extend this to a full compiler, we need at least one more
thing: a stable compilation target. What I mean by that is that if I
distribute a package in binary form, it needs to run on a reasonably
large range of PHP versions and installations. My understanding is that
the OpCodes in the Zend VM are not designed to be stable across
versions, so you can't just ship today's OpCache output like you would a
Java class file or .net assembly. Again, I don't know how much effort it
would be to make the VM work as such a stable target.

Ability to deprecate features for pre-compiled code while still supporting them when not precompiled.

Unlike P++, Editions, or Strict Mode, this would undeniably define that
the deprecated features were "the wrong way". If the engine had to
support the feature anyway, I'm not sure what the advantage would be of
tying it to "compiled vs non-compiled", rather than opting in via a
declare() statement or package config.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

Thank you for your comments.

I chose the phrase "static analysis tool" deliberately, because I wanted to think about the minimum requirements for such a tool, rather than its long-term possibilities.

Your points are all well-considered.

To be clear, I wasn't stating the idea as a alternative to your idea, I was only stating that your comments inspired me to have the idea of a pre-compiler.

IOW, I saw no reason both could not be done, one sooner and the other later.

However, combining those usefully may not be that easy.

Also for clarity, I was not assuming existing OpCache would be 100% unmodified, I was talking about benefits that a pre-compiler could have and was less focused on ensuring it could slot into an existing OpCache implementation as-is.

IOW, if it is worth doing it might be worth extending how the OpCache works.

So we'd probably need some built-in definition of a "package", which could be analysed and compiled as one unit, and didn't rely on any run-time loading.

That idea of a "package" came up during a debate on this list at least once, a few months ago, and I think it makes a lot of sense. And what I proposed effectively implies that namespaces would be treated like packages from the perspective of the compiler.

But then again a new package concept might be needed in addition to namespaces, I am not certain either way.

Unlike P++, Editions, or Strict Mode, this would undeniably define that the deprecated features were "the wrong way".

I am not sure I cam agree that it would define them as the "wrong way."

The way I would see it is there would be a "strict way" and an "unstrict way." If you prefer the simplicity of low strictness and do not need more/better performance or the benefits of type-safety that are needed for building large applications, then the "right way" would still be the "unstrict way."

And the non-strict features would not be "deprecated" per-se, they would instead be disallowed for the strict (compiled) way, but still allowed for the unstrict (interpreted) way.

If the engine had to support the feature anyway,

I think we are talking two engines; one for compiling and another for interpreting. They could probably share a lot of code, but I would think it would still need to be two different engines.

I'm not sure what the advantage would be of tying it to "compiled vs non-compiled", rather than opting in via a declare() statement or package config.

The advantage would be two-fold:

Backward compatibility
Allowing PHP to continue to meet the needs of new/less-skilled programmers and/or people who want a more productive language for smaller projects that do not need or want all the enterprisey type-safe features.

Frankly it is this advantage which is the primary reason I though to send a message to the list. The chance to have the benefit of strictness and high performance for more advanced PHP developers while still having full BC for existing code and for beginner developers seemed highly compelling to me.

-Mike

5 years ago by Benjamin Morel — view source

unread

So we'd probably need some built-in definition of a "package", which
could be analysed and compiled as one unit, and didn't rely on any run-time
loading.
That idea of a "package" came up during a debate on this list at least
once, a few months ago, and I think it makes a lot of sense. And what I
proposed effectively implies that namespaces would be treated like packages
from the perspective of the compiler.

Putting aside the idea of distributing pre-compiled PHP scripts, if we're
only debating the precompilation as, notably, a means to reduce the cost of
type checks, I wouldn't mind if the precompilation occurred only if
preloading is in use, i.e if most class definitions are known on server
startup, which is when the compilation / optimization passes could occur.
No preloading = no such optimizations, I could personally live with that.

No need for a package definition, IMO.

— Benjamin

On Oct 27, 2019, at 7:04 PM, Rowan Tommins rowan.collins@gmail.com
wrote:

Thank you for your comments.

I chose the phrase "static analysis tool" deliberately, because I wanted
to think about the minimum requirements for such a tool, rather than its
long-term possibilities.

Your points are all well-considered.

To be clear, I wasn't stating the idea as a alternative to your idea, I
was only stating that your comments inspired me to have the idea of a
pre-compiler.

IOW, I saw no reason both could not be done, one sooner and the other
later.

However, combining those usefully may not be that easy.

Also for clarity, I was not assuming existing OpCache would be 100%
unmodified, I was talking about benefits that a pre-compiler could have and
was less focused on ensuring it could slot into an existing OpCache
implementation as-is.

IOW, if it is worth doing it might be worth extending how the OpCache
works.

So we'd probably need some built-in definition of a "package", which
could be analysed and compiled as one unit, and didn't rely on any run-time
loading.

That idea of a "package" came up during a debate on this list at least
once, a few months ago, and I think it makes a lot of sense. And what I
proposed effectively implies that namespaces would be treated like packages
from the perspective of the compiler.

But then again a new package concept might be needed in addition to
namespaces, I am not certain either way.

Unlike P++, Editions, or Strict Mode, this would undeniably define that
the deprecated features were "the wrong way".

I am not sure I cam agree that it would define them as the "wrong way."

The way I would see it is there would be a "strict way" and an "unstrict
way." If you prefer the simplicity of low strictness and do not need
more/better performance or the benefits of type-safety that are needed for
building large applications, then the "right way" would still be the
"unstrict way."

And the non-strict features would not be "deprecated" per-se, they would
instead be disallowed for the strict (compiled) way, but still allowed for
the unstrict (interpreted) way.

If the engine had to support the feature anyway,

I think we are talking two engines; one for compiling and another for
interpreting. They could probably share a lot of code, but I would think
it would still need to be two different engines.

I'm not sure what the advantage would be of tying it to "compiled vs
non-compiled", rather than opting in via a declare() statement or package
config.

The advantage would be two-fold:

Backward compatibility

Allowing PHP to continue to meet the needs of new/less-skilled
programmers and/or people who want a more productive language for smaller
projects that do not need or want all the enterprisey type-safe features.

Frankly it is this advantage which is the primary reason I though to send
a message to the list. The chance to have the benefit of strictness and
high performance for more advanced PHP developers while still having full
BC for existing code and for beginner developers seemed highly compelling
to me.

-Mike

5 years ago by Andreas Hennings — view source

unread

So we'd probably need some built-in definition of a "package", which
could be analysed and compiled as one unit, and didn't rely on any run-time
loading.
That idea of a "package" came up during a debate on this list at least
once, a few months ago, and I think it makes a lot of sense. And what I
proposed effectively implies that namespaces would be treated like packages
from the perspective of the compiler.

Putting aside the idea of distributing pre-compiled PHP scripts, if we're
only debating the precompilation as, notably, a means to reduce the cost of
type checks, I wouldn't mind if the precompilation occurred only if
preloading is in use, i.e if most class definitions are known on server
startup, which is when the compilation / optimization passes could occur.
No preloading = no such optimizations, I could personally live with that.

No need for a package definition, IMO.

This would break as soon as we have two versions of a class, and a
runtime choice which of them to use.
(see also Mark Randall's comment)

What about this, instead:

Instead of a cli command, lazily "compile" in the opcache. So more
or less what we are already doing, I guess.
Possibility to store/cache multiple versions of a file, depending on
other files it depends on.

Somehow like this, per file:

Compile a low-level version of the file, or load it from a cache,
with cache id = file path.
Recursively process all the files and classes (autoload) this file
depends on.
Generate a hash from the dependencies.
Compile the final version of the file, or load it from a cache,
with cache id = file path + dependencies hash.

Perhaps this could even be further optimized with some "guessing":
Assume everything is as it was the last time, until we hit a conflict.

This is probably more complicated than I am describing it here.
I kept the term "dependencies" intentionally vague, because I am not
sure what exactly we would need to look at.

Perhaps we would store not just multiple versions of each file, but of
each global symbol (class, function).

One "base version" for each distinct definition of a symbol in a
distinct file.
One "specific version" per combination of versions of other symbols
this depends on.

One problem I see is that some of the dependees may be unknown at the
time a file is included.
E.g. a function might call a static method from a class that has not
yet been included, triggering the autoloader.
Since the autoloader can be anything, we have no way to predict which
file will be included, and thus, which version the static method to
typecheck against.

Even if we previously scanned the entire project directory, and found
only one class with the given static method, the autoloader might
instead include a file outside the project directory, or define the
class with eval() or stream wrappers, or dump a generated file in
/tmp.

This would mean we would have to run a non-deterministic model until
all dependees are included.
So perhaps this idea is a dead end :)

-- Andreas

— Benjamin

On Oct 27, 2019, at 7:04 PM, Rowan Tommins rowan.collins@gmail.com
wrote:

Thank you for your comments.

I chose the phrase "static analysis tool" deliberately, because I wanted
to think about the minimum requirements for such a tool, rather than its
long-term possibilities.

Your points are all well-considered.

To be clear, I wasn't stating the idea as a alternative to your idea, I
was only stating that your comments inspired me to have the idea of a
pre-compiler.

IOW, I saw no reason both could not be done, one sooner and the other
later.

However, combining those usefully may not be that easy.

Also for clarity, I was not assuming existing OpCache would be 100%
unmodified, I was talking about benefits that a pre-compiler could have and
was less focused on ensuring it could slot into an existing OpCache
implementation as-is.

IOW, if it is worth doing it might be worth extending how the OpCache
works.

So we'd probably need some built-in definition of a "package", which
could be analysed and compiled as one unit, and didn't rely on any run-time
loading.

That idea of a "package" came up during a debate on this list at least
once, a few months ago, and I think it makes a lot of sense. And what I
proposed effectively implies that namespaces would be treated like packages
from the perspective of the compiler.

But then again a new package concept might be needed in addition to
namespaces, I am not certain either way.

Unlike P++, Editions, or Strict Mode, this would undeniably define that
the deprecated features were "the wrong way".

I am not sure I cam agree that it would define them as the "wrong way."

The way I would see it is there would be a "strict way" and an "unstrict
way." If you prefer the simplicity of low strictness and do not need
more/better performance or the benefits of type-safety that are needed for
building large applications, then the "right way" would still be the
"unstrict way."

And the non-strict features would not be "deprecated" per-se, they would
instead be disallowed for the strict (compiled) way, but still allowed for
the unstrict (interpreted) way.

If the engine had to support the feature anyway,

I think we are talking two engines; one for compiling and another for
interpreting. They could probably share a lot of code, but I would think
it would still need to be two different engines.

I'm not sure what the advantage would be of tying it to "compiled vs
non-compiled", rather than opting in via a declare() statement or package
config.

The advantage would be two-fold:

Backward compatibility

Allowing PHP to continue to meet the needs of new/less-skilled
programmers and/or people who want a more productive language for smaller
projects that do not need or want all the enterprisey type-safe features.

Frankly it is this advantage which is the primary reason I though to send
a message to the list. The chance to have the benefit of strictness and
high performance for more advanced PHP developers while still having full
BC for existing code and for beginner developers seemed highly compelling
to me.

-Mike

5 years ago by Benjamin Morel — view source

unread

This would break as soon as we have two versions of a class, and a
runtime choice which of them to use.
(see also Mark Randall's comment)

That's why I'm suggesting to only make these optimizations when preloading
https://wiki.php.net/rfc/preloadis in use, which means that you know
ahead of time the class definitions, and you cannot have 2 runtime
definitions of a given class.

No preloading = no optimizations.
Full preloading (whole codebase) = maximum optimizations.
Partial preloading = the compiler should still be able to optimize *some *of
the code involving only the preloaded classes.

We already have, since PHP 7.4, a mechanism to know static class
definitions on startup, so why not build further optimizations on top of it?

⁠— Benjamin

5 years ago by Mark Randall — view source

unread

Allowing PHP to continue to meet the needs of new/less-skilled programmers and/or people who want a more productive language for smaller projects that do not need or want all the enterprisey type-safe features.

This concept of type safety being an enterprise feature needs to die.

Types are a way of preventing your program from getting into states that
you don't expect it to be in, so you don't have to worry about handling
them in the first place.

Scalars, and strict types would have saved me so much time when I
started trying to learn PHP.

Here's a video I stumbled upon recently that helps explain why types
help make coding easier, by reducing the number of possible states an
application can be in:

https://youtu.be/q1Yi-WM7XqQ?t=656

--
Mark Randall

5 years ago by Rowan Tommins — view source

unread

So we'd probably need some built-in definition of a "package", which could
be analysed and compiled as one unit, and didn't rely on any run-time
loading.

That idea of a "package" came up during a debate on this list at least
once, a few months ago, and I think it makes a lot of sense. And what I
proposed effectively implies that namespaces would be treated like packages
from the perspective of the compiler.

But then again a new package concept might be needed in addition to
namespaces, I am not certain either way.

Current tools tend to actually work on a directory level, because you don't
actually know what namespaces are involved until after you've loaded it,
and a file can include code for two completely separate namespaces. My
thinking was that a package would pre-define the full list of files that
define it, with no auto-loader, and no conditional definitions evaluated at
run-time. As Benjamin points out, this is closely related to preloading.

Unlike P++, Editions, or Strict Mode, this would undeniably define that
the deprecated features were "the wrong way".

I am not sure I cam agree that it would define them as the "wrong way."

The way I would see it is there would be a "strict way" and an "unstrict
way." If you prefer the simplicity of low strictness and do not need
more/better performance or the benefits of type-safety that are needed for
building large applications, then the "right way" would still be the
"unstrict way."

And what if you want simplicity and performance? Most of the things
people want to make strict about the language don't make it faster, so if
we limited "pre-compiled mode" to be strict, we'd be making a deliberate
choice to group objectively good things (fast vs slow) with subjective
preferences (strict vs simple). That pretty clearly marks strict mode as
"the better way".

If the engine had to support the feature anyway,

I think we are talking two engines; one for compiling and another for
interpreting. They could probably share a lot of code, but I would think
it would still need to be two different engines.

That sounds like the worst kind of fork: two different engines, running two
different dialects of the language. At that point, you might as well just
switch to Hack.

Note that this was exactly what "P++" was intended to avoid - the two
dialects would exist in the same engine, and get the same performance and
security enhancements.

I'm not sure what the advantage would be of tying it to "compiled vs
non-compiled", rather than opting in via a declare() statement or package
config.

The advantage would be two-fold:

Backward compatibility

Allowing PHP to continue to meet the needs of new/less-skilled
programmers and/or people who want a more productive language for smaller
projects that do not need or want all the enterprisey type-safe features.

Both of these are reasons to have some sort of "strict mode", but not for
tying it to some other feature.

Regards,

Rowan Tommins
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

Current tools tend to actually work on a directory level, because you don't
actually know what namespaces are involved until after you've loaded it,
and a file can include code for two completely separate namespaces. My
thinking was that a package would pre-define the full list of files that
define it, with no auto-loader, and no conditional definitions evaluated at
run-time. As Benjamin points out, this is closely related to preloading.

I would rather a tool that did not require specifying the files. I personally would be fine with one that used a directory as the demarcator, and even if it only worked when you put your namespace in another directory it won't work.

And what if you want simplicity and performance? Most of the things
people want to make strict about the language don't make it faster, so if
we limited "pre-compiled mode" to be strict, we'd be making a deliberate
choice to group objectively good things (fast vs slow) with subjective
preferences (strict vs simple). That pretty clearly marks strict mode as
"the better way".

At the risk of being too flippant, I defer to the wisdom on that great philosopher Mick Jagger and say you can't always get what you want...

But seriously, at some point tradeoffs have to be made to see any forward progress. What we have not found before was a good tradeoff between strict and BC. Maybe this it is? After all, while not all strict things are about performance but many things that enable performance are strict.

That sounds like the worst kind of fork: two different engines, running two
different dialects of the language. At that point, you might as well just
switch to Hack.

That feels like an over-reaction. Hack has purposely diverged from PHP and requires a different runtime than PHP.

The idea I was proposing is that the PHP runtime be one but operates in two different modes — one mode per "engine" — and the goal of two different modes would to be to stay more similar than different, but allow one of them to have BC breaks.

Note that this was exactly what "P++" was intended to avoid - the two
dialects would exist in the same engine, and get the same performance and
security enhancements.

It could also be one engine, it just seemed like that coupling would be more problematic than separating them.

That said, I'm not skilled enough in PHP internals to implement it (yet?) so I can only speak to it at a high level.

The advantage would be two-fold:

Backward compatibility

Allowing PHP to continue to meet the needs of new/less-skilled
programmers and/or people who want a more productive language for smaller
projects that do not need or want all the enterprisey type-safe features.

Both of these are reasons to have some sort of "strict mode", but not for
tying it to some other feature.

I don't understand your reply, but maybe it is moot considering the rest of the dialog?

What we have today is a rock vs a hard-place, and no one wants to give even a millimeter.

So, if this is not a viable solution in your mind to break the logjam between BC and the desire for strictness-in-all-the-things, do you have an alternate, better proposal?

-Mike

5 years ago by Rowan Tommins — view source

unread

>> Note that this was exactly what "P++" was intended to avoid - the two
>> dialects would exist in the same engine, and get the same performance and
>> security enhancements.
>
> It could also be one engine, it just seemed like that coupling
> would be more problematic than separating them.
>

I think the problem is that as soon as you have two engines targeting
different feature sets, it will be hard to persuade people to spend
equal attention on both. If all the new features end up being added to
one engine, the other one is going to increasingly feel like "legacy
mode", rather than "equal but different".

>> Both of these are reasons to have some sort of "strict mode", but not for
>> tying it to some other feature.
>
> I don't understand your reply, but maybe it is moot considering
> the rest of the dialog?
>
> What we have today is a rock vs a hard-place, and no one wants to
> give even a millimeter.
>
> So, if this is not a viable solution in your mind to break the
> logjam between BC and the desire for strictness-in-all-the-things,
> do you have an alternate, better proposal?
>

The idea of an "extra strict" and/or "less backwards compatible" mode
has been mentioned on the list several times, but you're the first to
suggest making it mandatory when using an otherwise unrelated
performance feature.

It would be much better to keep it separate, and opt into it via a
declare() statement, or a package configuration, or a file extension.
There have been proposals for a single flag, lots of separate flags, a
complete "P++" dialect, or bundles of settings ("Editions").

Whatever the approach, a key goal in my mind should be to maximise the
compatibility between the two, and share as much implementation as
possible. Both/all modes should get the same performance improvements,
except where the actual features are necessarily slower or faster.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

I think the problem is that as soon as you have two engines targeting different feature sets, it will be hard to persuade people to spend equal attention on both. If all the new features end up being added to one engine, the other one is going to increasingly feel like "legacy mode", rather than "equal but different".

That is a fair point.

It would be much better to keep it separate, and opt into it via a declare() statement, or a package configuration, or a file extension. There have been proposals for a single flag, lots of separate flags, a complete "P++" dialect, or bundles of settings ("Editions").

Correct me if I am wrong, but all of those have been objected to, strenuously, by at least several people on the list.

What will it take to finally get enough consensus to move forward?

Both/all modes should get the same performance improvements, except where the actual features are necessarily slower or faster.

Fine. But a pre-compiler still could have merit.

One of the things I would like to see from a pre-compiler is getting rid of the need to deal with an autoloader and hence we able to store multiple related classes in the same file.

Primarily I would like this will doing R&D on a project idea prior to fully understanding what the object hierarchy needs to be. That, of course, would conflict with the non-pre-compiled code by its very nature.

-Mike

5 years ago by Dik Takken — view source

unread

a static analyser that can infer types in a PHP program; we know
that's possible from a number of third-party tools, although they do
rely on docblock comments for things the language doesn't (yet) let you
define

Opcache already performs type inference. It does not make use of
information in comments. It only looks at the code, yielding type
information that is accurate and can be used for optimization.

Here is an interesting read on the subject:

https://depositonce.tu-berlin.de/bitstream/11303/7919/3/popov_etal_2017.pdf

The first problem is that OpCache is designed to work one file at a
time, because a program can load any combination of files at run-time.
Static analysers, on the other hand, need to process a whole directory
at a time, so that calls can be matched to definitions; multiple
definitions of the same function or class tend to cause problems, even
though only one is loaded at run-time. So we'd probably need some
built-in definition of a "package", which could be analysed and compiled
as one unit, and didn't rely on any run-time loading.

This problem could possibly be solved by using preloading. The
definition of a package would then be: the set of files that the
application will load during startup. Preloading could give opcache
access to the full application and optimize more effectively.

The second problem is that, as I understand it, type checks aren't
actually separate OpCodes, so eliminating them from the compiled program
may not be that easy. There are some cases where you can just eliminate
the type check from a definition, e.g.:

This is partially correct. Some type checks are separate opcodes, some
are not. Type checking opcodes are actually removed by opcache when its
static analysis can prove that the type check will always pass. It has
some limitations but the functionality is all there.

Regards,
Dik Takken

5 years ago by Rowan Tommins — view source

unread

Hi Dik,

Opcache already performs type inference. [...]
Here is an interesting read on the subject:

https://depositonce.tu-berlin.de/bitstream/11303/7919/3/popov_etal_2017.pdf

Thanks for the link, and the insight into how much OpCache can already do.

I guess preloading gets us pretty close to the tool I was imagining -
OpCache could make assumptions that cross file boundaries, within the
preloaded set, and could spend longer optimizing during the preloading
phase than might be expected on a simple cache miss.

I think it will be interesting to see how tools adopt that feature, and
whether eventually we'll see autoloader functions as just a fallback
mechanism, with most packages being enumerated in advance as large
preloaded blocks.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Andreas Hennings — view source

unread

Hi Dik,

Opcache already performs type inference. [...]
Here is an interesting read on the subject:

https://depositonce.tu-berlin.de/bitstream/11303/7919/3/popov_etal_2017.pdf

Thanks for the link, and the insight into how much OpCache can already do.

I guess preloading gets us pretty close to the tool I was imagining -
OpCache could make assumptions that cross file boundaries, within the
preloaded set, and could spend longer optimizing during the preloading
phase than might be expected on a simple cache miss.

I think it will be interesting to see how tools adopt that feature, and
whether eventually we'll see autoloader functions as just a fallback
mechanism, with most packages being enumerated in advance as large
preloaded blocks.

What if we had a "native" autoload layer?
The native autoloader could be made to fire before userland autoloaders.
It could be based on a mapping like PSR-4, or simply a classmap.
The mappings could be defined at "compile time", or frozen early in a request.

This would allow to predict where each class is located at "compile
time" or at opcache time, allowing to do all the type checks.

An alternative would be to allow userland autoloaders to be registered
with a hash, with the promise that as long as the hash is the same,
classes remain where they are.
Or allow userland to specify "class locators" instead of autoloaders,
which could also be registered with a prediction hash.

So, the overarching idea here is to make autoloading predictable at
compile time or opcache time, and would not require an artificial
"package" concept.

As in my previous proposal, the opcache would have to store different
versions of each file, for different combinations of autoload
prediction hashes.
This would allow e.g. different applications to share some of their
PHP files without spoiling the opcache.

-- Andreas

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]