Introducing compile time code execution to PHP preloading

5 years ago by Robert Hickman — view source

unread

With PHP having recently introduced preloading, i have been thinking
about the possibility of adding a system whereby arbitrary php code
can run during this step. Essentially, this would serve the same
function as 'compile time execution' in many programming languages. It
should be noted that my thoughts below are mostly inspired by the
in-development language JAI, demos of which are included at the end of
this email.

While PHP is an interpreted language, code is first parsed which
generates an AST, and this AST is then used to generate bytecode that
is stored in opcache. With preloading, the generation of this bytecode
is done only once on server startup. Compile time code would run
during this stage as a 'shim' between parsing and bytecode generation,
allowing arbitrary modifications to the AST.

I can think of numerous examples of ways this could be advantageous.
For one, frameworks often want to store configuration data in a
database or some other external source, and accessing it every request
is needless overhead, given that data tends to never change in
production. So you could do something like the following which runs
once during preload, and caches the constant in opcache.

static_run {
$link = mysqli_connect("127.0.0.1", "my_user", "my_password", "my_db");
$res = mysqli_query ($link, 'select * from sometable');

$array = [];
while($row = mysqli_fetch_assoc($res)) {
    $array[]= $row;
}

define('CONST_ARRAY' = $array);

}

static_run being a new keyword that allows an expression to be
evaluated at compile time.

I foresee this being able to do far more than simply define constants
though. In my opinion, it should be able to allow arbitrary
modifications to the AST, and arbitrary programmatic code generation.
For example, static code could register a callback which receives the
AST of a file during import:

static_run {
on_file_load(function($file_ast){

    // Do something with the ast of the file

    return $file_ast;
});

}

As noted above, I can think of numerous things that this could do, and
as a flexible and far reaching facility, I am sure many more things
are possible that I have not considered. To give a few examples:

Choose a database interface once instead of during every request.
Check the types defined in an orm actually match the database.
Inverting the above, programmatically generate types from a database table.
Compile templating languages like twig into PHP statically,
eliminating runtime overhead
Convert syntactically pretty code into a more optimised form.
Statically generate efficient code for mapping URLs to handler functions
Validate the usage of callback systems such as wordpress 'shortcodes'.
Arbitrary code validation, such as to implement corporate
programming standards.

==== Why not a preprocessor?

While things like this can be implemented as a preprocessor, I can see
considerable advantages of implementation as a native feature of the
language itself. A big one is that it would be aware of the semantics
of the language like namespaces, and scope, which is a big downside of
rudimentary preprocessors like the one in C/C++. Implementing it into
the language runtime also eliminates the need for a build step, and
means that everyone using the language has access to the same tools.

I also think that given that these data structures already exist
during compilation to bytecode, why not just give programmers access
to them?

This concept is not that unusual and python for example, allows python
code to modify the AST of files as they are being loaded. However
directly modifying the AST won't be very user friendly. Due to this,
syntax could be created which allows the more common operations to be
done more easily. Rust has a macro system that is based on this kind
of idea, and JAI has recently introduced something comparable. While
it should be obvious from the above, i am not talking about macros in
the C sense. These should be 'hygienic macros'.

==== How it runs

On the web, compile time code is ran during preloading. When running
php code at the CLI, compile time code could just be run every time,
before run time code. Cacheing the opcodes in a file and automatically
detecting changes and recompiling this as python does, could be a
worthwhile optimisation.

==== Inspirations

The general idea with this was inspired by the in development
programming language JAI, which has full compile time execution.
Literally, the entire programming language can be run at compile time
with very few restrictions. See the following to videos for a
demonstration of what it can do:

https://www.youtube.com/watch?v=UTqZNujQOlA
https://www.youtube.com/watch?v=59lKAlb6cRg&list=PLmV5I2fxaiCKfxMBrNsU1kgKJXD3PkyxO&index=20&t=0s

There is also a programming language called 'zig' that is based on
similar ideas to JAI, and also has compile time execution. Unlike JAI
it has been released ans is available to try today. My suggested
syntax for static_run was inspired by zig.

5 years ago by Larry Garfield — view source

unread

With PHP having recently introduced preloading, i have been thinking
about the possibility of adding a system whereby arbitrary php code
can run during this step. Essentially, this would serve the same
function as 'compile time execution' in many programming languages. It
should be noted that my thoughts below are mostly inspired by the
in-development language JAI, demos of which are included at the end of
this email.

While PHP is an interpreted language, code is first parsed which
generates an AST, and this AST is then used to generate bytecode that
is stored in opcache. With preloading, the generation of this bytecode
is done only once on server startup. Compile time code would run
during this stage as a 'shim' between parsing and bytecode generation,
allowing arbitrary modifications to the AST.

I can think of numerous examples of ways this could be advantageous.
For one, frameworks often want to store configuration data in a
database or some other external source, and accessing it every request
is needless overhead, given that data tends to never change in
production. So you could do something like the following which runs
once during preload, and caches the constant in opcache.

static_run {
$link = mysqli_connect("127.0.0.1", "my_user", "my_password", "my_db");
$res = mysqli_query ($link, 'select * from sometable');
$array = [];
while($row = mysqli_fetch_assoc($res)) {
    $array[]= $row;
}

define('CONST_ARRAY' = $array);
}

static_run being a new keyword that allows an expression to be
evaluated at compile time.

I foresee this being able to do far more than simply define constants
though. In my opinion, it should be able to allow arbitrary
modifications to the AST, and arbitrary programmatic code generation.
For example, static code could register a callback which receives the
AST of a file during import:

static_run {
on_file_load(function($file_ast){
    // Do something with the ast of the file

    return $file_ast;
});
}

As noted above, I can think of numerous things that this could do, and
as a flexible and far reaching facility, I am sure many more things
are possible that I have not considered. To give a few examples:

Choose a database interface once instead of during every request.

Check the types defined in an orm actually match the database.

Inverting the above, programmatically generate types from a database table.

Compile templating languages like twig into PHP statically,
eliminating runtime overhead

Convert syntactically pretty code into a more optimised form.

Statically generate efficient code for mapping URLs to handler functions

Validate the usage of callback systems such as wordpress 'shortcodes'.

Arbitrary code validation, such as to implement corporate
programming standards.

==== Why not a preprocessor?

While things like this can be implemented as a preprocessor, I can see
considerable advantages of implementation as a native feature of the
language itself. A big one is that it would be aware of the semantics
of the language like namespaces, and scope, which is a big downside of
rudimentary preprocessors like the one in C/C++. Implementing it into
the language runtime also eliminates the need for a build step, and
means that everyone using the language has access to the same tools.

I also think that given that these data structures already exist
during compilation to bytecode, why not just give programmers access
to them?

This concept is not that unusual and python for example, allows python
code to modify the AST of files as they are being loaded. However
directly modifying the AST won't be very user friendly. Due to this,
syntax could be created which allows the more common operations to be
done more easily. Rust has a macro system that is based on this kind
of idea, and JAI has recently introduced something comparable. While
it should be obvious from the above, i am not talking about macros in
the C sense. These should be 'hygienic macros'.

==== How it runs

On the web, compile time code is ran during preloading. When running
php code at the CLI, compile time code could just be run every time,
before run time code. Cacheing the opcodes in a file and automatically
detecting changes and recompiling this as python does, could be a
worthwhile optimisation.

==== Inspirations

The general idea with this was inspired by the in development
programming language JAI, which has full compile time execution.
Literally, the entire programming language can be run at compile time
with very few restrictions. See the following to videos for a
demonstration of what it can do:

https://www.youtube.com/watch?v=UTqZNujQOlA
https://www.youtube.com/watch?v=59lKAlb6cRg&list=PLmV5I2fxaiCKfxMBrNsU1kgKJXD3PkyxO&index=20&t=0s

There is also a programming language called 'zig' that is based on
similar ideas to JAI, and also has compile time execution. Unlike JAI
it has been released ans is available to try today. My suggested
syntax for static_run was inspired by zig.

While I'd love to be able to leverage preloading to do "compile-like stuff", I have a lot of concerns with it. Most notably, not all code will be run in a preload context. Language features that only sometimes work scare me greatly.

Doing one-time optimizations in preload that make the code faster, that's great. Preload optimizations that make the code behave differently, that's extremely dangerous. It also makes development much harder. In part because it means you have to consider two kinds of users (preloaded and not), but also because the most important "not preloaded" user is yourself, during development. "I changed one character and now I have to restart my webserver to see if it did anything" is a bad place for PHP to be.

So while I'm very open to engine additions to do known-quantity preload-only optimizations (eg, could generics be implemented in a way that is moderately performant normally, but full speed in preload, with the same behavior?), I am highly skeptical about allowing arbitrary preload/compile time behavior as it makes development harder and bifurcates the ecosystem.

To your specific examples, many are already possible today. Code generation in a pre-execute build step is increasingly common; the Symfony ecosystem does a ton of it, I've implemented a compiled version of a PSR-14 Event Dispatcher, etc.

The ones we cannot do today are those that require DB or other service access; that's frequently not available in a PaaS environment, where you're doing your build before you connect a container to an environment with services. Once you are in a live environment you want a read-only file system, for security and auditability. Code generation at that point is then impossible. Moving that code gen to a preloader wouldn't help with that.

I appreciate the intent here, but in practice I'd much rather we limit preload optimization to things the engine can do for us, and reliably know that it can do so without changing behavior. There's actually quite a lot that could be done there, if we were able to give the engine the information to do so. For example, there's a ton of optimizations that can be done that rely on working with pure functions only, but the engine today cannot know if a function is pure. (Or I don't think it's able to figure it out for itself, anyway.) I'd be fully in favor of ways that we could indicate to the engine "this is safe to do more computer-science-y optimizations on, do your thing", and then implementing those optimizations in the engine rather than in user space.

--Larry Garfield

5 years ago by Robert Hickman — view source

unread

Once you are in a live environment you want a read-only file system, for security and auditability. Code generation at that point is then impossible. Moving that code gen to a preloader wouldn't help with that.

My proposal has no impact on the file system. It would modify the AST,
and thus would be entirely in RAM as opcache is also in ram. No source
files involved at all.

5 years ago by Mike Schinkel — view source

unread

Most notably, not all code will be run in a preload context.

Can you give some concrete examples here?

Language features that only sometimes work scare me greatly.

Do you have some examples of language features, from PHP or another language, that only work sometimes and that are known to be problematic. and why they are problematic?

Doing one-time optimizations in preload that make the code faster, that's great.

Though I think this proposal may need to be fine-tuned, I can envision many frameworks and CMSes written in PHP could improve both performance, robustness and user-experience using preloading.

One of the ways most useful would be to run code that ensures the framework/CMS APIs are being used correctly. If this code is included today in frameworks and CMSes, it must run for every page load (on the web) when it could be run once when OpCode is generated. This could potentially improve performance significantly, depending on how much checking it implemented.

It could also improve performance of building data-driven structures at runtime. I know that in the past I have had data driven structures that were definitely very time-consuming on each page load. The WordPress admin does tons of it.

Preload optimizations that make the code behave differently, that's extremely dangerous.

Can you give some concrete examples where you fear this could happen?

"I changed one character and now I have to restart my webserver to see if it did anything" is a bad place for PHP to be.

As I envision it preloaded code of this nature would not be handled on server reboot, but when the files have had their time stamps updated. If I am not mistaken, PHP already does this (but I could be mistaken as I don't have expertise in PHP OpCodes.)

Whatever the case I think this could easily be handled with a simple API call to flush preloaded code which for debugging could be one of the first things a developer would call in their codebase.

I am highly skeptical about allowing arbitrary preload/compile time behavior as it makes development harder and bifurcates the ecosystem.

Given the copious performance and robustness benefits that preloading could provide, I would think we should try and identify specific concrete concerns rather than allow unidentified concerns from blocking a potentially great improvement to the language.

So what specific concrete concerns can we identify?

To your specific examples, many are already possible today. Code generation in a pre-execute build step is increasingly common; the Symfony ecosystem does a ton of it, I've implemented a compiled version of a PSR-14 Event Dispatcher, etc.

Am I understanding correctly that requires a build process, and not something that a PHP developer can depend upon having available on any hosted PHP server?

Code generation at that point is then impossible. Moving that code gen to a preloader wouldn't help with that.

As Robert stated, he is not proposing any code generation.

His preloading concept would modify classes by manipulating the AST, which, IMO would require an additional API. And I do think it is probably orthogonal to the idea of preloading code although I do think it would also have great benefit too, but that preloading is probably a prerequisite.

I appreciate the intent here, but in practice I'd much rather we limit preload optimization to things the engine can do for us, and reliably know that it can do so without changing behavior.

Limiting in that manner would effectively eliminate the possibility of serendipity that can occur when userland developers are empowered vs only those who can sufficient agreement to add features to PHP. IOW, tiny subset of problems could be solved if we limit vs. the number of problems developers could solve for themselves and offer to the open-source to the community if userland developers are given more control over preloading.

For example, there's a ton of optimizations that can be done that rely on working with pure functions only, but the engine today cannot know if a function is pure. (Or I don't think it's able to figure it out for itself, anyway.) I'd be fully in favor of ways that we could indicate to the engine "this is safe to do more computer-science-y optimizations on, do your thing", and then implementing those optimizations in the engine rather than in user space.

There are more innovations that can occur in computer science than just those that depend on pure functions. Why must we limit ourselves to only consider problems that can be solved with pure functions?

There are a lot of details we would need to work through to have a viable proposal for userland preloading (vs. preloading a sysadmin can control), but I assert we'd be better off optimisitically exploring the concept instead of prematurely stifling exploration.

-Mike

5 years ago by Robert Hickman — view source

unread

I would say that my proposal is more about compile-time meta
programming, and thus would not actually depend on preloading. It
could also be ran during page requests and would be cached by opcache
in the same way. However running it in that way could make the initial
request before the opcodes are cached much slower. Hence why combining
it with preloading would be advantageous.

Most notably, not all code will be run in a preload context.

Can you give some concrete examples here?

Language features that only sometimes work scare me greatly.

Do you have some examples of language features, from PHP or another language, that only work sometimes and that are known to be problematic. and why they are problematic?

Doing one-time optimizations in preload that make the code faster, that's great.

Though I think this proposal may need to be fine-tuned, I can envision many frameworks and CMSes written in PHP could improve both performance, robustness and user-experience using preloading.

One of the ways most useful would be to run code that ensures the framework/CMS APIs are being used correctly. If this code is included today in frameworks and CMSes, it must run for every page load (on the web) when it could be run once when OpCode is generated. This could potentially improve performance significantly, depending on how much checking it implemented.

It could also improve performance of building data-driven structures at runtime. I know that in the past I have had data driven structures that were definitely very time-consuming on each page load. The WordPress admin does tons of it.

Preload optimizations that make the code behave differently, that's extremely dangerous.

Can you give some concrete examples where you fear this could happen?

"I changed one character and now I have to restart my webserver to see if it did anything" is a bad place for PHP to be.

As I envision it preloaded code of this nature would not be handled on server reboot, but when the files have had their time stamps updated. If I am not mistaken, PHP already does this (but I could be mistaken as I don't have expertise in PHP OpCodes.)

Whatever the case I think this could easily be handled with a simple API call to flush preloaded code which for debugging could be one of the first things a developer would call in their codebase.

I am highly skeptical about allowing arbitrary preload/compile time behavior as it makes development harder and bifurcates the ecosystem.

Given the copious performance and robustness benefits that preloading could provide, I would think we should try and identify specific concrete concerns rather than allow unidentified concerns from blocking a potentially great improvement to the language.

So what specific concrete concerns can we identify?

To your specific examples, many are already possible today. Code generation in a pre-execute build step is increasingly common; the Symfony ecosystem does a ton of it, I've implemented a compiled version of a PSR-14 Event Dispatcher, etc.

Am I understanding correctly that requires a build process, and not something that a PHP developer can depend upon having available on any hosted PHP server?

Code generation at that point is then impossible. Moving that code gen to a preloader wouldn't help with that.

As Robert stated, he is not proposing any code generation.

His preloading concept would modify classes by manipulating the AST, which, IMO would require an additional API. And I do think it is probably orthogonal to the idea of preloading code although I do think it would also have great benefit too, but that preloading is probably a prerequisite.

I appreciate the intent here, but in practice I'd much rather we limit preload optimization to things the engine can do for us, and reliably know that it can do so without changing behavior.

Limiting in that manner would effectively eliminate the possibility of serendipity that can occur when userland developers are empowered vs only those who can sufficient agreement to add features to PHP. IOW, tiny subset of problems could be solved if we limit vs. the number of problems developers could solve for themselves and offer to the open-source to the community if userland developers are given more control over preloading.

For example, there's a ton of optimizations that can be done that rely on working with pure functions only, but the engine today cannot know if a function is pure. (Or I don't think it's able to figure it out for itself, anyway.) I'd be fully in favor of ways that we could indicate to the engine "this is safe to do more computer-science-y optimizations on, do your thing", and then implementing those optimizations in the engine rather than in user space.

There are more innovations that can occur in computer science than just those that depend on pure functions. Why must we limit ourselves to only consider problems that can be solved with pure functions?

There are a lot of details we would need to work through to have a viable proposal for userland preloading (vs. preloading a sysadmin can control), but I assert we'd be better off optimisitically exploring the concept instead of prematurely stifling exploration.

-Mike

5 years ago by Larry Garfield — view source

unread

Most notably, not all code will be run in a preload context.

Can you give some concrete examples here?

Language features that only sometimes work scare me greatly.

Do you have some examples of language features, from PHP or another
language, that only work sometimes and that are known to be
problematic. and why they are problematic?

To use the example from the OP:

static_run {
$link = mysqli_connect("127.0.0.1", "my_user", "my_password", "my_db");
$res = mysqli_query ($link, 'select * from sometable');

$array = [];
while($row = mysqli_fetch_assoc($res)) {
    $array[]= $row;
}

define('CONST_ARRAY' = $array);

}

I can see the use of that, sure. Now, what happens when the code is not preloaded? Does that block not get run, and thus CONST_ARRAY is not defined? Does it run on all requests if not preloaded? How does that interact with a file that gets read multiple times?

What happens if the code does more than set a constant? Can it define new functions? What happens to those functions in a non-preload situation?

To use the other example:

static_run {
on_file_load(function($file_ast){

    // Do something with the ast of the file

    return $file_ast;
});

}

AST manipulation from user-space opens up a lot of possibilities for optimization. However, it's also a huge foot-gun. When you start messing with the AST I can't imagine it's hard to end up introducing subtle behavioral changes without intending to. Or, maybe you are intending to. So then what happens if the code runs in a context when that doesn't happen? Does the AST then get re-manipulated on every request instead? What's the performance impact of that? Net negative?

I don't have answers to these questions. It's possible that we could come up with a set of answers that would address the core issue, but I am skeptical.

My core point here is that I am fully in favor of leveraging preloading to improve performance, BUT ensuring that there is zero behavioral difference between preloaded and non-preloaded code, only performance differences, is paramount, and IMO is more important than any flexibility, power, or performance benefits it could offer. We should consider exposing that to user space only if we can be pretty damned sure that it's not going to introduce weird-and-subtle behavioral bugs that end up making preloaded and non-preloaded code behave differently.

As an example, preloading seems like a great place to do something like tail recursion flattening. That's a logically safe thing to do, as long as the call is properly tail-recursive, and would make writing tail-recursive algorithms more practical. (They're often easier to read and maintain but performance makes them less practical.)

However! Doing so means the preloaded version doesn't have an issue with blowing out the stack. The non-preload version does. That means the non-preload version has a built in limit on how long of a list it can operate on (100 by default, minus however many stack calls have already been made) while the preloaded version doesn't. That can have ugly implications if you're running code that was working in preload in a non-preload context, and suddenly your 105 element array is causing a fatal error when it didn't before.

That's the sort of subtlety that, frankly, I am a lot more confident in Engine developers remembering to think about than user-land developers. Myself included. Not because they're less capable developers but because 99% of the time PHP doesn't force you to think about such questions, so most developers won't think to think about them. And 99% of the time that's a good thing. This is the other 1%. :-)

What I very much want to avoid, for as long as possible at least, is "this library only works if preloaded" type situations. That's how we end up with a division in the language; not just between people who own their own servers and those that don't, but it ties the hands of admins and framework authors in deciding what to preload. What a "good" preload strategy is depends on context, and we've only had a month or two experience with it to even know what to recommend to people.

And that's in addition to the development challenges of developing such code in the first place:

"I changed one character and now I have to restart my webserver to see if it did anything" is a bad place for PHP to be.

As I envision it preloaded code of this nature would not be handled on
server reboot, but when the files have had their time stamps updated.
If I am not mistaken, PHP already does this (but I could be mistaken as
I don't have expertise in PHP OpCodes.)

The opcache does that, yes. The preloader, however, is a one-shot deal and requires restarting FPM to have it re-run.

Thinking about it, I suspect there would be far more benefit in practice not from allowing AST manipulation but being able to "Checkpoint" a running script; that is, allow it to not just pre-load code (which we can do now in 7.4) but set up variables that are already initialized from one request to the next. I'm thinking here of things like bootstrapping a dependency injection container, declaring closed functions, and other semi-global stuff that right now makes a PHP application's bootstrap process more expensive than most other languages. (In the area of milliseconds, sure, but still slower.) Allowing that sort of execution to happen once and get persisted would reduce the need to do all the precompiling and such that many frameworks do today, at the cost of a great deal of complexity.

That may be as much of a pipedream to do safely, I don't know, but in practice that seems like a more promising direction for userland developers to leverage themselves.

--Larry Garfield

5 years ago by Mike Schinkel — view source

unread

Most notably, not all code will be run in a preload context.

Can you give some concrete examples here?

Language features that only sometimes work scare me greatly.

Do you have some examples of language features, from PHP or another
language, that only work sometimes and that are known to be
problematic. and why they are problematic?

To use the example from the OP:
<snip>
I can see the use of that, sure. Now, what happens when the code is not preloaded? Does that block not get run, and thus CONST_ARRAY is not defined? Does it run on all requests if not preloaded? How does that interact with a file that gets read multiple times?
<snip>

Thanks so much for going into such detail. It really helped me understand your concerns.

I have been planned to propose an alternate to static_run because it did not seem to me to be an ideal solution. And my proposal would not have included modifying the AST per se, even though I think that could be beneficial. Instead it would just be focused on userland preloading (vs. system admin preloading.)
IOW a preload feature that does not require restarting FPM but could instead be controlled by the userland developer.

But before I introduce that I want to better understand why you think having things that can only run during preload and having things that can only run during runtime would be a bad thing. To me it seems like a benefit. Preload vs. runtime are two different contexts. Having different behavior for different contexts only makes sense IMO. Look at CLI vs. FPM contexts:

https://stackoverflow.com/a/25653068/102699

I get your concern about recursion, but if that kind of issue is really a concern I don't see why we could not artificially limit recursion on preload to a configurable amount, with 100 being the default?

Again, if we can I hope we can focus on concrete problems this would cause that we might be able to solve vs limiting to only abstract concerns.

Separately I think it would make sense to provide a set of APIs to do the most commonly needed things that people might want to modify the AST for.

One example that I could envision that would make sense for preload but probably not for runtime is an extension of the Reflection API that would allow code to define classes, interfaces, traits, functions, etc. Here is something I mocked up that I called "Projection" to illustrate that concept:

https://gist.github.com/mikeschinkel/e07eb14a34ce83a96198744e18b0c961

I do completely see your points about the AST and most PHP developers.

The unfortunate problem with PHP is there is no way to use 3rd party code to extend PHP w/o being able to reconfigure the server by adding extensions. It would be nice if we could come up with a way for 3rd party developers writing in C or another language — people who are more likely to test their code before distributing it — to extend PHP such as by generating OpCode files that a userland developer could load.

Or maybe we implement ability to load Web Assembly on the server? If that were possible I could write PHP extensions in Go. :-)

I think these discussions should be on a different thread though. I think we have 3 different topics here:

Userland preloading
Projection API and related
Userland loadable extensions

-Mike

5 years ago by Robert Hickman — view source

unread

Hi folks, i think that we are getting a little confused here due to using
the term 'preloading' for different things. As i have noted previously, my
initial proposal of compile time execution would not depend on php's
preloading feature, but could work, with opcache in the usual sense, or
even without opcache, with a probably notable performance impact.

The proposal is really to split the execution flow into two stages, one
that happens once during compilation, and one that happens n times during
subsiquent requests. This actually has no dependency on preloading and is
orthoganal to it.

I don't see any real issue with allowing ast manipulation and compile time
metaprogramming myself, as it would just be making it easier to do things
that people are already doing with preprosessors of various kinds.

I tend towards the attitude john blow has with JAI: giving tools to
experianced developers, and not creating ristrictions for novices, who will
always be able to find ways of breaking something. Metaprogramming could be
a foot gun, yet it could also be really useful if someone knows what they
are doing.

I have been talking about ideas with Mike off-list, and we both think that
if a low-level API were exposed to allow this, most of the functionality
could be implemented in PHP code. By making a low level api, it would
probably put off biginners from using it directly, and they could use
higher level API's providing more restricted functionality.

Python has the ability to modify its own ast in python code, and i have
never personally seen any horror stories of biginners getting into trouble
with it. The API is low level and requires a lot of internal knowlage to
use.

With regards to not being able to use custom php extensions without being
able to edit the server configuration. In my opinion the best approach is
to get away from running the language in a 'traditional' virtual hosting
setup. I have no desire to write code that would need to run in such a
limiting environment.

I appoligise for any spelling errors as I am dsylexic, don't have access to
my desktop right now, and the spell checker on android is awful.

5 years ago by Larry Garfield — view source

unread

Thanks so much for going into such detail. It really helped me
understand your concerns.

I have been planned to propose an alternate to static_run because it
did not seem to me to be an ideal solution. And my proposal would not
have included modifying the AST per se, even though I think that could
be beneficial. Instead it would just be focused on userland preloading
(vs. system admin preloading.)
IOW a preload feature that does not require restarting FPM but could
instead be controlled by the userland developer.

I defer to Dmitry on whether such a thing is reasonable. My engine knowledge is far too paltry to comment.

But before I introduce that I want to better understand why you think
having things that can only run during preload and having things that
can only run during runtime would be a bad thing. To me it seems like
a benefit. Preload vs. runtime are two different contexts. Having
different behavior for different contexts only makes sense IMO. Look
at CLI vs. FPM contexts:

https://stackoverflow.com/a/25653068/102699

That's slightly different, as code will still execute in both contexts. And most of those differences are related to how PHP interacts with the outside world, in two different definitions of "outside world".

To illustrate my point, though... Well, that explains why my last attempt to use STDOUT and STDERR in a non-CLI context failed without explanation! TIL...

I get your concern about recursion, but if that kind of issue is really
a concern I don't see why we could not artificially limit recursion on
preload to a configurable amount, with 100 being the default?

It's not recursion itself that's an issue. It's that the behavior of the code changes between preload and non-preload contexts in subtle, non-obvious ways.

I don't mean a recursive function that runs during preload. I mean if a function that runs in a normal request is written recursively, it will fail at high recursion levels only if the file it is in was not preloaded. So whether or not foo.php was run through opcache_compile_file() changes when and how code in that file will fail. That's the situation we want to avoid like the plague, IMO.

Separately I think it would make sense to provide a set of APIs to do
the most commonly needed things that people might want to modify the
AST for.

One example that I could envision that would make sense for preload but
probably not for runtime is an extension of the Reflection API that
would allow code to define classes, interfaces, traits, functions, etc.
Here is something I mocked up that I called "Projection" to illustrate
that concept:

https://gist.github.com/mikeschinkel/e07eb14a34ce83a96198744e18b0c961

I've seen similar concepts for code-generation libraries. I've even written a very basic one at one point. That seems orthogonal to whether the end result is writing the generated code to disk or compiling it directly into memory. If we do make runtime code generation a first class citizen then some sort of API like that would be a good idea (as the alternative is to build a string and run eval()).

I do completely see your points about the AST and most PHP developers.

The unfortunate problem with PHP is there is no way to use 3rd party
code to extend PHP w/o being able to reconfigure the server by adding
extensions. It would be nice if we could come up with a way for 3rd
party developers writing in C or another language — people who are more
likely to test their code before distributing it — to extend PHP such
as by generating OpCode files that a userland developer could load.

That... sounds a lot like the new FFI extension? Write code in C or Rust or anything that can produce a .so file, plug it into PHP, go?

There's a whole lot of caveats on that, of course. I'm working on a blog post series on that very topic at work, so stay tuned if you're interested... :-)

--Larry Garfield

5 years ago by Mike Schinkel — view source

unread

I get your concern about recursion, but if that kind of issue is really
a concern I don't see why we could not artificially limit recursion on
preload to a configurable amount, with 100 being the default?

It's not recursion itself that's an issue. It's that the behavior of the code changes between preload and non-preload contexts in subtle, non-obvious ways.

I don't mean a recursive function that runs during preload. I mean if a function that runs in a normal request is written recursively, it will fail at high recursion levels only if the file it is in was not preloaded. So whether or not foo.php was run through opcache_compile_file() changes when and how code in that file will fail. That's the situation we want to avoid like the plague, IMO.

With respect, that feels like you are concerned about some real edge cases, and ignoring the probability that most code run during preloading would be written bespoke for preloading.

ALSO, I think I may have caused confusion by calling it "preloading." In my mind that meant "run once" vs. "run for each page load" but I think you are (rightly) conflating what I have called "preloading" with the new preloading feature in PHP 7.4?

What I have been envisioning is a preload (or some other) keyword that we could apply to functions, methods, and expressions. When preload modified functions/methods/expressions are reached during OpCode generation, rather than generate a call to runtime code PHP would just run the code and then take the return value and generate OpCode for that return value.

class Foo {
preload function bar() {
return < a_really_time_consuming_expression>;
}
}
$foo = new Foo();
echo $foo->bar(); // This would not actually execute anything at runtime.
// It would simply echo the constant value that was
// returned by <a_really_time_consuming_expression>.

I could be wrong but I find it hard to believe the runtime profile would be so different in this context as to create problems of the nature you are fearing.

Maybe someone like Dimitry who understand the OpCode generation process could weigh in?

I do completely see your points about the AST and most PHP developers.

The unfortunate problem with PHP is there is no way to use 3rd party
code to extend PHP w/o being able to reconfigure the server by adding
extensions. It would be nice if we could come up with a way for 3rd
party developers writing in C or another language — people who are more
likely to test their code before distributing it — to extend PHP such
as by generating OpCode files that a userland developer could load.

That... sounds a lot like the new FFI extension? Write code in C or Rust or anything that can produce a .so file, plug it into PHP, go?

Unless I am misunderstanding, using an .so requires a sysadmin to add an extension and update php.ini. Which is a non-starter on hosted sites like Pantheon, WPEngine, Kinsta, Pagely, Flywheel, PressLabs and similar.

What I was instead calling for what a way to extend PHP that a userland developer with no access to php.ini would be able to achieve.

Interestingly, after my above comment I googled and came across this PHP extension that can load a compiled Web Assembly (.wasm) file. Something like this in core would be very useful:

https://github.com/wasmerio/php-ext-wasm

-Mike

5 years ago by Larry Garfield — view source

unread

I get your concern about recursion, but if that kind of issue is really
a concern I don't see why we could not artificially limit recursion on
preload to a configurable amount, with 100 being the default?

It's not recursion itself that's an issue. It's that the behavior of the code changes between preload and non-preload contexts in subtle, non-obvious ways.

I don't mean a recursive function that runs during preload. I mean if a function that runs in a normal request is written recursively, it will fail at high recursion levels only if the file it is in was not preloaded. So whether or not foo.php was run through opcache_compile_file() changes when and how code in that file will fail. That's the situation we want to avoid like the plague, IMO.

With respect, that feels like you are concerned about some real edge
cases, and ignoring the probability that most code run during
preloading would be written bespoke for preloading.

I think you're still missing my point here. AST manipulation during preload would exist to manipulate code that would be run later. That's the point. Sure, the code in the run_once/preload/whatever block would be written to run in that context, but its entire point is, presumably, to improve the running of the rest of the code later.

That's where the footgun exists. The recursion example was simply to demonstrate that even seemingly "safe" optimizations may introduce inconsistency, so I don't expect most userland developers to be able to get it right the first 5 times. (And again, I include myself in that.)

ALSO, I think I may have caused confusion by calling it "preloading."
In my mind that meant "run once" vs. "run for each page load" but I
think you are (rightly) conflating what I have called "preloading" with
the new preloading feature in PHP 7.4?

What I have been envisioning is a preload (or some other) keyword
that we could apply to functions, methods, and expressions. When
preload modified functions/methods/expressions are reached during
OpCode generation, rather than generate a call to runtime code PHP
would just run the code and then take the return value and generate
OpCode for that return value.

That... is a completely different thing, yes, that has nothing to do with the PHP 7.4 feature called preloading. Which would mean we've been talking about two different things this whole damned time. sigh

That... sounds a lot like the new FFI extension? Write code in C or Rust or anything that can produce a .so file, plug it into PHP, go?

Unless I am misunderstanding, using an .so requires a sysadmin to add
an extension and update php.ini. Which is a non-starter on hosted sites
like Pantheon, WPEngine, Kinsta, Pagely, Flywheel, PressLabs and
similar.

No, the whole point of the FFI extension is that a user-land developer can load a .so file into PHP directly; basically doing the lib->PHP glue code in PHP rather than in C. It does require the FFI extension itself to be enabled, and to do safely requires using the preloader, which does mean setting an ini setting.

Not all hosts support that. The one I work for does, which is why I've been playing with it lately to document it all for our customers. :-) But the fact that such advanced features are not universally available is... exactly why I am extremely cautious about having functionality that results in subtle behavioral differences between those environments where it's available and those where it's not.

The existing preload logic has no impact on behavior other than skipping the autoloader. Any behavioral change is limited to those oddball libraries that do black magic during the autoloader, which are already well off the beaten path. That makes it a "safe" optimization.

My underlying point, and then I will bow out of this thread as I am tired of repeating myself, is that I am all for enabling more "safe" optimizations, even letting userland developers build them, but we need to be extremely careful about them being "safe" and not "a time bomb that will introduce subtle behavioral differences between different run modes that a developer is not expecting, thus slowly creating libraries that will only function in one run mode or the other".

If we can enable more optimizations while avoiding that trap, I'm all for it. But avoiding that trap is more important than giving developers more foot-guns.

--Larry Garfield

5 years ago by Mike Schinkel — view source

unread

I think you're still missing my point here. AST manipulation during preload would exist to manipulate code that would be run later. That's the point. Sure, the code in the run_once/preload/whatever block would be written to run in that context, but its entire point is, presumably, to improve the running of the rest of the code later.

And respectfully, I think I am not missing your point but that possibly you are misunderstand what I was proposing.

I was not proposing any AST manipulation, at least not for what I was defining as "Userland Preloading." Unless you define AST manipulation as using PHP to generate code to insert literals into OpCode, but if so I think that would be disingenuous.)

If you remember I said this thread had gotten to the point I am seeing these separate issues that should probably be three separate threads, which I will repeat here:

Userland preloading
Userland loadable extensions
Projection API and related

None of the above manipulate the AST directly, although #3 would do so indirectly. (There is an argument to be made that we could split #3 into higher level APIs and lower level AST manipulation which, IMO, should be two different debates, for a total of at least 4 different debates here.)

So again, when I am proposing userland preloading I am not as part of that proposal proposing AST manipulation, although I believe Robert Hickman may have been and maybe I should start a different thread?

That is why I find it hard to believe that PHP would generate code that would fail when PHP runs that code.

That... is a completely different thing, yes, that has nothing to do with the PHP 7.4 feature called preloading. Which would mean we've been talking about two different things this whole damned time. sigh

Yes. And I will take the blame for that. It did not occur to me until Robert mentioned to me that it was confusing.

No, the whole point of the FFI extension is that a user-land developer can load a .so file into PHP directly; basically doing the lib->PHP glue code in PHP rather than in C. It does require the FFI extension itself to be enabled, and to do safely requires using the preloader, which does mean setting an ini setting.

Interesting. So then maybe what I an asking for is to bundle FFI into core so that it is available everywhere.

Not all hosts support that. The one I work for does, which is why I've been playing with it lately to document it all for our customers. :-) But the fact that such advanced features are not universally available is... exactly why I am extremely cautious about having functionality that results in subtle behavioral differences between those environments where it's available and those where it's not.

Another interesting option would be to enable loading and running of WebAssembly in core. Do you have a position on that?

The existing preload logic has no impact on behavior other than skipping the autoloader. Any behavioral change is limited to those oddball libraries that do black magic during the autoloader, which are already well off the beaten path. That makes it a "safe" optimization.

So basically, I think this is what I have been asking for, but userland accessible.

I would envision that if it modified any environment, such as setting a preloaded or error levels etc. then all that environment would reset during normal execution.

My underlying point, and then I will bow out of this thread as I am tired of repeating myself, is that I am all for enabling more "safe" optimizations, even letting userland developers build them, but we need to be extremely careful about them being "safe" and not "a time bomb that will introduce subtle behavioral differences between different run modes that a developer is not expecting,thus slowly creating libraries that will only function in one run mode or the other".

I appreciate and agree with the desire not to have unsafe issues. But as I have felt we were talking about two different things and you confirmed that, maybe what I have actually been proposing is not something we need to be at odds about?

-Mike

5 years ago by Robert Hickman — view source

unread

With regards to allowing AST introspection, john blow just posted a
video on JAI which slows why it is useful. Around 11 mins into the
following video he introduces a few lines into the metaprogram (the
thing that interacts with the AST) which essentially 'queries' the
code and reports all occurrences of pointer math. I can think of many
uses for being able to query a program's source like this.

https://www.youtube.com/watch?v=0mQbBayzDPI

I think you're still missing my point here. AST manipulation during preload would exist to manipulate code that would be run later. That's the point. Sure, the code in the run_once/preload/whatever block would be written to run in that context, but its entire point is, presumably, to improve the running of the rest of the code later.

And respectfully, I think I am not missing your point but that possibly you are misunderstand what I was proposing.

I was not proposing any AST manipulation, at least not for what I was defining as "Userland Preloading." Unless you define AST manipulation as using PHP to generate code to insert literals into OpCode, but if so I think that would be disingenuous.)

If you remember I said this thread had gotten to the point I am seeing these separate issues that should probably be three separate threads, which I will repeat here:

Userland preloading

Userland loadable extensions

Projection API and related

None of the above manipulate the AST directly, although #3 would do so indirectly. (There is an argument to be made that we could split #3 into higher level APIs and lower level AST manipulation which, IMO, should be two different debates, for a total of at least 4 different debates here.)

So again, when I am proposing userland preloading I am not as part of that proposal proposing AST manipulation, although I believe Robert Hickman may have been and maybe I should start a different thread?

That is why I find it hard to believe that PHP would generate code that would fail when PHP runs that code.

That... is a completely different thing, yes, that has nothing to do with the PHP 7.4 feature called preloading. Which would mean we've been talking about two different things this whole damned time. sigh

Yes. And I will take the blame for that. It did not occur to me until Robert mentioned to me that it was confusing.

No, the whole point of the FFI extension is that a user-land developer can load a .so file into PHP directly; basically doing the lib->PHP glue code in PHP rather than in C. It does require the FFI extension itself to be enabled, and to do safely requires using the preloader, which does mean setting an ini setting.

Interesting. So then maybe what I an asking for is to bundle FFI into core so that it is available everywhere.

Not all hosts support that. The one I work for does, which is why I've been playing with it lately to document it all for our customers. :-) But the fact that such advanced features are not universally available is... exactly why I am extremely cautious about having functionality that results in subtle behavioral differences between those environments where it's available and those where it's not.

Another interesting option would be to enable loading and running of WebAssembly in core. Do you have a position on that?

The existing preload logic has no impact on behavior other than skipping the autoloader. Any behavioral change is limited to those oddball libraries that do black magic during the autoloader, which are already well off the beaten path. That makes it a "safe" optimization.

So basically, I think this is what I have been asking for, but userland accessible.

I would envision that if it modified any environment, such as setting a preloaded or error levels etc. then all that environment would reset during normal execution.

My underlying point, and then I will bow out of this thread as I am tired of repeating myself, is that I am all for enabling more "safe" optimizations, even letting userland developers build them, but we need to be extremely careful about them being "safe" and not "a time bomb that will introduce subtle behavioral differences between different run modes that a developer is not expecting,thus slowly creating libraries that will only function in one run mode or the other".

I appreciate and agree with the desire not to have unsafe issues. But as I have felt we were talking about two different things and you confirmed that, maybe what I have actually been proposing is not something we need to be at odds about?

-Mike