Hello internals,
Was there a previous discussion about the pros/cons of adding only the
syntax needed for generics, but not the functionality? So static
analyzers could use it, instead of docblocks. I looked at externals.io
but couldn't find anything specific.
Regards
Olle
Hi, yes there was, back in 2020: https://externals.io/message/111875
- Benjamin
Hello internals,
Was there a previous discussion about the pros/cons of adding only the
syntax needed for generics, but not the functionality? So static
analyzers could use it, instead of docblocks. I looked at externals.io
but couldn't find anything specific.Regards
Olle--
To unsubscribe, visit: https://www.php.net/unsub.php
On Mon, Oct 16, 2023 at 5:08 PM Olle Härstedt olleharstedt@gmail.com
wrote:
Hello internals,
Was there a previous discussion about the pros/cons of adding only the
syntax needed for generics, but not the functionality? So static
analyzers could use it, instead of docblocks. I looked at externals.io
but couldn't find anything specific.
Hey. There was a somewhat recent discussion on GH, but without experimental
features system, it's unlikely it would pass:
https://github.com/PHPGenerics/php-generics-rfc/issues/49
Hello internals,
Was there a previous discussion about the pros/cons of adding only the
syntax needed for generics, but not the functionality? So static
analyzers could use it, instead of docblocks. I looked at externals.io
but couldn't find anything specific.
Hi Olle,
Since I haven't seen it expressed quite this way in the previous
discussion, I'd like to highlight what I think is a major downside to
this approach, at least as commonly proposed:
Using the same syntax for type information that is guaranteed to be true
(existing run-time checks) and type information that is "advisory only"
(new checks for optional static analysis) means users can no longer have
confidence in that type information.
This is one of the interesting things about the compromise over scalar
types - if you see a declaration "function foo(int $bar) { ... }", you
know that $bar will be an int at the start of every invocation of that
function, regardless of which mode the calling code uses. I think adding
exceptions to that certainty would be a bad direction for the language.
On the other hand, I can see a "third way": if the problem with current
static analysis conventions is that they have to be parsed out of a
string-based docblock, we can provide dedicated syntax, without
unifying it with the standard type syntax. For instance, some of the
earlier discussions around introducing attributes suggested reflection
expose the AST of the attributes arguments, rather than the resolved
expressions, allowing them to act a bit like Rust's "hygienic macros".
If that was added as an optional mode, you might be able to do something
like this:
#[RawAttribute]
class GenericType {
public function __construct(AST\Node $typeInfo) { ... }
}
function foo(#[GenericType(int|float)] array $foo) {
// array is the type guaranteed by the language
// static analysis libraries can get the GenericType attribute from
reflection and receive an AST representing the type constraint int|float
}
The actual attributes could either be built-in, making them official
parts of the language, or managed in a library that static analysers
co-operate on, making them standardised but more agile.
Regards,
--
Rowan Tommins
[IMSoP]
2023-10-17 21:39 GMT+02:00, Rowan Tommins rowan.collins@gmail.com:
Hello internals,
Was there a previous discussion about the pros/cons of adding only the
syntax needed for generics, but not the functionality? So static
analyzers could use it, instead of docblocks. I looked at externals.io
but couldn't find anything specific.Hi Olle,
Since I haven't seen it expressed quite this way in the previous
discussion, I'd like to highlight what I think is a major downside to
this approach, at least as commonly proposed:Using the same syntax for type information that is guaranteed to be true
(existing run-time checks) and type information that is "advisory only"
(new checks for optional static analysis) means users can no longer have
confidence in that type information.This is one of the interesting things about the compromise over scalar
types - if you see a declaration "function foo(int $bar) { ... }", you
know that $bar will be an int at the start of every invocation of that
function, regardless of which mode the calling code uses. I think adding
exceptions to that certainty would be a bad direction for the language.On the other hand, I can see a "third way": if the problem with current
static analysis conventions is that they have to be parsed out of a
string-based docblock, we can provide dedicated syntax, without
unifying it with the standard type syntax. For instance, some of the
earlier discussions around introducing attributes suggested reflection
expose the AST of the attributes arguments, rather than the resolved
expressions, allowing them to act a bit like Rust's "hygienic macros".
If that was added as an optional mode, you might be able to do something
like this:#[RawAttribute]
class GenericType {
public function __construct(AST\Node $typeInfo) { ... }
}function foo(#[GenericType(int|float)] array $foo) {
// array is the type guaranteed by the language
// static analysis libraries can get the GenericType attribute from
reflection and receive an AST representing the type constraint int|float
}
Not sure readability is improved here compared to existing @template
annotations. ;)
Olle
Not sure readability is improved here compared to existing @template
annotations. ;)
That's because readability isn't the problem I was suggesting it would solve.
As with attributes in general, the aim would be to have the core language validate the syntax, but libraries supply the semantics.
Such a feature would also allow the set of attributes to be standardised in core - although I don't think I agree that "standardised on php-internals" is automatically better than "standardised by a forum of people who write static analysers".
I do take the point that if we ever did solve the problems of full implementation, e.g. by shipping a mandatory static analyser (aka compile-time checks), we'd have to change syntax again. For me, that does not outweigh the cost of "type declarations can no longer be trusted".
Regards,
--
Rowan Tommins
[IMSoP]
On Wed, Oct 18, 2023 at 4:14 AM Olle Härstedt olleharstedt@gmail.com
wrote:
2023-10-17 21:39 GMT+02:00, Rowan Tommins rowan.collins@gmail.com:
Hello internals,
Was there a previous discussion about the pros/cons of adding only the
syntax needed for generics, but not the functionality? So static
analyzers could use it, instead of docblocks. I looked at externals.io
but couldn't find anything specific.Hi Olle,
Since I haven't seen it expressed quite this way in the previous
discussion, I'd like to highlight what I think is a major downside to
this approach, at least as commonly proposed:Using the same syntax for type information that is guaranteed to be true
(existing run-time checks) and type information that is "advisory only"
(new checks for optional static analysis) means users can no longer have
confidence in that type information.This is one of the interesting things about the compromise over scalar
types - if you see a declaration "function foo(int $bar) { ... }", you
know that $bar will be an int at the start of every invocation of that
function, regardless of which mode the calling code uses. I think adding
exceptions to that certainty would be a bad direction for the language.On the other hand, I can see a "third way": if the problem with current
static analysis conventions is that they have to be parsed out of a
string-based docblock, we can provide dedicated syntax, without
unifying it with the standard type syntax. For instance, some of the
earlier discussions around introducing attributes suggested reflection
expose the AST of the attributes arguments, rather than the resolved
expressions, allowing them to act a bit like Rust's "hygienic macros".
If that was added as an optional mode, you might be able to do something
like this:#[RawAttribute]
class GenericType {
public function __construct(AST\Node $typeInfo) { ... }
}function foo(#[GenericType(int|float)] array $foo) {
// array is the type guaranteed by the language
// static analysis libraries can get the GenericType attribute from
reflection and receive an AST representing the type constraint int|float
}Not sure readability is improved here compared to existing @template
annotations. ;)Olle
I won't be participating much about this discussion because I lack
expertise to add too much. I just wanted to voice a small (and defeated)
minority of PHP developers that don't want/care for Generics. I've been
working with Typescript lately and I see generics only being useful for
library code and even then when I end up writing some valid Generics stuff,
Typescript verbosity becomes so bloated that it invalidates the added-value
of the functionality.
I truly can't understand how Generics is the most requested PHP feature so
I will just assume I will have to live with it one day, but mixing it with
Attributes Syntax seems to be a recipe to make it as bad (or worse) than
the experience of using Generics in Typescript.
--
Marco Deleu
On Wed, Oct 18, 2023 at 4:14 AM Olle Härstedt olleharstedt@gmail.com
wrote:2023-10-17 21:39 GMT+02:00, Rowan Tommins rowan.collins@gmail.com:
Hello internals,
Was there a previous discussion about the pros/cons of adding only the
syntax needed for generics, but not the functionality? So static
analyzers could use it, instead of docblocks. I looked at externals.io
but couldn't find anything specific.Hi Olle,
Since I haven't seen it expressed quite this way in the previous
discussion, I'd like to highlight what I think is a major downside to
this approach, at least as commonly proposed:Using the same syntax for type information that is guaranteed to be true
(existing run-time checks) and type information that is "advisory only"
(new checks for optional static analysis) means users can no longer have
confidence in that type information.This is one of the interesting things about the compromise over scalar
types - if you see a declaration "function foo(int $bar) { ... }", you
know that $bar will be an int at the start of every invocation of that
function, regardless of which mode the calling code uses. I think adding
exceptions to that certainty would be a bad direction for the language.On the other hand, I can see a "third way": if the problem with current
static analysis conventions is that they have to be parsed out of a
string-based docblock, we can provide dedicated syntax, without
unifying it with the standard type syntax. For instance, some of the
earlier discussions around introducing attributes suggested reflection
expose the AST of the attributes arguments, rather than the resolved
expressions, allowing them to act a bit like Rust's "hygienic macros".
If that was added as an optional mode, you might be able to do something
like this:#[RawAttribute]
class GenericType {
public function __construct(AST\Node $typeInfo) { ... }
}function foo(#[GenericType(int|float)] array $foo) {
// array is the type guaranteed by the language
// static analysis libraries can get the GenericType attribute from
reflection and receive an AST representing the type constraint int|float
}Not sure readability is improved here compared to existing @template
annotations. ;)Olle
I won't be participating much about this discussion because I lack
expertise to add too much. I just wanted to voice a small (and defeated)
minority of PHP developers that don't want/care for Generics. I've been
working with Typescript lately and I see generics only being useful for
library code and even then when I end up writing some valid Generics stuff,
Typescript verbosity becomes so bloated that it invalidates the added-value
of the functionality.I truly can't understand how Generics is the most requested PHP feature so
I will just assume I will have to live with it one day, but mixing it with
Attributes Syntax seems to be a recipe to make it as bad (or worse) than
the experience of using Generics in Typescript.--
Marco Deleu
I also agree with Marco. Generics are a pain in the rear in languages
that have them. Usually, the PHP version of the same code written in
generic C# or Typescript is much more concise and clear. The only
exception to that, would be built-in arrays. If there is exactly one
thing that could be "generic", it would be those that. Effectively, a
simpler syntax to this:
function onlyStrings(string ...$strings): array { return $strings; }
onlyStrings(...['array','of','strings']);
I'd be thrilled if I could just write:
function onlyStrings(array<string> $strings): array<string> { return $strings; }
onlyStrings(['array','of','strings']);
That is all I want whenever I think of Generics in PHP. The rest is
just complicated fluff in my humble opinion.
Robert Landers
Software Engineer
Utrecht NL
Le 18/10/2023 à 17:37, Robert Landers a écrit :
That is all I want whenever I think of Generics in PHP. The rest is
just complicated fluff in my humble opinion.
Where I respectfully disagree is about exactly this: you would
appreciate array<string>, I would very much appreciate
SomeDoctrineRepository<SomeEntity> so that my IDE would actually give me
the right autocompletion.
There's also another use where it would get even better:
EntityManager::find(Query<T>): T
class SomeQuery implements Query<SomeEntity> {}
// Later in code:
$someEntity = $entityManager->get(new SomeQuery())
assert($someEntity instanceof T); // Is now useless, my IDE simply knows.
There are many simple use case that can make our life easier, outside of
simply array, lists, vectors etc...
Even if type erasure is done and there no runtime checks, it would
greatly help when correctly done at the right place by userland APIs.
ORM's are one place, but there are many others.
Regards,
--
Pierre
On Tue, Oct 17, 2023 at 10:40 PM Rowan Tommins rowan.collins@gmail.com
wrote:
Using the same syntax for type information that is guaranteed to be true
(existing run-time checks) and type information that is "advisory only"
(new checks for optional static analysis) means users can no longer have
confidence in that type information.
If "syntax only" solution was temporary then warning users through some
kind of opt-in mechanism (like with strict_types=1) may be enough - that
way users will know that generics type information is "advisory only" and
that this might change in the future. In some other languages (Kotlin)
there are opt-in mechanism for experimental features - ones that are
possibly incomplete, unstable or non-final, and it's working quite well for
them.
On the other hand, I can see a "third way": if the problem with current
static analysis conventions is that they have to be parsed out of a
string-based docblock, we can provide dedicated syntax, without
unifying it with the standard type syntax. For instance, some of the
earlier discussions around introducing attributes suggested reflection
expose the AST of the attributes arguments, rather than the resolved
expressions, allowing them to act a bit like Rust's "hygienic macros".
If that was added as an optional mode, you might be able to do something
like this:
The community has just now decided on the PHPDoc syntax for generics, has
just started widely adopting them in packages and has just got first-party
support from PHPStorm. I doubt that migrating to yet another temporary
solution (one that still doesn't address all of the concerns) is a good
idea right now.
Le 18/10/2023 à 13:01, Alex Wells a écrit :
The community has just now decided on the PHPDoc syntax for generics, has
just started widely adopting them in packages and has just got first-party
support from PHPStorm. I doubt that migrating to yet another temporary
solution (one that still doesn't address all of the concerns) is a good
idea right now.
Documentation is not code, and you could have syntax errors within
without ever knowing it. Documentation is documentation and static
analysis based upon documentation is fragile. All static analyzers may
not even be in phase, when you contribute to projects you have to learn
code style and conventions, but you also have to learn documenting style
and conventions, and you must add the fact that from one project to
another, generics documentation convention changes. Even thought you
think it's "community decided convention", not all tools are in sync,
sadly, and not all the community is OK with it.
I don't use PHPStorm, it's not the only IDE out there, or even people
use simple editors sometime. Not all of them will have the same support
level regarding "community decided convention". And even worse, most
advanced IDE implementation may actually drive the "community decided
convention" and steal the "democratic syntax vote" from the community,
it's a very bad move where people that make money by selling their IDE
may become the one driving the syntax and convention, and which will
then en-prison their users in using their IDE, sometime the only
supporting the convention they created themselves. OK, it's probably not
that true, but at a very theoretical level, it's what it is.
When new syntax elements arise within a new PHP version, all editors,
IDE, LSP backends will adopt it immediately, whereas it's definitely not
true for "community decided convention". You have one, then two, then
some static analysis tool or some IDE adds new subtle changes, new
syntax, you never have, and will never have, at any point in time, a
situation where all tools are in sync.
I do love the idea to have the syntax at PHP level with type erasure
because it's not a "community decided convention" anymore, but a parser
syntax that the engine supports which has been discussed and voted for,
it is being validated, and exposed in reflection, which makes it
resilient, solid, usable, and universal.
Having type erasure eliminates runtime checks, but it still can pose the
basis for later real runtime checks. I like the idea even thought I'm
not fully comfortable with regarding later feasibility, if the syntax is
wrong and things cannot be implemented later in the engine, they the
result would be catastrophic (everyone who have used it would have to
fix all their code later when another solution would be chosen).
I'm aware this isn't an easy topic, and I have no solution for it. But
"community decided convention" is not a solution either, it's at best,
some kind of band-aid, and at worst, creating confusion because
conventions differs from project to project, from tooling to tooling,
and it's terrible for developers.
Regards,
--
Pierer
Le 18/10/2023 à 13:01, Alex Wells a écrit :
The community has just now decided on the PHPDoc syntax for generics,
has
just started widely adopting them in packages and has just got
first-party
support from PHPStorm. I doubt that migrating to yet another temporary
solution (one that still doesn't address all of the concerns) is a good
idea right now.Documentation is not code, and you could have syntax errors within
without ever knowing it. Documentation is documentation and static
analysis based upon documentation is fragile. All static analyzers may
not even be in phase, when you contribute to projects you have to learn
code style and conventions, but you also have to learn documenting style
and conventions, and you must add the fact that from one project to
another, generics documentation convention changes. Even thought you
think it's "community decided convention", not all tools are in sync,
sadly, and not all the community is OK with it.
Agreed with everything. PHPDoc is and will always be a temporary solution,
there's no denying, and I'd much prefer a better solution from the PHP
itself - one's easier to use and parse. All I'm saying is that I don't
believe it's feasible to implement another temporary solution that doesn't
have too many benefits over what the community already has, especially
given PHP's very limited resources. That's why I'm advocating for either
fully type erased generics (with proper, somewhat stable syntax for years
to come) or nothing at all.
On Tue, Oct 17, 2023 at 10:40 PM Rowan Tommins rowan.collins@gmail.com
wrote:Using the same syntax for type information that is guaranteed to be true
(existing run-time checks) and type information that is "advisory only"
(new checks for optional static analysis) means users can no longer have
confidence in that type information.If "syntax only" solution was temporary then warning users through some
kind of opt-in mechanism (like with strict_types=1) may be enough - that
way users will know that generics type information is "advisory only" and
that this might change in the future. In some other languages (Kotlin)
there are opt-in mechanism for experimental features - ones that are
possibly incomplete, unstable or non-final, and it's working quite well for
them.On the other hand, I can see a "third way": if the problem with current
static analysis conventions is that they have to be parsed out of a
string-based docblock, we can provide dedicated syntax, without
unifying it with the standard type syntax. For instance, some of the
earlier discussions around introducing attributes suggested reflection
expose the AST of the attributes arguments, rather than the resolved
expressions, allowing them to act a bit like Rust's "hygienic macros".
If that was added as an optional mode, you might be able to do something
like this:The community has just now decided on the PHPDoc syntax for generics, has
just started widely adopting them in packages and has just got first-party
support from PHPStorm. I doubt that migrating to yet another temporary
solution (one that still doesn't address all of the concerns) is a good
idea right now.
That's fine for systems already invested in building and maintaining a docblock parser. But that's a not-small lift.
In contrast, I built a serialization library that relies heavily on attributes. Most are opt-in, but for array properties, you really have to use one of #[SequenceField(Type::class)] or #[DictionaryField(Type::class, keyType: KeyType::String] so the system knows if it's a sequence or dictionary, and what the types are. It just can't do most things without that knowledge.
I didn't want to add custom, proprietary attributes for that, but it was better than either writing a docblock parser myself or adding and figuring out how to use a 3rd party dependency. I am not happy with this solution.
If there was some universally recognized #[Array(Type::int)] (for a list of ints) or #[Array(Type::string, Foo::class)] (for a string-keyed dict of Foo objects), I could easily access through the reflection/attribute API instead. That would make life a lot easier for both me and anyone using my library, as they wouldn't need to specify that information twice in two different forms. It would still not be a "real syntax", but it would make the de facto convention more accessible to projects other than PHPStan and Psalm.
That said, I agree that such an attribute-based convention, while it would be superior to the docblock approach in almost every way, would still not belong in the language itself. Save the language itself for if/when we figure out for-reals generics. That would be a project for PHPStan, Psalm, and PHPStorm to collaborate on in user-space. (FIG would be happy to host such a discussion if they were interested, but I don't know if they're at all interested.)
--Larry Garfield
Hi all,
Perhaps chiming into the conversation a bit too late, but I would personally like to say that I really do not like the idea of attribute-like, unvalidated generics syntax.
As a maintainer of Psalm and an active user of it both personally and at work, I absolutely adore the typesafety and ergonomics generics can provide, but all benefits are nullified if generics are not not actually typechecked.
Generics can be typechecked by:
- The language itself, which requires all types to be correct
- Oe by a static analysis tool which is specifically added to a given project for the purpose of typechecking generics: in this case, we can consider to be using a superset of PHP which similarly requires the static analysis to pass in order to "compile" the source code (I.e. if the Psalm pipelines are red, deployment is blocked).
Generics whose syntax is fully provided by the language, but then can only be optionally typechecked by installing third party unofficial tools are not, in my opinion, an improvement over the status quo.
Now, after reading this thread I was actually inspired to implement runtime typechecked generics in PHP itself, and https://github.com/nicelocal/php-src/tree/generics is the single-evening result of that: it features generics support in the parser, and some initial, partially committed runtime validation.
However, I paused the effort for the reasons listed in https://github.com/php/php-src/pull/8752: runtime generics typechecks are (needlessly) expensive.
The much better approach, one that I intend to maybe give a shot at this Christmas, is to add static analysis functionality to PHP itself (i.e. turn it into a truly statically typed language).
I have a hunch it may be easy enough to do by hooking into the type inference functionality provided by opcache, and throw compile-time exceptions instead of silently inserting runtime typechecks.
This functionality may be initially enabled through a declare(static_types=1); statement, and might actually also help find bugs in the type inference functionality which causes JIT bugs.
Personally I'm super excited for the possibility of introducing static typechecking in PHP itself, partially because it also allows to easily and cheaply implement generics.
Regards,
Daniil Gentili
Hi Daniil
The much better approach, one that I intend to maybe give a shot at this Christmas, is to add static analysis functionality to PHP itself (i.e. turn it into a truly statically typed language).
I have a hunch it may be easy enough to do by hooking into the type inference functionality provided by opcache, and throw compile-time exceptions instead of silently inserting runtime typechecks.
The optimizer, including type inference, is limited to the scope of
the current file (along with internal functions/classes). Each file is
considered a "single compilation unit". When classes from different
files reference each other, modifying one file does not require
recompiling the other. However, this does mean that we cannot rely on
information from other files as they may change at any point.
Preloading is the exception, where all preloaded files can assume not
to be changed after PHP has booted.
We can obviously not limit type checking to preloaded files. We could
make type checking a CLI step but how is that really better than
PHPStan or Psalm at that point, other than having the official PHP
stamp? PHPStan and Psalm are arguably successful because they are
written in PHP, making them much easier to maintain and contribute to.
I'd also like to add that tools like PHPStan and Psalm have much more
accurate type representations. PHP does not accurately represent
arrays in the optimizer and has no notion of array shapes. The
optimizers types are biased towards speed rather than accuracy.
Another issue, specifically pertaining to generics, is that PHP has
type coercion. In both weak and strict typing mode, a float function
parameter will coerce an integer value. However, if generic types are
erased at runtime then the VM cannot do coercion for foo<float>($int)
(where function foo<T>(T $var)). This will require either accepting
inaccurate runtime types, or establishing stricter static rules that
do not match the existing behavior.
Ilija
Hi Ilija,
The optimizer, including type inference, is limited to the scope of
the current file (along with internal functions/classes). [...]
Preloading is the exception, where all preloaded files can assume not
to be changed after PHP has booted.
Well, I was actually thinking of enabling these compile-time checks only
if preloading is enabled, to ensure the closed-world guarantee.
We could make type checking a CLI step but how is that really better than
PHPStan or Psalm at that point
Yeah don't really like the idea of a separate "typechecking" step
either, I'd really love for this to be an integral part of the execution
process, not as a separate static analysis step.
PHPStan and Psalm are arguably successful because they are
written in PHP, making them much easier to maintain and contribute to.
Agreed, however if at least some static analysis functionality is
already implemented, I see no reason to not make use of it, even if
PHP does not accurately represent arrays in the optimizer and has no notion of array shapes.
personally I really like array shapes but only as a stopgap measure
while switching to full object DTOs, so I'm comfortable with the idea of
not having static analysis for array shapes (actually, I would
absolutely love to just treat all array values as mixed during static
analysis, though array/list generics might be nice and easy to
implement...).
Another issue, specifically pertaining to generics, is that PHP has
type coercion. In both weak and strict typing mode, a float function
parameter will coerce an integer value. However, if generic types are
erased at runtime then the VM cannot do coercion for foo<float>($int)
(where function foo<T>(T $var)). This will require either accepting
inaccurate runtime types, or establishing stricter static rules that
do not match the existing behavior.
Well I'm all for stricter rules :)
Regards,
Daniil Gentili.
Well, I was actually thinking of enabling these compile-time checks only if preloading is enabled, to ensure the closed-world guarantee.
Doesn't that just take us back to the "generic types are only optionally checked" scenario you were trying to avoid?
Right now, I suspect the use of preloading is actually less common than the use of offline static analysers. Unlike OpCache itself, it's not something you can just switch on for an existing code base: you've got to switch from rules for loading classes at runtime to generating a static preload list in advance.
To be reliable for static analysis, that preload list would need to be comprehensive, defining every symbol up front, then making a second pass of cross-file analysis. That makes it look very much like a separate build step, rather than something integrated into the normal execution pipeline. I've said before that this might be the way PHP should go - a native "module" concept, with full pre-compilation - but it would be quite a radical change.
Perhaps an alternative is to change the rules of when autoloading can be invoked, by allowing the preload script to recursively autoload every symbol it can identify, rather than waiting for the relevant code to be executed. I think this is how current offline static analysers work.
That doesn't solve the "optional checks" problem, though, if preloading remains optional. We could produce an error if any generic/extended type was encountered without being preloaded, but that poses a dilemma for libraries: either don't use the new types, or impose a requirement for applications to use preloading. I'm not convinced there's an easy compromise here.
Regards,
--
Rowan Tommins
[IMSoP]
Hi,
Doesn't that just take us back to the "generic types are only optionally checked" scenario you were trying to avoid?
Of course, my idea would also require bundling and enabling opcache by default with PHP and actually requiring the preloading+compilation step in order to run any code, like for any other statically typed compiled language.
I.e. no more REPL-like, open world assumptions: all referenced classes and types must be present (or autoloadable) at compile time, with no way provided to disable compile-time type checks, like for any statically typed language (of course this would be somewhat of a breaking change, PHP 9-worthy).
you've got to switch from rules for loading classes at runtime to generating a static preload list in advance.
To be reliable for static analysis, that preload list would need to be comprehensive, defining every symbol up front, then making a second pass of cross-file analysis.
The preload list does not have to be generated manually, PHP itself could autonomously autoload all required classes at compile time (this differs from the current behavior, where autoloading is only triggered during execution of code mentioning not-yet-loaded classes).
That makes it look very much like a separate build step, rather than something integrated into the normal execution pipeline.
Well yeah, but it doesn't have to be separate from the execution step, I.e. I don't see why running a script using php a.php should not be able to autoload all reachable classes during compile time instead of at runtime, I.e. because they're mentioned in typehints, not because the function using the typehint is being executed.
I've said before that this might be the way PHP should go - a native "module" concept, with full pre-compilation - but it would be quite a radical change.
This is a different, unrelated matter, and it would actually be super cool to generate a single executable using function JIT, bundling both the compiled php code and php itself; I've actually already done this as an experiment with another JIT language based on dynasm, it's quite simple really.
Perhaps an alternative is to change the rules of when autoloading can be invoked, by allowing the preload script to recursively autoload every symbol it can identify, rather than waiting for the relevant code to be executed. I think this is how current offline static analysers work.
Precisely.
That doesn't solve the "optional checks" problem, though, if preloading remains optional. We could produce an error if any generic/extended type was encountered without being preloaded, but that poses a dilemma for libraries: either don't use the new types, or impose a requirement for applications to use preloading. I'm not convinced there's an easy compromise here.
No need to preload, just autoload at compile time like you said, and polyfill missing classes that cannot be autoloaded.
Regards,
Daniil Gentili.
(of course this would be somewhat of a breaking change, PHP 9-worthy).
I think this is rather an understatement. This would be a HUGE change for many users of the language, and expect a lot of pushback for any such proposal. There are still a lot of people who use a cheap shared host, edit a file directly on the live environment, and expect it to just work without needing any kind of rebuild or restart step.
This is a different, unrelated matter, and it would actually be super cool to generate a single executable using function JIT, bundling both the compiled php code and php itself; I've actually already done this as an experiment with another JIT language based on dynasm, it's quite simple really.
When I talk about "compiling PHP", I mean the existing compilation to op codes! maybe stabilising them for deployment, maybe just using the existing OpCache preloading.
If you use the JIT in advance, it becomes a very different tool - the point of JIT compilation is to make use of runtime information about hot paths, the actual types of dynamic arguments, etc
Regards,
Rowan Tommins
[IMSoP]
Hi,
There are still a lot of people who use a cheap shared host, edit a file directly on the live environment, and expect it to just work without needing any kind of rebuild or restart step.
Nothing will change in that sense, the only difference is the addition of compile-time autoloading and type checking.
No preloading (in the sense of no opcache configuration needed), nothing special.
If you use the JIT in advance, it becomes a very different tool - the point of JIT compilation is to make use of runtime information about hot paths, the actual types of dynamic arguments, etc
Not with function JIT, function JIT is the closest thing we currently have to an AOT compiler in php (minus the actual executable generation, since it all happens in memory), it just compiles opcodes without any optimizations based on runtime type information (unlike tracing JIT, which does use runtime type information).
Regards,
Daniil Gentili.
Nothing will change in that sense, the only difference is the addition of compile-time autoloading and type checking.
No preloading (in the sense of no opcache configuration needed), nothing special.
OK; just to explain the confusion, that's not what you were saying earlier:
I was actually thinking of enabling these compile-time checks only if preloading is enabled
Cross-file analysis on demand would also be quite a big change though - changing one file would need to invalidate the entire cache, or recursively follow tracked dependencies, maybe re-running the autoloader for all those possible references. The ability for live changes to one file to cause compilation errors in another might also be confusing.
That's why it's tied up in my mind with a native "module" concept - you could have some kind of isolation around a module that allowed preloading or cache invalidation at that level, rather than the extremes of one file or an entire application.
Not with function JIT, function JIT is the closest thing we currently have to an AOT compiler in php
Even function JIT will select which functions to compile based on what's executed frequently. It can also have certain types of function or operation it can't compile, or would produce worse performance for, so leaves them to the op code interpreter.
I suppose you could "warm" the JIT in advance as part of preloading, but that seems to be mostly orthogonal to what we're talking about here in terms of static analysis.
Regards,
--
Rowan Tommins
[IMSoP]