Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".
You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
cheers,
Derick
Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/cheers,
Derick
Nice! It is awesome to see some movement here. Just one thing:
Invariance would make arrays very difficult to adopt, as a library can not start type hinting generic arrays without breaking user code, and users can not pass generic arrays to libraries until they start using generic arrays type declarations.
This seems like a strawman argument, to a degree. In other words, it seems like you could combine static arrays and fluid arrays to accomplish what you are seeking to do. In other words, use static arrays but allow casting to treat it as "fluid."
In other words, simply cast to get your example to compile:
function f(array<int> $a) {}
function g(array $a) {}
$a = (array<int>) [1]; // array unless cast
f($a); // ok
g((array)$a); // ok
And the other way:
function f(array<int> $a) {}
function g(array $a) {}
$a = [1];
f((array<int>)$a); // ok, type check done during cast
g($a); // ok
— Rob
Hi Rob,
Invariance would make arrays very difficult to adopt, as a library can not start type hinting generic arrays without breaking user code, and users can not pass generic arrays to libraries until they start using generic arrays type declarations.
This seems like a strawman argument, to a degree. In other words, it seems like you could combine static arrays and fluid arrays to accomplish what you are seeking to do. In other words, use static arrays but allow casting to treat it as "fluid."
In other words, simply cast to get your example to compile:
function f(array<int> $a) {}
function g(array $a) {}$a = (array<int>) [1]; // array unless cast
f($a); // ok
g((array)$a); // okAnd the other way:
function f(array<int> $a) {}
function g(array $a) {}$a = [1];
f((array<int>)$a); // ok, type check done during cast
g($a); // ok
There is potential for breaking changes in both of your examples:
If f() is a library function that used to be declared as f(array $a)
, then changing its declaration to f(array<int> $a)
is a
breaking change in the Static Arrays flavour, as it would break
library users until they change their code to add casts.
Similarly, the following code would break (when calling g()) if h()
was changed to return an array<int>:
function h(): array {}
function g(array $a);
$a = h();
g($a);
Casting would allow users to pass generic arrays to libraries that
don't support generics yet, but that's expensive as it requires a
copy.
Best Regards,
Arnaud
Hi Rob,
Invariance would make arrays very difficult to adopt, as a library can not start type hinting generic arrays without breaking user code, and users can not pass generic arrays to libraries until they start using generic arrays type declarations.
This seems like a strawman argument, to a degree. In other words, it seems like you could combine static arrays and fluid arrays to accomplish what you are seeking to do. In other words, use static arrays but allow casting to treat it as "fluid."
In other words, simply cast to get your example to compile:
function f(array<int> $a) {}
function g(array $a) {}$a = (array<int>) [1]; // array unless cast
f($a); // ok
g((array)$a); // okAnd the other way:
function f(array<int> $a) {}
function g(array $a) {}$a = [1];
f((array<int>)$a); // ok, type check done during cast
g($a); // okThere is potential for breaking changes in both of your examples:
If f() is a library function that used to be declared as
f(array $a)
, then changing its declaration tof(array<int> $a)
is a
breaking change in the Static Arrays flavour, as it would break
library users until they change their code to add casts.
I don't think we should be scared of breaking changes; php 9.0 is coming 🔜 anyway. You could also consider it as "an array might be array<T>, but an array<T> is always an array"
Similarly, the following code would break (when calling g()) if h()
was changed to return an array<int>:function h(): array {}
function g(array $a);$a = h();
g($a);Casting would allow users to pass generic arrays to libraries that
don't support generics yet, but that's expensive as it requires a
copy.
Why does it require a copy? It should only require a copy if the contents are changed (CoW) and at that point, you can know what rules to apply based on the coerced/casted type. I'm doing a similar thing for the Literal Strings RFC, where it is a type that is also indistinguishable from a string until something happens to it and it is no longer a literal string.
So passing a array<int> to a function that only accepts an array shouldn't matter. Once inside that function, all type-checking can be disabled for that array. One approach to that could be to just smack a "type-check strategy" function pointer on zvals, potentially, as that would give the most flexibility for casting, aliases, generics, etc. Don't get me started on the current type checking; it is a mess and inconsistent depending on what is doing the checking (constructor promoted props, properties, method args, function args). Then you can just copy the zval, change a function pointer, but point it to the same array (which will CoW) and change the strategy during casting.
In other words, you could cheaply cast an array<int> to array<string> by (essentially) changing a couple of function pointers, but array<string> to array<int> would be expensive. So I imagine there would strategies for changing strategies... probably. I don't know, I literally just thought of this off the top of my head, so it probably needs more work.
Best Regards,
Arnaud
— Rob
On Monday, 19 of August 2024 г. at 20:11, Derick Rethans derick@php.net
wrote:
Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
cheers,
Derick
Hi! Thank you very much for the article.
In the "Fully Erased Type Declarations" section you mention that "It's
unclear what impact erased types would have on reflection, or libraries
that depend on reflection."
I wanted to share a thought that if code is analyzed with external tools
like Psalm and PHPStan, it might make sense for reflection to be handled by
external tools as well.
For example, BetterReflection https://github.com/Roave/BetterReflection
offers native reflection functionality statically and is already used by
PHPStan and Rector.
Additionally, I maintain a project called Typhoon Reflection
https://github.com/typhoon-php/typhoon/blob/0.4.x/docs/reflection.md that
supports phpDoc types and is capable of resolving generics and type aliases.
If PHP moves toward a “fully erased type system,” it’s possible that in the
future, we could see tools that both analyze code, and provide reflection.
--
Best regards,
Valentin
Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/cheers,
Derick
Hey Derick,
The fluid Arrays section says "A PoC has been implemented, but the
performance impact is still uncertain". Where may I find that PoC for my
curiosity? I'm imagining the implementation of the array types as a
counted collection of types of the entries. But without the PoC I may
only guess.
It also says "Another issue is that [...] typed properties may not be
possible.". Why would that be the case? Essentially a typed property
would just be a static array, which you describe in the section right below.
Also you are mentioning references. References to static arrays (typed
property case) are trivial. References to fluid arrays would probably
require runtime lookup of the contained references to determine the
actual full type. Which may be a valid tradeoff, given that the very
most arrays don't contain any or many references. ("Either you don't use
references or you pay an O(contained references) overhead when passing
around.")
So, reading the conclusion, I'm a bit taken disappointed by:
- Halt efforts on typed arrays, as our current thoughts are that it
is probably not worth doing, due to the complexities of how arrays
work, and the minimal functionality that it would bring.
I'd truly appreciate more investigation into the topic, as I feel the
functionality would definitely not be minor to PHP users.
Regarding the Collections PR, I personally really don't like it:
- It implements something which would be trivial if we had reified
generics. If this ever gets merged, and generics happen later, it
would be probably outdated and quirkiness the language has to carry
around. - It's not powerful. But rather a quite limited implementation. No
overrides of the built-in methods possible. No custom operations ("I
want a dict where a specific property on the key is the actual
unique key", "I want a custom callback be executed for each
modification"). It's okay as a PoC, but far from a complete enough
implementation. - It's a very specialized structure/syntax, not extensible for
userland at all. Some functionality like generic traits, where you'd
actually monomorphize the contained methods would be much more
flexible. E.g. class Articles { use Sequence<Article>; }. Much less
specialized syntax, much more extensible. And generic traits would
be doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the
generic arguments at link-time) would be an useful feature which
would remain useful even if we had fully reified generics.
I recognize that some functionality will need support of internal
zend_object_handlers. But that's not a blocker, we might provide
some default internal traits with PHP, enabling the internal class
handlers.
So to summarize, I would not continue on that path, but really invest
into monomorphizable generic traits instead.
Remains the last point about erased generics being acceptable:
- If we ever end up adding actual reified generics (maybe due to a
renewed investigation in 5 years), we'll most likely want to retain
the syntax. There may be some syntax which cannot be supported
though, or semantics which would have to break existing code. - Docblocks sort of an extensible and modifiable standard. Some type
checkers allow e.g. List<positive-int>. But PHP certainly won't
support it. So you will end up in a hybrid state where some
functions use generics and some use only docblocks, because they're
not powerful enough. Further, if you use both (e.g. List<int> in
definition, List<positive-int> in docblock), you also have to make
sure to keep them in sync, because the generic type doesn't get
verfied through execution. - We're used to "all types specified are checked". And that's a good
thing. It sets expectations.
Now imagine we're introducing type aliases. "type IntList =
List<int>;". Function signature "function processIntegers(IntList
$list)". This looks like I could expect something actually being an
IntList. There's no generic immediately in sight telling me that
this is only going to provide me a List of arbitrary values. I will
expect an IntList. Just like I will expect any bare "int" type to
also give me an integer.
So, overall, I think erased generics set the wrong expectations and have
quite a risk to be a bad decision in light of possible future improvements.
I'd also like to leave a small side note on this question:
What generic features are acceptable to leave out to make the
implementation more feasible?
I think this asks the wrong question. First, figure out, what generic
features really cannot make it, then figure out whether omitting these
features is acceptable.
Thanks all for investing time into this topic, I'm sure it will bring
the language forward!
Bob
Regarding the Collections PR, I personally really don't like it:
• It implements something which would be trivial if we had reified
generics. If this ever gets merged, and generics happen later, it would
be probably outdated and quirkiness the language has to carry around.
• It's not powerful. But rather a quite limited implementation. No
overrides of the built-in methods possible. No custom operations ("I
want a dict where a specific property on the key is the actual unique
key", "I want a custom callback be executed for each modification").
It's okay as a PoC, but far from a complete enough implementation.
I think we weren't that clear on that section, then. The intent is that dedicated collection classes are, well, classes. They can contain additional methods, and probably can override the parent methods; though the latter may have some trickiness if trying to access the internal data structure, which may or may not look array-ish. (That's why it's just a PoC and we're asking for feedback if it's worth trying to investigate further.)
• It's a very specialized structure/syntax, not extensible for
userland at all. Some functionality like generic traits, where you'd
actually monomorphize the contained methods would be much more
flexible. E.g. class Articles { use Sequence<Article>; }. Much less
specialized syntax, much more extensible. And generic traits would be
doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the generic
arguments at link-time) would be an useful feature which would remain
useful even if we had fully reified generics.
I recognize that some functionality will need support of internal
zend_object_handlers. But that's not a blocker, we might provide some
default internal traits with PHP, enabling the internal class handlers.
So to summarize, I would not continue on that path, but really invest
into monomorphizable generic traits instead.
Interesting. I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose. That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there. (I still hold out hope that Levi will take another swing at interface-default-methods.)
Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time. Still, it sounds like an area worth considering.
--Larry Garfield
Regarding the Collections PR, I personally really don't like it:
• It implements something which would be trivial if we had reified
generics. If this ever gets merged, and generics happen later, it would
be probably outdated and quirkiness the language has to carry around.
• It's not powerful. But rather a quite limited implementation. No
overrides of the built-in methods possible. No custom operations ("I
want a dict where a specific property on the key is the actual unique
key", "I want a custom callback be executed for each modification").
It's okay as a PoC, but far from a complete enough implementation.
I think we weren't that clear on that section, then. The intent is that dedicated collection classes are, well, classes. They can contain additional methods, and probably can override the parent methods; though the latter may have some trickiness if trying to access the internal data structure, which may or may not look array-ish. (That's why it's just a PoC and we're asking for feedback if it's worth trying to investigate further.)
I assumed so, as said "okay as a PoC" :-)
• It's a very specialized structure/syntax, not extensible for
userland at all. Some functionality like generic traits, where you'd
actually monomorphize the contained methods would be much more
flexible. E.g. class Articles { use Sequence<Article>; }. Much less
specialized syntax, much more extensible. And generic traits would be
doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the generic
arguments at link-time) would be an useful feature which would remain
useful even if we had fully reified generics.
I recognize that some functionality will need support of internal
zend_object_handlers. But that's not a blocker, we might provide some
default internal traits with PHP, enabling the internal class handlers.
So to summarize, I would not continue on that path, but really invest
into monomorphizable generic traits instead.
Interesting. I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose. That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there. (I still hold out hope that Levi will take another swing at interface-default-methods.)Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time. Still, it sounds like an area worth considering.
--Larry Garfield
Nikita did the investigation into monomorphized generics a long time ago
(https://github.com/PHPGenerics/php-generics-rfc/issues/44). So it was
mostly concluded that reified generics would be the way to go. The
primary issue Arnauld is currently investigating, is propagation of
generic information via runtime behaviour, inference etc.
It would be solving large amounts of problems if you'd have to fully
specify the specific instance of a generic every time you instantiate
one. But PHP is at heart a dynamic language where typing is generally
opt-in (also when constructing new objects of generic classes for
example). And we want to avoid "new List<Entry<Foo<Something>,
WeakReference<GodObject>>>()"-style nesting where not necessary.
"Monomorphization of interfaces" does not really make a lot of sense as
a concept. Ultimately in an interface, all you do is providing
information for classes to type check against, which happens at link
time, once. (Unless you mean interface-default-methods, but that would
just be an implicitly implemented trait implementation wise, really.)
But sure, generic interfaces and monomorphized generic traits are
perfectly implementable today. In fact, I'd definitely suggest we'd
start out by implementing these, orthogonally from actual class generics.
Bob
Regarding the Collections PR, I personally really don't like it:
• It implements something which would be trivial if we had reified
generics. If this ever gets merged, and generics happen later, it would
be probably outdated and quirkiness the language has to carry around.
• It's not powerful. But rather a quite limited implementation. No
overrides of the built-in methods possible. No custom operations ("I
want a dict where a specific property on the key is the actual unique
key", "I want a custom callback be executed for each modification").
It's okay as a PoC, but far from a complete enough implementation.I think we weren't that clear on that section, then. The intent is that dedicated collection classes are, well, classes. They can contain additional methods, and probably can override the parent methods; though the latter may have some trickiness if trying to access the internal data structure, which may or may not look array-ish. (That's why it's just a PoC and we're asking for feedback if it's worth trying to investigate further.)
I assumed so, as said "okay as a PoC" :-)• It's a very specialized structure/syntax, not extensible for
userland at all. Some functionality like generic traits, where you'd
actually monomorphize the contained methods would be much more
flexible. E.g. class Articles { use Sequence<Article>; }. Much less
specialized syntax, much more extensible. And generic traits would be
doable, regardless of the rest of the generics investigation.
In fact, generic traits (essentially statically replacing the generic
arguments at link-time) would be an useful feature which would remain
useful even if we had fully reified generics.
I recognize that some functionality will need support of internal
zend_object_handlers. But that's not a blocker, we might provide some
default internal traits with PHP, enabling the internal class handlers.
So to summarize, I would not continue on that path, but really invest
into monomorphizable generic traits instead.Interesting. I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose. That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there. (I still hold out hope that Levi will take another swing at interface-default-methods.)
Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time. Still, it sounds like an area worth considering.
--Larry Garfield
Nikita did the investigation into monomorphized generics a long time ago (https://github.com/PHPGenerics/php-generics-rfc/issues/44). So it was mostly concluded that reified generics would be the way to go. The primary issue Arnauld is currently investigating, is propagation of generic information via runtime behaviour, inference etc.It would be solving large amounts of problems if you'd have to fully specify the specific instance of a generic every time you instantiate one. But PHP is at heart a dynamic language where typing is generally opt-in (also when constructing new objects of generic classes for example). And we want to avoid "new List<Entry<Foo<Something>, WeakReference<GodObject>>>()"-style nesting where not necessary.
I generally follow the philosophy:
- get it working
- get it working well
- get it working fast
And inference seems like a type (2) task. In other words, I think people would be fine with generics, even if they had to type it out every single time. At least for a start. From there, you'd have multiple people able to tackle the inference part, proposing RFCs to make it happen, etc. vs. now where basically only one person on the planet can attempt to tackle a very complex problem that doesn't exist yet. That isn't to say it isn't useful research, because you want to write things in such a way that you can implement inference when you get to (2), but an actual implementation shouldn't be sought out yet, just understanding the problem and solution space is likely enough to do (1) while taking into account (2) -- such as choosing algorithms, op-codes, data structures, etc.
For a feature like this, perfect is very much the enemy of good.
"Monomorphization of interfaces" does not really make a lot of sense as a concept. Ultimately in an interface, all you do is providing information for classes to type check against, which happens at link time, once. (Unless you mean interface-default-methods, but that would just be an implicitly implemented trait implementation wise, really.)
Why doesn't it make sense?
interface Id<T> {
public T $id {
get => $this->id; // pretty sure this is the wrong syntax?
}
public function getId(): T;
public function setId(T $id): void;
}
class StringId implements Id<string> { /* ... / }
class IntId implements Id<int> { / ... */ }
For codebases (like the one I work with every day) identifiers may be a string or int and right now, that interface can't exist.
But sure, generic interfaces and monomorphized generic traits are perfectly implementable today. In fact, I'd definitely suggest we'd start out by implementing these, orthogonally from actual class generics.
Bob
— Rob
Hi Larry,
In fact, generic traits (essentially statically replacing the generic
arguments at link-time) would be an useful feature which would remain
useful even if we had fully reified generics.
I recognize that some functionality will need support of internal
zend_object_handlers. But that's not a blocker, we might provide some
default internal traits with PHP, enabling the internal class handlers.
So to summarize, I would not continue on that path, but really invest
into monomorphizable generic traits instead.Interesting. I have no idea why Arnaud has mainly been investigating reified generics rather than monomorphized, but a monomorphized trait has potential, I suppose. That naturally leads to the question of whether monomorphized interfaces would be possible, and I have no idea there. (I still hold out hope that Levi will take another swing at interface-default-methods.)
Though this still wouldn't be a path to full generics, as you couldn't declare the inner type of an object at creation time, only code time. Still, it sounds like an area worth considering.
Monomorphization as a solution to generic classes has a memory usage
issue (it requires duplicating the class entry, methods, props, and
also opcodes if method bodies can reference type parameters), and does
not solve all the complexity:
https://github.com/PHPGenerics/php-generics-rfc/issues/44.
This would be less a problem for traits, as there is already some
amount of duplication.
Best Regards,
Arnaud
Hi Bob,
The fluid Arrays section says "A PoC has been implemented, but the performance impact is still uncertain". Where may I find that PoC for my curiosity? I'm imagining the implementation of the array types as a counted collection of types of the entries. But without the PoC I may only guess.
I may publish the PoC at some point, but in the meantime here is a
short description of how it's implemented:
- The zend_array has a zend_type member representing the type of its elements
- Everytime we add or update a member, we union its type with the
array type. For simple types it's just a |= operation. For arrays with
a single class it's also simple. For complex types it's more expensive
currently, but it may be possible to cache transitions to make this
cheaper. - Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don't update the type on
deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array's
type in practice, and 2. Type checks rarely fail - References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array<mixed> to a more specific array<something> will cause a
re-compute, which also de-refs every reference. - Updating a nested element requires updating the type of every parent
It also says "Another issue is that [...] typed properties may not be possible.". Why would that be the case? Essentially a typed property would just be a static array, which you describe in the section right below.
It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.
E.g.
class C {
public array<array<int>> $prop;
}
$a = &$c->prop[0];
$a[] = 'string'; // must be an error
unset($c->prop[0]);
$a[] = 'string'; // must be accepted
$b = &$c->prop[1];
$b[] = 'string'; // must be an error
$c->prop = [];
$a[] = 'string'; // must be accepted
I don't remember all the possible cases, but I didn't find a way to
support this that didn't involve recursively scanning an array at some
point. IIRC, without references it's less of an issue, so a possible
way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
sort($c->prop)
. out/inout parameters may be part of a solution, but
with more array separations than pass-by-ref.
Best Regards,
Arnaud
Hey Arnauld,
Hi Bob,
The fluid Arrays section says "A PoC has been implemented, but the performance impact is still uncertain". Where may I find that PoC for my curiosity? I'm imagining the implementation of the array types as a counted collection of types of the entries. But without the PoC I may only guess.
I may publish the PoC at some point, but in the meantime here is a
short description of how it's implemented:
- The zend_array has a zend_type member representing the type of its elements
- Everytime we add or update a member, we union its type with the
array type. For simple types it's just a |= operation. For arrays with
a single class it's also simple. For complex types it's more expensive
currently, but it may be possible to cache transitions to make this
cheaper.- Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don't update the type on
deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array's
type in practice, and 2. Type checks rarely fail
That sounds like a clever way to do it. I like this approach.
- References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array<mixed> to a more specific array<something> will cause a
re-compute, which also de-refs every reference.
Classifying a reference as mixed certainly makes this work and I guess
it's probably an acceptable overhead. References into (big) arrays are
not that common. Short of doing a foreach by-ref, but that's anyway an
O(n) operation generally.
- Updating a nested element requires updating the type of every parent
Does it actually? It just requires updating the type of the parent, if
the own type is actually changed. But types of arrays don't change all
the time, so that's likely an amortized constant time operation with
respect to inserts/updates.
It also says "Another issue is that [...] typed properties may not be possible.". Why would that be the case? Essentially a typed property would just be a static array, which you describe in the section right below.
It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.E.g.
class C {
public array<array<int>> $prop;
}$a = &$c->prop[0];
$a[] = 'string'; // must be an error
unset($c->prop[0]);
$a[] = 'string'; // must be accepted
In this case $a will decay from a RC=1 reference to a normal value.
During the unreferencing operation the type restrictions can be dropped.
That operation is only O(n) if it actually contains other references.$b = &$c->prop[1];
$b[] = 'string'; // must be an error
$c->prop = [];
$a[] = 'string'; // must be acceptedI don't remember all the possible cases, but I didn't find a way to
support this that didn't involve recursively scanning an array at some
point. IIRC, without references it's less of an issue, so a possible
way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
sort($c->prop)
. out/inout parameters may be part of a solution, but
with more array separations than pass-by-ref.
Yes, you'll have to scan the array recursively, but only if it contains
references (which you know thanks to array<mixed> or
array<array<mixed>>). And you also only need to descend into arrays
which contain references.
If something contains a reference, you just slap a property type onto it
- like "foreach entry in array { if entry is reference {
add_type_source(inner type of entry) } }" - thus, in case of
array<array<int>>, you slap array<int> onto it. This operation is only
O(n) if the array type actually contains references (i.e. it will
mismatch due to array<mixed>, and you have to iterate anyway).
So it will just work like references to property types do: these can
also never violate the type containing them. At least in my mind.
I'd also be happy to chat more about it off-list, but possibly easier
too once the patch is public.
Best Regards,
Arnaud
Overall I would not focus too much on making the case "reference into
array" too much of a blocker. It should work, but it's fine if it comes
with a couple rough edges regarding performance. I don't think arrays
where you hold a reference into them are commonly passed around or big.
There are a few edge cases like state machines built with array
references, but the solution to these is ... don't type the property
containing it then. And if it really becomes a problem, we may still
invest time into it after landing it.
Thanks,
Bob
Hi Bob,
The fluid Arrays section says "A PoC has been implemented, but the
performance impact is still uncertain". Where may I find that PoC for my
curiosity? I'm imagining the implementation of the array types as a counted
collection of types of the entries. But without the PoC I may only guess.I may publish the PoC at some point, but in the meantime here is a
short description of how it's implemented:
- The zend_array has a zend_type member representing the type of its
elements- Everytime we add or update a member, we union its type with the
array type. For simple types it's just a |= operation. For arrays with
a single class it's also simple. For complex types it's more expensive
currently, but it may be possible to cache transitions to make this
cheaper.- Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don't update the type on
deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array's
type in practice, and 2. Type checks rarely fail- References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array<mixed> to a more specific array<something> will cause a
re-compute, which also de-refs every reference.- Updating a nested element requires updating the type of every parent
It also says "Another issue is that [...] typed properties may not be
possible.". Why would that be the case? Essentially a typed property would
just be a static array, which you describe in the section right below.It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.E.g.
class C {
public array<array<int>> $prop;
}$a = &$c->prop[0];
$a[] = 'string'; // must be an error
unset($c->prop[0]);
$a[] = 'string'; // must be accepted$b = &$c->prop[1];
$b[] = 'string'; // must be an error
$c->prop = [];
$a[] = 'string'; // must be acceptedI don't remember all the possible cases, but I didn't find a way to
support this that didn't involve recursively scanning an array at some
point. IIRC, without references it's less of an issue, so a possible
way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
sort($c->prop)
. out/inout parameters may be part of a solution, but
with more array separations than pass-by-ref.Best Regards,
Arnaud
Another one that I don't see mentioned that naturally follows from a
conversation I had with you a few weeks ago is operators on arrays. Namely,
the behavior of the +
operator when used with arrays. How this would
interact with generics, and with different approaches to generics and
arrays, is probably something that will require attention. Operators in
general present some challenges (though not unsolvable ones, just
complicated ones) to languages that try to use both generics and loose
types, because operators generally don't have a way for the programmer to
help the engine with typing during the evaluation.
Jordan
Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/cheers,
Derick
As an experiment, awhile ago, I went a different route for reified generics by 'hacking' type aliases (which I was also experimenting with). Such that a generic becomes compiled into a concrete implementation with a dangling type alias:
class Box<T> {
function __construct(T $thing) {}
}
is essentially compiled to
class Box {
use alias __Box_T => ???;
function __construct(__Box_T $thing) {}
}
This just gets a T type alias (empty-ish, with a mangled name) that gets filled in during runtime (every instance gets its own type alias table, and uses that along with the file alias table). There shouldn't be any performance impact this way (or at least, as bad as using type aliases, in general; which is also an oft-requested feature).
Thus, when you create a new Box<int> it just fills in that type alias for T as int. Nesting still works too Box<Box<int>> is just an int type alias on the inner Box and the outer Box alias is just Box. Type-checking basically works just like it does today (IIRC, Box<int> literally got stored as "Box<int>" for fast checking), and reflection just looks up the type aliases and unmangles them -- though I know for certain I never finished reflection and got bogged down in GC shenanigans.
There were probably some serious cons in that approach, but I ran out of free time to investigate. If you are doing experiments, it is probably worth looking into.
FYI though, people seemed really turned off by file-level type aliases (at least exposed to user-land, so I never actually pursued it).
— Rob
As an experiment, awhile ago, I went a different route for reified generics by 'hacking' type aliases (which I was also experimenting with). Such that a generic becomes compiled into a concrete implementation with a dangling type alias:
class Box<T> {
function __construct(T $thing) {}
}is essentially compiled to
class Box {
use alias __Box_T => ???;function __construct(__Box_T $thing) {}
}This just gets a T type alias (empty-ish, with a mangled name) that gets filled in during runtime (every instance gets its own type alias table, and uses that along with the file alias table). There shouldn't be any performance impact this way (or at least, as bad as using type aliases, in general; which is also an oft-requested feature).
From what I understand this is essentially how Go implements Generics. So +1 for considering this approach.
FYI though, people seemed really turned off by file-level type aliases (at least exposed to user-land, so I never actually pursued it).
Shame. Type aliases are super useful in practice in other languages, with many used for single-file scope in my experience.
-Mike
Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/cheers,
Derick
Great job on providing so much detail in your blog post.
JMTCW, but I am less of a fan of boil-the-ocean generics and more of a fan of focused pragmatic solutions like you proposed with the Collection types. The former can result in really complex to read and understand code whereas the latter — when done well — results in easier to read and understand code.
It seems Java-style Generics are viewed as the proper archetype for Generics in PHP? I would challenge the wisdom of taking that road considering how different the compilers and runtimes are between the Java and PHP. PHP should seek out solutions that are a perfect fit for its nature and not pursue parity with Java.
As PHP is primarily a web development language — vs. a systems language like C or Rust, or an enterprise application language like Java or C# — reducing code complexity for reading and understanding is a very important attribute of the language.
PHP is also a unique language and novel solutions benefit a unique language. PHP should pursue solutions that result in less complex code even if not found in other languages. Your collections idea is novel — which is great — but there are probably even more novel solutions to address other needs vs. going full-on with Java-style generics.
Consider if adding type aliases; or augmenting, enhancing, or even merging classes, interfaces, and/or traits to address the needs Java-style generics would otherwise provide. I would work on some examples but I think you are more likely to adopt the features you come up with on your own.
As for type-erasure, I am on the fence, but I find the proposed "how" problematic. I can see wanting some code to be type-checked and other code not, but I think more often developers would want code type-checked during development and testing but not for staging or production. And if the switch for that behavior is in every file that means modifying every file during deployment. IMO that is just a non-starter.
If you are going to pursue type-erasure I recommend introducing a file in the root — call it .php.config
or similar — that contains a wildcard enabled tree-map of code with attributes settable for each file, directory, group of files and/or group of directories where one attribute is type-checked or other attributes are reserved for future use. This config file should also be able to delegate the .php.config
files found elsewhere, such as config files for each package in the vendor directory. It would be much better and easier to swap out a few .php.config
files during CI/CD than to update all files.
Additionally PHP could use an environment variable as prescribed by 12 Factor apps to identify the root config file. That way a hosting company could allow someone to configure their production server to point to .php.production.config
instead of ``.php.development.config`.
-Mike
P.S. Also consider offering the ability for a function or class method to "type" a parameter or variable based on an interface and then allow values that satisfy that interface structurally[1] but not necessarily require the class to explicitly implement the interface.
This is much like how Stringable
is just automatically implemented by any class that has a __ToString()
method, but making this automatic implementation available to userland. Then these automatically-declared interfaces can cover some of the use-cases for generics without the complexity of generics.
For example — to allow you to visualize — consider a Printable
interface that defines a print()void
method. If some PHP library has a class Foo
and it has a method with signature print()void
then we could write a function to use it, maybe like so:
interface Printable {
print($x any)void
}
// The prefix ?
on Printable
means $printer
just has to match the Printable
interface's signature
function doSomething($printer ?Printable) {
$printer->print()
}
$foo = new Foo();
doSomething($foo);
Something to consider?
Hi Mike,
It seems Java-style Generics are viewed as the proper archetype for Generics in PHP? I would challenge the wisdom of taking that road considering how different the compilers and runtimes are between the Java and PHP. PHP should seek out solutions that are a perfect fit for its nature and not pursue parity with Java.
As PHP is primarily a web development language — vs. a systems language like C or Rust, or an enterprise application language like Java or C# — reducing code complexity for reading and understanding is a very important attribute of the language.
PHP is also a unique language and novel solutions benefit a unique language. PHP should pursue solutions that result in less complex code even if not found in other languages. Your collections idea is novel — which is great — but there are probably even more novel solutions to address other needs vs. going full-on with Java-style generics.
Consider if adding type aliases; or augmenting, enhancing, or even merging classes, interfaces, and/or traits to address the needs Java-style generics would otherwise provide. I would work on some examples but I think you are more likely to adopt the features you come up with on your own.
Part of the appeal for Java/C#/Kotlin-like generics is that they are
well understood and their usefulness is not to be proven. Also they
fit well with the object-oriented aspect of the language, and many PHP
projects already use them via PHPStan/Psalm. More experimental
alternatives would be more risky. I would be interested to see
suggestions or examples, however.
As for type-erasure, I am on the fence, but I find the proposed "how" problematic.
I can see wanting some code to be type-checked and other code not, but I think more often developers would want code type-checked during development and testing but not for staging or production. And if the switch for that behavior is in every file that means modifying every file during deployment. IMO that is just a non-starter.
The reason for this "how" is that type checking is also coercing, so
disabling it "from the outside" may break a program that's not
designed for that. That's why this is something that should be enabled
on a per-file basis, and can probably not be switched on/off depending
on the environment.
P.S. Also consider offering the ability for a function or class method to "type" a parameter or variable based on an interface and then allow values that satisfy that interface structurally[1] but not necessarily require the class to explicitly implement the interface.
Unfortunately, I believe that structural types would be very expensive
to implement at runtime. Static analysers could support this, however
(PHPStan/Psalm support some structural types already).
Best Regards,
Arnaud
Hi Mike,
It seems Java-style Generics are viewed as the proper archetype for Generics in PHP? I would challenge the wisdom of taking that road considering how different the compilers and runtimes are between the Java and PHP. PHP should seek out solutions that are a perfect fit for its nature and not pursue parity with Java.
As PHP is primarily a web development language — vs. a systems language like C or Rust, or an enterprise application language like Java or C# — reducing code complexity for reading and understanding is a very important attribute of the language.
PHP is also a unique language and novel solutions benefit a unique language. PHP should pursue solutions that result in less complex code even if not found in other languages. Your collections idea is novel — which is great — but there are probably even more novel solutions to address other needs vs. going full-on with Java-style generics.
Consider if adding type aliases; or augmenting, enhancing, or even merging classes, interfaces, and/or traits to address the needs Java-style generics would otherwise provide. I would work on some examples but I think you are more likely to adopt the features you come up with on your own.
Part of the appeal for Java/C#/Kotlin-like generics is that they are
well understood and their usefulness is not to be proven.
Yes, they are well understood by programmers who develop in a significantly more complex language. So while I acknowledge that appeal, I think the complexity provides benefit for most PHP developers.
Also they fit well with the object-oriented aspect of the language,
Even more importantly, PHP is not Java and what works for a compiled and strongly typed language does not necessarily work for a interpreted language with looser typing and where only one file can be seen by the compiler at a time.
and many PHP projects already use them via PHPStan/Psalm.
As an aside, it is an interesting data point that such as small percent of PHP developers actually use those tools.
Could it be because of their complexity? I cannot say for certain that is why, but it surely is a factor to ponder.
More experimental alternatives would be more risky.
Fair point
I would be interested to see suggestions or examples, however.
Two examples were already shown and/or mentioned: the collections class and automatic interface implementation based on structural typing.
I am sure they are more, and if I am able to identify any as the topic is discussed I will bring them up.
As for type-erasure, I am on the fence, but I find the proposed "how" problematic.
I can see wanting some code to be type-checked and other code not, but I think more often developers would want code type-checked during development and testing but not for staging or production. And if the switch for that behavior is in every file that means modifying every file during deployment. IMO that is just a non-starter.The reason for this "how" is that type checking is also coercing, so
disabling it "from the outside" may break a program that's not
designed for that.
AFAIK if you are using type checking then the code is never correct if the types do not match, the errors just may go unreported. Thus I do not see how the code that uses code with types could not be designed for code with types; disabling if from the outside does not change that.
Disabling type checking is not like changing the syntax that is allowed by strict mode, AFAIK.
type checking is also coercing
However, I do not understand your claim here. Is there some form of typing that would modify code behavior if the types were erased? Would allowing that even make sense? Can you give an example of this?
That's why this is something that should be enabled
on a per-file basis, and can probably not be switched on/off depending
on the environment.
I reserve my opinion on this awaiting your example(s).
P.S. Also consider offering the ability for a function or class method to "type" a parameter or variable based on an interface and then allow values that satisfy that interface structurally[1] but not necessarily require the class to explicitly implement the interface.
Unfortunately, I believe that structural types would be very expensive
to implement at runtime. Static analysers could support this, however
(PHPStan/Psalm support some structural types already).
But would it really be too expensive? Has anyone ever pursued considering it, or just dismissed it summarily? Seems to me it could handled rather inexpensively with bitmaps.
-Mike
Thanks for sharing this research work.
Instead of having to choose between fully reified generics and erased type
declarations, couldn't we have both? A new option in php.ini could allow to
enable the “erased” mode as a performance, production-oriented optimization.
In development, and on projects where performance isn't critical, types
(including generics) will be enforced at runtime, but users will have the
option of opting to disable these checks for production environments.
If this is not possible, the inline caches presented in the article,
combined with “worker” runtimes such as FrankenPHP, Swoole, RoadRunner,
etc., could make the cost of enforcing generics negligible: technically,
types will be computed once and reused for many HTTP requests (because they
are handled by the same long-running PHP script under the hood). As working
runtimes already provide a significant performance improvement over FPM, we
could say that even if non-performance-critical applications (most
applications) will be a bit slower because of the new checks, people
working on performance-sensitive applications will have the opportunity to
reduce the cost of checks to virtually nothing by switching to a
performance-oriented runtime.
Cheers,
Thanks for sharing this research work.
Instead of having to choose between fully reified generics and erased
type declarations, couldn't we have both? A new option in php.ini could
allow to enable the “erased” mode as a performance, production-oriented
optimization.
In development, and on projects where performance isn't critical, types
(including generics) will be enforced at runtime, but users will have
the option of opting to disable these checks for production
environments.
Strictly speaking, yes, a disabled-types mode could be made regardless of what happens with generics. But the downsides of that approach remain the same. I'm personally against type erasure generally, in large part because I don't know what it would break in terms of reflection, and in part because I know people will turn it off for dev, too, and then end up writing buggier code.
If this is not possible, the inline caches presented in the article,
combined with “worker” runtimes such as FrankenPHP, Swoole, RoadRunner,
etc., could make the cost of enforcing generics negligible:
technically, types will be computed once and reused for many HTTP
requests (because they are handled by the same long-running PHP script
under the hood). As working runtimes already provide a significant
performance improvement over FPM, we could say that even if
non-performance-critical applications (most applications) will be a bit
slower because of the new checks, people working on
performance-sensitive applications will have the opportunity to reduce
the cost of checks to virtually nothing by switching to a
performance-oriented runtime.
From talking to Arnaud, the main issue here is the file-at-a-time compilation. I'm not entirely clear if a persistent process would side-step that, with the delayed resolution bits, or if those would have to be re-resolved each time. (That's an Arnaud question.) Another possibility that's been floated a bit, tangentially, is allowing some kind of multi-file loading, which would allow for a larger scope to be included at once as an opcache segment, and thus the optimizer could do more.
That said, I suspect the benefits of the JIT when using a worker-mode runner would be larger anyway.
Also, speaking for me personally and no one else, I am still very much in favor of official steps to improve worker-mode options in php-src directly. What form that takes I'm not sure, but I would very much favor making worker-mode a first-class citizen, or at least a one-and-a-half class citizen, rather than its current second-class status.
--Larry Garfield
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/
Thank you Arnaud, Derick, Larry for the article.
Do you consider the path of not adding generics to the core at all? In
fact, this path is implicitly taken during the last years. So maybe it
makes sense to enforce that status quo?
Potential steps:
- Make the current status quo official by recognizing generics PHPDoc
syntax as The Generics for PHP. Just adding a php.net manual page will
do. - Recognize Composer as the official PHP tool. It's currently not
mentioned on php.net at all. - Suggest using PHPStan or Psalm for generics and type checks.
- Add an official specification for generics in the PHP manual to
eliminate semantic variances between tools.
This will keep the core simple and reduce the maintenance burden, not
increase it.
Moreover, it does not contradict with any other implementation
mentioned in the article, should they happen. In fact, it could be a
first baby-step for any of them.
There is also an attempt to do generics via attributes –
https://github.com/php-static-analysis/attributes – it could
potentially be a better alternative of recognising “official” syntax,
because unlike PHPDocs, attributes can be available in core and the
syntax is checked.
What do you folks think?
-Roman
On Fri, Aug 23, 2024 at 8:51 AM Roman Pronskiy
roman.pronskiy@thephp.foundation wrote:
Do you consider the path of not adding generics to the core at all? In
fact, this path is implicitly taken during the last years. So maybe it
makes sense to enforce that status quo?Potential steps:
- Make the current status quo official by recognizing generics PHPDoc
syntax as The Generics for PHP. Just adding a php.net manual page will
do.- Recognize Composer as the official PHP tool. It's currently not
mentioned on php.net at all.- Suggest using PHPStan or Psalm for generics and type checks.
- Add an official specification for generics in the PHP manual to
eliminate semantic variances between tools.This will keep the core simple and reduce the maintenance burden, not
increase it.Moreover, it does not contradict with any other implementation
mentioned in the article, should they happen. In fact, it could be a
first baby-step for any of them.There is also an attempt to do generics via attributes –
https://github.com/php-static-analysis/attributes – it could
potentially be a better alternative of recognising “official” syntax,
because unlike PHPDocs, attributes can be available in core and the
syntax is checked.What do you folks think?
-Roman
Seems like a great plan, to be honest. I find it rather odd that Generics
is the most requested PHP feature because it seems like we have survived
without it for so long and I don't subscribe to the concept that this would
be the best thing for PHP since electricity.
Having an official "syntax" (Attributes, Docblock) could increase adoption.
On the other hand, I feel like a likely outcome is that folks will still
consider it as "something that doesn't exist yet" and will keep requesting
it.
Something else that is worth mentioning, I like that Collection is being
discussed as a small step as well. It's a very common use of Generics and
would be a great addition to the language if something solid comes out of
it.
--
Marco Deleu
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/Thank you Arnaud, Derick, Larry for the article.
Do you consider the path of not adding generics to the core at all? In
fact, this path is implicitly taken during the last years. So maybe it
makes sense to enforce that status quo?Potential steps:
- Make the current status quo official by recognizing generics PHPDoc
syntax as The Generics for PHP. Just adding a php.net manual page will
do.- Recognize Composer as the official PHP tool. It's currently not
mentioned on php.net at all.- Suggest using PHPStan or Psalm for generics and type checks.
- Add an official specification for generics in the PHP manual to
eliminate semantic variances between tools.This will keep the core simple and reduce the maintenance burden, not
increase it.Moreover, it does not contradict with any other implementation
mentioned in the article, should they happen. In fact, it could be a
first baby-step for any of them.There is also an attempt to do generics via attributes –
https://github.com/php-static-analysis/attributes – it could
potentially be a better alternative of recognising “official” syntax,
because unlike PHPDocs, attributes can be available in core and the
syntax is checked.What do you folks think?
-Roman
The null option is always an option, yes. The thing to understand is that today, we already have erased generics, via PHPStan/Psalm. That's one reason I am, personally, against erased generics in the language proper. They don't really offer anything we don't have already.
Moving those definitions to attributes is certainly possible, though AFAIK both the PHPStan and Psalm devs have expressed zero interest in it. Part of the challenge is that such an approach will either still involve string parsing, or will involve a lot of deeply nested attribute classes. For instance, if today you'd write:
/**
- @var array<string, Dict<string, Foo>>
*/
protected array $foos;
(An entirely reasonable lookup table for some circumstances). What would that be in attributes?
This would still need string parsing:
#[GenericType('string', 'Dict<string, Foo>>')]
And a form that doesn't need string parsing:
#[DictType('string', new Dict('string', Foo::class))]
Which is getting kinda ugly fast.
All else equal, if we have to keep generics to implicit/erased, I'd favor going all the way to the latter (no-string-parsing attributes), and revising the syntax along the way. (The current syntax used by SA tools is decidedly weird compared to most generic languages, making it hard to follow.)
If instead we used attributes for reified generics, then we have all the same challenges that make reified generics hard, just with a different syntax. As I understand it (again, Arnaud is free to correct me), these two syntaxes would be equally straightforward to parse, but also equally complex to implement the runtime logic for:
#[DictType('string', new Dict('string', Foo::class))]
protected array $foo;
protected Dict<string, Dict<'string', Foo>> $foo;
The latter is more compact and typical of our sibling languages, but once the parser is done, all of the other challenges are the same.
As for making docblock generics "official", one, as I noted I hate the current syntax. :-) Two, that seems unwise as long as PHP still has an option to remove comments/docblocks at compile time. Even if it's not used much anymore, the option is still there, AFAIK.
And that's before we even run into the long-standing Internals aversion to even recognizing the existence of 3rd party tools for fear of "endorsing" anything. (With the inexplicable exception of Docuwiki.)
--Larry Garfield
On Fri, Aug 23, 2024 at 4:27 PM Larry Garfield larry@garfieldtech.com
wrote:
Moving those definitions to attributes is certainly possible, though AFAIK
both the PHPStan and Psalm devs have expressed zero interest in it.
Part of the challenge is that such an approach will either still involve
string parsing,
That's not really a challenge and would help somewhat with the current
status quo where we have to guess where the type ends and the textual part
of the comment begins. But it gets ugly for any type that has to include
quotes (literal strings, array keys, etc). Technically one can use nowdocs,
but it's not much better: https://3v4l.org/4hpte
or will involve a lot of deeply nested attribute classes.
Yeah, that would look like Lisp's S-exprs, but much worse - which, in my
opinion, would harm adoption.
All in all, in my opinion attribute-based solutions are less ergonomic than
what we already have now in docblocks.
--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com
Moving those definitions to attributes is certainly possible, though AFAIK both the PHPStan and Psalm devs have expressed zero interest in it.
Part of the challenge is that such an approach will either still involve string parsing,That's not really a challenge and would help somewhat with the current status quo where we have to guess where the type ends and the textual part of the comment begins. But it gets ugly for any type that has to include quotes (literal strings, array keys, etc). Technically one can use nowdocs, but it's not much better: https://3v4l.org/4hpte
or will involve a lot of deeply nested attribute classes.
Yeah, that would look like Lisp's S-exprs, but much worse - which, in my opinion, would harm adoption.
All in all, in my opinion attribute-based solutions are less ergonomic than what we already have now in docblocks.
--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com
Thank you Larry for expressing some of the problems. Is there any reason nesting has to be supported out of the gate? Think about type hints. It started with some basic functionality and then grew over time. There is no reason we have to have a new kitchen sink, oven, dishwasher and stove when all we want is a new refrigerator.
— Rob
Moving those definitions to attributes is certainly possible, though AFAIK both the PHPStan and Psalm devs have expressed zero interest in it.
Part of the challenge is that such an approach will either still involve string parsing,That's not really a challenge and would help somewhat with the current status quo where we have to guess where the type ends and the textual part of the comment begins. But it gets ugly for any type that has to include quotes (literal strings, array keys, etc). Technically one can use nowdocs, but it's not much better: https://3v4l.org/4hpte
or will involve a lot of deeply nested attribute classes.
Yeah, that would look like Lisp's S-exprs, but much worse - which, in my opinion, would harm adoption.
All in all, in my opinion attribute-based solutions are less ergonomic than what we already have now in docblocks.
--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.comThank you Larry for expressing some of the problems. Is there any
reason nesting has to be supported out of the gate? Think about type
hints. It started with some basic functionality and then grew over
time. There is no reason we have to have a new kitchen sink, oven,
dishwasher and stove when all we want is a new refrigerator.— Rob
While I understand the temptation to "just do part of it", which comes up very often, I must reiterate once again that can backfire badly. That is only sensible when:
- There's a very clear picture to get from A->Z.
- The implementation of C and D cannot interfere with the design or implementation of J or K.
- The steps along the way offer clear self-contained benefits, such that if nothing else happens, it's still a "complete" system and a win.
- The part being put off to later isn't just putting off the "hard part".
In practice, the level at which you get all four is quite coarse, much coarser than it seems most people on this list think.
Examples of where we have done that:
-
Enums. The initial Enum RFC is part one of at least 3 steps. Step 2 is pattern matching, Step 3 is ADTs/tagged unions. Those are still coming, but all three were spec'ed out in advance (1), we're fairly confident that the enum design will play nice with tagged unions (2), and enums step 1 has very clearly been hugely positive for the language (3, 4).
-
Property hooks and aviz. These were designed together. They were originally a single planning document, way back in Nikita's original RFC. After effectively doing all the design work of both together, we split up the implementations to make them easier. Hooks was still a large RFC, but that was after we split things up. That meant we had a clear picture of how the two would fit together (1, 2), either RFC on its own would have been beneficial to the language even if they're better together (2, 3), and both were substantial tasks in themselves (4).
-
Gina's ongoing campaign to make PHP's type juggling have some passing resemblance to logic.
With generics, the syntax isn't the hard part. The hard part is type inference, or accepting that generic-using code will just be extraordinarily verbose and clumsy. There is (as I understand from Arnaud, who again can correct me if I'm wrong) not a huge amount of difference in effort between supporting only Foo<Bar> and supporting Foo<Bar<Baz>>. The nesting isn't the hard part. The hard part is not having to type Foo<Bar> 4 times across 2 files every time you do something with generics. If that can be resolved satisfactorily (and performantly), then the road map to reified generics is reasonably visible.
So for any intermediate generics implementation, it would need to have a very clear picture to get from that initial state to the final state (without the landmines that something like readonly gave us), we'd need to be confident we're not adding any landmines, each step would need to be useful in its own right, and it would have to be breaking up the "hard work" into reasonable chunks, not just punting the hard work for later.
Leaving out nested generics doesn't achieve those.
This is also why the dedicated collections work that Derick and I were looking into has been on pause, because adding a dedicated collections syntax, and then getting full reified generics later, would lead to a very ugly mess of inconsistency. Better to wait and try to get full generics first, or confirm once and for all that it's impossible.
Strategies that MIGHT make sense in that framework, and the ones on which we are specifically requesting feedback, include:
- Type-erased generics, with the expectation that they would become enforced at some point in the future. (Though this could lead to lots of "working" code suddenly not working once the enforcement was turned on.)
- No type inference. Generics are just very verbose, deal with it. Type inference could, potentially, be added later. (Maybe; it's not guaranteed that it could be done effectively, as the writeup discusses).
- Only allow generics over simple types, not union/intersection types. Unlike nested generics, union types do increase the cost of determining compatibility considerably, so making them performant is a much bigger challenge. Maybe that challenge could be punted for later? (And if later turns into never, or 10 years from now, is that still an acceptable end-state?)
The acceptability of each of these strategies is what we were hoping to determine in feedback to the writeup.
--Larry Garfield
Moving those definitions to attributes is certainly possible, though AFAIK both the PHPStan and Psalm devs have expressed zero interest in it.
Part of the challenge is that such an approach will either still involve string parsing,That's not really a challenge and would help somewhat with the current status quo where we have to guess where the type ends and the textual part of the comment begins. But it gets ugly for any type that has to include quotes (literal strings, array keys, etc). Technically one can use nowdocs, but it's not much better: https://3v4l.org/4hpte
or will involve a lot of deeply nested attribute classes.
Yeah, that would look like Lisp's S-exprs, but much worse - which, in my opinion, would harm adoption.
All in all, in my opinion attribute-based solutions are less ergonomic than what we already have now in docblocks.
--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.comThank you Larry for expressing some of the problems. Is there any
reason nesting has to be supported out of the gate? Think about type
hints. It started with some basic functionality and then grew over
time. There is no reason we have to have a new kitchen sink, oven,
dishwasher and stove when all we want is a new refrigerator.— Rob
While I understand the temptation to "just do part of it", which comes up very often, I must reiterate once again that can backfire badly. That is only sensible when:
- There's a very clear picture to get from A->Z.
- The implementation of C and D cannot interfere with the design or implementation of J or K.
- The steps along the way offer clear self-contained benefits, such that if nothing else happens, it's still a "complete" system and a win.
- The part being put off to later isn't just putting off the "hard part".
In practice, the level at which you get all four is quite coarse, much coarser than it seems most people on this list think.
I wasn't intending to just say "just do it," but rather, is it "good enough." As I mentioned in another email on this topic, right now there is only one person in the world who can work on the problem. Sure, we can leave drive-by comments and our own experiences/opinions here and on github, but ultimately, the knowledge of how it works and how it can be improved exists solely within one (or thereabouts) person's brain; on the entire planet.
This is sort-of how when we write software, we try to keep small PRs. Small PRs can be reviewed quickly and merged. Other developers on the team can start interacting with the code, even if the feature that it pertains to is incomplete. From that point forward, other developers can improve that code, separately from the person who is working on the feature. The knowledge of how it works is shared and people with different perspectives and experiences can make it better.
Examples of where we have done that:
Enums. The initial Enum RFC is part one of at least 3 steps. Step 2 is pattern matching, Step 3 is ADTs/tagged unions. Those are still coming, but all three were spec'ed out in advance (1), we're fairly confident that the enum design will play nice with tagged unions (2), and enums step 1 has very clearly been hugely positive for the language (3, 4).
Property hooks and aviz. These were designed together. They were originally a single planning document, way back in Nikita's original RFC. After effectively doing all the design work of both together, we split up the implementations to make them easier. Hooks was still a large RFC, but that was after we split things up. That meant we had a clear picture of how the two would fit together (1, 2), either RFC on its own would have been beneficial to the language even if they're better together (2, 3), and both were substantial tasks in themselves (4).
Gina's ongoing campaign to make PHP's type juggling have some passing resemblance to logic.
With generics, the syntax isn't the hard part. The hard part is type inference, or accepting that generic-using code will just be extraordinarily verbose and clumsy. There is (as I understand from Arnaud, who again can correct me if I'm wrong) not a huge amount of difference in effort between supporting only Foo<Bar> and supporting Foo<Bar<Baz>>. The nesting isn't the hard part. The hard part is not having to type Foo<Bar> 4 times across 2 files every time you do something with generics. If that can be resolved satisfactorily (and performantly), then the road map to reified generics is reasonably visible.
Ok. But wasn't there something about nesting causing super-linear performance issues? So, disable nesting and don't worry about inference. Obviously, people will want these things. I do remember a time in PHP where you had to type out function() use (stuff) {} for every little anonymous function call. It was super annoying. However, I don't think PHP would have ever even existed if someone didn't say "it is good enough for now."
Maybe Arnaud can solve inference, all by themselves with whatever resources can be thrown at them, but what if it comes to people actually using it ... and they hate inference? PHP is a verbose language, so it isn't that unthinkable. But seriously, that would be quite a waste of time and effort.
Instead of striving for perfection from the beginning, strive for getting something working that people are actually asking for (generics—nobody is asking for inference) and gather feedback. Maybe nobody actually wants inference, or they want a type of inference that only makes sense once you start using it that you can't even guess about right now.
So for any intermediate generics implementation, it would need to have a very clear picture to get from that initial state to the final state (without the landmines that something like readonly gave us), we'd need to be confident we're not adding any landmines, each step would need to be useful in its own right, and it would have to be breaking up the "hard work" into reasonable chunks, not just punting the hard work for later.
Leaving out nested generics doesn't achieve those.
See above. But if the actual issue is inference, then leave that out.
This is also why the dedicated collections work that Derick and I were looking into has been on pause, because adding a dedicated collections syntax, and then getting full reified generics later, would lead to a very ugly mess of inconsistency. Better to wait and try to get full generics first, or confirm once and for all that it's impossible.
Strategies that MIGHT make sense in that framework, and the ones on which we are specifically requesting feedback, include:
- Type-erased generics, with the expectation that they would become enforced at some point in the future. (Though this could lead to lots of "working" code suddenly not working once the enforcement was turned on.)
Personally, this would be interesting because you'd have go-like semantics. It wouldn't actually have to implement the interface, just have the same methods/properties.
It also leads to interesting thoughts in that you don't even need to do type-checking at function/method boundaries, only when/if it is used. But that would probably require some pretty big changes to zvals/type juggling. I don't work here, so all I can give is advice and a little bit of time. But, I suspect that if someone wanted to, type checking/juggling could be 5-10x faster than it currently is.
- No type inference. Generics are just very verbose, deal with it. Type inference could, potentially, be added later. (Maybe; it's not guaranteed that it could be done effectively, as the writeup discusses).
IIRC, typescript didn't have proper inference until years later. For example, initially, it could only infer primitive types (similar to PHP, really).
- Only allow generics over simple types, not union/intersection types. Unlike nested generics, union types do increase the cost of determining compatibility considerably, so making them performant is a much bigger challenge. Maybe that challenge could be punted for later? (And if later turns into never, or 10 years from now, is that still an acceptable end-state?)
Ah, this is what I was thinking of. Thank you. Yeah, instead of "nesting" prior, I was referring to union types.
The acceptability of each of these strategies is what we were hoping to determine in feedback to the writeup.
--Larry Garfield
— Rob
With generics, the syntax isn't the hard part. The hard part is type inference, or accepting that generic-using code will just be extraordinarily verbose and clumsy. There is (as I understand from Arnaud, who again can correct me if I'm wrong) not a huge amount of difference in effort between supporting only Foo<Bar> and supporting Foo<Bar<Baz>>. The nesting isn't the hard part. The hard part is not having to type Foo<Bar> 4 times across 2 files every time you do something with generics. If that can be resolved satisfactorily (and performantly), then the road map to reified generics is reasonably visible.
Ok. But wasn't there something about nesting causing super-linear performance issues? So, disable nesting and don't worry about inference.
[...]
Ah, this is what I was thinking of. Thank you. Yeah, instead of "nesting" prior, I was referring to union types.
Rob, with all the kindness I can give, please condense your emails to have a semblance of sense.
This is not a bar where you are having a one on one conversation.
You are sending emails to thousands of people on a mailing list that can read you.
It would be appreciated if you could go over everything you read, digest the content, and then form a reply.
Or at the minimum, if you realize that a previous remark you made does not apply, redraft the email.
And possibly even sit on it for a bit before sending it, as you routinely come up with a point you forgot to include in your email.
Reading the mailing list is an exhausting task, especially when the volume is excessive.
As a reminder to everyone, we have rules: https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
However, in your case, please note the following rule:
If you notice that your posting ratio is much higher than that of other people, double-check the above rules. Try to wait a bit longer before sending your replies to give other people more time to digest your answers and more importantly give you the opportunity to make sure that you aggregate your current position into a single mail instead of multiple ones.
For the past 2–3 months, you have sent the vast majority of emails on this list, this is not what I would consider normal nor expected for your level of "seniority" (for the lack of better word) on the project.
This is not to say to stop posting and replying, just to do it in a more conscious manner for the rest of us reading you.
Best regards,
Gina P. Banyard
With generics, the syntax isn't the hard part. The hard part is type inference, or accepting that generic-using code will just be extraordinarily verbose and clumsy. There is (as I understand from Arnaud, who again can correct me if I'm wrong) not a huge amount of difference in effort between supporting only Foo<Bar> and supporting Foo<Bar<Baz>>. The nesting isn't the hard part. The hard part is not having to type Foo<Bar> 4 times across 2 files every time you do something with generics. If that can be resolved satisfactorily (and performantly), then the road map to reified generics is reasonably visible.
Ok. But wasn't there something about nesting causing super-linear performance issues? So, disable nesting and don't worry about inference.
[...]
Ah, this is what I was thinking of. Thank you. Yeah, instead of "nesting" prior, I was referring to union types.Rob, with all the kindness I can give, please condense your emails to have a semblance of sense.
This is not a bar where you are having a one on one conversation.
You are sending emails to thousands of people on a mailing list that can read you.
It would be appreciated if you could go over everything you read, digest the content, and then form a reply.
Or at the minimum, if you realize that a previous remark you made does not apply, redraft the email.
And possibly even sit on it for a bit before sending it, as you routinely come up with a point you forgot to include in your email.Reading the mailing list is an exhausting task, especially when the volume is excessive.
As a reminder to everyone, we have rules: https://github.com/php/php-src/blob/master/docs/mailinglist-rules.mdHowever, in your case, please note the following rule:
If you notice that your posting ratio is much higher than that of other people, double-check the above rules. Try to wait a bit longer before sending your replies to give other people more time to digest your answers and more importantly give you the opportunity to make sure that you aggregate your current position into a single mail instead of multiple ones.
For the past 2–3 months, you have sent the vast majority of emails on this list, this is not what I would consider normal nor expected for your level of "seniority" (for the lack of better word) on the project.
This is not to say to stop posting and replying, just to do it in a more conscious manner for the rest of us reading you.Best regards,
Gina P. Banyard
Hi Gina!
I hope this email finds you well. Sincerely, thank you for your feedback; it's clear that you are addressing this issue with the best intentions.
I want to say that I understand the importance of this rule and keeping the mailing list conversations relevant, especially given the large audience. I want to also acknowledge that I have occasionally responded quickly without fully considering the impact on readability. Moving forward, I will make a conscious effort to ensure my emails are more thoroughly reviewed.
Regarding your point about condensing emails, I see where you are coming from. However, my approach has been to respond within the same thread to maintain context, which I believe helps keep the discussion more organized for threaded readers. I understand that there is probably a balance there and will be more mindful in the future.
For the past 2–3 months, you have sent the vast majority of emails on this list, this is not what I would consider normal
To understand just how bad I was breaking this rule, I created https://email.catcounter.guru/ for anyone on the list to see where they currently stand with their post-ratio in comparison to others. It is updated every two hours, and you can enter an email address in the top-right to unmask an email address, otherwise the email addresses are anonymous.
Best regards,
Rob
For the past 2–3 months, you have sent the vast majority of emails on
this list, this is not what I would consider normalTo understand just how bad I was breaking this rule, I created
https://email.catcounter.guru/ for anyone on the list to see where
they currently stand with their post-ratio in comparison to others. It
is updated every two hours, and you can enter an email address in the
top-right to unmask an email address, otherwise the email addresses
are anonymous.
lol, nice. I somehow only managed to guess the top two. I think it's
bugged. No way I can't figure out another in the top 5 😏
At least you found something to keep yourself busy!
Cheers,
Bilge
For the past 2–3 months, you have sent the vast majority of emails on this list, this is not what I would consider normal
To understand just how bad I was breaking this rule, I created https://email.catcounter.guru/ for anyone on the list to see where they currently stand with their post-ratio in comparison to others. It is updated every two hours, and you can enter an email address in the top-right to unmask an email address, otherwise the email addresses are anonymous.
lol, nice. I somehow only managed to guess the top two. I think it's bugged. No way I can't figure out another in the top 5 😏At least you found something to keep yourself busy!
Cheers,
Bilge
Hey Bilge,
You are in the top 10 for the day ;) but not yet past a 10% posting ratio at some point in the range, which is how to end up on the graph. Otherwise it gets too noisy with very many people only sending in one or two emails per day. If you change the window to 7 days, you should show up since you've been pretty active in the last week.
If you get involved more than 14 days straight, you are almost guaranteed to appear on that graph. If you have an RFC, you are almost guaranteed to end up on that graph (due to replying to many people). If you get into an argument at some point, you are almost guaranteed to end up on that graph. If you do all three ... well, due to my GMP RFC (https://wiki.php.net/rfc/operator_overrides_lite), I am the top poster on the list for the last 3 months, by a huge margin. It was very unpopular and I fought hard. (https://email.catcounter.guru/?range=52&window=80)
At least you found something to keep yourself busy!
I have more than enough to do; just not enough time. :D Pretty standard problems. The last week of the month is pretty hectic, and I usually get little time to read emails and reply to them until the next month.
— Rob
For the past 2–3 months, you have sent the vast majority of emails on this list, this is not what I would consider normal
To understand just how bad I was breaking this rule, I created https://email.catcounter.guru/ for anyone on the list to see where they currently stand with their post-ratio in comparison to others. It is updated every two hours, and you can enter an email address in the top-right to unmask an email address, otherwise the email addresses are anonymous.
lol, nice. I somehow only managed to guess the top two. I think it's bugged. No way I can't figure out another in the top 5 😏At least you found something to keep yourself busy!
You are in the top 10 for the day ;) but not yet past a 10% posting ratio at some point in the range, which is how to end up on the graph. Otherwise it gets too noisy with very many people only sending in one or two emails per day. If you change the window to 7 days, you should show up since you've been pretty active in the last week.
This is off topic. Please don't spoil this thread with this. Remeber that over a thousand people receive this.
cheers
Derick
The null option is always an option, yes. The thing to understand is that today, we already have erased generics, via PHPStan/Psalm. That's one reason I am, personally, against erased generics in the language proper. They don't really offer anything we don't have already.
As for making docblock generics "official", one, as I noted I hate the current syntax. :-) Two, that seems unwise as long as PHP still has an option to remove comments/docblocks at compile time. Even if it's not used much anymore, the option is still there, AFAIK.
It seems you answered your own point here. Erased generics do bring
better syntax.
And that's before we even run into the long-standing Internals aversion to even recognizing the existence of 3rd party tools for fear of "endorsing" anything. (With the inexplicable exception of Docuwiki.)
Are you referring to this page: https://wiki.php.net/wiki/dokuwiki? It
doesn’t appear to be an endorsement and could be removed easily.
Additionally, there are several third-party extensions documented on
php.net, such as Swoole, Ds, Yaf, etc., which are, in a way, endorsed
by their inclusion.
Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/cheers,
Derick
Hello,
I had an idea I wanted to share:
Since PHP Opcache now has an Intermediate Representation (IR), could we potentially store this IR in a file and recompile it on demand? This approach could make PHP ahead-of-time compiled (though I believe this already happens in-memory, to some extent, via opcache).
I'm curious to hear your thoughts on the potential benefits and drawbacks of this idea. Could this solve some existing issues (especially in regards to generics/etc), or might it introduce new ones? Additionally, would it make sense to consider breaking PHP into separate components: the runtime, the libraries/extensions, and the compiler?
Regards,
— Rob
Hi!
Arnaud, Larry, and I have been working on an article describing the
state of generics and collections, and related "experiments".You can find this article on the PHP Foundation's Blog:
https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/cheers,
Derick
To offer my own answers to the questions posed:
-
I am against erased generics in the language, on the grounds that we already have them via docblocks and user-space tooling. Pushing the entire ecosystem to move that existing syntax into another existing syntax that doesn't really offer the user anything new is not worth the effort or disruption it would cause. Reified, enforced generics would be worth the hassle.
-
Even if I could be convinced of erased generics as a stop-gap, I would want to see a official, supported, first-party support for validating them in the PHP linter command, or similar. If that is surprisingly hard (it may be), then that would preclude erased generics for me.
-
I am open to punting on type inference, or having only limited type inference for now (eg, leaving out union types). Primarily, doing so would be forward-compatible. Saying for now that you must type
foo<A>(new A())
even if it's redundant doesn't preclude the language in the future allowingfoo(new A())
that figures out the generic type for you, and the explicit code would still continue to work indefinitely. So this seems like an area that is safe to allow to grow in the future. -
I am also OK if there is a (small) performance hit for generic code, or for especially esoteric generic code. Eg, if
new Foo<A>
has a 1% performance hit vsnew Foo
, andnew Foo<A|B>
has a 5% performance hit, I can live with that.new Foo<A|B>
is a zany edge case anyway, so if that costs a bit more, I'm fine. It would not be fine if adding generics madenew Foo
30% slower, or ifnew Foo<A>
was 30% slower than the non-generic version. -
My sense at the moment is that in/out markers for variance would not be a blocker, so I'd prefer to include those from the start. That's caveated on them not being a blocker; mainly I want to make sure we ensure they can be done without breaking things in the future, and I suspect the difference between "make sure we can do it" and "just do it" is small. (I could be wrong on that, of course.)
-
I feel the same about
class Foo<A extends Something>
(Foo is generic over A, but A must implement the Something interface): We should make sure it's doable, and I suspect verifying that is the same as just doing it, so let's just do it. But if we can verify but it will be a lot more work to do, then we could postpone that. -
I could deal with the custom collection syntax, but I'd much rather have real reified generics and then build on top of that. That would be better for everyone. I'm willing to wait for it. (That gives me more time to design collections anyway. :-) )
--Larry Garfield
To throw one more question into the pot, I'd like to raise the
possibility of wanting typedefs. (To use the old C/C++ term, the latter
also allows a "type alias" syntax that does much the same thing.)
To use an example from the blog post:
function f(List<Entry<int,BlogPost>> $entries): Map<int, BlogPost>
{...}
there need not be an explicit Map
class; instead something like
type Map<X,Y> = List<Entry<X,Y>>;
(to use Rust syntax).
So that one could write Map<int,BlogPost>
and have it mean
List<Entry<int,BlogPost>>
. The blog example constructed a new Map
given the List of Entries, but with the alias that would become a no-op.
Meanwhile BlogPost
might itself be an alias for
StructuredText<\DomDocument>
. Without the aliasing therefore we're
looking at List<Entry<int,StructuredText<\DomDocument>>
and who is
going to write that over and over? It would muffle use of generics.
I seem to recall that C#'s (or was it Java?) early support of generics
didn't offer any sort of abbreviation mechanism and this lead to long
awkward concrete types that no-one wanted to use.
Calling them "aliases" implies that they're expanded before types are
matched up. If Wibble<string>
and Splunge<int>
both ultimately
expand to Foo<Bar<int>,Baz<string>>
then they would be considered to
be the same type, despite appearances.
Since I expect that type aliases would become a desired feature, I would
hesitate to allow/disallow making type inferences that could block their
introduction. While the alias resolution would be an early stage that
does mean it has to be able to identify any uses of a type in order to
check whether it is a type alias. I can't think of any specific examples
where this would be a problem, but it depends on how they actually end
up looking - they should look like types.