Hi Internals,
I'd like to kick off a conversation to capture everyone else's thoughts on
tweaking / improving typed properties for arrays (for a PHP 8.x release).
With all the work done lately to greatly improve the type support in PHP
(which is amazing by the way), I'm finding for the most part, I'm no longer
needing to Docblock as much of my code which is lovely.
That said, there's a common use case that keeps me going back to them which
I think would be a good thing for PHP to try and solve as a language
feature - better typing of arrays to type their properties.
IDEs like PHPStorm handle this structure already hence sticking to that as
a starting point...
@returns []int
This would designate the return of an array where all its keys are that of
the int type, but it works for any type.
With that in mind, it might also make sense to allow a shorthand array
alias for array types anyway - array -> [].
To use actual PHP examples, this would mean the following would be
supported:
// Typed array properties ...values would follow any existing PHO type
function returnsIntArray(): []int;
function returnsClassArray(): []Class;
// The same outcome
function returnsArray(): array;
function returnsArray(): [];
I welcome all your thoughts on this proposal.
Many thanks,
Aran
Am 17.01.2020 um 08:50 schrieb Aran Reeks:
@returns []int
int[] etc. is common-place, but I have never seen []int.
Hello all
It's a much-requested feature for years and years. My first thought was "we need generics, not this" but than I took 5 minutes to actually think about it. While the same, and much more, can be achieved with generics, it's a difficult feature to implement. There have been several RFCs for generics in the past, which failed. I know Levi Morisson was, at one point, looking at adding support for generics only in traits, because it's difficult to add them in other places.
While I still think generics would be a great feature, I now also believe it's worth looking at an "array of" type as something standalone. I've got no clue about the technical implications, but maybe suprting "array of" syntax is a lot more easy than full blown generics? Looking at my day to day work with PHP, I'dsay "array of" types would solve ~80% of my problems with PHP's current type system, and I figure there are lots of developers in a similar situation. If I remember correct from my college days, Java also supports both styles: Int[] and ArrayList<Int>.
All that to say that maybe it's worth the effort looking at "array of" types as something different than generics?
Kind regards
Brent
Hi Internals,
I'd like to kick off a conversation to capture everyone else's thoughts on
tweaking / improving typed properties for arrays (for a PHP 8.x release).With all the work done lately to greatly improve the type support in PHP
(which is amazing by the way), I'm finding for the most part, I'm no longer
needing to Docblock as much of my code which is lovely.That said, there's a common use case that keeps me going back to them which
I think would be a good thing for PHP to try and solve as a language
feature - better typing of arrays to type their properties.IDEs like PHPStorm handle this structure already hence sticking to that as
a starting point...@returns []int
This would designate the return of an array where all its keys are that of
the int type, but it works for any type.With that in mind, it might also make sense to allow a shorthand array
alias for array types anyway - array -> [].To use actual PHP examples, this would mean the following would be
supported:// Typed array properties ...values would follow any existing PHO type
function returnsIntArray(): []int;
function returnsClassArray(): []Class;// The same outcome
function returnsArray(): array;
function returnsArray(): [];I welcome all your thoughts on this proposal.
Many thanks,
Aran
Hi,
So essentially we are talking about generics. I think it's the best time to
do so... Maybe our wishes come true soon? ;)
Cheers,
Máté
So essentially we are talking about generics. I think it's the best time to
do so... Maybe our wishes come true soon? ;)
Given that the general trend is towards making PHP more statically
typed and very java/C# like, why not just ditch PHP and use one of the
aforementioned languages?
So essentially we are talking about generics. I think it's the best time
to
do so... Maybe our wishes come true soon? ;)Given that the general trend is towards making PHP more statically
typed and very java/C# like, why not just ditch PHP and use one of the
aforementioned languages?--
Who's this?
How does this feature means PHP becoming more static type language?
Does adding strict typing features remove any dynamic type features of the
language?
Nope, this still dynamic typing coz it can do both as the need demands.
So essentially we are talking about generics. I think it's the best time to
do so... Maybe our wishes come true soon? ;)Given that the general trend is towards making PHP more statically
typed and very java/C# like, why not just ditch PHP and use one of the
aforementioned languages?
Because those languages suck for scripted use. For shared-nothing scripting, PHP beats the pants off of them.
That doesn't mean we can't continue PHP's fine tradition of stealing good ideas liberally from every language we can find. We can and should do so. (Whether we adopt Generics in the Java.C#/C++ style or pull from some other language is a separate debate.)
cf: https://24daysindecember.net/2019/12/06/growing-gradually-in-php/
--Larry Garfield
That said, there's a common use case that keeps me going back to them which
I think would be a good thing for PHP to try and solve as a language
feature - better typing of arrays to type their properties.
I for one would be a big +1 for this, with caveats.
IDEs like PHPStorm handle this structure already hence sticking to that as
a starting point...@returns []int
As previously noted, I assume you meant int[]?
Having the ability to type array elements would cover ~90% of the cases where I cannot properly type parameters or return values in PHP 7.4.
The caveat is that it would seem that to check dynamically would be an expensive proposition such as when an array that was not typed is passed to or returned from a function or assigned to a variable of a declared type, e.g.
function foo( $myarray ): Foo[] {
return $myarray; // This would need to be dynamically checked, I think?
}
Of course we could limit this type of typing to only arrays that are already know to be typed, meaning this would always fail:
function bar( Foo[] $myarray ) {
// Do whatever
}
function baz( $myarray ) {
bar( $myarray ); // Fails here because $myarray not known to be Foo[] even when it is
}
baz( [ new Foo() ] );
Alternately we could have a global option that would do type checking or not for these type hints, so they could be dynamically checked for all code prior to production code, where checking could be turned off.
Another option could be if the number of array elements is small (<100?) it could check, but otherwise not check, but this feels all kind of different types of wrong.
My vote, if I had one, would be to add type new typing for array elements, but also add a type checking global option that can be in one of 3 states:
- Static checks only,
- Dynamic checks only, or
- No checking of array elements.
#jmtcw
-Mike
P.S. Or maybe there is an inexpensive way to keep track of the types of the entire array on element assignment?
P.P.S. Can someone please explain and give an example of how generics would make this need moot? I do not get why that would be the case...
Hi Mike,
Thanks for your support, and yes, you're correct, I did mean to structure
the type prior to the [].
I'm unsure of exactly how this might work so defer to an Internals export,
but having previously read @Nikita Popov nikita.ppv@gmail.com's great
post on PHP's arrays, I did wonder if by knowing the data type within an
array and that it'd conform to a strict structure, could the array itself
be stored in an alternative way in C? Perhaps a more memory efficient way,
or one that's faster to iterate over rather than just a hash table?
Link to this article for reference:
https://nikic.github.io/2012/03/28/Understanding-PHPs-internal-array-implementation.html
Cheers,
Aran
That said, there's a common use case that keeps me going back to them
which
I think would be a good thing for PHP to try and solve as a language
feature - better typing of arrays to type their properties.I for one would be a big +1 for this, with caveats.
IDEs like PHPStorm handle this structure already hence sticking to that
as
a starting point...@returns []int
As previously noted, I assume you meant int[]?
Having the ability to type array elements would cover ~90% of the cases
where I cannot properly type parameters or return values in PHP 7.4.The caveat is that it would seem that to check dynamically would be an
expensive proposition such as when an array that was not typed is passed to
or returned from a function or assigned to a variable of a declared type,
e.g.function foo( $myarray ): Foo[] {
return $myarray; // This would need to be dynamically checked, I
think?
}Of course we could limit this type of typing to only arrays that are
already know to be typed, meaning this would always fail:function bar( Foo[] $myarray ) {
// Do whatever
}
function baz( $myarray ) {
bar( $myarray ); // Fails here because $myarray not known to be
Foo[] even when it is
}
baz( [ new Foo() ] );Alternately we could have a global option that would do type checking or
not for these type hints, so they could be dynamically checked for all code
prior to production code, where checking could be turned off.Another option could be if the number of array elements is small (<100?)
it could check, but otherwise not check, but this feels all kind of
different types of wrong.
My vote, if I had one, would be to add type new typing for array elements,
but also add a type checking global option that can be in one of 3 states:
- Static checks only,
- Dynamic checks only, or
- No checking of array elements.
#jmtcw
-Mike
P.S. Or maybe there is an inexpensive way to keep track of the types of
the entire array on element assignment?
P.P.S. Can someone please explain and give an example of how generics
would make this need moot? I do not get why that would be the case...
I'm unsure of exactly how this might work so defer to an Internals export,
but having previously read @Nikita Popov nikita.ppv@gmail.com's great
post on PHP's arrays, I did wonder if by knowing the data type within an
array and that it'd conform to a strict structure, could the array itself
be stored in an alternative way in C? Perhaps a more memory efficient way,
or one that's faster to iterate over rather than just a hash table?
Link to this article for reference:
https://nikic.github.io/2012/03/28/Understanding-PHPs-internal-array-implementation.html
A significant improvement that could be made to memory efficiency is to reduce the minimum array size for dynamically created arrays.
Currently, it's always 8 and must be a power of 2, even if there's only one element or the array is packed (i.e. a list without gaps)
- Array constants cached in opcache already have this optimization.
I have a PR that reduces the minimum size to 2 - the largest blockers were getting a realistic idea of whether commonly used applications
would see a decrease or increase in runtime, and needing code review.
("I'd want to know how this would be benchmarked before adding more changes, in case additional optimizations somehow turn out to be a performance regression.")
For example, Phan's memory usage for self-analysis (./phan --print-memory-usage-summary)
went from 444MB/576MB to 387MB/475MB (end memory/max memory) with this patch.
(This is a static analyzer for PHP which heavily uses small arrays)
I did wonder if by knowing the data type within an
array and that it'd conform to a strict structure
could the array itself be stored in an alternative way in C?
It might theoretically be possible to optimize the memory of an int[]
(64 bits per element) without references or gaps in elements.
If a reference in an element got added, or it stopped being a packed array without gaps (i.e. a list),
then such an implementation would use an unoptimized version that also enforces the type constraint
(128 bits per list element with zvals).
It would probably require changing a lot of APIs and macros, and I'd have no idea how to implement it.
I think JavaScript engines do something similar by having specializations for JS arrays of integers,
arrays of floats, arrays of small (e.g. 16-bit) integers, etc.
I have a PR that reduces the [minimum capacity of a packed array from 8] to 2, noticeably decreasing memory -
the largest blockers were getting a realistic idea of whether commonly used applications
would see a decrease or increase in runtime, and needing code review.
Sorry, I forgot to include the link to it, it was https://github.com/php/php-src/pull/4783
- Tyson
Hi Internals,
I'd like to kick off a conversation to capture everyone else's thoughts on
tweaking / improving typed properties for arrays (for a PHP 8.x release).With all the work done lately to greatly improve the type support in PHP
(which is amazing by the way), I'm finding for the most part, I'm no longer
needing to Docblock as much of my code which is lovely.That said, there's a common use case that keeps me going back to them which
I think would be a good thing for PHP to try and solve as a language
feature - better typing of arrays to type their properties.IDEs like PHPStorm handle this structure already hence sticking to that as
a starting point...@returns []int
This would designate the return of an array where all its keys are that of
the int type, but it works for any type.With that in mind, it might also make sense to allow a shorthand array
alias for array types anyway - array -> [].To use actual PHP examples, this would mean the following would be
supported:// Typed array properties ...values would follow any existing PHO type
function returnsIntArray(): []int;
function returnsClassArray(): []Class;// The same outcome
function returnsArray(): array;
function returnsArray(): [];I welcome all your thoughts on this proposal.
Many thanks,
Aran
Hi Aran,
Did you read through the previous discussions on this topic?
https://externals.io/message/100946 in particular comes to mind.
The primary concern about the previous typed array proposal was the O(n)
cost of type checks, which required iterating over the whole array and
checking the type of individual elements. Any new proposal in this area
must address this concern.
As far as I know, the only viable way to do that is to make the array
intrinsically typed, which means that types are validated when elements are
inserted into the array, not when it is passed across a function boundary.
In other words, array generics.
Regards,
Nikita
Hi Nikita,
Did you read through the previous discussions on this topic?
https://externals.io/message/100946 in particular comes to mind.
Thanks for this link. It was very insightful.
The primary concern about the previous typed array proposal was the O(n)
cost of type checks, which required iterating over the whole array and
checking the type of individual elements. Any new proposal in this area
must address this concern.
Agreed.
As far as I know, the only viable way to do that is to make the array
intrinsically typed, which means that types are validated when elements are
inserted into the array, not when it is passed across a function boundary.
In other words, array generics.
Reading the prior discussion, there appeared to be several other potential approaches, but none were followed to a conclusion. Of course it devolved into bikeshedding, but I digress...
One approach mentioned by Andrea Faulds was to extend the hashtable (ref: your article[1]) and count types as assigned just like we currently count references. So a 10,240 element array of ints could have an internal tracker showing that the array contains type(s): ['int' => 10240]. Append a string value to the array and then the types would be ['int' => 10240, 'string' => 1].
Mark Randall mentioned that this would not work if the array contained references, but no one discussed the potential of simply disallowing arrays with references at compile time when the arrays are typehinted, which seems like it could solve the proverbial 80/20 scenario. Need references in arrays? Don't typehint them.
Also, Rowan Collins mentioned that checks in Go can be disabled for runtime checking; maybe we could support an option that disables said checking so that production sites could run w/o checks but we could run checks in development, testing and staging. We could also have an option to disable checking of array types above a given size of array, maybe defaulting to 1024? Clearly both of these would be no worse than what we have today.
I think this could add a major improvement to PHP all without having to finalize the design and implementation of generics, no?
Are none of these viable options? I am asking that as a legitimate question — as I am not (yet) a PHP internals developer — and not just assuming they are viable options.
-Mike
[1] https://nikic.github.io/2012/03/28/Understanding-PHPs-internal-array-implementation.html
P.S. There was also the mention by Levi Morrison that the type[] syntax was a poor one because of ambiguity between (?int)[] or ?(int[]). I would argue that the latter would likely occur orders of magnitude more often than the former, so I would argue that ?int[] should interpret as ?(int[]), and if they want (?int)[] then the developer should use parentheses.
P.S. There was also the mention by Levi Morrison that the type[] syntax was a poor one because of ambiguity between (?int)[] or ?(int[]). I would argue that the latter would likely occur orders of magnitude more often than the former, so I would argue that ?int[] should interpret as ?(int[]), and if they want (?int)[] then the developer should use parentheses.
As a thought, perhaps the syntax '[Type]' for an array of Type. That way, you could write ?[int], or [?int], or even ?[?int] and there would be no ambiguity, and no need for parentheses since the array brackets would serve that purpose.
If we also wanted to allow typing array keys, this syntax could be extended to [string : Type] and [int: Type], and it would continue to remain unambiguous, even with nested arrays, and with using a more similar syntax than the docblock syntax array<string, Type>. (It might also be reasonable to support both variants as aliases of each other.)
Both of these are the syntax Swift uses for arrays and dictionaries, so the syntax has precedence from another language. Swift also supports both syntaxes as described above ([KeyType : ValueType] is exactly the same as Dictionary<KeyType, ValueType>), but the shorter bracket syntax is preferred for readability.
-John
As a thought, perhaps the syntax '[Type]' for an array of Type. That way, you could write ?[int], or [?int], or even ?[?int] and there would be no ambiguity, and no need for parentheses since the array brackets would serve that purpose.
That syntax was what someone suggested on the prior discussion. I personally dislike it because I have used PHPDoc syntax of type[]
for so long and would rather see us stick with that.
But if I'm honest about it, debate over syntax is probably just bikeshedding at this point.
The more important question IMO is, can we actually implement typed arrays to enough voter's satisfaction and w/o a significant performance penalty?
-Mike
One approach mentioned by Andrea Faulds was to extend the hashtable (ref: your article[1]) and count types as assigned just like we currently count references. So a 10,240 element array of ints could have an internal tracker showing that the array contains type(s): ['int' => 10240]. Append a string value to the array and then the types would be ['int' => 10240, 'string' => 1].
This would work really well for simple types like 'int' and 'string',
but loses its advantage fast with things like interfaces and
pseudo-types. For instance, if you have an array with objects of 20
different classes, and need to check it against a constraint of
SomeInterface[], you still have to test all 20 classes to see if they
implement that interface.
The overhead is also rather high, because you have to allocate memory
for this list on every array, and keep it up to date on every write,
even if it's never used.
I've had a similar idea in the past, but rather than trying to list the
types in advance, just cache them after passing (or failing) a type
check, so more like [ 'SomeInterface[]' => true, 'SomeOtherInterface[]'
=> false ]. Even if you just wiped the cache completely on every write,
I think that would give a decent boost, because there will often be
cases where a value is passed through a series of related functions all
expecting the same type.
The worst case pseudotype is probably "callable[]", because it's
actually context-dependent (e.g. [$object, 'privateMethod] is only
"callable" inside the same class as $object) so can't be pre-calculated
or cached. That would be problematic even with full generics -
logically, List<callable> would check each member was callable when it
was added to the list, but it might turn out not to be callable when it
was accessed later.
Regards,
--
Rowan Tommins (né Collins)
[IMSoP]
Also, Rowan Collins mentioned that checks in Go can be disabled for
runtime checking; maybe we could support an option that disables said
checking so that production sites could run w/o checks but we could run
checks in development, testing and staging. We could also have an option to
disable checking of array types above a given size of array, maybe
defaulting to 1024? Clearly both of these would be no worse than what we
have today.
You are getting into static analysis territory here with that. There are
already static analysis tools that do exactly this type of array type
checking during development. For example, there are three type mistakes in
this code:
1 <?php
2 class C {
3 /**
4 * @param int[] $ints
5 * @param string[] $strings
6 * @return array<int,string>
7 */
8 static function f(array $ints, array $strings):array {
9 return array_combine($strings, $ints);
10 }
11 }
12 print_r(C::f([3,2,'1'], ['abc', 'def', 42]));
Running Phan on it produces:
array.php:9 PhanTypeMismatchReturn Returning type array<string,int> but f()
is declared to return array<int,string>
array.php:12 PhanTypeMismatchArgument Argument 1 ($ints) is
array{0:3,1:2,2:'1'} but \C::f() takes int[] defined at array.php:8
array.php:12 PhanTypeMismatchArgument Argument 2 ($strings) is
array{0:'abc',1:'def',2:42} but \C::f() takes string[] defined at
array.php:8
The code itself would run in production without errors, of course, and
would produce:
Array
(
[abc] => 3
[def] => 2
[42] => 1
)
But at Etsy, at least, this code would never make it to production because
static analysis checks are run by all developers and also run automatically
during staging prior to a production push.
Really expensive checks like this belong at the static analysis stage. And
yes, it would be amazing to have a static analyzer built into PHP, which is
basically what you are asking for here, but that is a huge task and goes
way beyond just this particular check.
-Rasmus
You are getting into static analysis territory here with that. There are already static analysis tools that do exactly this type of array type checking during development. For example, there are three type mistakes in this code:
<snip>
Running Phan on it produces:
Understood. But in my experience a large number of PHP developers do not use Phan. At least not in the WordPress realm.
For my current project we tried for two days to get Phan to work but it generated so many errors that were not actually errors we gave up. I am sure it were possible if we had had the time and expertise to configure it correctly we could have gotten it working, but I would not be surprised if we are unique in that respect.
IOW, if a tool is very complex to get working, its existence is not a solution except for advanced teams and use-cases where the benefits are so overwhelming that teams managers are willing to fund the time it takes to implement.
Really expensive checks like this belong at the static analysis stage. And yes, it would be amazing to have a static analyzer built into PHP, which is basically what you are asking for here,
Expensive checks would not be a problem if they could be run once during OpCode generation without affecting day-to-day code generation, right?
But at Etsy, at least, this code would never make it to production because static analysis checks are run by all developers and also run automatically during staging prior to a production push.
To be fair, I would say Etsy is an extreme outlier.
Few business across the economy are fully web-based, have the revenue of Etsy and thus the financial downside Esty experiences when there is a problem on their website. Etsy is exactly the type of use-case I was referring to where the benefits of using tools like Phan are so overwhelming that management understands the need.
But many other companies won't see such an overwhelming benefit and thus managers often just don't appreciate the need to work on it.
#justsaying
but that is a huge task and goes way beyond just this particular check.
Understood.
But my above comments are to point out that the existence of Phan is not a panacea.
-Mike
As far as I know, the only viable way to do that is to make the array
intrinsically typed, which means that types are validated when elements are
inserted into the array, not when it is passed across a function boundary.
In other words, array generics.
What if we left the array type alone, and instead focussed on "list<Foo>"
type and "dict<string, Foo>", "dict<int, Bar>" types?
That would allow a clear break from previous behaviour, and would allow you
to introduce other changes (e.g. removing string -> int coercion for
numeric string keys).
What if we left the array type alone, and instead focussed on "list<Foo>"
type and "dict<string, Foo>", "dict<int, Bar>" types?
That would allow a clear break from previous behaviour, and would allow you
to introduce other changes (e.g. removing string -> int coercion for
numeric string keys).
Can't agree more.
— Benjamin
Am 21.01.2020 um 22:21 schrieb Matthew Brown:
What if we left the array type alone, and instead focussed on "list<Foo>"
type and "dict<string, Foo>", "dict<int, Bar>" types?That would allow a clear break from previous behaviour, and would allow you
to introduce other changes (e.g. removing string -> int coercion for
numeric string keys).
Just to make sure I understand you correctly: are you proposing new data
structures, names list and dict, in addition to array that can bring more
specific / strict semantics?
Yes!
Though I don't necessarily think they need to be genericised (e.g.
list<int>) in the language itself – just having those alternate
datatypes would, I think, be a boon to the language itself – with list (a
subtype of array) more useful to me than dict.
Am 21.01.2020 um 22:21 schrieb Matthew Brown:
What if we left the array type alone, and instead focussed on "list<Foo>"
type and "dict<string, Foo>", "dict<int, Bar>" types?That would allow a clear break from previous behaviour, and would allow
you
to introduce other changes (e.g. removing string -> int coercion for
numeric string keys).Just to make sure I understand you correctly: are you proposing new data
structures, names list and dict, in addition to array that can bring more
specific / strict semantics?
Thus far we have discussed that implementation of type checking for arrays would be too costly from a performance perspective and that there is no good solution that is not extremely complicated to implement.
Given that, can we consider an alternative?
ALLOW the use of a syntax for typed arrays — whether it be type[] or [type] — but only validate that it is an array and that the "type" is in fact a type, but don't actually validate that each element is the correct type.
This would allow those of us who want to start documenting specific usage using type hints to be able to do so instead of what we currently have to do is PHPDoc one way and type hint with "array." It would also allow IDEs like PhpStorm to add support.
Is this something the PHP community would consider?
-Mike
ALLOW the use of a syntax for typed arrays — whether it be type[] or [type] — but only validate that it is an array and that the "type" is in fact a type, but don't actually validate that each element is the correct type.
I don't really see much point in that. Tools are happily reading this
information from docblocks already, so all I can see this achieving is:
- Misleading users into thinking the language will guarantee something
when it won't. - Making it harder to use that syntax for a different purpose later,
because code will be out there which lists such constraints but violates
them, and will suddenly fail if the checks are enforced.
Regards,
--
Rowan Tommins (né Collins)
[IMSoP]
Thus far we have discussed that implementation of type checking for arrays would be too costly from a performance perspective and that there is no good solution that is not extremely complicated to implement.
Given that, can we consider an alternative?
ALLOW the use of a syntax for typed arrays — whether it be type[] or [type] — but only validate that it is an array and that the "type" is in fact a type, but don't actually validate that each element is the correct type.
This would allow those of us who want to start documenting specific usage using type hints to be able to do so instead of what we currently have to do is PHPDoc one way and type hint with "array." It would also allow IDEs like PhpStorm to add support.
Is this something the PHP community would consider?
-Mike
My opinion is that if you're going to declare the type of a variable, you have to have some way of enforcing that the type really is what you say it is. Otherwise, the type information is basically a lie, and you're much better off without it. If PHP's type system can't or won't enforce the type fully as declared, then it's much better to have phpdocs that assert what the type "really" is, like we do now.
I have some ideas on what might be reasonably performant array type enforcement, but having not fully read the thread, I'll keep that to myself until I have a chance to see that I have anything that's actually novel.
-John