Apologies for the double post - I missed a tag and I'm not sure the list
server will send it along because of that mistake.
I would like to propose a clean way to add some strong typing to PHP in a
manner that is almost fully backward compatible (there is a behavior change
with PHP 7 type declarations). As I don't have access to the add RFC's to
the wiki I'll place this here.
Before I begin detailing this I want to emphasize this syntax is optional
and lives alongside PHP's default scalar variables. If variables aren't
declared using the syntax detailed below than nothing changes. This is not
only for backwards compatibility, but it's also to keep the language easy
to learn as understanding datatypes can be a stumbling block (I know it was
for me at least).
VARIABLE DECLARATION
Currently the var keyword is used to formally declare a variable. The
keyword will now allow a type argument before the var name as so
var [type] $varname;
If the type is omitted, scalar is assumed. If Fleshgrinder's scalar RFC is
accepted then it would make sense to allow programmers to explicitly
declare the variable as a scalar, but in any event when the type is omitted
scalar must be assumed for backwards compatibility.
The variables created by this pattern auto cast anything assigned to them
without pitching an error. So...
var string $a = 5.3;
The float of 5.3 will be cast as a string.
For some this doesn't go far enough - they'd rather have a TypeError thrown
when the assignment isn't going to work. For them there is this syntax
string $a = "Hello";
Note that the var keyword isn't used.
FUNCTION DECLARATION
PHP 7 introduced type declarations. This RFC calls for these to become
binding for consistency, which introduces the only backward compatibility
break of the proposal. Consider the following code.
function foo ( string $a ) {
$a = 5;
echo is_int($a) ? 'Yes' : 'No';
}
Under this RFC "No" is returned because 5 is cast to a string when assigned
to $a. Currently "Yes" would be returned since a scalar has the type that
makes sense for the last assignment.
I believe this is an acceptable break for two reasons. 1, the type
declaration syntax is relatively new. 2, changing the type of a variable
mid-function is a bad pattern anyway.
OBJECT TYPE LOCKING
Currently there is no way to prevent a variable from being changed from an
object to something else. Example.
$a = new SomeClass();
$a = 5;
If objects are allowed to follow the same pattern outlined above though
this problem is mostly solved..
SomeClass $a = new SomeClass();
var SomeClass $a = new SomeClass();
QUESTION: How do we handle the second auto casting case? $a is not allowed
to not be a SomeClass() object, but there are no casting rules. We have
three options:
- Throw an error on illegal assign.
- Allow a magic __cast function that will cast any assignment to the
object. - Create a PHP Internal interface the object can implement that will
accomplish what 2 does without the magic approach.
Note that 1 will need to occur without implementation. 2 and 3 are not
mutually exclusive though my understanding is PHP is moving away from magic
functions.
CLASS DECLARATION
Again, by default class members are scalars. The syntax translates over
here as might be expected.
class SomeClass {
public var string $a;
protected int $b;
private SomeOtherClass $c;
public var SomeThirdClass $d;
}
Note a default value doesn't need to be provided. In the case of object
members, these types are only checked for on assignment to prevent
recursion sending the autoloader into an infinite loop.
Also note that one of the functions of setters - guaranteeing correct type
assignment - comes free of charge with this change.
COMPARISON BEHAVIOR
When a strongly typed variable (autocasting or not) is compared to a scalar
variable only the scalar switches types. The strict comparison operator is
allowed though it only blocks the movement of the scalar.
Comparisons between strongly typed variables are always strict and a
TypeError results if their types don't match. This actually provides a way
to force the greater than, lesser than, and spaceship operation to be
strict.
FUNCTION CALLING
When a strong typed variable is passed to a function that declares a
variable's type then autocasting will occur so long as the pass is not by
reference. For obvious reasons a TypeError will occur on a by reference
assignment..
function bar( string $a) {}
function foo( string &$a ) {}
$a = 5.3;
foo( $a ); // Works, $a is a scalar, so it type adjusts.
var bool $b = false;
foo( $b ); // TypeError, $b is boolean, function expects to receive a
string by reference.
bar($b); // Works since the pass isn't by reference, so the type can be
adjusted for the local scope.
CONCLUSION
I believe that covers all the bases needed. This will give those who want
things to use strong typing better tools, and those who don't can be free
to ignore them.
Second Draft based on the feedback upstream.
Target version: PHP 8.
This is a proposal to strengthen the dynamic type checking of PHP during
development.
Note - this is not a proposal to change PHP to a statically typed language
or to remove PHP's current loose typing rules. PHP is a weakly typed
language for a reason, and will remain so subsequent to this RFC. This RFC
is concerned with providing tools to make controlling variable types
stronger when the programmer deems this necessary.
VARIABLE DECLARATION
PHP currently has no keyword to initialize a variable - it is simply
created when it is first referenced. The engine infers the appropriate type
for the variable, and this may be later cast to other types depending on
the context of the code. Objects can have magic functions to carry out this
casting such as __toString.
It is sometimes useful to explicitly state a variable's type. One case is
when the engine might incorrectly infer the type. For example "073117" is a
valid octal integer but also a date string in mmddyy format, so a
comparison with another date string in the same format could be... amusing.
While there is a string comparison function, that functions presence is
borne of the fact that we can't reliably compare "073117" with say,
"010216" because of the int casting possibility.
Since the scalar types have already been reserved as keywords they can be
used to declare variables in a manner not unlike C or Java.
int $a = 073117;
The var keyword is still around from PHP 4 days but is going unused. In
JavaScript var is used to formally declare a variable though it isn't
required (It remains important because without it JavaScript will search
the scope chain of the current closure all the way to the top scope. If it
doesn't find the reference it only then creates one. This can lead to huge
headaches so the creation of variables without using var is strongly
discouraged in JavaScript).
Since the keyword is available, let's make use of it.
var $a = "123";
What I propose this will do is formally declare $a, infer it's type, then
LOCK the type from casting. If further assignments are made to the variable
the quantity being assigned will be cast to type desired if possible,
otherwise a type error will be raised.
var string $a = $_POST['date'];
This syntax allows the programmer to choose the type rather than allowing
the engine to infer it. Here $_POST['date'] might be provided in date
string that might be confused for an octal int.
This magical casting is suggested because it follows the spirit of PHP, but
it may not be strict enough. For those the type can be explicitly declared
without using the var keyword as follows.
int $a = 4;
In this event a type error will occur on any attempt to assign a value to
$a that isn't an int.
The variable can still be re-declared in both cases so.
var $a = 4;
string $a = "Hello";
The var keyword can be combined with the new keyword to lock an object
variable so it doesn't accidentally change
var $a = new SomeClass();
As noted above a deliberate redeclare can still change the type of $a.
ARRAYS
All members of an array can be cast to one type using this syntax
var string array $a = [ 'Mary', 'had', 'a', 'little', 'lamb' ];
int array $b = [1,2,3,5];
Or members can be individually cast
var $a = [ var 'Todd', var 'Alex' ];
$b = [string 'id' => int 1, 'name' => string 'Chad'];
Again, following rules similar to the above. The main reason for doing
this is to insure smooth interaction with the pack and splat operators.
function foo (var string array $a = ...);
And speaking of functions, that's the next section.
FUNCTION DECLARATION
Variables are also declared as arguments to functions. I propose using the
var keyword to lock the resulting variable and perform a cast if possible.
function foo( var string $a, var $b ) {}
Note that using var without explicitly calling type will be legal if rarely
used for consistency reasons. Also, someone might have a use for an
argument who's type could be anything, but won't change after it is
received.
The type can also be inferred from the default.
function foo( var $a = "hello" ) {}
This syntax is essentially doing a redeclare of the variable. This could be
very troublesome with references, so a Type error will result if this is
tried.
function foo ( var &$a = "Hello" ) {}
$b = 3;
foo($b);
With objects the var keyword can be used to prevent the function from
changing the object.
function foo ( var SomeClass $a ) {}
CLASS MEMBER DECLARATION
Variables also appear as object members. Following the pattern established
above their types can be locked. A couple note though
class SomeClass {
var $a = '3';
public var $b = 'hello';
}
For backwards compatibility the var keyword by itself must be equivalent to
"public". It is only when a scope operator is present that var takes on its
new meaning in this context.
Magic __set and __get cannot access variables with locked types because,
well, it will be a bloody mess. Basic getter/setter behavior (insuring the
datatype is correct) is accomplished just with the ability to type lock.
Beyond that explicit getters and setters will be needed, and once again an
inbuilt interface will be invoked. The interface is a little magical though
like the ArrayAccess interface.
class SomeClass implements AccessorInterface {
protected $a = '';
protected var $b = "string";
protected int $c = 5;
public get_a () { return $this->a; }
public set_b( $val ) { $this->b = (string) $val; }
public get_c():int { return $this->c; }
}
Unlike userland interfaces, the AccessorInterface gets its potential method
names from the properties of the members as well as their signatures.
These methods follow the get_[varname] or set_[varname]. Getters must
return the same type as the underlying var if specified. Setters don't have
to take a matching argument as often their job is conversion.
CASTING INTERFACES
As these elements are introduced the ability of objects to control how they
are cast into scalars needs better improving. I propose interfaces in the
vein of ArrayAccess with the pattern below.
interface IntegerCastable {
public function CastToInteger():int;
}
PHP will call the function for the appropriate casting operation. Also, if
the object with at least one of these interfaces is echo'ed out then that
cast will be performed, in the priority order string, float, int, bool.
Note - The magic __toString and the StringCastable interface are mutually
exclusive - trying to create an object with both will trip a parseError.
COMPARISON BEHAVIOR
Controlling variable types gives us more granular control over comparisons
but determining which of the variables in a comparison can be coerced. When
a variable with a locked type is compared to another variable only that
other variable can be coerced.
Even better, comparisons between strongly typed variables are always strict
and a TypeError results if their types don't match. This actually provides
a way to force the greater than, lesser than, etc. to be strict.
PERFORMANCE IMPLICATIONS - TURNING IT OFF.
I imagine implementing all of the above will incur a performance hit. Yet,
pretty much all of this could be done with strategic use of assert()
. PHP
is happy to not do this checking - much of it is for program testing and
peace of mind. So the last piece of the proposal is to insure all of the
above can be disabled.
I'm personally in favor if turning it off using the existing zend.assertion
flag since all these checks are part of Design by Contract anyway. In
addition to turning off all of the above I recommend allowing other
function type declarations to be turned off by zend.assertion.
There are reasons not to do that and use a separate flag for one or both of
these methods. While I can live with that I would like to point out that
debug flag proliferation can lead to confusion.
The crux of my argument for using the zend.assertion flag is that these
type checks are all, at the end of day, engine level assertions. PHP ships
with zend.assertion set to 1, and with PHP 8 we can keep that default and
recommend to providers to not assume it's safe to set it to -1 since there
is a small, but not insignificant, chance that old code relying on Type
declarations to be on might corrupt user data. I admit this would be
painful in the short term, but it is better for the long term health of the
language and parser.
CONCLUSION
I believe that covers all the bases needed. This will give those who want
things to use strong typing better tools, and those who don't can be free
to ignore them.
2018-01-04 3:37 GMT+01:00 Michael Morris tendoaki@gmail.com:
Second Draft based on the feedback upstream.
Target version: PHP 8.
This is a proposal to strengthen the dynamic type checking of PHP during
development.Note - this is not a proposal to change PHP to a statically typed language
or to remove PHP's current loose typing rules. PHP is a weakly typed
language for a reason, and will remain so subsequent to this RFC. This RFC
is concerned with providing tools to make controlling variable types
stronger when the programmer deems this necessary.VARIABLE DECLARATION
PHP currently has no keyword to initialize a variable - it is simply
created when it is first referenced. The engine infers the appropriate type
for the variable, and this may be later cast to other types depending on
the context of the code. Objects can have magic functions to carry out this
casting such as __toString.It is sometimes useful to explicitly state a variable's type. One case is
when the engine might incorrectly infer the type. For example "073117" is a
valid octal integer but also a date string in mmddyy format, so a
comparison with another date string in the same format could be... amusing.
While there is a string comparison function, that functions presence is
borne of the fact that we can't reliably compare "073117" with say,
"010216" because of the int casting possibility.Since the scalar types have already been reserved as keywords they can be
used to declare variables in a manner not unlike C or Java.int $a = 073117;
The var keyword is still around from PHP 4 days but is going unused. In
JavaScript var is used to formally declare a variable though it isn't
required (It remains important because without it JavaScript will search
the scope chain of the current closure all the way to the top scope. If it
doesn't find the reference it only then creates one. This can lead to huge
headaches so the creation of variables without using var is strongly
discouraged in JavaScript).Since the keyword is available, let's make use of it.
var $a = "123";
What I propose this will do is formally declare $a, infer it's type, then
LOCK the type from casting. If further assignments are made to the variable
the quantity being assigned will be cast to type desired if possible,
otherwise a type error will be raised.var string $a = $_POST['date'];
This syntax allows the programmer to choose the type rather than allowing
the engine to infer it. Here $_POST['date'] might be provided in date
string that might be confused for an octal int.This magical casting is suggested because it follows the spirit of PHP, but
it may not be strict enough. For those the type can be explicitly declared
without using the var keyword as follows.int $a = 4;
In this event a type error will occur on any attempt to assign a value to
$a that isn't an int.The variable can still be re-declared in both cases so.
var $a = 4;
string $a = "Hello";The var keyword can be combined with the new keyword to lock an object
variable so it doesn't accidentally changevar $a = new SomeClass();
As noted above a deliberate redeclare can still change the type of $a.
If $a is declared with an int type shouldn't it be enought to simply freeze
it's type
to int? var keyword was used in PHP4 and PHP5 and I suppose no one uses it
in PHP7 anymore, why not deprecate it? IMO it shoudl be burned&burried.
If all variable declarations with type would lock it's type then var
keyword would be
useless am I right?
ARRAYS
All members of an array can be cast to one type using this syntaxvar string array $a = [ 'Mary', 'had', 'a', 'little', 'lamb' ];
int array $b = [1,2,3,5];
Personally I really don't like proposed syntax, there are some work in
progress
in subject of generics and IMO that should be the right way to declare
generic types.
Or members can be individually cast
var $a = [ var 'Todd', var 'Alex' ];
$b = [string 'id' => int 1, 'name' => string 'Chad'];Again, following rules similar to the above. The main reason for doing
this is to insure smooth interaction with the pack and splat operators.function foo (var string array $a = ...);
Here again why not just lock it's type here if we expect $a to be int. I
assume if someone
declares it as array or string he did it with some purpose.
And speaking of functions, that's the next section.
FUNCTION DECLARATION
Variables are also declared as arguments to functions. I propose using the
var keyword to lock the resulting variable and perform a cast if possible.function foo( var string $a, var $b ) {}
Note that using var without explicitly calling type will be legal if rarely
used for consistency reasons. Also, someone might have a use for an
argument who's type could be anything, but won't change after it is
received.The type can also be inferred from the default.
function foo( var $a = "hello" ) {}
This syntax is essentially doing a redeclare of the variable. This could be
very troublesome with references, so a Type error will result if this is
tried.function foo ( var &$a = "Hello" ) {}
$b = 3;
foo($b);With objects the var keyword can be used to prevent the function from
changing the object.function foo ( var SomeClass $a ) {}
CLASS MEMBER DECLARATION
Variables also appear as object members. Following the pattern established
above their types can be locked. A couple note thoughclass SomeClass {
var $a = '3';
public var $b = 'hello';}
This is awkward, seems like returning to PHP4 again. public is strictly
pointing out that
$a member is public and I suppose everyone got used to it.
For backwards compatibility the var keyword by itself must be equivalent to
"public". It is only when a scope operator is present that var takes on its
new meaning in this context.Magic __set and __get cannot access variables with locked types because,
well, it will be a bloody mess. Basic getter/setter behavior (insuring the
datatype is correct) is accomplished just with the ability to type lock.
Beyond that explicit getters and setters will be needed, and once again an
inbuilt interface will be invoked. The interface is a little magical though
like the ArrayAccess interface.class SomeClass implements AccessorInterface {
protected $a = '';
protected var $b = "string";
protected int $c = 5;public get_a () { return $this->a; }
public set_b( $val ) { $this->b = (string) $val; }
public get_c():int { return $this->c; }}
IMO there are better proposals for getters/setters in PHP RFC's.
You rely here on specific function naming which may collide or mess with
popular naming conventions.
Unlike userland interfaces, the AccessorInterface gets its potential method
names from the properties of the members as well as their signatures.
These methods follow the get_[varname] or set_[varname]. Getters must
return the same type as the underlying var if specified. Setters don't have
to take a matching argument as often their job is conversion.CASTING INTERFACES
As these elements are introduced the ability of objects to control how they
are cast into scalars needs better improving. I propose interfaces in the
vein of ArrayAccess with the pattern below.interface IntegerCastable {
public function CastToInteger():int;
}PHP will call the function for the appropriate casting operation. Also, if
the object with at least one of these interfaces is echo'ed out then that
cast will be performed, in the priority order string, float, int, bool.Note - The magic __toString and the StringCastable interface are mutually
exclusive - trying to create an object with both will trip a parseError.COMPARISON BEHAVIOR
Controlling variable types gives us more granular control over comparisons
but determining which of the variables in a comparison can be coerced. When
a variable with a locked type is compared to another variable only that
other variable can be coerced.Even better, comparisons between strongly typed variables are always strict
and a TypeError results if their types don't match. This actually provides
a way to force the greater than, lesser than, etc. to be strict.PERFORMANCE IMPLICATIONS - TURNING IT OFF.
I imagine implementing all of the above will incur a performance hit. Yet,
pretty much all of this could be done with strategic use ofassert()
. PHP
is happy to not do this checking - much of it is for program testing and
peace of mind. So the last piece of the proposal is to insure all of the
above can be disabled.I'm personally in favor if turning it off using the existing zend.assertion
flag since all these checks are part of Design by Contract anyway. In
addition to turning off all of the above I recommend allowing other
function type declarations to be turned off by zend.assertion.There are reasons not to do that and use a separate flag for one or both of
these methods. While I can live with that I would like to point out that
debug flag proliferation can lead to confusion.The crux of my argument for using the zend.assertion flag is that these
type checks are all, at the end of day, engine level assertions. PHP ships
with zend.assertion set to 1, and with PHP 8 we can keep that default and
recommend to providers to not assume it's safe to set it to -1 since there
is a small, but not insignificant, chance that old code relying on Type
declarations to be on might corrupt user data. I admit this would be
painful in the short term, but it is better for the long term health of the
language and parser.CONCLUSION
I believe that covers all the bases needed. This will give those who want
things to use strong typing better tools, and those who don't can be free
to ignore them.
Please register a wiki account and put proposed RFC to the RFC's list.
--
regards / pozdrawiam,
Michał Brzuchalski
about.me/brzuchal
brzuchalski.com