[RFC][DISCUSSION] Strong Typing Syntax

7 years ago by Michael Morris — view source

unread

I would like to propose a clean way to add some strong typing to PHP in a
manner that is almost fully backward compatible (there is a behavior change
with PHP 7 type declarations). As I don't have access to the add RFC's to
the wiki I'll place this here.

Before I begin detailing this I want to emphasize this syntax is optional
and lives alongside PHP's default scalar variables. If variables aren't
declared using the syntax detailed below than nothing changes. This is not
only for backwards compatibility, but it's also to keep the language easy
to learn as understanding datatypes can be a stumbling block (I know it was
for me at least).

VARIABLE DECLARATION

Currently the var keyword is used to formally declare a variable. The
keyword will now allow a type argument before the var name as so

var [type] $varname;

If the type is omitted, scalar is assumed. If Fleshgrinder's scalar RFC is
accepted then it would make sense to allow programmers to explicitly
declare the variable as a scalar, but in any event when the type is omitted
scalar must be assumed for backwards compatibility.

The variables created by this pattern auto cast anything assigned to them
without pitching an error. So...

var string $a = 5.3;

The float of 5.3 will be cast as a string.

For some this doesn't go far enough - they'd rather have a TypeError thrown
when the assignment isn't going to work. For them there is this syntax

string $a = "Hello";

Note that the var keyword isn't used.

FUNCTION DECLARATION

PHP 7 introduced type declarations. This RFC calls for these to become
binding for consistency, which introduces the only backward compatibility
break of the proposal. Consider the following code.

function foo ( string $a ) {
$a = 5;
echo is_int($a) ? 'Yes' : 'No';
}

Under this RFC "No" is returned because 5 is cast to a string when assigned
to $a. Currently "Yes" would be returned since a scalar has the type that
makes sense for the last assignment.

I believe this is an acceptable break for two reasons. 1, the type
declaration syntax is relatively new. 2, changing the type of a variable
mid-function is a bad pattern anyway.

OBJECT TYPE LOCKING

Currently there is no way to prevent a variable from being changed from an
object to something else. Example.

$a = new SomeClass();
$a = 5;

If objects are allowed to follow the same pattern outlined above though
this problem is mostly solved..

SomeClass $a = new SomeClass();
var SomeClass $a = new SomeClass();

QUESTION: How do we handle the second auto casting case? $a is not allowed
to not be a SomeClass() object, but there are no casting rules. We have
three options:

Throw an error on illegal assign.
Allow a magic __cast function that will cast any assignment to the
object.
Create a PHP Internal interface the object can implement that will
accomplish what 2 does without the magic approach.

Note that 1 will need to occur without implementation. 2 and 3 are not
mutually exclusive though my understanding is PHP is moving away from magic
functions.

CLASS DECLARATION
Again, by default class members are scalars. The syntax translates over
here as might be expected.

class SomeClass {
public var string $a;
protected int $b;
private SomeOtherClass $c;
public var SomeThirdClass $d;
}

Note a default value doesn't need to be provided. In the case of object
members, these types are only checked for on assignment to prevent
recursion sending the autoloader into an infinite loop.

Also note that one of the functions of setters - guaranteeing correct type
assignment - comes free of charge with this change.

COMPARISON BEHAVIOR
When a strongly typed variable (autocasting or not) is compared to a scalar
variable only the scalar switches types. The strict comparison operator is
allowed though it only blocks the movement of the scalar.

Comparisons between strongly typed variables are always strict and a
TypeError results if their types don't match. This actually provides a way
to force the greater than, lesser than, and spaceship operation to be
strict.

FUNCTION CALLING
When a strong typed variable is passed to a function that declares a
variable's type then autocasting will occur so long as the pass is not by
reference. For obvious reasons a TypeError will occur on a by reference
assignment..

function bar( string $a) {}
function foo( string &$a ) {}

$a = 5.3;
foo( $a ); // Works, $a is a scalar, so it type adjusts.
var bool $b = false;
foo( $b ); // TypeError, $b is boolean, function expects to receive a
string by reference.
bar($b); // Works since the pass isn't by reference, so the type can be
adjusted for the local scope.

CONCLUSION
I believe that covers all the bases needed. This will give those who want
things to use strong typing better tools, and those who don't can be free
to ignore them.

7 years ago by Niklas Keller — view source

unread

Hey Michael,

I don't think the BC break is acceptable. You argue that scalar type
declarations are relatively new, but in fact they're already years old now.
They're used in most PHP 7+ packages. Even if changing types might be
discouraged, it still happens a lot.

Regards, Niklas

7 years ago by Michael Morris — view source

unread

Hey Michael,

I don't think the BC break is acceptable. You argue that scalar type
declarations are relatively new, but in fact they're already years old now.
They're used in most PHP 7+ packages. Even if changing types might be
discouraged, it still happens a lot.

Hmm. Well, that aspect of this can be dropped. What about the rest of it?

7 years ago by Andreas Hennings — view source

unread

This proposal contains some interesting ideas, which I see as separate:

A syntax to declare the type of local variables.
A syntax to declare the type of object properties.
Preventing local variables, object properties and parameters to
change their type after initialization/declaration.

For me the point 3 is the most interesting one.
I think the other points are already discussed elsewhere in some way,
although they are clearly related to 3.

Point 3 would be a BC break, if we would introduce it for parameters.
Current behavior: https://3v4l.org/bjaLQ

Local variables and object properties currently cannot be types, so
point 3 would not be a BC break for them, if we introduce it together
with 1 and 2.
But then we would have an inconsistency between parameters and local
vars / object properties.

What we could do to avoid BC break is to introduce
declare(fixed_parameter_types=1) in addition to
declare(strict_types=1).
For local variables and object properties, the type would always be fixed.
But for parameters, it would only be fixed if the
declare(fixed_parameter_types=1) is active.

Maybe to make it less verbose, we could say declare(strict_types=2),
which would mean the combination of both those things?
Or some other type of shortcut.
I think we will have to think about shortcuts like this if we
introduce more "modes" in the future.

Currently the var keyword is used to formally declare a variable.

Are you talking about local variables?
In which PHP version? https://3v4l.org/o0PFg

Afaik, currently var is only used for class/object properties from the
time when people did not declare the visibility as
public/protected/private.

If the type is omitted, scalar is assumed. If Fleshgrinder's scalar RFC is
accepted then it would make sense to allow programmers to explicitly
declare the variable as a scalar, but in any event when the type is omitted
scalar must be assumed for backwards compatibility.

If no type is specified, then "mixed" should be assumed, not "scalar".
Assuming "scalar" would be a BC break, and it would be confusing.

Hey Michael,

I don't think the BC break is acceptable. You argue that scalar type
declarations are relatively new, but in fact they're already years old now.
They're used in most PHP 7+ packages. Even if changing types might be
discouraged, it still happens a lot.

Hmm. Well, that aspect of this can be dropped. What about the rest of it?

7 years ago by Andreas Hennings — view source

unread

Another idea I have when reading this proposal is "implicit" typing
based on the initialization.

E.g.

$x = 5;
$x = 'hello'; // -> Error: $x was initialized as integer, and cannot
hold a string.

$x = $a + $b;
$x = 'hello'; // -> Error: $x was initialized as number (int|float),
and cannot hold a string.

To me this is only acceptable if the implicit type can be determined
at compile time.
So:

if ($weather_is_nice) {
$x = 5;
}
else {
$x = 'hello'; // -> Error: $x would be initialized as int
elsewhere, so cannot be initialized as string.
}

This change would be controversial and leave a lot of questions.
It would be a BC break, unless we introduce yet another declare()
setting, e.g. declare(implicit_types=1).

It could be tricky for global variables, or in combination with
include/require, where the variable can be seen from outside a
function body, and outside the range of the declare() statement.

I only mention it here because it relates to the proposal. I do not
have a strong opinion on it atm.

This proposal contains some interesting ideas, which I see as separate:

A syntax to declare the type of local variables.

A syntax to declare the type of object properties.

Preventing local variables, object properties and parameters to
change their type after initialization/declaration.

For me the point 3 is the most interesting one.
I think the other points are already discussed elsewhere in some way,
although they are clearly related to 3.

Point 3 would be a BC break, if we would introduce it for parameters.
Current behavior: https://3v4l.org/bjaLQ

Local variables and object properties currently cannot be types, so
point 3 would not be a BC break for them, if we introduce it together
with 1 and 2.
But then we would have an inconsistency between parameters and local
vars / object properties.

What we could do to avoid BC break is to introduce
declare(fixed_parameter_types=1) in addition to
declare(strict_types=1).
For local variables and object properties, the type would always be fixed.
But for parameters, it would only be fixed if the
declare(fixed_parameter_types=1) is active.

Maybe to make it less verbose, we could say declare(strict_types=2),
which would mean the combination of both those things?
Or some other type of shortcut.
I think we will have to think about shortcuts like this if we
introduce more "modes" in the future.

Currently the var keyword is used to formally declare a variable.

Are you talking about local variables?
In which PHP version? https://3v4l.org/o0PFg

Afaik, currently var is only used for class/object properties from the
time when people did not declare the visibility as
public/protected/private.

If the type is omitted, scalar is assumed. If Fleshgrinder's scalar RFC is
accepted then it would make sense to allow programmers to explicitly
declare the variable as a scalar, but in any event when the type is omitted
scalar must be assumed for backwards compatibility.

If no type is specified, then "mixed" should be assumed, not "scalar".
Assuming "scalar" would be a BC break, and it would be confusing.

Hey Michael,

I don't think the BC break is acceptable. You argue that scalar type
declarations are relatively new, but in fact they're already years old now.
They're used in most PHP 7+ packages. Even if changing types might be
discouraged, it still happens a lot.

Hmm. Well, that aspect of this can be dropped. What about the rest of it?

7 years ago by Michael Morris — view source

unread

On Wed, Jan 3, 2018 at 12:21 PM, Andreas Hennings andreas@dqxtech.net
wrote:

Another idea I have when reading this proposal is "implicit" typing
based on the initialization.

E.g.

$x = 5;
$x = 'hello'; // -> Error: $x was initialized as integer, and cannot
hold a string.

No, no no. I don't think I'd like that always on approach. However, I just
had an idea.....

Let's step back. Way back. PHP/FF days back.

Back in the day Ramus chose to put variables off on their own symbol table
for performance reasons. This isn't as necessary now, but vars in PHP
continue to be always $something.

Now I don't know the implementation can of worms this would touch but what
if this was changed for the locked type variables. That would distinguish
them greatly..

int x = 5;

Here x is a locked type variable of the integer type. Since it's also on
the same symbol tables as the classes, functions, constants et al I presume
it is namespace bound as well.

var x = 5;

If allowed what would this mean?

And what to do with class members is an open question.

Anyway, I'm looking for an implementation that allows loose and strong
typing to coexist even within a given file. I use loosely typed variables
most of time myself.

7 years ago by Michael Morris — view source

unread

On Wed, Jan 3, 2018 at 12:10 PM, Andreas Hennings andreas@dqxtech.net
wrote:

This proposal contains some interesting ideas, which I see as separate:

A syntax to declare the type of local variables.

A syntax to declare the type of object properties.

Preventing local variables, object properties and parameters to
change their type after initialization/declaration.

For me the point 3 is the most interesting one.
I think the other points are already discussed elsewhere in some way,
although they are clearly related to 3.

Point 3 would be a BC break, if we would introduce it for parameters.
Current behavior: https://3v4l.org/bjaLQ

Local variables and object properties currently cannot be types, so
point 3 would not be a BC break for them, if we introduce it together
with 1 and 2.
But then we would have an inconsistency between parameters and local
vars / object properties.

What we could do to avoid BC break is to introduce
declare(fixed_parameter_types=1) in addition to
declare(strict_types=1).
For local variables and object properties, the type would always be fixed.
But for parameters, it would only be fixed if the
declare(fixed_parameter_types=1) is active.

Maybe to make it less verbose, we could say declare(strict_types=2),
which would mean the combination of both those things?
Or some other type of shortcut.
I think we will have to think about shortcuts like this if we
introduce more "modes" in the future.

There will be occasions where having an unfixed variable alongside normal
ones will be desirable.

Currently the var keyword is used to formally declare a variable.

Are you talking about local variables?
In which PHP version? https://3v4l.org/o0PFg

Sorry, I'm confusing PHP for JavaScript. I forgot that the var keyword was
only used in PHP 4 for class members. For some reason my brain assumed it
was usable in a local scope.

Afaik, currently var is only used for class/object properties from the
time when people did not declare the visibility as
public/protected/private.

If no type is specified, then "mixed" should be assumed, not "scalar".
Assuming "scalar" would be a BC break, and it would be confusing.

Ok. I'm misusing the term scalar to mean "variable who's type can be
changed at will depending on context." Sorry.

7 years ago by Rowan Collins — view source

unread

Hi Michael,

I would like to propose a clean way to add some strong typing to PHP in a
manner that is almost fully backward compatible (there is a behavior change
with PHP 7 type declarations). As I don't have access to the add RFC's to
the wiki I'll place this here.

Thanks for putting this together. Perhaps unlike Andreas, I think it is
good to look at typing changes as a unified framework, rather than
considering "typed properties", "typed variables", etc, as separate
concerns. If we don't, there is a real risk we'll end up making
decisions now that hobble us for future changes, or over-complicating
things in one area because we're not yet ready to make changes in another.

My own thoughts on the subject from a while ago are here:
http://rwec.co.uk/q/php-type-system In that post, I borrowed the term
"container" from Perl6 for the conceptual thing that type constraints
are stored against; in PHP's case, this would include variables, object
properties, class static properties, function parameters, and return
values. I think a good plan for introducing typing is one that considers
all of these as equals.

The biggest issue with any proposal, though, is going to be performance.
I don't think this is an incidental detail to be dealt with later, it is
a fundamental issue with the way type hints in PHP have evolved. PHP is
extremely unusual, if not unique, in exclusively enforcing type
constraints at runtime. Other languages with "gradual typing" such as
Python, Hack, and Dart, use the annotations only in separate static
analysers and/or when a runtime debug flag is set (similar to enabling
assertions).

Extending that to all containers means every assignment operation would
effectively need to check the value on the right-hand-side against the
constraint on the left-hand-side. Some of those checks are non-trivial,
e.g. class/interface constraints, callable; or in future maybe "array of
Foo", "Foo | Bar | int", "Foo & Bar", etc. There are ways to ease this a
bit, like passing around a cache of type constraints a value has passed,
but I think we should consider whether the "always-on runtime
assertions" model is the one we want in the long term.

If the type is omitted, scalar is assumed.

As Andreas pointed out, you mean "mixed" here (accepts any value),
rather than "scalar" (accepts int, string, float, and bool).

The variables created by this pattern auto cast anything assigned to them
without pitching an error.

My initial thought was that this makes the assignment operator a bit too
magic for my taste. It's conceptually similar to the "weak mode" for
scalar type hints (and could perhaps use the same setting), but those
feel less magic because they happen at a clear scope boundary, and the
cast only happens once. But on reflection, the consistency makes sense,
and assigning to an object property defined by another library is
similar to calling a method defined by another library, so the
separation of caller and callee has similar justification.

PHP 7 introduced type declarations.

This is incorrect, and leads you to a false conclusion. PHP 7 introduced
scalar type declarations, which extended an existing system which had
been there for years, supporting classes, interfaces, the generic
"array" constraint, and later pseudo-types like "callable".

I don't think it's tenable to change the meaning of this syntax, but it
would certainly be possible to bikeshed some modifier to simultaneously
declare "check type on function call, and declare corresponding local
variable as fixed type".

OBJECT TYPE LOCKING

[...]

QUESTION: How do we handle the second auto casting case? $a is not allowed
to not be a SomeClass() object, but there are no casting rules.

There are actually more than just object and scalar type hints -
"callable" is a particularly complex check - but currently they all just
act as assertions, so it would be perfectly consistent for "locking" to
also only have the one mode.

COMPARISON BEHAVIOR
When a strongly typed variable (autocasting or not) is compared to a scalar
variable only the scalar switches types. The strict comparison operator is
allowed though it only blocks the movement of the scalar.

Comparisons between strongly typed variables are always strict and a
TypeError results if their types don't match. This actually provides a way
to force the greater than, lesser than, and spaceship operation to be
strict.

I like this idea. The over-eager coercion in comparisons is a common
criticism of PHP.

In general I really like the outline of this; there's a lot of details
to work out, but we have to start somewhere.

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Michael Morris — view source

unread

On Wed, Jan 3, 2018 at 3:26 PM, Rowan Collins rowan.collins@gmail.com
wrote:

Hi Michael,

I would like to propose a clean way to add some strong typing to PHP in a
manner that is almost fully backward compatible (there is a behavior
change
with PHP 7 type declarations). As I don't have access to the add RFC's to
the wiki I'll place this here.

Thanks for putting this together. Perhaps unlike Andreas, I think it is
good to look at typing changes as a unified framework, rather than
considering "typed properties", "typed variables", etc, as separate
concerns. If we don't, there is a real risk we'll end up making decisions
now that hobble us for future changes, or over-complicating things in one
area because we're not yet ready to make changes in another.

My thoughts exactly. PHP already has enough warts born of piecemeal design

a cursory look at the PHP string functions shows this very well. We have
functions with haystack/needle and needle/haystack. Some function names are
_ delimited, some aren't (or were meant to be camel cased but since PHP
function labels aren't case sensitive), and so on. When I see an RFC based
on types it worries me precisely because without a core plan of action we
are inviting more language fragmentation.

My own thoughts on the subject from a while ago are here:
http://rwec.co.uk/q/php-type-system In that post, I borrowed the term
"container" from Perl6 for the conceptual thing that type constraints are
stored against; in PHP's case, this would include variables, object
properties, class static properties, function parameters, and return
values. I think a good plan for introducing typing is one that considers
all of these as equals.

That was one of the most enjoyable reads I've had in awhile and I can't
think of anything there I disagree with. I'm still working through your
references for how Python is handling things and the treatise on the nature
of types.

The biggest issue with any proposal, though, is going to be performance. I
don't think this is an incidental detail to be dealt with later, it is a
fundamental issue with the way type hints in PHP have evolved. PHP is
extremely unusual, if not unique, in exclusively enforcing type constraints
at runtime. Other languages with "gradual typing" such as Python, Hack, and
Dart, use the annotations only in separate static analysers and/or when a
runtime debug flag is set (similar to enabling assertions).

Has the bus already left the station forever on this?

I think it's clear that what we are discussing here can't go into effect
before PHP 8. Further, it could very well be on of if not the key feature
of PHP 8. In majors backwards compatibility breaks are considered were
warranted.

I'm not familiar with the Zend Engine as I probably should be. I bring the
perspective of an end user. From what you've posted am I correct in stating
that PHP Type Hints / scalar Type Declarations are in truth syntactic sugar
for asserting the type checks. Hence we read this

function foo( ClassA $a, ClassB $b, string $c ) {}

But the engine has to do the work of this...

function foo ( $a, $b, $c ) {
assert( $a instanceof ClassA, TypeError );
assert( $b instanceof ClassB, TypeError );
assert( is_string($c), InvalidArgument );
}

If that is indeed the case, why not disable these checks according to the
zend.assertions flag, or if that's too bold a move create a php.ini flag
that allows them to be disabled in production.

Existing code would be unaffected if it has been fully debugged because, in
accordance with the principles of Design by Contract, a call with an
illegal type should be impossible. For code that isn't up to par though we
have the possibility of data corruption when the code proceeds past the
call to wherever the reason for that type hint is. I'll hazard that most of
the time that will be a call to method on non-object or something similar.

PHP programmers however would need to get used to the idea that their type
hints mean nothing when assertions are turned off (or if handled by a
separate flag, when that flag is turned off). I'm ok with this, but I'm a
big proponent of Design by Contract methodology as a supplement to Test
Driven Design.

Another thing to consider is that if the existing type hints are so
expensive, this change might grant a welcome speed boost.

Extending that to all containers means every assignment operation would

effectively need to check the value on the right-hand-side against the
constraint on the left-hand-side. Some of those checks are non-trivial,
e.g. class/interface constraints, callable; or in future maybe "array of
Foo", "Foo | Bar | int", "Foo & Bar", etc. There are ways to ease this a
bit, like passing around a cache of type constraints a value has passed,
but I think we should consider whether the "always-on runtime assertions"
model is the one we want in the long term.

If the type is omitted, scalar is assumed.

As Andreas pointed out, you mean "mixed" here (accepts any value), rather
than "scalar" (accepts int, string, float, and bool).

Yes. I admitted to him in a previous post that I had made that mistake.

The variables created by this pattern auto cast anything assigned to them

without pitching an error.

My initial thought was that this makes the assignment operator a bit too
magic for my taste. It's conceptually similar to the "weak mode" for scalar
type hints (and could perhaps use the same setting), but those feel less
magic because they happen at a clear scope boundary, and the cast only
happens once. But on reflection, the consistency makes sense, and assigning
to an object property defined by another library is similar to calling a
method defined by another library, so the separation of caller and callee
has similar justification.

PHP 7 introduced type declarations.

This is incorrect, and leads you to a false conclusion. PHP 7 introduced
scalar type declarations, which extended an existing system which had
been there for years, supporting classes, interfaces, the generic "array"
constraint, and later pseudo-types like "callable".

I don't think it's tenable to change the meaning of this syntax, but it
would certainly be possible to bikeshed some modifier to simultaneously
declare "check type on function call, and declare corresponding local
variable as fixed type".

Or go back to using the under utilized assert() statement :D

Or, if it's really important to the programmer, they can re-declare the
variable to lock the type down. I only suggested this change to bring about
consistency.

COMPARISON BEHAVIOR

When a strongly typed variable (autocasting or not) is compared to a
scalar
variable only the scalar switches types. The strict comparison operator is
allowed though it only blocks the movement of the scalar.

Comparisons between strongly typed variables are always strict and a
TypeError results if their types don't match. This actually provides a way
to force the greater than, lesser than, and spaceship operation to be
strict.

I like this idea. The over-eager coercion in comparisons is a common
criticism of PHP.

In general I really like the outline of this; there's a lot of details to
work out, but we have to start somewhere.

Well, after this post I'm going to write a second draft pursuant to what
you and Andre have taught me and addressing some of the concerns that have
been raised.

7 years ago by Rowan Collins — view source

unread

I'm not familiar with the Zend Engine as I probably should be. I bring the
perspective of an end user. From what you've posted am I correct in stating
that PHP Type Hints / scalar Type Declarations are in truth syntactic sugar
for asserting the type checks.

This is how I've always pictured it, but I've never dug into the
implementation before, so I had a look. (If anyone's curious how I found
it, I started by searching for "callable", because it's a keyword that
should only show up in type hints, then clicked through on LXR to
everything that looked promising.)

It looks like the actual "assertion" is the function
zend_verify_arg_type [1] which calls zend_check_type [2] and formats an
appropriate Error if the type check returns false.

zend_check_type has to do various things depending on the type hint the
user specified, which I'm guessing are classified when the function is
compiled:

Null values are checked against nullable type markers and null default
values.
A class name traverses up through the inheritance hierarchy of the
argument until it finds a match or reaches the end [3], while an
interface name has to recursively check all interfaces that might be
indirectly implemented [4]
The "callable" type hint has to check all sorts of different formats,
and is scope-dependent [5]
Strict array and scalar type hints are just a comparison of bit fields
Weak scalar type hints which aren't a direct match end up in
zend_verify_scalar_type_hint to perform coercion if possible [6]

When talking about additional type checks for assignment to properties,
or "locked" local variables, etc, this is the code we're saying needs to
be run more often. For simple types, in strict mode, it's not too bad,
but checking classes, interfaces, and complex pseudotypes like
"callable" seem pretty intensive. This is likely to get more complex
too: proposed additions include union types ("Foo|Bar"), intersection
types ("Foo&Bar"), typed arrays ("int[]"), generics ("Map<int,Foo>"),
and others.

So I guess I'm agreeing with Rasmus and Dan Ackroyd that thinking
there's an easy optimisation here is naive.

For the same reason, I am supportive of the idea of having type checks,
at least those we don't have yet, only enable with an off-by-default INI
setting, treating them like assertions or DbC, not as part of the normal
runtime behaviour.

[1]
https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_execute.c#zend_verify_arg_type
[2]
https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_execute.c#zend_check_type
[3]
https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_operators.c#instanceof_class
[4]
https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_operators.c#instanceof_interface
[5]
https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_API.c#zend_is_callable_impl
[6]
https://php-lxr.adamharvey.name/source/xref/master/Zend/zend_execute.c#zend_verify_scalar_type_hint

--
Rowan Collins
[IMSoP]

7 years ago by Andreas Hennings — view source

unread

Hi Michael,

I would like to propose a clean way to add some strong typing to PHP in a
manner that is almost fully backward compatible (there is a behavior
change
with PHP 7 type declarations). As I don't have access to the add RFC's to
the wiki I'll place this here.

Thanks for putting this together. Perhaps unlike Andreas, I think it is good
to look at typing changes as a unified framework, rather than considering
"typed properties", "typed variables", etc, as separate concerns. If we
don't, there is a real risk we'll end up making decisions now that hobble us
for future changes, or over-complicating things in one area because we're
not yet ready to make changes in another.

I think the best strategy is to develop a greater vision of where we
want to go, and then identify manageably small steps that move us in
this direction, and that do not create conflicts in the future. This
means we are both right.

I still think the following are good "small steps":

typed properties with type lock
typed local variables with type lock
discussion whether and when parameters should be type-locked in the
function body.

Of course there should be consistency between those steps.

You are right, we also need to consider when these types should be
validated, and/or how the variables would be implemented.
Perhaps we could actually create a system where type-locked variables
use less memory, because they no longer need to store the type of the
variable?
E.g. a type-locked integer would only use the 64 bit or whichever size
we currently use to store the actual number.

The biggest issue with any proposal, though, is going to be performance. I don't think this is an incidental detail to be dealt with later, it is a fundamental issue with the way type hints in PHP have evolved. PHP is extremely unusual, if not unique, in exclusively enforcing type constraints at runtime. Other languages with "gradual typing" such as Python, Hack, and Dart, use the annotations only in separate static analysers and/or when a runtime debug flag is set (similar to enabling assertions).

A system where all variables are type-locked could in fact be faster
than a system with dynamically typed variables.
Depends on the implementation, of course. I imagine it would be a lot
of work to get there.

7 years ago by Rasmus Lerdorf — view source

unread

A system where all variables are type-locked could in fact be faster
than a system with dynamically typed variables.
Depends on the implementation, of course. I imagine it would be a lot
of work to get there.

I think you, and many others, commenting here, should start by looking at the engine implementation. Any successful RFC needs to have a strong implementation behind it, or at the very least a very detailed description of how the implementation would mesh with the existing engine code.

The reason we don’t have typed properties/variables is that it would require adding type checks on almost every access to the underlying zval. That is a huge perf hit compared to only doing it on method/function egress points as we do now.

-Rasmus

7 years ago by Michael Morris — view source

unread

A system where all variables are type-locked could in fact be faster
than a system with dynamically typed variables.
Depends on the implementation, of course. I imagine it would be a lot
of work to get there.

I think you, and many others, commenting here, should start by looking at
the engine implementation. Any successful RFC needs to have a strong
implementation behind it, or at the very least a very detailed description
of how the implementation would mesh with the existing engine code.

The reason we don’t have typed properties/variables is that it would
require adding type checks on almost every access to the underlying zval.
That is a huge perf hit compared to only doing it on method/function egress
points as we do now.

I’ve been thinking on this during my drive today to a new job and city. I
promise to read over the current implementation before going further, but a
quick question - what if the underlying zval wasn’t a zval but a separate
class specific to the data type but implementing the same interface as
zval? The compiler would choose to use the alternate classes when it
encounters new syntax calling for their use, in effect adding a static
typic layer that augments the existing dynamic typing layer.

-Rasmus

7 years ago by Dan Ackroyd — view source

unread

what if the underlying zval wasn’t a zval but a separate
class specific to the data type but implementing the same interface as
zval?

I believe the only sensible answer to that is 'mu', as that question
is based on misunderstanding.

The internals of the PHP engine is C, and zvals are structs not
classes, and so there is no interface. In userland classes are also
zvals. http://www.phpinternalsbook.com/php7/internal_types/zvals/basic_structure.html

Also, I think people who try to guess at how to make changes to the
engine, are doing a small disservice to people who have already tried
to implement this. The current contributors are a bunch of clever
people, and if there was an obvious way to implement it, they would
have implemented it already. It's not a case that there is going to be
an easy solution that has been overlooked, that someone cleverer is
going to be able to guess at.

cheers
Dan
Ack

7 years ago by Andreas Hennings — view source

unread

what if the underlying zval wasn’t a zval but a separate
class specific to the data type but implementing the same interface as
zval?

I believe the only sensible answer to that is 'mu', as that question
is based on misunderstanding.

The internals of the PHP engine is C, and zvals are structs not
classes, and so there is no interface. In userland classes are also
zvals. http://www.phpinternalsbook.com/php7/internal_types/zvals/basic_structure.html

I think a good beginners intro is this,
http://php.net/manual/de/internals2.variables.intro.php

Yes, these things are structs, and there are no interfaces.

It would be possible, in theory, to create a different struct for
type-locked variables, where the type is not stored with each
instance, but in the opcode.
Or perhaps separate structs per type.

This would obviously be a huge amount of work, and a radical change to
the language, so I do not imagine this going to happen any time soon.
Every place in code that currently deals with the _zval_struct would
then have to consider all other structs.
The opcode could then be optimized for such type-locked variables, and
this would reduce cost in memory and performance.

The next best thing would be to keep the existing _zval_struct also
for type-locked variables, and still try to optimize the opcode as if
the type is known at compile time.
Still a lot of work, I imagine, because it still affects every place
where we deal with a variable.

The third option is to keep the implementation as if all types are
dynamic, and only add some type checks here and there, which can be
globally enabled or disabled.
This is what other gradually typed languages do, as pointed out by
Rowan Collins,

The biggest issue with any proposal, though, is going to be performance. I don't think this is an incidental detail to be dealt with later, it is a fundamental issue with the way type hints in PHP have evolved. PHP is extremely unusual, if not unique, in exclusively enforcing type constraints at runtime. Other languages with "gradual typing" such as Python, Hack, and Dart, use the annotations only in separate static analysers and/or when a runtime debug flag is set (similar to enabling assertions).

Also, I think people who try to guess at how to make changes to the
engine, are doing a small disservice to people who have already tried
to implement this.

This is a dilemma.
I think there are some people with valuable opinions on language
design, which did not find the time yet to study the engine
implementation.
So, either we risk occasional ignorant ideas, or we will miss some
valuable contributions.

I personally want to eventually study the engine in more detail, but I
don't think I need to completely self-censor myself until then.
Instead, I have to make a judgement call each time if my limited
understanding is sufficient to allow a meaningful contribution to the
discussion.

7 years ago by Christoph M. Becker — view source

unread

On 5 January 2018 at 11:35, Dan Ackroyd danack@basereality.com wrote:>

The internals of the PHP engine is C, and zvals are structs not
classes, and so there is no interface. In userland classes are also
zvals. http://www.phpinternalsbook.com/php7/internal_types/zvals/basic_structure.html

I think a good beginners intro is this,
http://php.net/manual/de/internals2.variables.intro.php

The internals2 part of the PHP manual is about PHP 5. The best info for
PHP 7 regarding the internals ist the phpinternalsbook already pointed
at by Dan.

--
Christoph M. Becker

7 years ago by Lester Caine — view source

unread

The reason we don’t have typed properties/variables is that it would require adding type checks on almost every access to the underlying zval. That is a huge perf hit compared to only doing it on method/function egress points as we do now.

I think that in hindsight all I have been looking to out of this is that
'zval' has additional capability to standardise validation. 'Simply'
adding a crude type check with it's overheads does not remove the
validation requirements which still need to be handled much of the time.
It the type check ALSO included validation, then the performance hit
would be mitigated by the reduction in user side code. But 'error' may
not be the right response EVEN with just the simple type check and that
is why current typing hacks don't fit MY method of working. I have
validation on key paths, but each is isolated from other paths while a
core standard method of validation would simplify things in a way
'strong typing' does not!

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

7 years ago by Rowan Collins — view source

unread

'Simply' adding a crude type check with it's overheads does not remove the
validation requirements which still need to be handled much of the time.

Yes, I'd love to be able to define custom types like "integer in the
range 0 to 100" or whatever.

But 'error' may not be the right response EVEN with just the simple type check

I think one of the big distinctions is between validation of a value
being received from somewhere (can only happen at runtime), versus
verifying the validity of a piece of code (would ideally happen at
compile time).

Maybe we could have a different syntax to define a function with a
compile-time constraint that the value was provably of the right type.
With custom types, that could then prove that some validation check had
already passed.

That's kind of what the offline type checkers that other languages offer
do, it's just up to you to run them before deploying your code somewhere
important.

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Michael Morris — view source

unread

Before I begin, and without picking on anyone specific, I want to say that
it is generally unhelpful to say that because I, or others, do not know how
the engine is set up that it is impossible to make any meaningful
contributions to the list or on this issue specifically. My clients don't
understand HTML. If I told them they needed to study how HTML works before
trying to give me input on the sites I'm building for them I'd likely be
fired. As a theater major I know quite a bit more than most of the people
on this list about what makes a good play or movie actually work, but I
don't pretend that knowledge is prerequisite to knowing if a play or movie
is good. It either works, or it doesn't.

If the fallback to all suggestions is "Shut up, that's impossible to do
given current engine architecture." then I'm afraid that PHP is doomed to
become the next COBOL - a language with a lot of important legacy programs,
but no new developers or future as those old systems finally give up the
ghost. Also, given that HHVM has implemented at least one aspect if this
proposal with classes the argument that it's impossible carries a rather
large spoonful of salt.

This said, I will refrain from offering any more input on how this might be
implemented as it is clearly not wanted. I will instead focus on the
desired end state.

Much of what follows is based on Michal Brzuchalski's comments. His
commentary can be largely summed up with "using the var keyword in a
counter-intuitive way is just going to make matters worse. Also,
getter/setter debates are quite a bit out of scope.

Third Draft.

Target version: PHP 8.

This is a proposal to strengthen the dynamic type checking of PHP during
development.

Note - this is not a proposal to change PHP to a statically typed language
or to remove PHP's current typing rules. PHP is typed the way it is for a
reason, and will remain so subsequent to this RFC. This RFC is concerned
with providing tools to make controlling variable types stronger when the
programmer deems this necessary.

DEFINITIONS
Before a meaningful discussion on types and type handling can be performed
some terms must be defined explicitly, especially since their definitions
in common parlance may change from language to language, and even
programmer to programmer. I

Static Typing: This typing is performed by the compiler either explicitly
or implicitly.
Dynamic Typing: This typing is performed by the runtime. Unlike static
typing it allows for varying degrees of variable coercion.
Strong/Weak typing: These terms typically refer to the amount of latitude
the run time has to coerce variables - the more latitude the "weaker" the
typing.

VARIABLE DECLARATION (GLOBAL AND FUNCTION SCOPE)

PHP currently has no keyword to initialize a variable - it is simply
created when it is first referenced. The engine continually coerces the
variables into the required types for each operation. While this is a very
powerful ability, it runs into problems with comparisons (
http://phpsadness.com/sad/52 ). As a result of this issue PHP (and many
languages that share this problem such as JavaScript) provides the strict
comparison operator. Avoiding this issue is one reason it can be useful to
have some amount of control over a variable's type. This can be
accomplished with two keywords, one old and one new: var and strict.

var $a = 5;
strict $b = 9;

Together these are "mutability operators" - that is they control if a
variable can mutate, be coerced, recast or what have you between types. The
strict keyword removes the mutability of a variable between types. The var
keyword restores it. The keywords perform these operations even in the
absence of an assignment, though using strict without any assignment will
lead to an error since type will be unknown. Examples.

var $a; // $a will be created and be NULL.
strict $a; // TypeError - strict variables must have an explicit type or a
value from which a type can be inferred.
$b = 5;
strict $b; // $b locks down to integer since it was already declared. It's
value remains 5.
var $b; // $b's ability to be coerced is restored.
strict int $c; // Works. $c will be empty - any assignment must be the
specified type.
strict $d = 'Hello'; // Works. Type of string can be inferred.

Both keywords allow comma delimited lists of declarations (for
consistency), but strict will be the one to use it most frequently:

strict $a = 1, $b = "Hello", $c = 3.14, $d = [];

$a is inferred to int, $b to string, $c to float and $d to array.

FUNCTION DECLARATION

The strict keyword in a function declaration locks down the argument var
for the remainder of the function (or until var is used). For consistency
it is recommended that var be allowed as well, but it wouldn't do anything
beyond cuing IDE's that mixed will be accepted.

function foo (strict int $a, strict string $b, var $c, strict $d = true) {}

ARRAYS
If an array is strict all of its keys and values will be strict and
inferred on assignment.

strict $a = [
'id' => 1,
'name' => 'Mark',
];

CLASS MEMBERS

The var keyword appeared in PHP 4 to declare class members and found itself
deprecated. As the mutability operator it is still allowed, and now allowed
alongside the scope operator.

class SomeClass {
var $a = '3';
public var $b = 'hello';
public strict $c = 3.14;
protected strict int $d;
}

Interfaces can also lock member types following the above pattern.

COMPARISON BEHAVIOR
As mentioned above comparisons are the area where the most stability gains
are to be had. When strict variables are compared to dynamic or "var"
variables only the var variable will be coerced. If two strict variables
are compared a TypeError will raise barring an explicit cast

strict $a = 123;
strict $b = '123';

if ($b == (string) $a) {}

7 years ago by Rasmus Lerdorf — view source

unread

Before I begin, and without picking on anyone specific, I want to say that
it is generally unhelpful to say that because I, or others, do not know how
the engine is set up that it is impossible to make any meaningful
contributions to the list or on this issue specifically. My clients don't
understand HTML. If I told them they needed to study how HTML works before
trying to give me input on the sites I'm building for them I'd likely be
fired. As a theater major I know quite a bit more than most of the people
on this list about what makes a good play or movie actually work, but I
don't pretend that knowledge is prerequisite to knowing if a play or movie
is good. It either works, or it doesn't.

The difference here is that the end syntax is something like 10% of the
problem. 90% of it is fitting it into the engine in an efficient manner
giving that it is affecting the very core of the engine. An RFC on this
issue that doesn't address the bulk of the problem isn't all that helpful.

-Rasmus

7 years ago by Michael Morris — view source

unread

The difference here is that the end syntax is something like 10% of the
problem. 90% of it is fitting it into the engine in an efficient manner
giving that it is affecting the very core of the engine. An RFC on this
issue that doesn't address the bulk of the problem isn't all that helpful.

It makes absolutely NO sense to do that 90% of the work to have it all
burned up when the proposal fails to carry a 2/3rds vote because the syntax
is disliked.

Also, drawing the architectural drawings for a skyscraper is also like only
10% of the work, but it's a damn important 10%.

That the implementation will be a major pain in the ass to do is all the
more reason to create and pass a planning RFC before doing any related
code/implementation RFC's. It will encourage people to do the research to
try to figure out how to get this done because they know the syntax is
approved and they aren't fiddling around in the dark trying to figure out
how to do something that may not be accepted for inclusion at all, which is
a huge waste of time.

7 years ago by Rasmus Lerdorf — view source

unread

Also, drawing the architectural drawings for a skyscraper is also like only
10% of the work, but it's a damn important 10%.

Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen
and others are doing working on the core of PHP. Describing the syntax/UI
for a feature like this is nothing like the architectural drawings for a
skyscraper. The architectural drawings for a skyscraper are extremely
detailed and describe exactly how to build it including all materials,
tolerances, etc. The analogy here is more like you saying you would like a
blue skyscraper with 30 windows and a door and then complaining that the
idiot constructions crew should stop complaining and just build the thing.

There are plenty of things where the UI/syntax description is all that is
needed because the implementation is trivial and flows straight from such a
description. This doesn't happen to be one of those.

-Rasmus

7 years ago by Sebastian Bergmann — view source

unread

Am 10.01.2018 um 16:04 schrieb Rasmus Lerdorf:

Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen
and others are doing working on the core of PHP.

I agree.

IIRC, last time optional type declarations for attributes were discussed
Dmitry optimized/refactored something in the engine that would reduce the
performance hit.

Do we have a guess at how big that performance hit would be? I, for one,
would gladly trade a couple of percent of performance (considering the
huge gains in performance PHP 7 brought) to be able to use these type
declarations in my code.

7 years ago by Levi Morrison — view source

unread

Am 10.01.2018 um 16:04 schrieb Rasmus Lerdorf:

Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen
and others are doing working on the core of PHP.

I agree.

IIRC, last time optional type declarations for attributes were discussed
Dmitry optimized/refactored something in the engine that would reduce the
performance hit.

Do we have a guess at how big that performance hit would be? I, for one,
would gladly trade a couple of percent of performance (considering the
huge gains in performance PHP 7 brought) to be able to use these type
declarations in my code.

--

An additional issue is typed references. I believe Bob Weinand did
some work in that area; maybe he can share more insight.

7 years ago by Michael Morris — view source

unread

On Wed, Jan 10, 2018 at 5:27 AM, Michael Morris tendoaki@gmail.com
wrote:

Also, drawing the architectural drawings for a skyscraper is also like
only
10% of the work, but it's a damn important 10%.

Wow, that's rather insulting to the amazing work Dmitry, Nikita, Xinchen
and others are doing working on the core of PHP.

No insult was intended here. I apologize if any is taken.

Describing the syntax/UI for a feature like this is nothing like the
architectural drawings for a skyscraper.

In terms of time and effort spent it is. It often takes years to complete
plans drawn up over the span of weeks. The analogy becomes more firm when
you compare the man hours on each side - an architect can draw up plans for
a house in less than 100 hours (unless it's a freaking huge house). The
contractor labor hours will be 100 times that at a minimum. If anything I'm
off in scales, but I was being anecdotal - I wasn't aiming for precise
accuracy.

Plans still must precede work, and if the ramifications of those plans are
to be far reaching they need to be agreed upon as early as possible.

7 years ago by Rowan Collins — view source

unread

Describing the syntax/UI for a feature like this is nothing like the
architectural drawings for a skyscraper.
In terms of time and effort spent it is. It often takes years to complete
plans drawn up over the span of weeks. The analogy becomes more firm when
you compare the man hours on each side - an architect can draw up plans for
a house in less than 100 hours (unless it's a freaking huge house).

I don't think Rasmus was saying architects' plans aren't important, or
making any comment about the scale of the task. I think he was saying
that things like syntax and UI are not the appropriate part of the
process to compare to architects' plans. Architects know how buildings
work, and spend those weeks making sure the subsequent years aren't
going to be wasted because the plausible-looking shape the client asked
for can't actually support its own weight.

And just to be clear, this particular feature IS a freaking huge house.
Worse, it's a type of skyscraper nobody has ever tried to build before.
Sketching the kinds of shapes it might have is interesting; getting hung
up on what size the windows are (the exact keywords to use) is probably
a waste of time until we've figured out if there's a material that bends
that way. And saying "hey, could you make it out of carbon nanotubes?"
is a fun conversation to have over a beer, but probably isn't going to
be that helpful to people who are experts on skyscrapers and material
science.

Apologies for extending the metaphor somewhat beyond stretching point,
but I think it acts as a reasonable illustration of where people are
coming from in this thread.

Plans still must precede work, and if the ramifications of those plans are
to be far reaching they need to be agreed upon as early as possible.

Absolutely, and unfortunately, the biggest ramifications of this
particular type of change is going to be in the very core of the engine.
That's not true of every feature, but for this particular feature, one
of the parts that needs planning and agreeing as early as possible is
"how are we going to do this without killing performance".

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Ryan Jentzsch — view source

unread

I agree with Michael (to a large degree) and I think I see clearly
Michael's point:
Under the current system I will NEVER create an RFC (or find someone with
the Zend engine coding chops to help me) because the RISK vs. REWARD with
the current RFC system is too likely to be a colossal waste of everyone's
time.
Currently the tail wags the dog (implementation details govern top level
policy). The current process nearly insists I spend valuable time coding up
front with a good chance that if/when the RFC goes up for a vote someone
will still be bleating about syntax, or using tabs vs. spaces, or some
other minor detail -- with a 2/3 vote needed it may shoot all my
preliminary hard work to hell. No thanks.

On Wed, Jan 10, 2018 at 12:53 AM, Rasmus Lerdorf rasmus@lerdorf.com
wrote:

The difference here is that the end syntax is something like 10% of the
problem. 90% of it is fitting it into the engine in an efficient manner
giving that it is affecting the very core of the engine. An RFC on this
issue that doesn't address the bulk of the problem isn't all that
helpful.

It makes absolutely NO sense to do that 90% of the work to have it all
burned up when the proposal fails to carry a 2/3rds vote because the syntax
is disliked.

Also, drawing the architectural drawings for a skyscraper is also like only
10% of the work, but it's a damn important 10%.

That the implementation will be a major pain in the ass to do is all the
more reason to create and pass a planning RFC before doing any related
code/implementation RFC's. It will encourage people to do the research to
try to figure out how to get this done because they know the syntax is
approved and they aren't fiddling around in the dark trying to figure out
how to do something that may not be accepted for inclusion at all, which is
a huge waste of time.

7 years ago by Rasmus Lerdorf — view source

unread

On Wed, Jan 10, 2018 at 10:11 AM, Ryan Jentzsch ryan.jentzsch@gmail.com
wrote:

I agree with Michael (to a large degree) and I think I see clearly
Michael's point:
Under the current system I will NEVER create an RFC (or find someone with
the Zend engine coding chops to help me) because the RISK vs. REWARD with
the current RFC system is too likely to be a colossal waste of everyone's
time.
Currently the tail wags the dog (implementation details govern top level
policy). The current process nearly insists I spend valuable time coding up
front with a good chance that if/when the RFC goes up for a vote someone
will still be bleating about syntax, or using tabs vs. spaces, or some
other minor detail -- with a 2/3 vote needed it may shoot all my
preliminary hard work to hell. No thanks.

There is a middle ground here. I agree that doing months of work on a
rock-solid implementation doesn't make sense if you don't know the RFC will
pass. On the other end of the spectrum, RFCs that are essentially feature
requests with no specifics on the actual implementation also don't make any
sense. A good RFC strikes a happy balance between the two. For many/most
things, the actual work in figuring out the implementation isn't that bad.
As Sara said, a full implementation isn't needed, but a rough sketch of
what changes are needed along with their potential impact on the existing
code definitely is. And yes, unfortunately, if your RFC touches the basic
building block of PHP, the zval, then that rough sketch becomes even more
important. If you stay away from trying to change a 25-year old loosely
typed language into a strictly typed one, then the RFC becomes much simpler.

-Rasmus

7 years ago by Michael Morris — view source

unread

If you stay away from trying to change a 25-year old loosely typed
language into a strictly typed one, then the RFC becomes much simpler.

-Rasmus

I have REPEATEDLY stated that is not the goal. I don't misrepresent what
you say, please do not do that to me.

I want to see strict typing as an option, not a requirement.

Arggh...

I said I'd stay away from implementation, but would this work? Working
this into z_val in any way is problematic. So, store elsewhere?

Create a symbol table that holds the strict variables and the types they
are locked into. The strict keyword pushes them onto that table, the var
keyword pulls them off. When an operation that cares about type occurs
check that table - if the var appears there than authenticate it.

I would hope that if a programmer doesn't want strict typing the overhead
of checking an empty table would be minimal, even if repeated a great many
times.

7 years ago by Rasmus Lerdorf — view source

unread

On Wed, Jan 10, 2018 at 12:27 PM, Rasmus Lerdorf rasmus@lerdorf.com
wrote:

If you stay away from trying to change a 25-year old loosely typed
language into a strictly typed one, then the RFC becomes much simpler.

-Rasmus

I have REPEATEDLY stated that is not the goal. I don't misrepresent what
you say, please do not do that to me.

I want to see strict typing as an option, not a requirement.

But the point is that whether it is an option or not, it still has to touch
the zval. Which means everything changes whether the option is enabled or
not. If you store this information elsewhere, that other location has to be
checked on every zval access. Basically the work is identical to the work
required to make PHP strictly typed. Making it optional might actually be
harder because we have to build both and add more checks in that case.

The only viable place I see to store this optionally is outside the runtime
in a static analyzer like Phan (which already does this) which matches how
HHVM solved it. Of course, there may be a cleaner way to do it. But that is
why an RFC on this topic has to give a clear plan towards this cleaner
implementation.

Now if the RFC was a plan for baking a compile-time static analysis engine
into PHP itself, that would be interesting. But that is a massive project.

-Rasmus

7 years ago by Andreas Hennings — view source

unread

Now if the RFC was a plan for baking a compile-time static analysis engine
into PHP itself, that would be interesting. But that is a massive project.

Even with my limited understanding of the engine, I can imagine this
to be a lot of work.
But it sounds much better to me than adding more expensive runtime type checks.
I think it would be worth exploring as a long-term direction.

7 years ago by Andreas Hennings — view source

unread

Whether we work with runtime type checks or compile-time static analysis:
The user-facing language design questions would still be the same, right?
E.g. we would still have to distinguish type-locked parameter values
vs dynamically typed parameter values.

Now if the RFC was a plan for baking a compile-time static analysis engine
into PHP itself, that would be interesting. But that is a massive project.

Even with my limited understanding of the engine, I can imagine this
to be a lot of work.
But it sounds much better to me than adding more expensive runtime type checks.
I think it would be worth exploring as a long-term direction.

7 years ago by Ryan Jentzsch — view source

unread

In my opinion The Strong Typing Syntax RFC will have less of a chance of
passing a vote than https://wiki.php.net/rfc/typed-properties.
Since the typed-properties RFC was confined to properties on a class (and
looking at the code it appears to me that it wasn't too difficult to
implement the type strictness constraints). Sadly, even after it was shown
to have minimal effect on performance the RFC was still shot down.

Strong Typing Syntax I would think is even more complicated given this
touches ALL zval processing internally. The concern of "expensive run-time
checks" can of course be mitigated by requiring declare(strict_types=1) to
enable/allow strong typing syntax.
I'd love to see Strong Typing Syntax in PHP but realistically, given the
past history, this RFC will need to target version 8.

On Wed, Jan 10, 2018 at 12:25 PM, Andreas Hennings andreas@dqxtech.net
wrote:

Whether we work with runtime type checks or compile-time static analysis:
The user-facing language design questions would still be the same, right?
E.g. we would still have to distinguish type-locked parameter values
vs dynamically typed parameter values.

Now if the RFC was a plan for baking a compile-time static analysis
engine
into PHP itself, that would be interesting. But that is a massive
project.

Even with my limited understanding of the engine, I can imagine this
to be a lot of work.
But it sounds much better to me than adding more expensive runtime type
checks.
I think it would be worth exploring as a long-term direction.

7 years ago by Stanislav Malyshev — view source

unread

Hi!

I want to see strict typing as an option, not a requirement.

You seem to be under impression that this somehow makes things easier.
It does not. To explain: let's say you design a strictly typed language,
like Java. The compiler knows which variable is of which type at every
point, and if it's not clear for some reason, it errors out. You can
build a compiler on top of those assumptions.
Now let's say you design a loosely typed language, like Javascript. The
compiler knows variables have no types, only values have it, and builds
on top of that (as in, it doesn't need to implement type tracking for
variables).
Now, you come in and say - let's make the compiler have both
assumptions - that sometimes it's strict and sometimes it's not.
Sometimes you need to track variable types and sometimes you don't.
Sometimes you have type information and can rely on it, and sometimes
you don't and have to type-juggle.
Do you really think this just made things easier? To implement both
Java and Javascript inside the same compiler, with radically different
types of assumption? If you have desire to answer "yes", then a) please
believe me it is not true b) please try to implement a couple of
compilers and see how easy it is.

Having two options is not even twice as harder as having one. It's much
more. So "optional" part adds all work that needs to be done to support
strict typing in PHP, and on top of that, you also have to add work that
needs to be done to support cases where half of the code is typed and
the other half is not. And this is not only code writing work - this is
conceptual design work, testing work, documenting work, etc.

Without even going to the merits of the proposal itself, it certainly
looks to me like you are seriously underestimating what we're talking
about, complexity-wise. I am not saying it's not possible at all - a lot
of things are possible. It's just "it's merely an option" is exactly the
wrong position to take.

Create a symbol table that holds the strict variables and the types they
are locked into. The strict keyword pushes them onto that table, the var
keyword pulls them off. When an operation that cares about type occurs
check that table - if the var appears there than authenticate it.

And now every function and code piece that works with symbol tables
needs to be modified to account for the fact that there are two of them.
Every lookup is now two lookups, and no idea how $$var would even work
at all.

--
Stas Malyshev
smalyshev@gmail.com

7 years ago by Andreas Hennings — view source

unread

I will not

Hi!

I want to see strict typing as an option, not a requirement.

You seem to be under impression that this somehow makes things easier.
It does not. To explain: let's say you design a strictly typed language,
like Java. The compiler knows which variable is of which type at every
point, and if it's not clear for some reason, it errors out. You can
build a compiler on top of those assumptions.
Now let's say you design a loosely typed language, like Javascript. The
compiler knows variables have no types, only values have it, and builds
on top of that (as in, it doesn't need to implement type tracking for
variables).
Now, you come in and say - let's make the compiler have both
assumptions - that sometimes it's strict and sometimes it's not.
Sometimes you need to track variable types and sometimes you don't.
Sometimes you have type information and can rely on it, and sometimes
you don't and have to type-juggle.
Do you really think this just made things easier? To implement both
Java and Javascript inside the same compiler, with radically different
types of assumption? If you have desire to answer "yes", then a) please
believe me it is not true b) please try to implement a couple of
compilers and see how easy it is.

I do not doubt that it would be a lot of work, possibly so much that
it becomes unrealistic.

There are languages which have a number of strict types and then a
"variant" type.
https://en.wikipedia.org/wiki/Variant_type
https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/variant-data-type

In PHP, to allow a mix of strict statically typed variables and
dynamically typed variables, we could adopt such a model, where all
dynamically typed variables would have the type "variant".

The strict types would become the basic model, and the variant type
would be a special case within that model.

This does not make this any easier to implement, but it seems more
promising than seeing this as two parallel systems which have to be
maintained separately.

Having two options is not even twice as harder as having one. It's much
more. So "optional" part adds all work that needs to be done to support
strict typing in PHP, and on top of that, you also have to add work that
needs to be done to support cases where half of the code is typed and
the other half is not. And this is not only code writing work - this is
conceptual design work, testing work, documenting work, etc.

Without even going to the merits of the proposal itself, it certainly
looks to me like you are seriously underestimating what we're talking
about, complexity-wise. I am not saying it's not possible at all - a lot
of things are possible. It's just "it's merely an option" is exactly the
wrong position to take.

Create a symbol table that holds the strict variables and the types they
are locked into. The strict keyword pushes them onto that table, the var
keyword pulls them off. When an operation that cares about type occurs
check that table - if the var appears there than authenticate it.

And now every function and code piece that works with symbol tables
needs to be modified to account for the fact that there are two of them.
Every lookup is now two lookups, and no idea how $$var would even work
at all.

--
Stas Malyshev
smalyshev@gmail.com

7 years ago by Rowan Collins — view source

unread

Before I begin, and without picking on anyone specific, I want to say that
it is generally unhelpful to say that because I, or others, do not know how
the engine is set up that it is impossible to make any meaningful
contributions to the list or on this issue specifically. My clients don't
understand HTML. If I told them they needed to study how HTML works
before trying to give me input on the sites I'm building for them I'd likely be fired.

While I understand your frustration, I don't think anyone here is saying you shouldn't offer any input, only to be aware of your own limitations when presenting it.

To use your analogy, imagine if a client came to you and said "we think it would be cool if the page changed colour as the user looked at different parts of the screen". You probably wouldn't ask them for details of what colours they wanted, with a vague idea that you'd research if it was possible to implement eye-tracking in browser JS later; more likely, you'd say "yes, that would be cool, but I'm pretty sure it's not possible". If they went ahead and gave you a 10-page spec "in case you work out how to do it after all", that would be a waste of everyone's time.

So, back to the subject at hand: it is useful to share ideas on the typing strategy PHP should be taking, things like which types of value you'd like to see checked, whether we need both auto-casting and type errors, whether all of this should be switched off in production, and the implications for the user of those various decisions. But there's always the possibility that those ideals won't be possible, so details like the exact keywords to use for each type of variable are probably best left vague and sorted out later.

I'll also echo a previous request that you apply for a wiki account to make your document more readable; or maybe just put it as a github gist or on your own website, and treat it as more of a wishlist and discussion piece than a spec that core developers are going to commit to.

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Christoph M. Becker — view source

unread

I'll also echo a previous request that you apply for a wiki account to make your document more readable; or maybe just put it as a github gist or on your own website, and treat it as more of a wishlist and discussion piece than a spec that core developers are going to commit to.

That is, however, not necessarily sufficient. There are already several
accepted RFC with pending implementation[1], the oldest of which had
been accepted in 2011 and 2012, respectively.

[1] https://wiki.php.net/rfc#pending_implementation

--
Christoph M. Becker

7 years ago by Rowan Collins — view source

unread

I'll also echo a previous request that you apply for a wiki account to
make your document more readable; or maybe just put it as a github gist or
on your own website, and treat it as more of a wishlist and discussion
piece than a spec that core developers are going to commit to.

That is, however, not necessarily sufficient. There are already several
accepted RFC with pending implementation[1], the oldest of which had
been accepted in 2011 and 2012, respectively.

Sufficient for what? I was just saying it would be easier to read online in
a versioned doc than in the bodies of a series of e-mails.

--
Rowan Collins
[IMSoP]

7 years ago by Christoph M. Becker — view source

unread

I'll also echo a previous request that you apply for a wiki account to
make your document more readable; or maybe just put it as a github gist or
on your own website, and treat it as more of a wishlist and discussion
piece than a spec that core developers are going to commit to.

That is, however, not necessarily sufficient. There are already several
accepted RFC with pending implementation[1], the oldest of which had
been accepted in 2011 and 2012, respectively.

Sufficient for what? I was just saying it would be easier to read online in
a versioned doc than in the bodies of a series of e-mails.

Sorry for badly quoting. I fully agree that using another medium for
drafting the RFC other than this list would be preferable. However, I
wanted to point out that it might not even make sense to do so, until
someone has been found who is willing and able :) to actually write a
suitable implementation.

--
Christoph M. Becker

7 years ago by Sara Golemon — view source

unread

I think you, and many others, commenting here, should start by looking
at the engine implementation. Any successful RFC needs to have a strong
implementation behind it, or at the very least a very detailed description of
how the implementation would mesh with the existing engine code.

The reason we don’t have typed properties/variables is that it would
require adding type checks on almost every access to the underlying
zval. That is a huge perf hit compared to only doing it on method/function
egress points as we do now.

I'm going to underline Rasmus' comment here. zval assignment is a
deep/core element of what the engine does. Even when it's not

7 years ago by Sara Golemon — view source

unread

I think you, and many others, commenting here, should start by looking
at the engine implementation. Any successful RFC needs to have a strong
implementation behind it, or at the very least a very detailed description of
how the implementation would mesh with the existing engine code.

The reason we don’t have typed properties/variables is that it would
require adding type checks on almost every access to the underlying
zval. That is a huge perf hit compared to only doing it on method/function
egress points as we do now.

**agh-mistabbed into a send

I'm going to underline Rasmus' comment here. zval assignment is a
deep/core element of what the engine does. Even when it's not a
literal $x = "foo"; in userspace, zvals are flying around the engine
constantly. Adding so much as a Z_TYPEINFO_P(val) & ZVAL_FLAG_STRICT
check to EVERY ONE OF THOSE accesses is both heavy-weight and
massively complex. On the order of the php-ng rewrite complexity,
because EVERY assignment needs to be dealt with, and there WILL be
some misbehaving extension out there which gets it wrong.

The implementation essentials are not a trivial part of such a
feature. You don't need to have the entire implementation written and
tested, but you do need to have a clear plan for how and what will be
done and vitally, what the impact of that plan will be. You can't
just waive your hands and say: "We'll sort this out..."

"How does HackLang do this?" has been asked of me offline, so I want
to put my answer here: IT DOESN'T.

HackLang relies of static analysis to prove "$x will never be assigned
a non-integer, so we can always assume it's an integer". This is done
by the static analysis tool before the site is ever run, not at
runtime. Why? Because the HHVM could see the same thing Rasmus is
telling you. Runtime type enforcement is damned expensive.

-Sara

[RFC][DISCUSSION] Strong Typing Syntax

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL