Reviving scalar type hints

10 years ago by Larry Garfield — view source

unread

Zeev,

On Fri, Feb 20, 2015 at 10:24 AM, Zeev Suraski zeev@zend.com wrote:

On 20 בפבר׳ 2015, at 16:55, Anthony Ferrara ircmaxell@gmail.com wrote:

verification and static analysis aren't enough?

Anthony,

While IMHO they're not enough to warrant substantial deviation from PHP's behavior, this is a subjective question that others might answer differently.

But there's also an objective issue. There's a serious question mark whether the type of hint - strict, coercive of otherwise can have any sort of implications on one's ability to conduct static analysis, JIT or AOT (I'm bringing those up again since they're closely related in terms of what you can or cannot infer).

Now, I'll contend that even though I don't think we are, perhaps we're missing something. But at the very least it should be clear to the list there's serious doubt on whether there's any extra value there even if they do seem static analysis critical. If there is, it's likely to be very, very limited in scope.

Let's simply agree to disagree here :-)

That's not saying you should want to use statically typed for
everything. And nor would I support PHP moving to pure statically
typed (which is why the proposal I'm backing doesn't).

We're on the same page here. But the kinds of static analysis benefits you seem to believe we can get from strict type hints would require that - strong typing, variable declarations, perhaps changes to casting rules - not just around that narrow interface between callers and callees. Thankfully that's not on the table.

That's also not necessary in most cases. You can infer a lot about the
types of variables just having arguments declared. In most cases, you
can infer enough for static analysis to work. In the cases you can't,
that's actually a valid result of the analysis because you may have
undefined behavior. Example:

function foo(string $a): int {
return $a + 1;
}

You can't infer the type of $a+1 because the conversion of $a->numeric
that happens is unstable from a type perspective. But PHP's type
changes are predictable enough where the majority of sane cases are
predictable.

Both Swift and Go behave like this. Where you only need explicit
declarations on the arguments, the rest can be inferred. And where it
can't infer, it raises a type error.

Anthony

OK, performance is out then. :-)

Anthony, please explain this to me next:

Assume for a moment that:

we tighten up the default passing rules, such that "32" passed to an int works but "32 footsteps" errors (which I thing most are on board with doing)
we include a strict types mode as proposed, including the logical type widening cases
Zend Engine gets whatever static analysis tools you feel are appropriate.

In what circumstances would I as a Drupal core developer (a few hundred thousand lines of code, need high degree of correctness, 2500 developers), contrib developer (a few thousand lines of code, medium correctness guarantee, 3 developers), or a client consultant (a few hundred lines of custom one-off code, 1 developer) want to enable strict mode?

Using strict typing would out more work on me. How much varies with the part of the application. (Closer to IO, more work.) Having it in the language means I will run into it sometime, even if not in code I originate. If potential performance is not a factor, then what is my carrot? What day to day benefit would I get from doing so other than pedantry?[1] "You can infer things" doesn't make it clear what the day to day impact is for me. Concrete examples, please.

If it would only matter to the Facebook's of the world, well, they've got HACK already. How would this practically help Drupal, Symfony, Zend Framework, OwnCloud, Doctrine, phpMyAdmin, etc?

[1] I'm hardly one to object to pedantry, of course, but that in itself is not enough to push 2500 developers to use it.

10 years ago by Anthony Ferrara — view source

unread

Larry,

Assume for a moment that:

we tighten up the default passing rules, such that "32" passed to an int works but "32 footsteps" errors (which I thing most are on board with doing)

we include a strict types mode as proposed, including the logical type widening cases

Zend Engine gets whatever static analysis tools you feel are appropriate.

In what circumstances would I as a Drupal core developer (a few hundred thousand lines of code, need high degree of correctness, 2500 developers), contrib developer (a few thousand lines of code, medium correctness guarantee, 3 developers), or a client consultant (a few hundred lines of custom one-off code, 1 developer) want to enable strict mode?

So, in what circumstances? Well, I think there are a few cases we need
to talk about:

In legacy systems (Wordpress, existing versions, etc):

In these cases, strict mode would likely go in only extremely
sensitive corners. So areas dealing with cryptography, random number
generation, session security, etc. And it would go in slowly, as
someone revamps it. Not all at once.

Just like unit testing, scalar types would be refactored in very
slowly, in the critical corners of legacy systems. It's unlikely that
more than 1-2% of the entire system would ever become typed.

In non-legacy systems it will likely come down to developer experience
coupled with benefits of the tooling to the problem.

A few hundred LOC script would likely never enable strict mode, and
would be just fine because of it (you can mentally keep a few hundred
LOC in your head at one time).

The larger the project, the more the contributors, the more the
benefits to using strict mode. That's not to say that large projects
would immediately go full strict. It's just pointing out that the
tradeoffs would need to be weighed by the authors.

The proposals that Andrea and now I have put forward give the choice
to the authors, and give the power to them. If they are not convinced
to go strict, then they won't. And there's nothing wrong with that.
But those who do want to go strict can.

The real key that you should be thinking about is when errors are
detected. The stricter the type system, the earlier the errors can be
detected. In a lot of cases, a strict-mode type system can detect the
vast majority (if not all) of type-related errors at compile time.
That's a big benefit for large projects with many moving parts.

Using strict typing would out more work on me. How much varies with the part of the application. (Closer to IO, more work.) Having it in the language means I will run into it sometime, even if not in code I originate. If potential performance is not a factor, then what is my carrot? What day to day benefit would I get from doing so other than pedantry?[1] "You can infer things" doesn't make it clear what the day to day impact is for me. Concrete examples, please.

It's more work on you when you write a single line of code. It's less
work when you're making a single line change in a 1-million-loc app.
It's less work because the type system can verify that the change
won't cause type errors when running (where even unit tests won't give
you the same level of confidence).

Beyond that, I'd highly recommend reading this article:
http://blog.steveklabnik.com/posts/2010-07-17-what-to-know-before-debating-type-systems

If it would only matter to the Facebook's of the world, well, they've got HACK already. How would this practically help Drupal, Symfony, Zend Framework, OwnCloud, Doctrine, phpMyAdmin, etc?

Hack hasn't been out to the public for that long of a time, yet look
at the buzz it's generating. Look at the frameworks that are popping
up with it. Look at the libraries.

I think if anything, the appearance of Hack (and its adoption) show
that people want static typing, at least to some level...

Anthony

10 years ago by Rasmus Lerdorf — view source

unread

I think if anything, the appearance of Hack (and its adoption) show
that people want static typing, at least to some level...

To be perfectly transparent here though, you should mention that your
proposed RFC goes well beyond the strict typing that is in Hack because
in Hack the internal API is largely untyped while your proposal is to
default the entire internal API to strict types in strict mode. Also, in
Hack there is a distinction between the off-line hh_client type-checker
and the runtime.

Hack examples all using <?hh // strict

echo number_format('1000');
echo htmlspecialchars(1000);
echo md5(1000);

These are all fine both as far as the type-checker is concerned as well
as the runtime, of course, but they are runtime fatals in your proposed RFC.

And if you only go by the runtime and ignore the out-of-band type
checker there are almost no strictness rules applied to the internal API
in Hack.

eg. explode(0, 1000);

Here the hh_client type checker will complain that it is expecting
strings, but the runtime will run it nicely.

So when you say, and as I have heard other people say, that people want
Hack-like strict typing, you have to be a bit careful about what is
meant by that. Even in the cases where the internal API is typed in
Hack, it is still not a runtime fatal if they are called with the wrong
types. Now whether that is a good thing or not is debatable, of course,
my point is simply that if you are going to use Hack adoption as a sign
"that people want static typing" you should clearly explain that your
approach is quite different from what Hack is doing.

-Rasmus

10 years ago by Josh Watzman — view source

unread

I think if anything, the appearance of Hack (and its adoption) show
that people want static typing, at least to some level...

To be perfectly transparent here though, you should mention that your
proposed RFC goes well beyond the strict typing that is in Hack because
in Hack the internal API is largely untyped while your proposal is to
default the entire internal API to strict types in strict mode. Also, in
Hack there is a distinction between the off-line hh_client type-checker
and the runtime.

This distinction is going away pretty soon. The typechecker is an integral part of Hack, and you shouldn't be able to ignore its errors. The runtime will soon consider them fatal errors too. If your code doesn't pass the hh_client typechecker, then its behavior when run on HHVM is completely undefined.

Hack examples all using <?hh // strict

echo number_format('1000');
echo htmlspecialchars(1000);
echo md5(1000);

These are all fine both as far as the type-checker is concerned as well
as the runtime, of course, but they are runtime fatals in your proposed RFC.\

And they should be errors in Hack too. The reason they aren't are to ease transitions from PHP to Hack. I'd expect them to be more strongly typed eventually.

And if you only go by the runtime and ignore the out-of-band type
checker there are almost no strictness rules applied to the internal API
in Hack.

eg. explode(0, 1000);

Here the hh_client type checker will complain that it is expecting
strings, but the runtime will run it nicely.

So when you say, and as I have heard other people say, that people want
Hack-like strict typing, you have to be a bit careful about what is
meant by that. Even in the cases where the internal API is typed in
Hack, it is still not a runtime fatal if they are called with the wrong
types.

See above -- relying on (or even really thinking about) this distinction is relying on implementation details which are going to change.

Josh

10 years ago by Rasmus Lerdorf — view source

unread

Hack examples all using <?hh // strict

echo number_format('1000');
echo htmlspecialchars(1000);
echo md5(1000);

These are all fine both as far as the type-checker is concerned as well
as the runtime, of course, but they are runtime fatals in your proposed RFC.\

And they should be errors in Hack too. The reason they aren't are to ease transitions from PHP to Hack. I'd expect them to be more strongly typed eventually.

Right, you are doing a gradual transition of an API that wasn't written
to be strict. It was written with the assumption that type coercion
would take place. If there is a good reason to ease the transition from
PHP to Hack there is an even stronger reason to ease the transition from
PHP to PHP.

-Rasmus

10 years ago by Anthony Ferrara — view source

unread

Rasmus,

To be perfectly transparent here though, you should mention that your
proposed RFC goes well beyond the strict typing that is in Hack because
in Hack the internal API is largely untyped while your proposal is to
default the entire internal API to strict types in strict mode. Also, in
Hack there is a distinction between the off-line hh_client type-checker
and the runtime.

In addition to what Josh said, I want to make one point here. This
distinction is what lead me to push out 0.5 instead of where 0.4 was
going. Let me explain:

Let's say we don't type internal functions and release 7.1 with the
rest of the dual mode type system.

Then we're bound to never strictly type internal functions unless we
introduce a NEW declare setting (declare(strict_types=2) or
declare(internal_strict_types=1) or whatever). Which is a bit out
there considering people already are testy about this one.

So that practically means if we don't allow strict now, we can never
tighten it again.

However, if we do allow typed now, then we can expand and loosen in
the future. If an API is found to be overly strict, it can be loosened
(using a union type for example). We have the ability to loosen over
time, but not strengthen.

That's why I chose to apply the same typing to internal functions as
user-land. To not to would be a major mistake IMHO. So that's why I'm
moving forward with it. I will add this to discussion points in the
RFC.

So when you say, and as I have heard other people say, that people want
Hack-like strict typing, you have to be a bit careful about what is
meant by that. Even in the cases where the internal API is typed in
Hack, it is still not a runtime fatal if they are called with the wrong
types. Now whether that is a good thing or not is debatable, of course,
my point is simply that if you are going to use Hack adoption as a sign
"that people want static typing" you should clearly explain that your
approach is quite different from what Hack is doing.

It's not "quite different". It's subtly different in a few details.
But conceptually it's the same.

Right, you are doing a gradual transition of an API that wasn't written
to be strict. It was written with the assumption that type coercion
would take place. If there is a good reason to ease the transition from
PHP to Hack there is an even stronger reason to ease the transition from
PHP to PHP.

And that's why the current proposal has two modes: weak (coercive) and
strict (error inducing). The default mode will not change things for
anyone. Then they can start adding types, and things will just work.
When they are ready, then they can turn on strict mode, one file at a
time. Heck, they can run a strict-mode static analyzer on non-strict
code to see where potential problems will be (assuming the analyzer
has that mode) so they can fix it before committing to strict types in
their production app.

If that's not the definition of a "gradual transition", I'm not sure
what else can be done without fundamentally disallowing the ability to
strictly type.

Anthony

10 years ago by Anthony Ferrara — view source

unread

Point of clarification: it exists in the RFC already:
https://wiki.php.net/rfc/scalar_type_hints_v5#internal_functions_should_opt-in_to_typing

Rasmus,

To be perfectly transparent here though, you should mention that your
proposed RFC goes well beyond the strict typing that is in Hack because
in Hack the internal API is largely untyped while your proposal is to
default the entire internal API to strict types in strict mode. Also, in
Hack there is a distinction between the off-line hh_client type-checker
and the runtime.

In addition to what Josh said, I want to make one point here. This
distinction is what lead me to push out 0.5 instead of where 0.4 was
going. Let me explain:

Let's say we don't type internal functions and release 7.1 with the
rest of the dual mode type system.

Then we're bound to never strictly type internal functions unless we
introduce a NEW declare setting (declare(strict_types=2) or
declare(internal_strict_types=1) or whatever). Which is a bit out
there considering people already are testy about this one.

So that practically means if we don't allow strict now, we can never
tighten it again.

However, if we do allow typed now, then we can expand and loosen in
the future. If an API is found to be overly strict, it can be loosened
(using a union type for example). We have the ability to loosen over
time, but not strengthen.

That's why I chose to apply the same typing to internal functions as
user-land. To not to would be a major mistake IMHO. So that's why I'm
moving forward with it. I will add this to discussion points in the
RFC.

So when you say, and as I have heard other people say, that people want
Hack-like strict typing, you have to be a bit careful about what is
meant by that. Even in the cases where the internal API is typed in
Hack, it is still not a runtime fatal if they are called with the wrong
types. Now whether that is a good thing or not is debatable, of course,
my point is simply that if you are going to use Hack adoption as a sign
"that people want static typing" you should clearly explain that your
approach is quite different from what Hack is doing.

It's not "quite different". It's subtly different in a few details.
But conceptually it's the same.

Right, you are doing a gradual transition of an API that wasn't written
to be strict. It was written with the assumption that type coercion
would take place. If there is a good reason to ease the transition from
PHP to Hack there is an even stronger reason to ease the transition from
PHP to PHP.

And that's why the current proposal has two modes: weak (coercive) and
strict (error inducing). The default mode will not change things for
anyone. Then they can start adding types, and things will just work.
When they are ready, then they can turn on strict mode, one file at a
time. Heck, they can run a strict-mode static analyzer on non-strict
code to see where potential problems will be (assuming the analyzer
has that mode) so they can fix it before committing to strict types in
their production app.

If that's not the definition of a "gradual transition", I'm not sure
what else can be done without fundamentally disallowing the ability to
strictly type.

Anthony