Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:83529
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain zend.com designates 209.85.213.177 as permitted sender)
References: <2e4694f9805ee81ea0b2c79eab06c2d6@mail.gmail.com>
	<CAAyV7nExeZ1xouJAsACm9AvAGuQw0e-0FgVdEOAyzsC5OEn5_Q@mail.gmail.com>
	<83921f861c3378dfc6ea34b6681f2edd@mail.gmail.com>	<CAAyV7nEhdeXQbUi9jy10M1NyfaV-w1kF=nWeSW0zd8UQh2_dJQ@mail.gmail.com>
	<26a5bb62bd37a3f610b2a6c10f84d855@mail.gmail.com>	<CAAyV7nEf_q_NkoQrWN1pa_ot8Ype-SCdvJOdpevUsd4y8Kv6pA@mail.gmail.com>
	<6d317b6cce0cc1aaded1dae11218234d@mail.gmail.com>	<CAAyV7nFcGWJOj-R3ZEn8XkGFqzuw1Ef3tx3Ykm3AXg6KxB+wHg@mail.gmail.com>
	<7d58f9b893c257c7289166b31bbdd9ac@mail.gmail.com> <CAAyV7nEoRfPi9u+2OhfZeuTpNJ+innmyajiEYRGSbX+zOP_4Dg@mail.gmail.com>
In-Reply-To: <CAAyV7nEoRfPi9u+2OhfZeuTpNJ+innmyajiEYRGSbX+zOP_4Dg@mail.gmail.com>
MIME-Version: 1.0
Thread-Index: AQIDXD7ehmaYKzbAddTs7O1sB4MzmgIw+OLdAfG9oisCHTTwrQKy9+dWAdPf8sIB5yj4egFvLJYPAS37ZyYB8FQrZ5wNepzA
Date: Mon, 23 Feb 2015 03:59:02 +0200
Message-ID: <b251384fc2f38f31cc36a7b79b01c5b2@mail.gmail.com>
To: Anthony Ferrara <ircmaxell@gmail.com>
Cc: PHP internals <internals@lists.php.net>
Content-Type: text/plain; charset=UTF-8
Subject: RE: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
From: zeev@zend.com (Zeev Suraski)

> -----Original Message-----
> From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
> Sent: Monday, February 23, 2015 3:21 AM
> To: Zeev Suraski
> Cc: PHP internals
> Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
> RFC)
>
> Zeev,
>
>
> >> Partially.
> >>
> >> The static analysis and compilation would be pure AOT. So the errors
> >> would be told to the user when they try to analyze the program, not run
> it.
> >> Similar
> >> to HHVM's hh_client.
> >
> > How about that then:
> >
> > 1. The developers runs a static analyzer on the program.
> > 2. It fails because the static analyzer detects float being fed to an
> > int.
> > 3. The user changes the code to convert the input to int.
> > 4. You can now optimize the whole flow better, since you know for a
> > fact it's an int.
> >
> > Is that an accurate flow?
>
> Yes. At least for what I was talking about in this thread.

OK.

So the code after the fix would look like this:
<?php declare(strict_types=1);
function foo(int $int): int {
    return $int + 1;
}

function bar(int $something): int {
    $x = (int) $something / 2;  // (int) or whatever else makes it clear
it's an int
    return foo($x);
}
?>

Let me explain how this could play out with coercive type hints:
<?php
function foo(int $int): int {
    return $int + 1;
}

function bar(int $something): int {
    $x = $something / 2;
    return foo($x);
}

We can all agree that determining the types of just about anything here is
ultra-easy, so easy you could do it with a static analyzer, as you
suggested.  $int and $something are integers, while $x is either an integer
or a float.  We also know that both foo() and bar() expect integers.

What's the optimal code we could generate here?
First, on the function body of foo(), we can clearly and easily translate
the whole into machine code, as we know we'll get a long and need to return
a long.
Moving to the caller scope in bar(), given we know $x is either a float or
an integer, we could either generate code that calls coerce_to_int($x), or
even some optimize machine code that checks zval.type and either uses the
lval or converts dval.  This can be done in AOT, no need to wait for
runtime.  Once we know for a fact we have an integer in our hands - we can
make the call directly to the optimized foo(), a C level call without the
overhead of a PHP function call.

If you look at the generated code, it's going to be remarkably similar
between the two cases.  If the developer chooses to pick the casting route,
it will look almost identical - except it will be convert_to_long() that is
called instead of coerce_to_int(), the former being more aggressive than the
latter.

Can you see anything impossible or otherwise wrong with my description of
how the AOT compiler would work in this case, with coercive type hints?  If
not, there are no performance benefits for the Strict typed version after
the user alters his code to behave similarly to what coercive type hints
would bring.

Based on our Twitter discussion, I think I may have not made my position
clear regarding where our differences are.  I'm not claiming that you can't
do the optimizations you say you can do.  Not at all.  My point is that we
can do the very same optimizations with coercive types as well - basically,
that there is no delta.
 >
> >> However, there could be a "runtime compiler" which compiles in PHP's
> >> compile flow (leveraging opcache, etc). In that case, if the type
> >> assertion isn't stable, the function wouldn't be compiled (the
> >> external analyzer would error, here it just doesn't compile). Then
> >> the code would be run under the Zend engine (and error when called).
> >
> > Got you.  Is it fair to say that if we got to that case, it no longer
> > matters what type of type hints we have?
>
> Once you get to the end, no. Recki-CT proves that.

Do you mean that the statement is unfair or that it no longer matters?   If
it's the former, can you elaborate as to why?

>
> The difference though is the journey. The static analyzer can reason about
> far more code with strict types than it can without (due to the limited
> number of possibilities presented at each call). So this leaves the
> dilema:
> compiled code that behaves slightly differently (what Recki does) or
> whether
> it always behaves the same.
>
> >> So think of it as a graph. When you start the type analysis, there's
> >> one edge between $input and foo() with type mixed. Looking at foo's
> >> argument, you can say that the type of that graph edge must be int.
> >> Therefore it becomes an int. Then, when you look at $input, you see
> >> that it can be other things, and therefore there are unstable states
> >> which can error at runtime.
> >
> > So when you say it 'must be an int', what you mean is that you assume
> > it needs to be an int, and attempt to either prove that or refute
> > that.  Is that correct?
> > If you manage to prove it - you can generate optimal code.
> > If you manage to refute that - the static analyzer will emit an error.
> > If you can't determine - you defer to runtime.
> >
> > Is that correct?
>
> Basically yes.

Let me describe here too how it may look with coercive hints.  Instead of
beginning with the assertion that it must be an int, we make no guess as to
what it may be(*).  We would use the very same methods you would use to
prove or refute that it's an int, to determine whether it's an int.  Our
ability to deduce that it's an int is going to be identical to your ability
to prove that it's an int.  If we see that it comes from an int type hint,
from an int typed function, etc. - we'd be able to generate the same ultra
optimized C-level call.  If we manage to deduce that it may be an int or a
float, we can still create an ultra-optimized calling code that would deal
with just these two cases, or call coerce_to_int().  If we deduce that it's
a type that cannot be converted to an int (e.g. array or resource) - we can
emit a compile-time error.   And if we have no idea what it is, we emit a
regular function call.  Going back to that (*) from earlier, even if we're
unable to deduce what it is, we can actually assume/hope that it'll be an
integer and if it is - pass it on directly to the C implementation with a C
level function call;  And if not, go with the regular function call.

The machine code you're left with is pretty much equivalent in case we
reached the conclusion that the variable is an integer (which would be
roughly in the same cases you're able to prove it that it is).  The
difference would be that it allows for the non-integer types to be accepted
according to the coercion rules, which is a functional difference, not
performance difference.

Again, I'm not at all saying you can't do the optimizations you're saying
you're going to do or already doing.  In a way I'm saying the opposite - of
course you can do them.  We can do them as well with coercive type hints.

Thanks,

Zeev