Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:83508
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.180 as permitted sender)
Message-ID: <54EA6A99.5010609@gmail.com>
Date: Sun, 22 Feb 2015 15:47:37 -0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Anthony Ferrara <ircmaxell@gmail.com>
CC: Zeev Suraski <zeev@zend.com>, Jefferson Gonzalez <jgmdev@gmail.com>, 
 PHP internals <internals@lists.php.net>
References: <2e4694f9805ee81ea0b2c79eab06c2d6@mail.gmail.com>	<CAAyV7nExeZ1xouJAsACm9AvAGuQw0e-0FgVdEOAyzsC5OEn5_Q@mail.gmail.com>	<54EA5EDA.8010605@gmail.com> <CAAyV7nFt3KL_uH4HuA5xQ7HY+3_3=fq=pg0_U1aMrdirETC6LQ@mail.gmail.com>
In-Reply-To: <CAAyV7nFt3KL_uH4HuA5xQ7HY+3_3=fq=pg0_U1aMrdirETC6LQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
From: smalyshev@gmail.com (Stanislav Malyshev)

Hi!

> You can tell because you know the function foo expects an integer. So
> you can infer that $x will have to have the type integer due to the
> future requirement. Which means the expression "$something / 2" must
> also be an integer. We know that's not the case, so we can raise an
> error here.

OK, so your claim is that the compiler with strict typing can detect
some situations which the dynamic one can not and reject some of the
code. Without going too much into details, I agree with this, this is an
obvious difference between strict and dynamic. However, this is not a
performance advantage, obviously - since you are comparing running code
with non-running one - your model just accepts less code. Obviously,
this works if non-accepted code was wrong - and doesn't work if it was
not. But we talked about running code, I thought.

> At that point the developer has the choice to explicitly cast or put
> in a floor() or one of a number of options.

That's exactly what I claim would be the defect of the strict model -
people would start putting excessive casts ensuring there would be cases
where information is lost. For example, assume we knew $something is even:

function bar(int $something): int {
    assert($something %2 == 0);
    $x = $something / 2;
    return foo($x);
}

Now everything is fine (ignoring the typing for a second), right? We're
dealing with integers, /2 always divides evenly, all is great. Now we
introduce strictness, so we'd need to say something like:

function bar(int $something): int {
    assert($something %2 == 0);
    $x = $something / 2;
    return foo((int)$x);
}

Now assume somebody messed up on the routine code reformatting merge and
the code somehow ended up like:

function bar(int $something): int {
    $x = $something / 2;
    return foo((int)$x);
}

Do you see what the problem is? Now we lost the check for $something
being even, but we would never know about it since type system forced us
to insert (int) (which we didn't need) and thus disabled the controls
for the bug of $something not being even (which we did need).

But more important question is - with (int) the coercive model can use
this information too, so what's the difference from strict model on that
code? There seems to be none.

> Without strict typing this code is always stable, but you still need
> to generate full type assertions in a compiled version of foo() and
> use ZVALs for $x, hence reducing the effect of the optimization
> significantly.

Wait, you said "this code is invalid" so no code will be generated. Did
you mean code after introducing (int)? Then strict has no advantage
anymore as we can derive the info from (int) anyway.
Otherwise, I can't see how you can avoid generating typechecks in foo()
unless the only place it can ever be called from is bar() - but I don't
see how you can ensure that in PHP, and if you could, I don't see why
weak model could not make the same conclusions on the same code.

So far the only "advantage" I've seen seems to be that your compiler
would reject code that looks suspicious to it and thus force the
programmer to coerce the variables into the types manually - by (int) or
floor() - something that the coercive model would do for you
automatically. Once coerced, the same code would have the same type info
(and thus same potential optimizations) in both models. I don't think it
is a gain in general, and I don't think forcing people to modify their
code qualifies as "JIT performance gain".
-- 
Stas Malyshev
smalyshev@gmail.com