Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:83526
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <54EA7F15.9030606@gmail.com>
References: <2e4694f9805ee81ea0b2c79eab06c2d6@mail.gmail.com>
	<CAAyV7nExeZ1xouJAsACm9AvAGuQw0e-0FgVdEOAyzsC5OEn5_Q@mail.gmail.com>
	<54EA5EDA.8010605@gmail.com>
	<CAAyV7nFt3KL_uH4HuA5xQ7HY+3_3=fq=pg0_U1aMrdirETC6LQ@mail.gmail.com>
	<54EA6A99.5010609@gmail.com>
	<CAAyV7nE2Cfr0ohpPnXWnf5agmyGfeirNH-3UGyHEhsDaMvmsoQ@mail.gmail.com>
	<54EA7F15.9030606@gmail.com>
Date: Sun, 22 Feb 2015 20:43:29 -0500
Message-ID: <CAAyV7nHqeKUcnRBEhrSbxyCg-okxs4S9G95PsEhFyA-k-iDnUA@mail.gmail.com>
To: Stanislav Malyshev <smalyshev@gmail.com>
Cc: Zeev Suraski <zeev@zend.com>, Jefferson Gonzalez <jgmdev@gmail.com>, 
	PHP internals <internals@lists.php.net>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)
From: ircmaxell@gmail.com (Anthony Ferrara)

Stas,

>> It is still a performance advantage, because since we know the types
>> are stable at compile time, we can generate far more optimized code
>> (no variant types, native function calls, etc).
>
> I don't see where it comes from. So far you said that your compiler
> would reject some code. That doesn't generate any code, optimized or
> otherwise. For the code your compiler does not reject, still no
> advantage over dynamic model.

It rejects code because doing code generation on the dynamic case is
significantly harder and more resource intensive. Could that be built
in? Sure. But it's a very significant difference from generating the
static code.

And even if we generated native code for the dynamic code, it would
still need variants, and hence ZPP at runtime. Hence the static code
has a significant performance benefit in that we can indeed bypass
type checks as shown in the PECL example a few messages up (more than
a few).

>> Actually, in this case, the int cast does tell us something. It says
>> that the result (truncation) is explicitly wanted. Not to the compiler
>> (tho that happens), but to the developer.
>
> No, it doesn't say that in this case. The developer didn't actually want
> truncation. They just wanted to call foo(). You forced them to use
> truncation because that's the only way to call foo() in your compiler.
> They said it's ok since truncation is over value that is int anyway, and
> they are true - except when it stops to be true in the future. That
> generates brittle code because it forces the developer to take risks
> they otherwise wouldn't take - such as use much stronger forced
> conversions instead of more appropriate dynamic ones.

Look at the RFC that Zeev proposed:
https://wiki.php.net/rfc/coercive_sth#user-land_additions

Passing a float to an integer parameter would result in a runtime
E_RECOVERABLE_ERROR if the float has "dataloss".

So in the case I cited: foo($someint / 2), that will generate an
E_RECOVERABLE_ERROR in Zeev's proposal, as well as in the static
typing mode of mine.

Hence to say "casts are needed" is a bit over-stating this proposal...

>> With coercive typing as proposed in Ze'ev's RFC, that would need to
>> happen anyway. In both proposals that would generate a runtime error.
>
> No, it wouldn't need to happen since no-DL conversion is allowed.

Sure it would. 3/2 is 1.5. Which would fatal if I passed it to
foo(int) under Zeev's RFC. Because of data loss.

>> The difference is, with strict types, we can detect the error ahead of
>> time and warn about it.
>
> Static analyzer can warn about it regardless of type model. The only
> difference in strict model is that when compiling - not ahead of time,
> but in runtime - it would produce hard error even in case of even
> number, which can work just fine without it.

This very particular case, yes, because of the simplicity of the types
involved. But with strict typing you only need to look at 1 success
case, but with coercive typing you need to look at many more.

Also, in many (I'd argue most) cases coercive has to either issue a
warning (it doesn't know) or error on valid and functioning code.
Example:

function isdivisibleby2(string $foo): bool {
    if (preg_match('(\D)', $foo)) {
        return false;
    }
    return 0 == ($int % 2);
}

function something2(string $foo): int {
    if (!isdivisibleby2($foo)) {
        return 10;
    }
    return foo($foo / 2);
}

This code would never raise a runtime error in Zeev's coercive
proposal. However, when looking at it statically, you cant tell
(unless you've got a regex decompiler).

So static analysis on dynamic types will either error on valid code,
or not error on invalid code (and I'm not even talking about the
halting problem here).

Whereas with strict typing, the error would appear in both cases
(static and runtime). And you could fix it.

>> In this precise example there is none, because division is not type
>
> That's what I am saying - if the code runs, there's no difference. The
> only difference your model runs less code, and forces (or, rather,
> strongly incentivizes) people to wrote more dangerous one because some
> of the non-dangerous one is not allowed.

More dangerous?

>> stable (it depends on the values of its arguments). Let's take a
>> different example
>>
>> function foo(float $something): int {
>>     return $something + 0.5;
>> }
>>
>> With coercive types, you can't tell ahead of time if that will error
>> or not. With static types, you can.
>
> I'm not sure what this proves. Yes, of course there are cases where
> strict typing (please let's not confuse it with static typing - these
> are different things, static typing is when everything's type is known
> in advance and this is not happening in PHP, that's kind of the whole
> point) would disallow some code that dynamic typing allows. Nobody
> argues with that. What I am arguing with is that this difference is
> somehow useful - especially for JIT optimizations.

I've shown it a few times in this thread. So far nobody has said "not
possible" to the code sample I showed above. But I'll quote it here
again:

PHP_FUNCTION(test_strict) {
    zend_bool valid_return = 0;
    if (!zend_parse_parameters(...)) {
        return;
    }
    internal_test_strict(&valid_return);
}

void internal_test_strict(zend_bool *valid_return) {
    //outer_code
    zend_bool foo_valid = 0;
    internal_strict_foo(x, &foo_valid);
    if (!foo_valid) {
        throw_error();
    }
}

That has a demonstrable performance benefit. And while it may be
possible with a limited subset of dynamic types, the analyzer is
significantly harder to build (and uses more resources) to determine
the types as effectively as you'd need to with strict types.


>> No, I was talking about trying to do the same trick (using native
>> function calls) with coercive types.
>
> I'm not sure what you are comparing to what. You provide some code and
> say "in my compiler, this code A would not work, while in dynamic model
> it would. Instead, you should write code B. This code B would run faster
> in my compiler". But that is not a proof your compiler is better!
> Because code B would also run faster in dynamic model, and in addition,
> code A would also run (though indeed not faster than B).
>
>> Actually, no. Coercive as proposed by Ze'ev would cause 8.5 to error
>> if passed to an int type hint. So you'd need the cast there as well.
>> Either that, or error at runtime as well.
>
> We were talking about the case where the argument was even, you must
> have missed that part. If the argument is not even, indeed both models
> would produce the same error, no difference there. The only difference
> in your model vs. dynamic model so far is that you forced the developer
> to do manual (int) instead of doing much smarter coercive check on
> entrance of foo(). There's no performance improvement in that and
> there's reliability decrease.

Unless the static analyzer can prove that it's even, then it's still
an error. That's the point of a type system.

Today, you write the code and assume it's always even. Great. Everything works.

Tomorrow, another dev on your team hooks into your code and calls that
function with an odd integer.

Now you're getting a type error because a precondition wasn't
expressed in code.

You want to write non-robust code, great! That's what weak/coercive
mode is for. You want the ability to have some type sanity? That's
what strict mode is for.

>> Hence, in both cases casts would be required. One could tell you ahead
>> of time where you forgot a cast, the other would wait until runtime
>> (when the edge-case was hit).
>
> You imply it's always the case of "forgot" and the casts always should
> be there, which is not the case - actually, as I already said, I think
> this is the main defect of your model, forcing manual casts everywhere.

NOOO, don't misunderstand me. The majority of the cases of a type
mismatch indicate that you're doing something wrong. In fact, I'd
argue there are only 2 reasons to use an explicit cast:

1. Being explicit that you want to drop precision: $x / $y
2. Because of internal functions returning improper types: floor($x + $y)

The majority of type errors, the correct action is to not add a cast
but to fix the types.

Look at basically every other typed language. Do you see casts
everywhere? No. Because type errors are **fixed**. Not just blindly
casted.

> OK, so I got it right and the "JIT advantage" was in fact not in JIT
> working better but in static analyzer forcing people to write the code
> that looks more JIT-friendly by rejecting some of the code that does
> what developer wants but the compiler/analyzer can't figure it out? But
> nothing prevents you from having the same JIT-friendly code in the first
> place, moreover - nothing prevents you from having static analyzer that
> generates alerts on code that you think may be non-JIT friendly and let
> the developer decide if that's what they want. All that does not require
> strict typing at runtime in any form.

No, but it does VASTLY increase the complexity and resource
consumption of such a analyzer (dealing with coercive types).

> I of course completely disagree in correctness part - I think making
> people use forced conversions (which are OK with data loss in many
> cases) is worse than using smarter coercive typing.

And we can disagree there. That's an opinion. Though I do think
there's enough evidence of benefits to it to not entirely dismiss it,
but that's your interpretation.

>> Their code is buggy today. All the static type system does is show it
>> to them ahead of time, rather than relying on test failures or bugs to
>> show it.
>
> You seem to be talking about static type system, but PHP has no static
> type system and strict typing of function parameters does not introduce
> one.
>
>> And to be fair I haven't really been talking generic JIT, but generic
>> AOT (which can include local-JIT compilation).
>
> Even for AOT, I don't see any advantage for strict typing on the same
> code. The only difference is that strict AOT compiler would reject some
> code and some of that code may be non-JIT-friendly. On accepted code,
> again, I see no difference.

I showed you several. I'm not going to go in circles because we have a
failure in communication.

Anthony