JIT (was RE: Coercive Scalar Type Hints RFC)

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Jefferson Gonzalez [mailto:jgmdev@gmail.com]
Sent: Sunday, February 22, 2015 4:25 PM
To: Etienne Kneuss; Anthony Ferrara; Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Jefferson,

Please note that Anthony, the lead of the Dual Mode RFC, said this earlier
on this thread, referring to the claim that Strict STH can improve JIT/AOT
compilation:

"A statement which I said on the side, and I said should not impact RFC or
voting in any way. And is in no part in my RFC at all."

Please also see:

marc.info/?l=php-internals&m=142439750614527&w=2

So while Anthony and I don't agree on whether there are performance gains
to be had from Strict STH, both of us agree that it's not at a level that
should influence our decision regarding the RFCs on the table.

I wholeheartedly agree with that stance, which is why I also listed the
apparently extremely widespread misconception (IMHO) that Strict STH can
meaningfully help JIT/AOT in my RFC.

Despite that, as your email suggests, there are still (presumably a lot)
of people out there that assume that there are, in fact, substantial gains
to be had from JIT/AOT if we introduce Strict STH. I'm going to take
another stab at explaining why that's not the case.

A JIT or AOT machine code generator IMHO will never have a decent use of
system resources without some sort of strong/strict typed rules,
somebody
explain if thats not the case.

It kind of is and kind of isn't.

There's consensus, I think, that if PHP was completely strongly typed -
i.e., all variables need to be declared and typed ahead of time, cannot
change types, etc. - we'd clearly be able to create a lot of optimizations
in AOT that we can't do today. That the part that 'is the case'. But
nobody is suggesting that we do that. The discussion on the table is
very, very narrow:

-- Can the code generated for a strict type hint can somehow be optimized
significantly better than the code generated for a dynamic/coercive type
hint.

And here, I (as well as Dmitry, who actually wrote a JIT compiler for PHP)
claim that it isn't the case. To be fair, there's no consensus on this
point.

Let me attempt, again, to explain why we don't believe there are any gains
associated with Strict STH, be them with the regular engine, JIT or AOT.

Consider the following code snippet:

function strict_foo($x)
{
if (!is_int($x)) {
trigger_error();
}
.inner_code.
}

function very_lax_foo($x)
{
$x = (int) $x;
.inner_code.
}

function test_strict()
{
.outer_code.
strict_foo($x);
}

function test_lax()
{
.outer_code.
very_lax_foo($x);
}

test_strict();
test_lax();

strict_foo() implements a pretty much identical check to the one that a
Strict integer STH would perform.
very_lax_foo() implements an explicit type conversion to int, that can
pretty much never fail - which is significantly more lax than what is
proposed for weak types in the Dual Mode RFC, and even more so compared to
the Coercive STH RFC.
.inner_code. is identical between the two foo() functions, and
.outer_code. is identical between the two tester functions.

The claim that strict types can be more efficiently optimized than more
lax types, suggests it should be possible to optimize the code flow for
test_strict()/strict_foo() significantly better than for very_lax_foo()
using JIT/AOT.

Let's dive in. Beginning with the easy part, that's been mentioned
countless times - it's clear that it's just as easy (or hard) to optimize
the .inner_code. block in the two implementations of foo(). It can bank
on the exact same assumptions - $x would be an integer. So we can
optimize the two function bases to exactly the same level. For example,
if we're sure that $x inside the function never changes type - it can be
optimized down to a C-level long. That's oversimplifying things a bit,
but the important thing here is that it can be easily proven that the two
function bodies can be optimized to the exact same level, for better or
worse. The only difference between them is how they handle non-integer
inputs; The strict implementation errors out if it gets a non-integer
typed value, while the lax version happily accepts everything. But that's
a functionality difference, not a performance one (i.e., if you want the
value to be accepted in the strict case, you'd manually conduct the
conversion before the call is made, or sooner - resulting in roughly the
same behavior and performance).

Now the slightly trickier part - the .outer_code. block. What can we say
about the type of $x, without knowing what code is in there? Not a whole
lot. We know that if $x isn't going to be of type int at the end of this
block, test_strict() is going to fail, but that doesn't mean $x will truly
be an int. The fact I want to be young and healthy doesn't mean I'm going
to magically become young and healthy :)

Let's dive further. Assuming we don't have strong variable type
declarations (i.e., int $x; $x = "foo"; // fails!), there are two
possible outcomes from analyzing the .outer_code. block:

We can infer (with varying levels of confidence) that $x is going to be
an int right before the call to foo() is made. Whether we can infer that
or not has nothing to do with any implementation detail of foo(),
including whether it's using Strict type hints, Weak type hints, or
conducts nuclear simulations. It has only to do with the code in
.outer_code., which means we can do it equally well (or not so well) in
both test_strict() and test_lax(). Before anybody asks, the levels of
confidence would also not vary between the two flavors, and it too, has
only to do with what's written in the .outer_code. block.
We cannot determine what type $x is going to be right before the call
to foo() is made. Here too, our ability or inability to determine that is
identical between test_strict() and test_lax(), and has only to do with
our ability to analyze .outer_block., nothing else.

Now, let's continue to dive into the first scenario, as the second one can
obviously not be optimized in any meaningful way.

To simplify things, let's assume we can know - with absolute confidence,
that $x is an int right before the call. Can we somehow optimize
test_strict()/strict_foo() better than test_lax()/very_lax_foo()? The
answer is - no, not really. We could optimize them down to the exact same
machine level code - bypassing the is_strict() check in the strict_foo()
case, and the explicit cast in very_lax_foo() case. With absolute
confidence that $x is an int, we could have a single long pass the
caller-callee boundary in both cases. This is easier said than done -
JIT/AOT are quite complex - but it's equally hard in both cases, and the
end result is identical.

As I see it, some example, if the JIT generated C++ code to then
generate the
machine code:

function calc(int $val1, int $val2) : int {return $val1 + $val2;}

How does that handle the situation where $val1 is a float? Or a string?
Or an array?
Here you are already assuming that you KNOW $val1 and $val2 are ints, but
nothing about declaring $val1/$val2 as as 'strict int' implies anything
regarding the value/type that will actually be passed to the function.

On weak mode I see the generated code would be something like this:

Variant* calc(Variant& val1, Variant& val2) {
if(val1.isInt() && val2.isInt())
return new Variant(val1.toInt() + val2.toInt());
 else if(val1.isFloat() && val2.isFloat())
     return new Variant(val1.toInt() + val2.toInt());
 else
     throw new RuntimeError();
}

Technically it'd be more like this:

Variant* calc(Variant& val1, Variant& val2) {
if(val1.isInt() ) {
// type checking
if (!val1.coerceToInt()) {
throw new RuntimeError()
}
If (!val2.coerceToInt()) {
throw new RuntimeError();
}

    // function body begins here
    int result = Variant(val1.intValue() + val2.intValue());
    return result;

}

But the code that's generated for strict typing would actually not look
very different, if it can't assume that val1 & val2 are ints. A generic
implementation that has no insight about the incoming values would look
almost identical:

Variant* calc(Variant& val1, Variant& val2) {
if(val1.isInt() ) {
// type checking
if (!val1.isInt()) {
throw new RuntimeError()
}
If (!val2.isInt()) {
throw new RuntimeError();
}

    // function body begins here
    int result = Variant(val1.intValue() + val2.intValue());
    return result;

}

Between both cases, the actual code body (after the 'function body begins
here' comment), can be optimized in exactly the same way, and probably all
the way down to this:

{
long val1, val2;

return val1+val2;
}

In the strict case, if we somehow know ahead of time or just in time that
val1.isInt() and val2.isInt() are true (for a particular instance of a
call to calc()) - we can create a more efficient function calling code
that will do away with all of type checking, and go directly to the
function body.

But is the Dynamic version any different? Not at all.
In fact, if val1.isInt() and val2.isInt() are true, we can equally bypass
the coerceToInt() calls in the dynamic version - as we already know it's
an int! What are we left with? The exact same code, down to the last
bit.

To summarize, the difference between the different flavors of STH simply
has no meaningful impact on the performance of the current non-JITted PHP
as well as potential future JIT/AOT engines. If you can infer the caller
argument types - you can reach the same level of performance in all the
different types of STH. If you can't infer the types - the code would
look remarkably similar between the two, with the big difference being
behavioral - not performance. All types of STH give you the exact same
input for optimizing the callee, down to the last bit.

Does weak mode could provide the required rules to implement a JIT
with
a sane level of memory and CPU usage?

It would use the exact same amount of memory and CPU as strict mode. The
hard part remains inferring types - and as I hoped I illustrated, there's
no different between different flavors of STH in that front.

I see that the proponents of dual weak/strict modes are offering to
write a
AOT implementation if strict makes it, And the impresive work of Joe
(JITFU)
and Anthony on recki-ct with the strict mode could be taken to another
level
of integration with PHP and performance. IMHO is harder and more
resource
hungry to implement a JIT/AOT using weak mode. With that said, if a JIT
implementation is developed will the story of the ZendOptimizer being a
commercial solution will be repeated or would this JIT implementation
would
be part of the core?

Actually, what I'm seeing is the proponents of dual weak/strict mode
saying that JIT/AOT should not be a part of the discussion around the
RFCs, and people refusing to accept that! :)
But more importantly, as early as when we announced PHPNG, we said we'd
want to look into JIT solutions once it ships. We'd obviously want to
cooperate with everyone interested to try and create the best JIT
implementation possible once PHP 7 is out the door, as a part of an open
effort. If it's ever any good, it'll make it into the core, if accepted.

Thats all that comes to mind now, and while many people doesn't care for
performance, IMHO a programming language mainly targeted for the web
should have some caring on this department.

I think it's fair to say that Dmitry - who led the PHPNG effort - cares a
lot performance. I'm sure you'd agree. I tend to think that I also care
a lot about performance, and so does Xinchen. We all spent substantial
parts of our lives working to speed PHP up. It's not whether we think
performance is important - it is (although we do believe we should build
optimizers for languages, more so than languages for optimizers). It's
just that we all fail to see how the flavor of STH can have any meaningful
influence on performance.

Thanks for the feedback!

Zeev

10 years ago by jgmdev@gmail.com — view source

unread

2015-02-22 14:37 GMT-04:00 Zeev Suraski zeev@zend.com:

I think it's fair to say that Dmitry - who led the PHPNG effort - cares a
lot performance. I'm sure you'd agree. I tend to think that I also care
a lot about performance, and so does Xinchen. We all spent substantial
parts of our lives working to speed PHP up. It's not whether we think
performance is important - it is (although we do believe we should build
optimizers for languages, more so than languages for optimizers). It's
just that we all fail to see how the flavor of STH can have any meaningful
influence on performance.

Thanks for the feedback!

Zeev

Thanks for the insightful response! Now it would be nice to also see the
opinions of the other camp.

10 years ago by Lester Caine — view source

unread

Variant* calc(Variant& val1, Variant& val2) {
if(val1.isInt() ) {
// type checking
if (!val1.coerceToInt()) {
throw new RuntimeError()
}
If (!val2.coerceToInt()) {
throw new RuntimeError();
}
    // function body begins here
    int result = Variant(val1.intValue() + val2.intValue());
    return result;
}

A more practical example would be to replace coerceToInt() with
inRange() which includes an int check/'coerce' as part of the range
check, and produce a number of errors based on the result.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Anthony Ferrara — view source

unread

Zeev,

-----Original Message-----
From: Jefferson Gonzalez [mailto:jgmdev@gmail.com]
Sent: Sunday, February 22, 2015 4:25 PM
To: Etienne Kneuss; Anthony Ferrara; Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC

Jefferson,

Please note that Anthony, the lead of the Dual Mode RFC, said this earlier
on this thread, referring to the claim that Strict STH can improve JIT/AOT
compilation:

"A statement which I said on the side, and I said should not impact RFC or
voting in any way. And is in no part in my RFC at all."

Please also see:

marc.info/?l=php-internals&m=142439750614527&w=2

So while Anthony and I don't agree on whether there are performance gains
to be had from Strict STH, both of us agree that it's not at a level that
should influence our decision regarding the RFCs on the table.

I wholeheartedly agree with that stance, which is why I also listed the
apparently extremely widespread misconception (IMHO) that Strict STH can
meaningfully help JIT/AOT in my RFC.

So you agree we shouldn't discuss it, then you go ahead and discuss
it. I guess that shouldn't surprise me.

Despite that, as your email suggests, there are still (presumably a lot)
of people out there that assume that there are, in fact, substantial gains
to be had from JIT/AOT if we introduce Strict STH. I'm going to take
another stab at explaining why that's not the case.

A JIT or AOT machine code generator IMHO will never have a decent use of
system resources without some sort of strong/strict typed rules,
somebody
explain if thats not the case.

It kind of is and kind of isn't.

There's consensus, I think, that if PHP was completely strongly typed -
i.e., all variables need to be declared and typed ahead of time, cannot
change types, etc. - we'd clearly be able to create a lot of optimizations
in AOT that we can't do today. That the part that 'is the case'. But
nobody is suggesting that we do that. The discussion on the table is
very, very narrow:

There's no consensus there. As I've pointed out to you more than once,
plenty of other languages manage this through type inference or
reconstruction. Many (like Go) only requiring explicit types on the
parameters, not on variables.

Heck, I did exactly that in Recki-CT. So please don't dismiss
something that's being done IN THE PHP WORLD just because you
don't think it's possible.

-- Can the code generated for a strict type hint can somehow be optimized
significantly better than the code generated for a dynamic/coercive type
hint.

And here, I (as well as Dmitry, who actually wrote a JIT compiler for PHP)
claim that it isn't the case. To be fair, there's no consensus on this
point.

And me, who wrote an AOT compiler that does exactly this, claim
that it is the case. Along with other people who've worked on
compilers. See the reply in a private thread you started that shows
the tradeoffs, specifically in generated code efficiency and memory
usage.

You can keep ignoring the arguments, but PLEASE don't keep spreading
them as "fact".

Also: if Dmitry worked on a JIT compiler, why isn't that code out in
the open? And if the code isn't out, why isn't the knowledge open? Are
we just supposed to rely on a single person's experience (especially
when more than one other person's shared experiences differ)?

Let me attempt, again, to explain why we don't believe there are any gains
associated with Strict STH, be them with the regular engine, JIT or AOT.

Consider the following code snippet:

function strict_foo($x)
{
if (!is_int($x)) {
trigger_error();
}
.inner_code.
}

function very_lax_foo($x)
{
$x = (int) $x;
.inner_code.
}

function test_strict()
{
.outer_code.
strict_foo($x);
}

function test_lax()
{
.outer_code.
very_lax_foo($x);
}

test_strict();
test_lax();

strict_foo() implements a pretty much identical check to the one that a
Strict integer STH would perform.
very_lax_foo() implements an explicit type conversion to int, that can
pretty much never fail - which is significantly more lax than what is
proposed for weak types in the Dual Mode RFC, and even more so compared to
the Coercive STH RFC.
.inner_code. is identical between the two foo() functions, and
.outer_code. is identical between the two tester functions.

The claim that strict types can be more efficiently optimized than more
lax types, suggests it should be possible to optimize the code flow for
test_strict()/strict_foo() significantly better than for very_lax_foo()
using JIT/AOT.

Assuming that they were split in the files appropriately, you are
missing THE key thing we've been trying to tell you this entire
time. Looking at a single function, yes there is no difference if it's
strict or not (well, you can save some time on the next function call
inside, but it's small). However we're not talking about looking at a
single function.

In your code, a compiler could compile test_strict into a "php"
function. Which would (assuming that it accepted arguments) accept a
ZVAL. Then it would do any type assertions necessary. At this point
there's no difference between the approaches.

However, since test_strict() is compiled, there's no reason to
dispatch back up to PHP functions for strict_foo(). In fact, that
would be exceedingly slow. So instead, we'd compile strict_foo() as a
C function, and do a native function call to it. Never having to check
types because they are passed on the C stack.

That's precisely what I'm doing with Recki-CT when generating PECL
extensions from PHP code. It would look like this:

PHP_FUNCTION(test_strict) {
zend_bool valid_return = 0;
if (!zend_parse_parameters(...)) {
return;
}
internal_test_strict(&valid_return);
}

void internal_test_strict(zend_bool *valid_return) {
//outer_code
zend_bool foo_valid = 0;
internal_strict_foo(x, &foo_valid);
if (!foo_valid) {
throw_error();
}
}

And then generate the same proxy structure for strict_foo. So there's
no need to ever do type assertions after the first external call (from
weak mode).

So that means there's no need to build up a ZVAL from native code.
There's no reason to type assert it. There's no reason to dispatch a
PHP-level function.

And note that this can only work with strict types since you can do
the necessary type inference and reconstruction (both forward from a
function call, and backwards before it).

With lax (weak, coercive) types, the ability to do type reconstruction
drops significantly. Because you can no longer do any backwards
inference from other function calls. Which means you can't prove if a
type is stable in most cases (won't change). Therefore, you'll always
have to allocate a ZVAL, and then the optimizations I showed above
would stop working.

To simplify things, let's assume we can know - with absolute confidence,
that $x is an int right before the call. Can we somehow optimize
test_strict()/strict_foo() better than test_lax()/very_lax_foo()? The
answer is - no, not really. We could optimize them down to the exact same
machine level code - bypassing the is_strict() check in the strict_foo()
case, and the explicit cast in very_lax_foo() case. With absolute
confidence that $x is an int, we could have a single long pass the
caller-callee boundary in both cases. This is easier said than done -
JIT/AOT are quite complex - but it's equally hard in both cases, and the
end result is identical.

Actually, I just demonstrated how, yes, we can optimize the entire
thing better than test_lax.

Does weak mode could provide the required rules to implement a JIT
with
a sane level of memory and CPU usage?

It would use the exact same amount of memory and CPU as strict mode. The
hard part remains inferring types - and as I hoped I illustrated, there's
no different between different flavors of STH in that front.

This was demonstrated to be false both by my code above (in which
calls from compiled code would use significantly less memory due to
not needing zvals at all), and by mails in private that show existing
JIT compilers and their memory overheads (specifically around V8's
engine vs others).

I see that the proponents of dual weak/strict modes are offering to
write a
AOT implementation if strict makes it, And the impresive work of Joe
(JITFU)
and Anthony on recki-ct with the strict mode could be taken to another
level
of integration with PHP and performance. IMHO is harder and more
resource
hungry to implement a JIT/AOT using weak mode. With that said, if a JIT
implementation is developed will the story of the ZendOptimizer being a
commercial solution will be repeated or would this JIT implementation
would
be part of the core?

Actually, what I'm seeing is the proponents of dual weak/strict mode
saying that JIT/AOT should not be a part of the discussion around the
RFCs, and people refusing to accept that! :)

You made it part of the conversation by having it in your RFC. You
make it part of the conversation by repeating the "It's our opinion
that..." rhetoric.

It should be removed from your RFC completely (including any reference
to performance or compilation other than of the direct implementation
being provided). It's not in my RFC at all. The fact that it's in
yours does raise some questions.

But more importantly, as early as when we announced PHPNG, we said we'd
want to look into JIT solutions once it ships. We'd obviously want to
cooperate with everyone interested to try and create the best JIT
implementation possible once PHP 7 is out the door, as a part of an open
effort. If it's ever any good, it'll make it into the core, if accepted.

Do you want to cooperate? Then share the work that was done by Dmitry
already. Keeping bringing it up while refusing to share it is the
exact opposite of cooperation.

Anthony

10 years ago by Stanislav Malyshev — view source

unread

Hi!

-- Can the code generated for a strict type hint can somehow be optimized
significantly better than the code generated for a dynamic/coercive type
hint.
And me, who wrote an AOT compiler that does exactly this, claim

Sorry, did exactly what? Here a bit more explanation would help.

However, since test_strict() is compiled, there's no reason to
dispatch back up to PHP functions for strict_foo(). In fact, that
would be exceedingly slow. So instead, we'd compile strict_foo() as a
C function, and do a native function call to it. Never having to check
types because they are passed on the C stack.

Doesn't that assume strict_foo() is always called with the right type of
arguments? What exactly ensures that it does in fact happen? Shouldn't
you have the type check somewhere to be able to claim this happens?
test_foo() doesn't do any checks, so what ensures $x is of the right
type for C? And if the check is there, how is it better?

And note that this can only work with strict types since you can do
the necessary type inference and reconstruction (both forward from a
function call, and backwards before it).

I don't get the backwards part - I think you claimed it last time we
discussed it but I haven't seen your answer explaining why it's OK to
just ignore cases when the variable is of the wrong type. Right now, it
looks like you claim that if somebody has a call strict_foo($x) and
strict_foo() accepts integers, that magically makes $x integer and you
can generate code everywhere (not only inside strict_foo but outside)
assuming $x is integer without actually needing a check. I don't see how
this can work.

Stas Malyshev
smalyshev@gmail.com

10 years ago by Anthony Ferrara — view source

unread

Stas,

-- Can the code generated for a strict type hint can somehow be optimized
significantly better than the code generated for a dynamic/coercive type
hint.
And me, who wrote an AOT compiler that does exactly this, claim

Sorry, did exactly what? Here a bit more explanation would help.

Optimized statically typed PHP functions. Or more specifically
function calls inside of compiled code are treated strictly (so trying
to pass a float to an int typed function would error at compile time).
The outer function call (from non-compiled PHP) is parsed using ZPP
rules, but once it's inside it's strict.

https://github.com/google/recki-ct/blob/master/doc/0_introduction.md
https://github.com/google/recki-ct/blob/master/doc/2_basic_operation.md

However, since test_strict() is compiled, there's no reason to
dispatch back up to PHP functions for strict_foo(). In fact, that
would be exceedingly slow. So instead, we'd compile strict_foo() as a
C function, and do a native function call to it. Never having to check
types because they are passed on the C stack.

Doesn't that assume strict_foo() is always called with the right type of
arguments? What exactly ensures that it does in fact happen? Shouldn't
you have the type check somewhere to be able to claim this happens?
test_foo() doesn't do any checks, so what ensures $x is of the right
type for C? And if the check is there, how is it better?

Yes it does check the types, but at compile time. My AOT compiler
backend has no concept of a "mixed" or ZVAL type. All types are
determined at compile time, and in the very few cases it can't it will
error. The type inference engine attempts to determine specifically
using all available information (prior context, current context,
future context) to determine what the type is.

It does also detect type changes (via assignment) and is able to
correctly generate code based on that as well.

And note that this can only work with strict types since you can do
the necessary type inference and reconstruction (both forward from a
function call, and backwards before it).

I don't get the backwards part - I think you claimed it last time we
discussed it but I haven't seen your answer explaining why it's OK to
just ignore cases when the variable is of the wrong type. Right now, it
looks like you claim that if somebody has a call strict_foo($x) and
strict_foo() accepts integers, that magically makes $x integer and you
can generate code everywhere (not only inside strict_foo but outside)
assuming $x is integer without actually needing a check. I don't see how
this can work.

Ok, let's take another example:

<?php declare(strict_types=1);
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = $something / 2;
return foo($x);
}

^^ In that case, without strict types, you'd have to generate code for
both integer and float paths. With strict types, this code is invalid.

You can tell because you know the function foo expects an integer. So
you can infer that $x will have to have the type integer due to the
future requirement. Which means the expression "$something / 2" must
also be an integer. We know that's not the case, so we can raise an
error here.

At that point the developer has the choice to explicitly cast or put
in a floor() or one of a number of options.

The function "bar" itself didn't give us that information. We needed
to use the type information from foo() to infer the type of $x prior
to foo()'s call. Or more specifically, we inferred the only stable
type that it could be. Which let us determine that $x's assignment was
where the error was (since it wasn't a stable assignment).

Without strict typing this code is always stable, but you still need
to generate full type assertions in a compiled version of foo() and
use ZVALs for $x, hence reducing the effect of the optimization
significantly.

Anthony

10 years ago by Stanislav Malyshev — view source

unread

Hi!

You can tell because you know the function foo expects an integer. So
you can infer that $x will have to have the type integer due to the
future requirement. Which means the expression "$something / 2" must
also be an integer. We know that's not the case, so we can raise an
error here.

OK, so your claim is that the compiler with strict typing can detect
some situations which the dynamic one can not and reject some of the
code. Without going too much into details, I agree with this, this is an
obvious difference between strict and dynamic. However, this is not a
performance advantage, obviously - since you are comparing running code
with non-running one - your model just accepts less code. Obviously,
this works if non-accepted code was wrong - and doesn't work if it was
not. But we talked about running code, I thought.

At that point the developer has the choice to explicitly cast or put
in a floor() or one of a number of options.

That's exactly what I claim would be the defect of the strict model -
people would start putting excessive casts ensuring there would be cases
where information is lost. For example, assume we knew $something is even:

function bar(int $something): int {
assert($something %2 == 0);
$x = $something / 2;
return foo($x);
}

Now everything is fine (ignoring the typing for a second), right? We're
dealing with integers, /2 always divides evenly, all is great. Now we
introduce strictness, so we'd need to say something like:

function bar(int $something): int {
assert($something %2 == 0);
$x = $something / 2;
return foo((int)$x);
}

Now assume somebody messed up on the routine code reformatting merge and
the code somehow ended up like:

function bar(int $something): int {
$x = $something / 2;
return foo((int)$x);
}

Do you see what the problem is? Now we lost the check for $something
being even, but we would never know about it since type system forced us
to insert (int) (which we didn't need) and thus disabled the controls
for the bug of $something not being even (which we did need).

But more important question is - with (int) the coercive model can use
this information too, so what's the difference from strict model on that
code? There seems to be none.

Without strict typing this code is always stable, but you still need
to generate full type assertions in a compiled version of foo() and
use ZVALs for $x, hence reducing the effect of the optimization
significantly.

Wait, you said "this code is invalid" so no code will be generated. Did
you mean code after introducing (int)? Then strict has no advantage
anymore as we can derive the info from (int) anyway.
Otherwise, I can't see how you can avoid generating typechecks in foo()
unless the only place it can ever be called from is bar() - but I don't
see how you can ensure that in PHP, and if you could, I don't see why
weak model could not make the same conclusions on the same code.

So far the only "advantage" I've seen seems to be that your compiler
would reject code that looks suspicious to it and thus force the
programmer to coerce the variables into the types manually - by (int) or
`floor()` - something that the coercive model would do for you
automatically. Once coerced, the same code would have the same type info
(and thus same potential optimizations) in both models. I don't think it
is a gain in general, and I don't think forcing people to modify their
code qualifies as "JIT performance gain".

Stas Malyshev
smalyshev@gmail.com

10 years ago by Anthony Ferrara — view source

unread

Stas,

Hi!

You can tell because you know the function foo expects an integer. So
you can infer that $x will have to have the type integer due to the
future requirement. Which means the expression "$something / 2" must
also be an integer. We know that's not the case, so we can raise an
error here.

OK, so your claim is that the compiler with strict typing can detect
some situations which the dynamic one can not and reject some of the
code. Without going too much into details, I agree with this, this is an
obvious difference between strict and dynamic. However, this is not a

Alright, we're getting somewhere.

performance advantage, obviously - since you are comparing running code
with non-running one - your model just accepts less code. Obviously,
this works if non-accepted code was wrong - and doesn't work if it was
not. But we talked about running code, I thought.

It is still a performance advantage, because since we know the types
are stable at compile time, we can generate far more optimized code
(no variant types, native function calls, etc).

And yes, it accepts less code. It refuses to accept code that is not
type stable. More on that in a second:

At that point the developer has the choice to explicitly cast or put
in a floor() or one of a number of options.

That's exactly what I claim would be the defect of the strict model -
people would start putting excessive casts ensuring there would be cases
where information is lost. For example, assume we knew $something is even:

function bar(int $something): int {
assert($something %2 == 0);
$x = $something / 2;
return foo($x);
}

Now everything is fine (ignoring the typing for a second), right? We're
dealing with integers, /2 always divides evenly, all is great. Now we
introduce strictness, so we'd need to say something like:

function bar(int $something): int {
assert($something %2 == 0);
$x = $something / 2;
return foo((int)$x);
}

Now assume somebody messed up on the routine code reformatting merge and
the code somehow ended up like:

function bar(int $something): int {
$x = $something / 2;
return foo((int)$x);
}

Do you see what the problem is? Now we lost the check for $something
being even, but we would never know about it since type system forced us
to insert (int) (which we didn't need) and thus disabled the controls
for the bug of $something not being even (which we did need).

Actually, in this case, the int cast does tell us something. It says
that the result (truncation) is explicitly wanted. Not to the compiler
(tho that happens), but to the developer.

With coercive typing as proposed in Ze'ev's RFC, that would need to
happen anyway. In both proposals that would generate a runtime error.
The difference is, with strict types, we can detect the error ahead of
time and warn about it.

But more important question is - with (int) the coercive model can use
this information too, so what's the difference from strict model on that
code? There seems to be none.

In this precise example there is none, because division is not type
stable (it depends on the values of its arguments). Let's take a
different example

function foo(float $something): int {
return $something + 0.5;
}

With coercive types, you can't tell ahead of time if that will error
or not. With static types, you can.

Without strict typing this code is always stable, but you still need
to generate full type assertions in a compiled version of foo() and
use ZVALs for $x, hence reducing the effect of the optimization
significantly.

Wait, you said "this code is invalid" so no code will be generated. Did
you mean code after introducing (int)? Then strict has no advantage
anymore as we can derive the info from (int) anyway.
Otherwise, I can't see how you can avoid generating typechecks in foo()
unless the only place it can ever be called from is bar() - but I don't
see how you can ensure that in PHP, and if you could, I don't see why
weak model could not make the same conclusions on the same code.

No, I was talking about trying to do the same trick (using native
function calls) with coercive types.

So far the only "advantage" I've seen seems to be that your compiler
would reject code that looks suspicious to it and thus force the
programmer to coerce the variables into the types manually - by (int) or
floor() - something that the coercive model would do for you
automatically. Once coerced, the same code would have the same type info

Actually, no. Coercive as proposed by Ze'ev would cause 8.5 to error
if passed to an int type hint. So you'd need the cast there as well.
Either that, or error at runtime as well.

Hence, in both cases casts would be required. One could tell you ahead
of time where you forgot a cast, the other would wait until runtime
(when the edge-case was hit).

(and thus same potential optimizations) in both models. I don't think it
is a gain in general, and I don't think forcing people to modify their
code qualifies as "JIT performance gain".

Sure it does. If their code is not type-stable, even simply telling
them that can give advantages both in performance and in correctness.
Their code is buggy today. All the static type system does is show it
to them ahead of time, rather than relying on test failures or bugs to
show it.

And to be fair I haven't really been talking generic JIT, but generic
AOT (which can include local-JIT compilation).

Anthony

10 years ago by Stanislav Malyshev — view source

unread

Hi!

It is still a performance advantage, because since we know the types
are stable at compile time, we can generate far more optimized code
(no variant types, native function calls, etc).

I don't see where it comes from. So far you said that your compiler
would reject some code. That doesn't generate any code, optimized or
otherwise. For the code your compiler does not reject, still no
advantage over dynamic model.

Actually, in this case, the int cast does tell us something. It says
that the result (truncation) is explicitly wanted. Not to the compiler
(tho that happens), but to the developer.

No, it doesn't say that in this case. The developer didn't actually want
truncation. They just wanted to call foo(). You forced them to use
truncation because that's the only way to call foo() in your compiler.
They said it's ok since truncation is over value that is int anyway, and
they are true - except when it stops to be true in the future. That
generates brittle code because it forces the developer to take risks
they otherwise wouldn't take - such as use much stronger forced
conversions instead of more appropriate dynamic ones.

With coercive typing as proposed in Ze'ev's RFC, that would need to
happen anyway. In both proposals that would generate a runtime error.

No, it wouldn't need to happen since no-DL conversion is allowed.

The difference is, with strict types, we can detect the error ahead of
time and warn about it.

Static analyzer can warn about it regardless of type model. The only
difference in strict model is that when compiling - not ahead of time,
but in runtime - it would produce hard error even in case of even
number, which can work just fine without it.

In this precise example there is none, because division is not type

That's what I am saying - if the code runs, there's no difference. The
only difference your model runs less code, and forces (or, rather,
strongly incentivizes) people to wrote more dangerous one because some
of the non-dangerous one is not allowed.

stable (it depends on the values of its arguments). Let's take a
different example

function foo(float $something): int {
return $something + 0.5;
}

With coercive types, you can't tell ahead of time if that will error
or not. With static types, you can.

I'm not sure what this proves. Yes, of course there are cases where
strict typing (please let's not confuse it with static typing - these
are different things, static typing is when everything's type is known
in advance and this is not happening in PHP, that's kind of the whole
point) would disallow some code that dynamic typing allows. Nobody
argues with that. What I am arguing with is that this difference is
somehow useful - especially for JIT optimizations.

No, I was talking about trying to do the same trick (using native
function calls) with coercive types.

I'm not sure what you are comparing to what. You provide some code and
say "in my compiler, this code A would not work, while in dynamic model
it would. Instead, you should write code B. This code B would run faster
in my compiler". But that is not a proof your compiler is better!
Because code B would also run faster in dynamic model, and in addition,
code A would also run (though indeed not faster than B).

Actually, no. Coercive as proposed by Ze'ev would cause 8.5 to error
if passed to an int type hint. So you'd need the cast there as well.
Either that, or error at runtime as well.

We were talking about the case where the argument was even, you must
have missed that part. If the argument is not even, indeed both models
would produce the same error, no difference there. The only difference
in your model vs. dynamic model so far is that you forced the developer
to do manual (int) instead of doing much smarter coercive check on
entrance of foo(). There's no performance improvement in that and
there's reliability decrease.

Hence, in both cases casts would be required. One could tell you ahead
of time where you forgot a cast, the other would wait until runtime
(when the edge-case was hit).

You imply it's always the case of "forgot" and the casts always should
be there, which is not the case - actually, as I already said, I think
this is the main defect of your model, forcing manual casts everywhere.
Otherwise, I agree - that's the only difference. Still struggle to see
any JIT gain. So far only one advantage demonstrated was the obvious one

if you obviously pass obvious non-int to int parameter in strict
model, this can be detected statically. It would be stupid to deny that
as it is pretty much immediately follows from the definition of strict
model. But that's the only difference I see and not much of an advantage
in my eyes as a) patently obvious cases would be pretty rare and b) in
many cases would also not be what developer wanted, leading to manual
casts and c) last but not least, static analyzer doing that can be as
easily written without having these strict rules in core PHP!

Sure it does. If their code is not type-stable, even simply telling
them that can give advantages both in performance and in correctness.

OK, so I got it right and the "JIT advantage" was in fact not in JIT
working better but in static analyzer forcing people to write the code
that looks more JIT-friendly by rejecting some of the code that does
what developer wants but the compiler/analyzer can't figure it out? But
nothing prevents you from having the same JIT-friendly code in the first
place, moreover - nothing prevents you from having static analyzer that
generates alerts on code that you think may be non-JIT friendly and let
the developer decide if that's what they want. All that does not require
strict typing at runtime in any form.

I of course completely disagree in correctness part - I think making
people use forced conversions (which are OK with data loss in many
cases) is worse than using smarter coercive typing.

Their code is buggy today. All the static type system does is show it
to them ahead of time, rather than relying on test failures or bugs to
show it.

You seem to be talking about static type system, but PHP has no static
type system and strict typing of function parameters does not introduce
one.

And to be fair I haven't really been talking generic JIT, but generic
AOT (which can include local-JIT compilation).

Even for AOT, I don't see any advantage for strict typing on the same
code. The only difference is that strict AOT compiler would reject some
code and some of that code may be non-JIT-friendly. On accepted code,
again, I see no difference.

Stas Malyshev
smalyshev@gmail.com

10 years ago by Anthony Ferrara — view source

unread

Stas,

It is still a performance advantage, because since we know the types
are stable at compile time, we can generate far more optimized code
(no variant types, native function calls, etc).

I don't see where it comes from. So far you said that your compiler
would reject some code. That doesn't generate any code, optimized or
otherwise. For the code your compiler does not reject, still no
advantage over dynamic model.

It rejects code because doing code generation on the dynamic case is
significantly harder and more resource intensive. Could that be built
in? Sure. But it's a very significant difference from generating the
static code.

And even if we generated native code for the dynamic code, it would
still need variants, and hence ZPP at runtime. Hence the static code
has a significant performance benefit in that we can indeed bypass
type checks as shown in the PECL example a few messages up (more than
a few).

Actually, in this case, the int cast does tell us something. It says
that the result (truncation) is explicitly wanted. Not to the compiler
(tho that happens), but to the developer.

No, it doesn't say that in this case. The developer didn't actually want
truncation. They just wanted to call foo(). You forced them to use
truncation because that's the only way to call foo() in your compiler.
They said it's ok since truncation is over value that is int anyway, and
they are true - except when it stops to be true in the future. That
generates brittle code because it forces the developer to take risks
they otherwise wouldn't take - such as use much stronger forced
conversions instead of more appropriate dynamic ones.

Look at the RFC that Zeev proposed:
https://wiki.php.net/rfc/coercive_sth#user-land_additions

Passing a float to an integer parameter would result in a runtime
E_RECOVERABLE_ERROR if the float has "dataloss".

So in the case I cited: foo($someint / 2), that will generate an
E_RECOVERABLE_ERROR in Zeev's proposal, as well as in the static
typing mode of mine.

Hence to say "casts are needed" is a bit over-stating this proposal...

With coercive typing as proposed in Ze'ev's RFC, that would need to
happen anyway. In both proposals that would generate a runtime error.

No, it wouldn't need to happen since no-DL conversion is allowed.

Sure it would. 3/2 is 1.5. Which would fatal if I passed it to
foo(int) under Zeev's RFC. Because of data loss.

The difference is, with strict types, we can detect the error ahead of
time and warn about it.

Static analyzer can warn about it regardless of type model. The only
difference in strict model is that when compiling - not ahead of time,
but in runtime - it would produce hard error even in case of even
number, which can work just fine without it.

This very particular case, yes, because of the simplicity of the types
involved. But with strict typing you only need to look at 1 success
case, but with coercive typing you need to look at many more.

Also, in many (I'd argue most) cases coercive has to either issue a
warning (it doesn't know) or error on valid and functioning code.
Example:

function isdivisibleby2(string $foo): bool {
if (preg_match('(\D)', $foo)) {
return false;
}
return 0 == ($int % 2);
}

function something2(string $foo): int {
if (!isdivisibleby2($foo)) {
return 10;
}
return foo($foo / 2);
}

This code would never raise a runtime error in Zeev's coercive
proposal. However, when looking at it statically, you cant tell
(unless you've got a regex decompiler).

So static analysis on dynamic types will either error on valid code,
or not error on invalid code (and I'm not even talking about the
halting problem here).

Whereas with strict typing, the error would appear in both cases
(static and runtime). And you could fix it.

In this precise example there is none, because division is not type

That's what I am saying - if the code runs, there's no difference. The
only difference your model runs less code, and forces (or, rather,
strongly incentivizes) people to wrote more dangerous one because some
of the non-dangerous one is not allowed.

More dangerous?

stable (it depends on the values of its arguments). Let's take a
different example

function foo(float $something): int {
return $something + 0.5;
}

With coercive types, you can't tell ahead of time if that will error
or not. With static types, you can.

I'm not sure what this proves. Yes, of course there are cases where
strict typing (please let's not confuse it with static typing - these
are different things, static typing is when everything's type is known
in advance and this is not happening in PHP, that's kind of the whole
point) would disallow some code that dynamic typing allows. Nobody
argues with that. What I am arguing with is that this difference is
somehow useful - especially for JIT optimizations.

I've shown it a few times in this thread. So far nobody has said "not
possible" to the code sample I showed above. But I'll quote it here
again:

PHP_FUNCTION(test_strict) {
zend_bool valid_return = 0;
if (!zend_parse_parameters(...)) {
return;
}
internal_test_strict(&valid_return);
}

void internal_test_strict(zend_bool *valid_return) {
//outer_code
zend_bool foo_valid = 0;
internal_strict_foo(x, &foo_valid);
if (!foo_valid) {
throw_error();
}
}

That has a demonstrable performance benefit. And while it may be
possible with a limited subset of dynamic types, the analyzer is
significantly harder to build (and uses more resources) to determine
the types as effectively as you'd need to with strict types.

No, I was talking about trying to do the same trick (using native
function calls) with coercive types.

I'm not sure what you are comparing to what. You provide some code and
say "in my compiler, this code A would not work, while in dynamic model
it would. Instead, you should write code B. This code B would run faster
in my compiler". But that is not a proof your compiler is better!
Because code B would also run faster in dynamic model, and in addition,
code A would also run (though indeed not faster than B).

Actually, no. Coercive as proposed by Ze'ev would cause 8.5 to error
if passed to an int type hint. So you'd need the cast there as well.
Either that, or error at runtime as well.

We were talking about the case where the argument was even, you must
have missed that part. If the argument is not even, indeed both models
would produce the same error, no difference there. The only difference
in your model vs. dynamic model so far is that you forced the developer
to do manual (int) instead of doing much smarter coercive check on
entrance of foo(). There's no performance improvement in that and
there's reliability decrease.

Unless the static analyzer can prove that it's even, then it's still
an error. That's the point of a type system.

Today, you write the code and assume it's always even. Great. Everything works.

Tomorrow, another dev on your team hooks into your code and calls that
function with an odd integer.

Now you're getting a type error because a precondition wasn't
expressed in code.

You want to write non-robust code, great! That's what weak/coercive
mode is for. You want the ability to have some type sanity? That's
what strict mode is for.

Hence, in both cases casts would be required. One could tell you ahead
of time where you forgot a cast, the other would wait until runtime
(when the edge-case was hit).

You imply it's always the case of "forgot" and the casts always should
be there, which is not the case - actually, as I already said, I think
this is the main defect of your model, forcing manual casts everywhere.

NOOO, don't misunderstand me. The majority of the cases of a type
mismatch indicate that you're doing something wrong. In fact, I'd
argue there are only 2 reasons to use an explicit cast:

Being explicit that you want to drop precision: $x / $y
Because of internal functions returning improper types: floor($x + $y)

The majority of type errors, the correct action is to not add a cast
but to fix the types.

Look at basically every other typed language. Do you see casts
everywhere? No. Because type errors are fixed. Not just blindly
casted.

OK, so I got it right and the "JIT advantage" was in fact not in JIT
working better but in static analyzer forcing people to write the code
that looks more JIT-friendly by rejecting some of the code that does
what developer wants but the compiler/analyzer can't figure it out? But
nothing prevents you from having the same JIT-friendly code in the first
place, moreover - nothing prevents you from having static analyzer that
generates alerts on code that you think may be non-JIT friendly and let
the developer decide if that's what they want. All that does not require
strict typing at runtime in any form.

No, but it does VASTLY increase the complexity and resource
consumption of such a analyzer (dealing with coercive types).

I of course completely disagree in correctness part - I think making
people use forced conversions (which are OK with data loss in many
cases) is worse than using smarter coercive typing.

And we can disagree there. That's an opinion. Though I do think
there's enough evidence of benefits to it to not entirely dismiss it,
but that's your interpretation.

Their code is buggy today. All the static type system does is show it
to them ahead of time, rather than relying on test failures or bugs to
show it.

You seem to be talking about static type system, but PHP has no static
type system and strict typing of function parameters does not introduce
one.

And to be fair I haven't really been talking generic JIT, but generic
AOT (which can include local-JIT compilation).

Even for AOT, I don't see any advantage for strict typing on the same
code. The only difference is that strict AOT compiler would reject some
code and some of that code may be non-JIT-friendly. On accepted code,
again, I see no difference.

I showed you several. I'm not going to go in circles because we have a
failure in communication.

Anthony

10 years ago by Stanislav Malyshev — view source

unread

Hi!

It rejects code because doing code generation on the dynamic case is
significantly harder and more resource intensive. Could that be built
in? Sure. But it's a very significant difference from generating the
static code.

I can appreciate that. Dynamic typing is hard to translate into
statically typed code efficiently. But I don't see how that is related
to PHP having strict types - surely even strict types do not make PHP
statically typed, in fact, I don't see how they improve much - so far
you've shown me code examples that you compiler wouldn't handle. I
don't see not being able to handle code is an advantage. Could I see
examples of code that strict model can handle and that work better in
that model?

And even if we generated native code for the dynamic code, it would
still need variants, and hence ZPP at runtime. Hence the static code
has a significant performance benefit in that we can indeed bypass
type checks as shown in the PECL example a few messages up (more than
a few).

I don't see how you can bypass type checks unless you know the variable
types at the time of the call, from some external source or some
information you collected about the code. If you know that, you could as
well generate the same check-less code for weak/dynamic model.

Passing a float to an integer parameter would result in a runtime
E_RECOVERABLE_ERROR if the float has "dataloss".

So in the case I cited: foo($someint / 2), that will generate an
E_RECOVERABLE_ERROR in Zeev's proposal, as well as in the static
typing mode of mine.

It sounds like you've missed the part of my reply where I was saying
that I am considering the case of even numbers.

With coercive typing as proposed in Ze'ev's RFC, that would need to
happen anyway. In both proposals that would generate a runtime error.

No, it wouldn't need to happen since no-DL conversion is allowed.

Sure it would. 3/2 is 1.5. Which would fatal if I passed it to
foo(int) under Zeev's RFC. Because of data loss.

Again, you seem to miss the part where I said that we're considering a
non-DL case. For DL case, both behave the same so there's indeed no
difference (while you claimed there's some advantage for strict model?)

This very particular case, yes, because of the simplicity of the types
involved. But with strict typing you only need to look at 1 success
case, but with coercive typing you need to look at many more.

I do not see why you can ignore the fact that your assumptions about the
variable types could be wrong with strict typing. PHP is not a static
typed language, so unless you can prove definitely that the variable
absolutely can not be anything other than the prescribed type (prior to
the call), you still need to have code that accounts for the other
possibility. If you can, however, prove that, both strict and dynamic
typing would behave exactly the same!
You could, of course, build your static analyzer in a way that would
reject every code where it can not prove all types - however I hope you
understand it is not an option for PHP core?

Also, in many (I'd argue most) cases coercive has to either issue a
warning (it doesn't know) or error on valid and functioning code.
Example:

function isdivisibleby2(string $foo): bool {
if (preg_match('(\D)', $foo)) {
return false;
}
return 0 == ($int % 2);
}

function something2(string $foo): int {
if (!isdivisibleby2($foo)) {
return 10;
}
return foo($foo / 2);
}

This code would never raise a runtime error in Zeev's coercive
proposal. However, when looking at it statically, you cant tell
(unless you've got a regex decompiler).
So static analysis on dynamic types will either error on valid code,
or not error on invalid code (and I'm not even talking about the
halting problem here).

True, but PHP is built on dynamic types, and neither proposal changes
that. So you either propose to make PHP fully statically typed (which I
hope you do not) or say static analysis is not perfect - which I
wholeheartedly agree, but then again CS is full of unsolvable problems,
and static analysis is, unfortunately, reduceable ultimately to one of
them, so no wonder here. The same case would, of course, be true with
strict and non-strict runtime typing - simply because PHP is not
statically typed.

Whereas with strict typing, the error would appear in both cases
(static and runtime). And you could fix it.

If you are saying that you can construct code, containing an error,
which will be missed by coercive typing but would fail (not necessarily
because of this specific error, but because of type mismatch) with
strict typing, it is of course trivially true. But so what? This in no
way proves strict typing caught the error - to prove that, the type
failure should be causally connected to the error, in your examples it
is not.

Moreover, you somehow bring example of the code that is actually not
wrong, practically speaking (as it divides by 2 the number that is
actually divisible by 2) and say it produces an error and it is good? I
somehow miss the point of how it is a good thing.

More dangerous?

Yes, of course, explicit casts would be more dangerous since they may
hide errors, as I have shown you in one of the past emails. Explicit
casts are much more powerful than implicit ones, and thus are more
dangerous if used inappropriately (such as to override type system that
prevents one from doing the right thing).

I've shown it a few times in this thread. So far nobody has said "not
possible" to the code sample I showed above. But I'll quote it here
again:

PHP_FUNCTION(test_strict) {
zend_bool valid_return = 0;
if (!zend_parse_parameters(...)) {
return;
}
internal_test_strict(&valid_return);
}

void internal_test_strict(zend_bool *valid_return) {
//outer_code
zend_bool foo_valid = 0;
internal_strict_foo(x, &foo_valid);
if (!foo_valid) {
throw_error();
}
}

That has a demonstrable performance benefit. And while it may be
possible with a limited subset of dynamic types, the analyzer is
significantly harder to build (and uses more resources) to determine
the types as effectively as you'd need to with strict types.

I'm not sure I understand - where exactly in this code the performance
benefit is happening? And how internal_test_strict gets the x variable?
What type it has (in C)? What ensures it is indeed of that type and the
value that corresponds to it in PHP is always of the same and not some
other type? Something is missing here.

Also the part which is missing - after all the above - is an
explanation why the same code can not be generated in coercive model.

You want to write non-robust code, great! That's what weak/coercive
mode is for. You want the ability to have some type sanity? That's
what strict mode is for.

No, in fact there's absolutely no difference in initial code for both
models with regard to odd numbers. Same for the first iteration (with
(int)). But in the final code, strict model is actually worse than
weak - because it was set up for failure by the addition of (int). It is
exactly the opposite of your claim - in your example, it is
weak/coercive model that is more robust, and (int) added by your model
is what hides the errors.

NOOO, don't misunderstand me. The majority of the cases of a type
mismatch indicate that you're doing something wrong. In fact, I'd

How do you know that? It looks like a case of circular logic - analyzer
is good because if it's says something wrong then it's something wrong.
We didn't yet prove that.

argue there are only 2 reasons to use an explicit cast:

Being explicit that you want to drop precision: $x / $y

Because of internal functions returning improper types: floor($x + $y)

These are valid reasons. But your model adds another one - "without
the cast the code just doesn't work even if I know the value is OK,
but the type does not match". That is - as I repeat again and again - is
the worst flaw of this model, that there are cases where people know the
values are right but your model doesn't and the only way to make it know
it is to use a sledgehammer - the explicit case.

The majority of type errors, the correct action is to not add a cast
but to fix the types.

Nope, at least not in dynamic language with majority string
inputs/outputs, like PHP is.

Look at basically every other typed language. Do you see casts
everywhere? No. Because type errors are fixed. Not just blindly
casted.

Actually, I do see a number of casts - i.e., when processing data from
files which contain sets of integers in Python, I'd have to do something
like x = [int(n) for n in x] all the time.
But that again does not relate to the advantage of strict typing - those
languages are fully strict typed from the start, PHP is not and
realistically you can't expect any production code to be even
majority-typed for the next 5 to 10 years.

No, but it does VASTLY increase the complexity and resource
consumption of such a analyzer (dealing with coercive types).

I'm sorry, but nothing you said so far does not provide any proof of
such VAST increase. In fact, I can not see how it matters at all. Yes,
such analyzer would not reject some code it otherwise would reject, but
that by itself is not a VAST increase in complexity. You need to account
for dynamic nature of PHP anyway, if your analyzer is going to be worth
anything.

I showed you several. I'm not going to go in circles because we have a
failure in communication.

It is true that you showed me several examples of code, however I do not
see how these examples prove any of your claims except for the trivial
one that strict typing would reject some code that dynamic typing would not.

--
Stas Malyshev
smalyshev@gmail.com

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 3:43 AM
To: Stanislav Malyshev
Cc: Zeev Suraski; Jefferson Gonzalez; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Stas,

It is still a performance advantage, because since we know the types
are stable at compile time, we can generate far more optimized code
(no variant types, native function calls, etc).

I don't see where it comes from. So far you said that your compiler
would reject some code. That doesn't generate any code, optimized or
otherwise. For the code your compiler does not reject, still no
advantage over dynamic model.

It rejects code because doing code generation on the dynamic case is
significantly harder and more resource intensive. Could that be built in?
Sure.
But it's a very significant difference from generating the static code.

I hope I demonstrated in the other email that lists how two use cases would
look with coercive type hints, the strategies and implementation of doing
the optimizations for those cases where we can infer the type in compile
time, would be similar in cost, complexity and resource consumption to the
optimizations you're talking about. Even if we keep the handling of all
other types as AOTless/JITless, it would still have performance equivalent
to the strict case, given inputs that the strict case accepts.

Either way, I'm happy we all agree that equally-efficient code can be
generated for the dynamic case, which is the point I was making in the
Coercive typing RFC. We still have the gap on whether it's truly a lot
harder and resource intensive - I don't think it is as we can do the very
same things in compile-time - but that's a smaller gap that I personally
care less about. I wanted it to be clear to everyone that we can reach the
same level of optimizations for Coercive type hints as we can for Strict.

And even if we generated native code for the dynamic code, it would still
need variants, and hence ZPP at runtime. Hence the static code has a
significant performance benefit in that we can indeed bypass type checks
as
shown in the PECL example a few messages up (more than a few).

We can only eliminate the ZPP structure during compile time if we know with
certainty what the type is. If we do, we know that for both strict type
hints and coercive type hints (i.e. we either managed to prove it's an int
in the static analyzer in the strict case, or we managed to deduce what the
type is in the coercive case). If we don't - we the ZPP structure it in
exactly the same way.

Thanks,

Zeev

10 years ago by Jefferson Gonzalez — view source

unread

We were talking about the case where the argument was even, you must
have missed that part. If the argument is not even, indeed both models
would produce the same error, no difference there. The only difference
in your model vs. dynamic model so far is that you forced the developer
to do manual (int) instead of doing much smarter coercive check on
entrance of foo(). There's no performance improvement in that and
there's reliability decrease.

How is coercive much smarter? Basically what coercive would do is
similar to what the intval(), floatval(), etc... set of functions do
with some type checking on the mix to ensure a value matches some set of
rules.

How casting (int) could be such dangerous thing? Lets take for example
this code:

echo (int) "whats cooking!";
echo intval("whats cooking");

Both statements print 0, so how is casting unsafe???

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Jefferson Gonzalez [mailto:jgmdev@gmail.com]
Sent: Monday, February 23, 2015 3:58 AM
To: Stanislav Malyshev; Anthony Ferrara
Cc: Zeev Suraski; Jefferson Gonzalez; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

How casting (int) could be such dangerous thing? Lets take for example
this
code:

echo (int) "whats cooking!";
echo intval("whats cooking");

Both statements print 0, so how is casting unsafe???

One key premise behind both strict type hinting and coercive type hinting is
that conversions that lose data, or that 'invent' data, are typically
indicators of a bug in the code.

You're right that there's no risk of a segfault or buffer overflow from the
snippets you listed. But there are fair chances that if you fed $x into
round() and it contains "whats cooking" (string), your code contains a bug.

Coercive typing allows 'sensible' conversions to take place, so that if you
pass "35.7" (string) to round() it will be accepted without a problem.
Strict typing will disallow any input that is not of the exact type that the
function expects, so in strict mode, round() will reject it. The point that
was raised by Stas and others is that this is likely to push the user to
explicitly cast the string to float; Which from that point onwards, happily
accept "whats cooking", keeping the likely bug undetected.

Zeev

10 years ago by Jefferson Gonzalez — view source

unread

One key premise behind both strict type hinting and coercive type hinting is
that conversions that lose data, or that 'invent' data, are typically
indicators of a bug in the code.

You're right that there's no risk of a segfault or buffer overflow from the
snippets you listed. But there are fair chances that if you fed $x into
round() and it contains "whats cooking" (string), your code contains a bug.

Coercive typing allows 'sensible' conversions to take place, so that if you
pass "35.7" (string) to round() it will be accepted without a problem.
Strict typing will disallow any input that is not of the exact type that the
function expects, so in strict mode, round() will reject it. The point that
was raised by Stas and others is that this is likely to push the user to
explicitly cast the string to float; Which from that point onwards, happily
accept "whats cooking", keeping the likely bug undetected.

Thats true, but I think where most problems will rise is when dealing
with user input, example:

Good url
myurl.com/?id=10

Bad url
myurl.com/?id=somehing+else

So in the url example neither coercive or strict are safe, IMHO you as a
developer should analyze the input and decide what to do if the value
isn't of an expected type.

On strict you as a developer decide if casting is an accepted behavior,
like when dealing with database output which may return values as
string, or reading from config files, but you know the value is (int)
compatible, so the casting is safe. Besides, in the v0.5 STH RFC the
strict mode is optional.

I think both RFC's should join, dual mode coercive/strict :), but I
guess that will not be possible until Anthony convinces the coercive
camp how strict could be used to do better optimizations. Unless it
happens the other way around and is proved with code/patches that same
level of optimizations can be reached with coercive.

Anyway I just hope for scalar type hints, not just to improve code
reliability, but also to gain some performance out of it. At the end I
wish the best option is implemented since this is a really impacting
feature for the future of the language.

10 years ago by Joe Watkins — view source

unread

Zeev,

I missed the initial replies to this, just had a quick read through (of

the kind you have first thing on a Monday morning).

Essentially the problem is this:

> My point is that we can do the very same optimizations with coercive

types as well - basically, that there is no delta.

The problem is that an implementation that has to care about coercion,
one that has to care about nonsense input, has to generate considerably
more complex code, to manage that input or perform those coercions.

You may be able to perform some of the same optimizations, but the code
executed at runtime is going to be more complex to execute and to generate,
that is obviously true.

The JIT Zend had needed to generate similarly complicated code and it
was not faster than PHP, in the real world, we are told.

I'm not sure what you are arguing about, there is no point in generating
any code that is not faster than Zend, and Zend just got a bunch faster.

I'm not sure if I have said the right words to make you see the other
point of view, if I haven't then I don't know them.

Cheers
Joe

On Mon, Feb 23, 2015 at 2:59 AM, Jefferson Gonzalez jgmdev@gmail.com
wrote:

One key premise behind both strict type hinting and coercive type hinting
is
that conversions that lose data, or that 'invent' data, are typically
indicators of a bug in the code.

You're right that there's no risk of a segfault or buffer overflow from
the
snippets you listed. But there are fair chances that if you fed $x into
round() and it contains "whats cooking" (string), your code contains a
bug.

Coercive typing allows 'sensible' conversions to take place, so that if
you
pass "35.7" (string) to round() it will be accepted without a problem.
Strict typing will disallow any input that is not of the exact type that
the
function expects, so in strict mode, round() will reject it. The point
that
was raised by Stas and others is that this is likely to push the user to
explicitly cast the string to float; Which from that point onwards,
happily
accept "whats cooking", keeping the likely bug undetected.

Thats true, but I think where most problems will rise is when dealing with
user input, example:

Good url
myurl.com/?id=10

Bad url
myurl.com/?id=somehing+else

So in the url example neither coercive or strict are safe, IMHO you as a
developer should analyze the input and decide what to do if the value isn't
of an expected type.

On strict you as a developer decide if casting is an accepted behavior,
like when dealing with database output which may return values as string,
or reading from config files, but you know the value is (int) compatible,
so the casting is safe. Besides, in the v0.5 STH RFC the strict mode is
optional.

I think both RFC's should join, dual mode coercive/strict :), but I guess
that will not be possible until Anthony convinces the coercive camp how
strict could be used to do better optimizations. Unless it happens the
other way around and is proved with code/patches that same level of
optimizations can be reached with coercive.

Anyway I just hope for scalar type hints, not just to improve code
reliability, but also to gain some performance out of it. At the end I wish
the best option is implemented since this is a really impacting feature for
the future of the language.

10 years ago by Robert Stoll — view source

unread

Hey all,

tl;dr

Just one point which JIT/AOT people should consider when dealing with PHP. PHP is highly dynamic and there are enough use cases which makes it impossible for a static analyser to infer types accurately without using a top type like mixed.
How would you deal with variable function calls, variable variables, reflection, dynamic includes etc.

Your inferred types would simply be wrong without using mixed. Consider the following

function foo(int $a){}
$a = 1; //can be int for sure right?
$b = "a";
$$b = "h"; //oh no, your generated code would crash
foo($a);

Maybe I am wrong and there is a possibility, if so, please let me know, would be interesting to know.

Cheers,
Robert

10 years ago by Anthony Ferrara — view source

unread

Robert,

Hey all,

tl;dr

Just one point which JIT/AOT people should consider when dealing with PHP. PHP is highly dynamic and there are enough use cases which makes it impossible for a static analyser to infer types accurately without using a top type like mixed.
How would you deal with variable function calls, variable variables, reflection, dynamic includes etc.

Your inferred types would simply be wrong without using mixed. Consider the following

function foo(int $a){}
$a = 1; //can be int for sure right?
$b = "a";
$$b = "h"; //oh no, your generated code would crash
foo($a);

Maybe I am wrong and there is a possibility, if so, please let me know, would be interesting to know.

This very specific example is easy to type. The reason is that we can
use constant propagation to know that $$b is really $a at compile
time. Hence we can reduce it to:

$a = "h";
foo($a);

And hence know at compile time that's an error.

This isn't the general case, but we can error in that case (from a
static analysis perspective at least) and say "this code is too
dynamic". In strict mode at least.

Anthony

10 years ago by Robert Stoll — view source

unread

Heya Anthony,

-----Ursprüngliche Nachricht-----
Von: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Gesendet: Montag, 23. Februar 2015 14:53
An: Robert Stoll
Cc: PHP internals
Betreff: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)

Robert,

Hey all,

tl;dr

Just one point which JIT/AOT people should consider when dealing with PHP. PHP is highly dynamic and there are enough
use cases which makes it impossible for a static analyser to infer types accurately without using a top type like mixed.
How would you deal with variable function calls, variable variables, reflection, dynamic includes etc.

Your inferred types would simply be wrong without using mixed.
Consider the following

function foo(int $a){}
$a = 1; //can be int for sure right?
$b = "a";
$$b = "h"; //oh no, your generated code would crash foo($a);

Maybe I am wrong and there is a possibility, if so, please let me know, would be interesting to know.

This very specific example is easy to type. The reason is that we can use constant propagation to know that $$b is really $a
at compile time. Hence we can reduce it to:

$a = "h";
foo($a);

[Robert Stoll]
Sure, "a" was just an example to illustrate the problem. I figured it would not be necessary to say that the value of $b can be completely unknown by the static analyser -> could come from user input, from a database, from unserialising code etc. (but probably that is what you meant with "this isn't the general case" below).

Assuming statically that $a is int or $b is string is erroneous in this context.

Another problem to illustrate that a top type or at least some form of union type is required:

function foo($x, $y, $z){
$a = 1;
if($x){
$a = "1";
}
If($y > 10){
$a = [];
}
If($z->foo() < 100){
$a = new Exception();
}
echo $a;
return $a;
}

How do you want to type $a without using a union type?

And hence know at compile time that's an error.

This isn't the general case, but we can error in that case (from a static analysis perspective at least) and say "this code is too
dynamic". In strict mode at least.

[Robert Stoll]
If you go and implement a more conservative type system than the actual dynamic type system of PHP well then... you can do whatever you want of course.
But if you do not just want to support just a limited set of PHP then you will need to include dynamic checks in many places. Or do you think that is not true?

Anthony

10 years ago by Anthony Ferrara — view source

unread

Robert,

[Robert Stoll]
Sure, "a" was just an example to illustrate the problem. I figured it would not be necessary to say that the value of $b can be completely unknown by the static analyser -> could come from user input, from a database, from unserialising code etc. (but probably that is what you meant with "this isn't the general case" below).

Assuming statically that $a is int or $b is string is erroneous in this context.

Another problem to illustrate that a top type or at least some form of union type is required:

function foo($x, $y, $z){
$a = 1;
if($x){
$a = "1";
}
If($y > 10){
$a = [];
}
If($z->foo() < 100){
$a = new Exception();
}
echo $a;
return $a;
}

How do you want to type $a without using a union type?

Actually, this case is reasonably easy to handle. There's a
representation called SSA (Static-Single-Assignment) that you move
code to prior to doing type analysis. Basically, at a really high
level, it would rewrite the code to this:

Where Φ is a function that chooses the value based on the branch of
the graph that entered it.

There are a few ways to implement it in practice, one would be to
generate a variant. But another would be to generate different code
paths. Considering that $a5 will be the result if $z->foo() < 1000 no
matter what the prior conditionals are, you could invert the code to
push that check first, making it:

function foo($x, $y, $z) {
if ($z->foo() < 1000) {
$a = new Exception();
echo $a;
return $a;
}
if ($y > 10) {
echo [];
return [];
}
if ($x) {
echo "1";
return "1";
}
echo 1;
return 1;
}

That transform can be done by the compiler, and hence never need you
to do anything. We still compiled without variants, and the analysis
job wasn't that difficult.

There will be cases of course where this won't work. And in those
cases we could either not compile, or generate a variant.

However, I would like to point out something. If you added a return
type, and ran that code in strict mode, it would error. A static
analyzer can pick up that error and tell you about it.

So really, we're not talking about valid strict code here (tho the
same problem does exist inside strict bodies, and techniques can be
done here the same.

For more info, check out:
https://github.com/google/recki-ct/blob/master/doc/5_phi_resolving.md

And hence know at compile time that's an error.

This isn't the general case, but we can error in that case (from a static analysis perspective at least) and say "this code is too
dynamic". In strict mode at least.

[Robert Stoll]
If you go and implement a more conservative type system than the actual dynamic type system of PHP well then... you can do whatever you want of course.
But if you do not just want to support just a limited set of PHP then you will need to include dynamic checks in many places. Or do you think that is not true?

I think with strict type declarations, the 'limitations' are far less
than you'd think. Yes, there will be cases (like variable variables,
etc) where valid strict code won't be analyzable. But I haven't seen
var-vars in the wild in a while. So I think my assertion is fair: the
majority of valid strict-typed code will be analyzable. Where the
majority of coercive won't be.

Anthony

10 years ago by Robert Stoll — view source

unread

Hey Anthony,

-----Ursprüngliche Nachricht-----
Von: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Gesendet: Montag, 23. Februar 2015 15:28
An: Robert Stoll
Cc: PHP internals
Betreff: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints RFC)

Robert,

[Robert Stoll]
Sure, "a" was just an example to illustrate the problem. I figured it would not be necessary to say that the value of $b can
be completely unknown by the static analyser -> could come from user input, from a database, from unserialising code etc.
(but probably that is what you meant with "this isn't the general case" below).

Assuming statically that $a is int or $b is string is erroneous in this context.

Another problem to illustrate that a top type or at least some form of union type is required:

function foo($x, $y, $z){
$a = 1;
if($x){
$a = "1";
}
If($y > 10){
$a = [];
}
If($z->foo() < 100){
$a = new Exception();
}
echo $a;
return $a;
}

How do you want to type $a without using a union type?

Actually, this case is reasonably easy to handle. There's a representation called SSA (Static-Single-Assignment) that you
move code to prior to doing type analysis. Basically, at a really high level, it would rewrite the code to this:

function foo($x, $y, $z){
$a = 1;
if($x){
$a1 = "1";
}
$a2 = Φ($a, $a1);
If($y > 10){
$a3 = [];
}
$a4 = Φ($a2, $a3);
If($z->foo() < 100){
$a5 = new Exception();
}
$a6 = Φ($a4, $a5);
echo $a6;
return $a6;
}

Where Φ is a function that chooses the value based on the branch of the graph that entered it.
There are a few ways to implement it in practice, one would be to generate a variant.

[Robert Stoll]
I am aware of SSA but it does not enable you to by-pass union types (sure in certain cases as below but that is it).
Maybe I got one of your email wrong (or did not consider the context enough) but I got the impression that you think union types would not be necessary in strict mode. But they are, due to the fact that PHP supports data polymorphism (even for variables).
There are cases where SSA is not applicable at all. Consider the above code but use $this->a instead of $a. In the end $this->a will have the type mixed in your static analyser.

But I see that you are well aware of it. Recki-CT seems to be a nice project btw. :)

But another would be to generate
different code paths. Considering that $a5 will be the result if $z->foo() < 1000 no matter what the prior conditionals are,
you could invert the code to push that check first, making it:

function foo($x, $y, $z) {
if ($z->foo() < 1000) {
$a = new Exception();
echo $a;
return $a;
}
if ($y > 10) {
echo [];
return [];
}
if ($x) {
echo "1";
return "1";
}
echo 1;
return 1;
}

That transform can be done by the compiler, and hence never need you to do anything. We still compiled without variants,
and the analysis job wasn't that difficult.

There will be cases of course where this won't work. And in those cases we could either not compile, or generate a variant.

However, I would like to point out something. If you added a return type, and ran that code in strict mode, it would error. A
static analyzer can pick up that error and tell you about it.

[Robert Stoll]
How come? You mean because PHP does not offer the possibility of union types yet or why? Specifying a return type mixed would be perfectly legal here even in strict mode.

So really, we're not talking about valid strict code here (tho the same problem does exist inside strict bodies, and
techniques can be done here the same.

[Robert Stoll]
I get more and more the impression that you talk about strong typing here or a language which is statically typed instead of the strict mode as presented in your v0.5 RFC or the coercive RFC as in this discussion.
Or is this discussion about dynamic vs. static and I missed it?

For more info, check out:
https://github.com/google/recki-ct/blob/master/doc/5_phi_resolving.md

And hence know at compile time that's an error.

This isn't the general case, but we can error in that case (from a
static analysis perspective at least) and say "this code is too dynamic". In strict mode at least.

[Robert Stoll]
If you go and implement a more conservative type system than the actual dynamic type system of PHP well then... you
can do whatever you want of course.
But if you do not just want to support just a limited set of PHP then you will need to include dynamic checks in many
places. Or do you think that is not true?

I think with strict type declarations, the 'limitations' are far less than you'd think. Yes, there will be cases (like variable
variables,
etc) where valid strict code won't be analyzable. But I haven't seen var-vars in the wild in a while. So I think my assertion is
fair: the majority of valid strict-typed code will be analyzable. Where the majority of coercive won't be.

[Robert Stoll]
I agree with your first sentence but only in the case that your "strict type declarations" include somehow polymorphic declarations as well. Otherwise I would need to object vehement. Consider the following:

$b = 1 + 2;
$a = [1] + [2];

If you would claim that + can only have one strict type (num -> num) for instance, then no! Strict-typed code in this sense is not able to analyse most valid strict-typed code.
But I guess you used an unfavourable wording and hence I do not go into more arguments here.

As for your second sentence. Analysable are both variants. The coercive one will just need more effort and computer power - maybe something which you consider to be not feasible. Fair enough, but the claim is wrong.
It is really not a question about strict-typed or coercive. From a static analyser point of view it can still predict the types equally well (as I said with lot more effort) - it only gets more complicated for the developer of the static analyser since she has to take implicit conversions into account as well when resolving a function, an overload respectively, has to propagate the implicit conversions forward etc. It is more complicated but doable. Of course, the resulting type system is uglier (and complex) than a type system for a strict-typed version and way uglier than a strongly and strict typed version (e.g. Haskell's type system as paragon).

I haven't seen variable variables in the wild for a while either. Yet, variable function calls are quite common and dynamic includes are part of almost all template system in know.

So in the end it really matters what a static analyser wants to achieve (support all features or only a sub-feature) and if it wants to remain compatible with PHP without changing behaviour (even if only a subset is supported)
Your comparisons with Recki-CT are only helpful to a certain degree (they are very helpful, do not get me wrong). Yet, Recki-CT does not have the same behaviour as PHP. Consider the following:

function foo(int $x){}
$a = 1;
if($x){ //$x is always false
$a = 1 / $y;
}
echo $a;
foo($a);

Is translated according to the docs of Recki-CT to the following after SSA (annotated with types)

function foo(int $x){}
double $a = 1.0
if($x){ //
double $a1 = 1 / $y;
}
double $a2 = Φ($a1, $a);
echo $a2; //will no longer be 1 but 1.0 if the if branch was not taken, behaviour was changed by the compiler
foo($a2); //will result in an error of course

Btw. How do you type echo in Recki-CT? I guess you use conversions here as well ;-) or are object with __toString() not supported any longer?

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Joe Watkins [mailto:pthreads@pthreads.org]
Sent: Monday, February 23, 2015 10:03 AM
To: Jefferson Gonzalez
Cc: Zeev Suraski; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,
I missed the initial replies to this, just had a quick read through
(of the kind
you have first thing on a Monday morning).
Essentially the problem is this:

> My point is that we can do the very same optimizations with coercive
types as well - basically, that there is no delta.

The problem is that an implementation that has to care about coercion,
one that has to care about nonsense input, has to generate considerably
more complex code, to manage that input or perform those coercions.

Not necessarily. If you can infer the type with confidence, you can do away
with coercion code altogether. If you can't, then you're not obligated to
generate optimized paths for every possible input - you can just defer to
the relevant coerce() function. It's very similar to what you'd be able to
do with strict, except instead of calling coerce(), it would run a runtime
type check that errors out on failure. The difference is that of
functionality, not performance.

You may be able to perform some of the same optimizations, but the code
executed at runtime is going to be more complex to execute and to
generate, that is obviously true.

Again, not necessarily. We shouldn't assume we'll generate machine code for
all the possible flows. The code generated would be very similar in both
strict and coercive cases.

The JIT Zend had needed to generate similarly complicated code and it
was not faster than PHP, in the real world, we are told.

It depends on what kind of real world we're talking about here. For number
crunching, it's super-fast with very tangible gains, and that's an
understatement. Not sure if you saw my email from yesterday, but it shows
25x (2500%, no mistake) gains on bench.php, with zero changes to the code,
and obviously, without type hints, strict or dynamic. With web apps we
tested, there wasn't a tangible difference, but that doesn't (yet) mean
much. We stopped most of the work on it when the PHPNG effort began. It
may or may not be possible to squeeze more performance using JIT out of
these apps, we won't know until we try again.

I'm not sure what you are arguing about, there is no point in
generating any
code that is not faster than Zend, and Zend just got a bunch faster.

JIT might end up being about making PHP viable for more CPU-bound use cases,
rather than speeding up existing Web workloads. But it's way too early to
conclude that.

Thanks,

Zeev

10 years ago by Joe Watkins — view source

unread

Zeev,

If you can infer the type with confidence, you can do away with coercion
code altogether.

Maybe I'm ignorant of something, but isn't the only way you can begin to
infer the type with real confidence is by having strict typed parameters ?

This sounds like the kind of strict mode we're talking about, where no
coercion is necessary because inference is so reliable given a good
starting place (function entry with strictly typed parameters).

If you can't, then you're not obligated to generate optimized paths for
every possible input - you can just defer to the relevant coerce() function.

If the parameters aren't strict but are typed then you need to travel
coercion code paths somewhere, optimized inline, calling an API function,
it makes no real difference.

I guess we are hedging our bets that having to travel those paths will suck
away so much performance, that it will make all of the effort required to
make any of this a reality seem, somehow ... wasted.

JIT might end up being about making PHP viable for more CPU-bound use
cases ...

This is probably quite realistic.

In case anyone is reading and thinks I'm using any of this to justify dual
mode, I'm not doing that. The original RFC justified it well enough, I just
happen to disagree with some of the assertions made in this thread and or
RFC.

Cheers
Joe

-----Original Message-----
From: Joe Watkins [mailto:pthreads@pthreads.org]
Sent: Monday, February 23, 2015 10:03 AM
To: Jefferson Gonzalez
Cc: Zeev Suraski; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,
I missed the initial replies to this, just had a quick read through
(of the kind
you have first thing on a Monday morning).
Essentially the problem is this:

> My point is that we can do the very same optimizations with
coercive

types as well - basically, that there is no delta.

The problem is that an implementation that has to care about coercion,
one that has to care about nonsense input, has to generate considerably
more complex code, to manage that input or perform those coercions.

Not necessarily. If you can infer the type with confidence, you can do
away
with coercion code altogether. If you can't, then you're not obligated to
generate optimized paths for every possible input - you can just defer to
the relevant coerce() function. It's very similar to what you'd be able to
do with strict, except instead of calling coerce(), it would run a runtime
type check that errors out on failure. The difference is that of
functionality, not performance.

You may be able to perform some of the same optimizations, but the
code
executed at runtime is going to be more complex to execute and to
generate, that is obviously true.

Again, not necessarily. We shouldn't assume we'll generate machine code
for
all the possible flows. The code generated would be very similar in both
strict and coercive cases.

The JIT Zend had needed to generate similarly complicated code and it
was not faster than PHP, in the real world, we are told.

It depends on what kind of real world we're talking about here. For number
crunching, it's super-fast with very tangible gains, and that's an
understatement. Not sure if you saw my email from yesterday, but it shows
25x (2500%, no mistake) gains on bench.php, with zero changes to the code,
and obviously, without type hints, strict or dynamic. With web apps we
tested, there wasn't a tangible difference, but that doesn't (yet) mean
much. We stopped most of the work on it when the PHPNG effort began. It
may or may not be possible to squeeze more performance using JIT out of
these apps, we won't know until we try again.

I'm not sure what you are arguing about, there is no point in
generating any
code that is not faster than Zend, and Zend just got a bunch faster.

JIT might end up being about making PHP viable for more CPU-bound use
cases,
rather than speeding up existing Web workloads. But it's way too early to
conclude that.

Thanks,

Zeev

10 years ago by pajousek@gmail.com — view source

unread

Zeev,

If you can infer the type with confidence, you can do away with coercion
code altogether.

Maybe I'm ignorant of something, but isn't the only way you can begin to
infer the type with real confidence is by having strict typed parameters ?

This sounds like the kind of strict mode we're talking about, where no
coercion is necessary because inference is so reliable given a good
starting place (function entry with strictly typed parameters).

If you can't, then you're not obligated to generate optimized paths for
every possible input - you can just defer to the relevant coerce() function.

If the parameters aren't strict but are typed then you need to travel
coercion code paths somewhere, optimized inline, calling an API function,
it makes no real difference.

I guess we are hedging our bets that having to travel those paths will suck
away so much performance, that it will make all of the effort required to
make any of this a reality seem, somehow ... wasted.

JIT might end up being about making PHP viable for more CPU-bound use
cases ...

This is probably quite realistic.

In case anyone is reading and thinks I'm using any of this to justify dual
mode, I'm not doing that. The original RFC justified it well enough, I just
happen to disagree with some of the assertions made in this thread and or
RFC.

Hello,

if I understand the issue correctly, not even strongly typed
parameters won't help you infer types of variables with real
confidence?

Imagine this piece of code:
function foo(int $x) { $x = fnThatReturnsRandomScalarType(); return $x; }

Regards
Pavel Kouril

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Pavel Kouřil [mailto:pajousek@gmail.com]
Sent: Monday, February 23, 2015 11:49 AM
To: Joe Watkins
Cc: Zeev Suraski; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

On Mon, Feb 23, 2015 at 10:28 AM, Joe Watkins pthreads@pthreads.org
wrote:

Zeev,

If you can infer the type with confidence, you can do away with
coercion
code altogether.

Maybe I'm ignorant of something, but isn't the only way you can begin
to infer the type with real confidence is by having strict typed
parameters ?

This sounds like the kind of strict mode we're talking about, where no
coercion is necessary because inference is so reliable given a good
starting place (function entry with strictly typed parameters).

If you can't, then you're not obligated to generate optimized paths
for
every possible input - you can just defer to the relevant coerce()
function.

If the parameters aren't strict but are typed then you need to travel
coercion code paths somewhere, optimized inline, calling an API
function, it makes no real difference.

I guess we are hedging our bets that having to travel those paths will
suck away so much performance, that it will make all of the effort
required to make any of this a reality seem, somehow ... wasted.

JIT might end up being about making PHP viable for more CPU-bound use
cases ...

This is probably quite realistic.

In case anyone is reading and thinks I'm using any of this to justify
dual mode, I'm not doing that. The original RFC justified it well
enough, I just happen to disagree with some of the assertions made in
this thread and or RFC.

Hello,

if I understand the issue correctly, not even strongly typed parameters
won't
help you infer types of variables with real confidence?

Imagine this piece of code:
function foo(int $x) { $x = fnThatReturnsRandomScalarType(); return $x; }

Absolutely correct.

On the flip side, you can very often infer type without any type hints at
all. In reality we'd have to rely on that quite heavily, since code is not
only about - or even mostly about - function arguments. There are local
variables, function arguments can be reassigned and change type (as you
illustrated), etc. For example, from bench.php:

function simplecall() {
for ($i = 0; $i < 1000000; $i++)
strlen("hallo");
}

You can infer with full confidence that $i is an int without having any type
hint information for it. This is actually a remarkably common case.

Zeev

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Joe Watkins [mailto:pthreads@pthreads.org]
Sent: Monday, February 23, 2015 11:29 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,

If you can infer the type with confidence, you can do away with
coercion
code altogether.

Maybe I'm ignorant of something, but isn't the only way you can begin to
infer the type with real confidence is by having strict typed parameters ?

Not at all. You can definitely infer lots of type information without any
type hints, strict or weak. bench.php wouldn't be 25x faster and Mandelbrot
would not be as fast as an equivalent C program if we couldn't infer types
without type hints. Take a look at it (Zend/bench.php) - much like you can,
as a human, figure out that $i is an integer, and that $recen is a float,
etc. - you can write code that will do that automatically (by taking a look
at the assigned values, at the operators which are used, etc.) In general.

If you haven't, please read my reply to Jefferson from yesterday:
marc.info/?l=php-internals&m=142463029832159&w=2
It illustrates the point as to why not only are strict types not necessary
to do type inference, they also don't provide you with additional data that
can help you generate better code assuming the same input source code.
The bets are off if you change your code and compare how the changed version
runs in the Strict mode and the unmodified one runs in Coercive. But if you
take the modified code and run it on both Strict and Coercive - you can
reach the same results (except for the obvious, intentional functionality
differences).

Now, there may be cases that a static analyzer - that assumes strict rules -
could prompt a developer to change their code (e.g. explicitly cast), and
it's possible that after the change - the code could be better optimized.
I'm not sure if these cases exist or not, but even if they do - after the
code change, Strict and Coercive runtime implementations would be able to
optimize the code equally well. It might actually make sense for a static
analyzer to optionally employ strict rules even if we go for the
coercive-only option - pointing the developer to where values in his code
clearly switches types due to a type hint - as some sort of a performance
tip. This would apply primarily in situations where you have values
changing types back and forth and perhaps, with some developer attention -
that could be avoided.

If you can't, then you're not obligated to generate optimized paths for
every possible input - you can just defer to the relevant coerce()
function.

If the parameters aren't strict but are typed then you need to travel
coercion
code paths somewhere, optimized inline, calling an API function, it makes
no
real difference.

Correct. But you're comparing a situation where in Strict - the code would
bail out and fail. Of course failure is faster than actually doing
something :) The question is what the developer wants to do semantically.
In most cases, with "32" (string) being passed to an integer type hinted
argument, he'd want it to convert to 32 (setting aside the question on
whether implicitly or explicitly). Which means that in strict - he'd have
to do something along the lines of an explicit cast - and similarly, it
could be optimized inline, call convert_to_long(), etc.
The root difference is that of functionality, not performance.

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

Maybe I'm ignorant of something, but isn't the only way you can begin to
infer the type with real confidence is by having strict typed parameters ?

Not at all. You can definitely infer lots of type information without any
type hints, strict or weak. bench.php wouldn't be 25x faster and Mandelbrot
would not be as fast as an equivalent C program if we couldn't infer types
without type hints. Take a look at it (Zend/bench.php) - much like you can,
as a human, figure out that $i is an integer, and that $recen is a float,
etc. - you can write code that will do that automatically (by taking a look
at the assigned values, at the operators which are used, etc.) In general.

Yes, but mandel() from bench.php isn't generic code. It is a subset of
valid code. One that wouldn't need any changes to move to strict mode.

Additionally: "would not be as fast as an equivalent C program". I
compiled mandel() using C (via PECL). It wasn't 25x faster, it was
75734.13830x faster (3000x with -O0). So no, you're way off from C
speed.

If you haven't, please read my reply to Jefferson from yesterday:
marc.info/?l=php-internals&m=142463029832159&w=2
It illustrates the point as to why not only are strict types not necessary
to do type inference, they also don't provide you with additional data that
can help you generate better code assuming the same input source code.
The bets are off if you change your code and compare how the changed version
runs in the Strict mode and the unmodified one runs in Coercive. But if you
take the modified code and run it on both Strict and Coercive - you can
reach the same results (except for the obvious, intentional functionality
differences).

Again, please make it clear to people that you're talking about
special cases of userland code, where the rest of us are talking about
general cases.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 4:14 PM
To: Zeev Suraski
Cc: Joe Watkins; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,

Maybe I'm ignorant of something, but isn't the only way you can begin
to infer the type with real confidence is by having strict typed
parameters
?

Not at all. You can definitely infer lots of type information without
any type hints, strict or weak. bench.php wouldn't be 25x faster and
Mandelbrot would not be as fast as an equivalent C program if we
couldn't infer types without type hints. Take a look at it
(Zend/bench.php) - much like you can, as a human, figure out that $i
is an integer, and that $recen is a float, etc. - you can write code
that will do that automatically (by taking a look at the assigned
values, at
the operators which are used, etc.) In general.

Yes, but mandel() from bench.php isn't generic code. It is a subset of
valid
code. One that wouldn't need any changes to move to strict mode.

I'm not sure what that means. mandel() is an example in the very same way
that foo() and bar() you presented were examples. It also happens to be a
real world example showing Joe that not only is it possible to do type
inference without strict typing (who asked " isn't the only way you can
begin to infer the type with real confidence is by having strict typed
parameters?") - but it's possible to do a great job at it, too.

Additionally: "would not be as fast as an equivalent C program". I
compiled
mandel() using C (via PECL). It wasn't 25x faster, it was 75734.13830x
faster
(3000x with -O0). So no, you're way off from C speed.

Those were our results:
PHPNG-JIT (JIT=on) 0.011 (25 times faster than php-7)
gcc -O2 (4.9.2) 0.013
gcc -O0 (4.9.2) 0.022
PHP-7 0.281 (15 times faster than php-5.0)
PHP-5.6 0.379
PHP-5.5 0.383
PHP-5.4 0.406
PHP-5.3 0.855
PHP-5.2 1.096
PHP-5.1 1.217
PHP-4.4 4.209
PHP-5.0 4.434

Something broken in your setup. BTW, I'm not claiming our JIT is faster
than gcc. It's at the margin of error level.

If you haven't, please read my reply to Jefferson from yesterday:
marc.info/?l=php-internals&m=142463029832159&w=2
It illustrates the point as to why not only are strict types not
necessary to do type inference, they also don't provide you with
additional data that can help you generate better code assuming the
same input source code.
The bets are off if you change your code and compare how the changed
version runs in the Strict mode and the unmodified one runs in
Coercive. But if you take the modified code and run it on both Strict
and Coercive - you can reach the same results (except for the obvious,
intentional functionality differences).

Again, please make it clear to people that you're talking about special
cases
of userland code, where the rest of us are talking about general cases.

But I'm not. We are all talking about general cases. The code snippets in
the email I sent to Jefferson are as generic as they could possibly be,
allowing you to fill the code blocks with anything to your heart's content -
the outcome would not change.

We're in an endless repetitive ping pong here, so I'm going to summarize
this thread from my POV:

For a given input source file, there is zero difference in terms of
AOT/JIT between Strict mode and Coercive mode. In both cases, the level of
type inference you can do is identical - and cases where you can determine
the type of the arguments being passed conclusively can turn into a C-style
call instead of a PHP call. Cases where you cannot determine that would be
conceptually similar between Strict and Coercive, except Strict would result
in a failure in case it detects the wrong type at runtime, while Coercive
may succeed. Intended functional difference, no performance difference.
Since Strict rejects any wrongly typed value, in cases where you can
determine conclusively that the wrong type may make it into a certain
function, you can alert to that fact, prompting the developer to change it.
Depending on the nature of the change, it might result in the ability to
better optimize the code, although so far, there has been no concrete
example of that shown (i.e., an example where prompting the user about a
potentially invalid code, that would prompt a change, that would result in
the ability to generate faster code than the original code running under
coercive typing). I think these cases are rare but do exist - primarily in
situations where you have the same value repeatedly flipping back and forth
between types across function boundaries (e.g. a string converting to float
and back to string).
The difference between Strict and Coercive modes is exclusively that of
function, not performance. Whatever performance gains one may be able to
gain, if any, will result from code changes - not from him choosing Strict
STH. The changed code would have identical result whether they're run on
Strict STH or Coercive STH.
As a direct result of #3, one could use the very same static analyzer
and emit the very same messages - except conversions which may succeed (like
string->int) may be flagged as warnings instead of errors. People who want
to minimize implicit conversions will be able to do so, to the exact same
level as your static analyzer could warn them if they were running in Strict
mode. That is, assuming there are any performance gains to be had from such
explicit conversions.

I think we've all spent enough time on that topic, and the academic
discussion has run its course. I find myself restating to the same
principals again and again, and so far no counterproofs have been shown to
refute any of them. If you believe you come up with any counterproofs to
any of the above points - code ones, not theoretical ones - please share
them. I'm going to focus on the actual RFC from this point onwards.

Thanks,

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

As this is going in circles, this will be my last post here (it seems
we feel the same in this regards):

Those were our results:
PHPNG-JIT (JIT=on) 0.011 (25 times faster than php-7)
gcc -O2 (4.9.2) 0.013
gcc -O0 (4.9.2) 0.022
PHP-7 0.281 (15 times faster than php-5.0)
PHP-5.6 0.379
PHP-5.5 0.383
PHP-5.4 0.406
PHP-5.3 0.855
PHP-5.2 1.096
PHP-5.1 1.217
PHP-4.4 4.209
PHP-5.0 4.434

Something broken in your setup. BTW, I'm not claiming our JIT is faster
than gcc. It's at the margin of error level.

I thought something was broken too, but it turns out it was just GCC's
optimizer doing what it was supposed to do. Eliminate dead code
(meaning code that has no side effects and doesn't affect the
outcome).

We're in an endless repetitive ping pong here, so I'm going to summarize
this thread from my POV:

For a given input source file, there is zero difference in terms of
AOT/JIT between Strict mode and Coercive mode. In both cases, the level of
type inference you can do is identical - and cases where you can determine
the type of the arguments being passed conclusively can turn into a C-style
call instead of a PHP call. Cases where you cannot determine that would be
conceptually similar between Strict and Coercive, except Strict would result
in a failure in case it detects the wrong type at runtime, while Coercive
may succeed. Intended functional difference, no performance difference.

For a given arbitrary source file, yes you are correct.

For a given source code which will not produce errors at runtime, you
are not correct. Because coercive will let through things that a
static analyzer would error at (the code wouldn't be valid in strict
mode). Therefore, the level of type inference you can do is different.

You can say that for any correct strict file, you can run it in
coercive mode and have the same benefits. But the reverse is not true.
Valid coercive code is not necessarily valid strict code. So the
assumptions and analysis you can do on strict code doesn't apply in
the general case to coercive code.

Hence the point I've been trying to make the entire time.

Since Strict rejects any wrongly typed value, in cases where you can
determine conclusively that the wrong type may make it into a certain
function, you can alert to that fact, prompting the developer to change it.
Depending on the nature of the change, it might result in the ability to
better optimize the code, although so far, there has been no concrete
example of that shown (i.e., an example where prompting the user about a
potentially invalid code, that would prompt a change, that would result in
the ability to generate faster code than the original code running under
coercive typing). I think these cases are rare but do exist - primarily in
situations where you have the same value repeatedly flipping back and forth
between types across function boundaries (e.g. a string converting to float
and back to string).

Again, it comes down to analyzability. For an arbitrary valid
(non-error-producing) source file, strict mode will allow more
optimizations. Not because for a specific example it can do more, but
it can do so for a far greater percentage of the code in strict mode.

The difference between Strict and Coercive modes is exclusively that of
function, not performance. Whatever performance gains one may be able to
gain, if any, will result from code changes - not from him choosing Strict
STH. The changed code would have identical result whether they're run on
Strict STH or Coercive STH.

Again, for very specific examples, yes. For the general case of both, no.

As a direct result of #3, one could use the very same static analyzer
and emit the very same messages - except conversions which may succeed (like
string->int) may be flagged as warnings instead of errors. People who want
to minimize implicit conversions will be able to do so, to the exact same
level as your static analyzer could warn them if they were running in Strict
mode. That is, assuming there are any performance gains to be had from such
explicit conversions.

And the same can be done today without any type hints. For a very
significant subset of code.

With strict mode, the vast majority of valid code will be analyzable
(and hence optimizable). Without it, a much smaller subset can be.

I think we've all spent enough time on that topic, and the academic
discussion has run its course. I find myself restating to the same
principals again and again, and so far no counterproofs have been shown to
refute any of them. If you believe you come up with any counterproofs to
any of the above points - code ones, not theoretical ones - please share
them. I'm going to focus on the actual RFC from this point onwards.

I've shown what I believe are counterproofs. I've given example after
example to show where valid (never producing an error at runtime in
coercive mode) code is ambiguous to a static analyzer. And I've shown
that to make it unambiguous is akin to applying the restrictions that
strict mode enforces. Therefore, coercive is not the same as strict
mode in terms of analyzability.

But since we're going in circles, I would simply ask that you remove
any reference to the topics from your RFC. It's contentions enough as
it is, and your presentation of only one side is not only unfair, but
also misleading.

Anthony

10 years ago by Pierre Joye — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 4:14 PM
To: Zeev Suraski
Cc: Joe Watkins; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,

Maybe I'm ignorant of something, but isn't the only way you can begin
to infer the type with real confidence is by having strict typed
parameters
?

Not at all. You can definitely infer lots of type information without
any type hints, strict or weak. bench.php wouldn't be 25x faster and
Mandelbrot would not be as fast as an equivalent C program if we
couldn't infer types without type hints. Take a look at it
(Zend/bench.php) - much like you can, as a human, figure out that $i
is an integer, and that $recen is a float, etc. - you can write code
that will do that automatically (by taking a look at the assigned
values, at
the operators which are used, etc.) In general.

Yes, but mandel() from bench.php isn't generic code. It is a subset of
valid
code. One that wouldn't need any changes to move to strict mode.

I'm not sure what that means. mandel() is an example in the very same way
that foo() and bar() you presented were examples. It also happens to be a
real world example showing Joe that not only is it possible to do type
inference without strict typing (who asked " isn't the only way you can
begin to infer the type with real confidence is by having strict typed
parameters?") - but it's possible to do a great job at it, too.

Additionally: "would not be as fast as an equivalent C program". I
compiled
mandel() using C (via PECL). It wasn't 25x faster, it was 75734.13830x
faster
(3000x with -O0). So no, you're way off from C speed.

Those were our results:
PHPNG-JIT (JIT=on) 0.011 (25 times faster than php-7)
gcc -O2 (4.9.2) 0.013
gcc -O0 (4.9.2) 0.022
PHP-7 0.281 (15 times faster than php-5.0)
PHP-5.6 0.379
PHP-5.5 0.383
PHP-5.4 0.406
PHP-5.3 0.855
PHP-5.2 1.096
PHP-5.1 1.217
PHP-4.4 4.209
PHP-5.0 4.434

Something broken in your setup. BTW, I'm not claiming our JIT is faster
than gcc. It's at the margin of error level.

I think without actually making the code available, this specific point is
just speculation, with all respects to the work having been done.

If the goal is to have that in PHP, please publish now, not when you think
it is what it should be.

Doing so will allow other to test, provide feedback and maybe have ideas to
solve one or another issue.

10 years ago by francois@php.net — view source

unread

Hi,

De : Joe Watkins [mailto:pthreads@pthreads.org]

If you can infer the type with confidence, you can do away with coercion
code altogether.

Maybe I'm ignorant of something, but isn't the only way you can begin to
infer the type with real confidence is by having strict typed parameters ?

This sounds like the kind of strict mode we're talking about, where no
coercion is necessary because inference is so reliable given a good
starting place (function entry with strictly typed parameters).

'Weak' mode also provides full confidence on parameter zval type at function entry as they-re systematically converted. If you statically analyze PHP code starting from this point, there's no difference. The only difference is when you build the reverse tree of possible zval types in the calling environment. But, in this case, strict types just allow to restrict possible types as, in many cases, you cannot have one-to-one matching.

I hope we all agree now that, in most cases, compilers (at least static ones) and code analyzers can provide smaller code and detect more potential issues in so-called strict-typing mode. What we don't agree is the downside of the architecture needed to allow this. Making a personal decision is just putting the relative perception of both aspects in balance.

Let me be clear : if I didn't consider downsides more important than benefits, I would be all for enabling future compilers and static analyzers to be more performant. Anthony is an expert, the tools he wants to build are potentially interesting for the whole community, he knows his own needs better than anyone else. So, if I think this can be satisfied without making it worse for the rest of the world, why would I refuse, sadism ?

Unfortunately, I think we have, at least, two major downsides here ;

the need for dual-mode. Actually, if we absolutely had to authorize strict types, I would still prefer it being the only available mode. Some will claim it's FUD but others will agree that no syntax or mechanism to switch modes proposed so far was really satisfying. All were more or less unintuitive and confusing, let alone the proposal to shoehorn the concept by hacking the PHP syntax or possible ambiguities against the E_STRICT concept. Even per-file activation may seem attractive on a small codebase but would probably quickly become a nightmare on a large one. I even know people currently writing composer plugins to turn it on and off on a project basis, modifying every PHP source files to add/remove the declare directive, which proves total misunderstanding of the concept, but also that proposed mechanisms don't fit so well with user needs. I wish good luck to the fooled users who will naively use such massive editing tools to improve the quality of their software... I have tried to explain from the beginning that, in most cases, especially in software architecture, 1 + 1 is not 2, and concatenating incompatible needs in an hybrid solution is not the same as merging them in a 'compromise' unified one. I read that the dual-mode RFC unifies people's needs. No, that's exactly the opposite.
the risk of seeing people massively consider strict mode as a way to improve their code quality without understanding the implications, potentially leading to a disaster. As I already wrote, debugging is too often assimilated with a phase to suppress error messages, trusting the compiler/interpreter, and considering that each error message implies an underlying bug : "Look, the new PHP 7 strict mode is really fantastic. I ran it on my legacy software and it detected 50 new bugs. PHP 5 was really bad, to leave so many undetected !". Then, this would automatically lead to massive explicit casting, as it is also natural to think that constraining values enhances code quality, especially if it is confirmed by the suppression of an error. The point is : we must never, in any case, whatever the documentation we write to explain this, allow the possibility of false errors. Don't get me wrong, people are not more dumber and more incompetent than I am. We would be fully responsible for giving them a trap, perfectly suited to fool them. In my opinion, introducing such a false feeling of quality and security is a terrible mistake.

Regards

François

10 years ago by Jefferson Gonzalez — view source

unread

-----Original Message-----
From: Joe Watkins [mailto:pthreads@pthreads.org]
Sent: Monday, February 23, 2015 10:03 AM
To: Jefferson Gonzalez
Cc: Zeev Suraski; PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,
 I missed the initial replies to this, just had a quick read through
(of the kind
you have first thing on a Monday morning).
 Essentially the problem is this:

 > My point is that we can do the very same optimizations with coercive
types as well - basically, that there is no delta.
The problem is that an implementation that has to care about coercion,
one that has to care about nonsense input, has to generate considerably
more complex code, to manage that input or perform those coercions.
Not necessarily. If you can infer the type with confidence, you can do away
with coercion code altogether. If you can't, then you're not obligated to
generate optimized paths for every possible input - you can just defer to
the relevant coerce() function. It's very similar to what you'd be able to
do with strict, except instead of calling coerce(), it would run a runtime
type check that errors out on failure. The difference is that of
functionality, not performance.

In that case since coercive is less strict it would be needed to
duplicate the code in order to optimize, example:

function calc (int $val1, int $val2){ ... }

//This call does not requires coercive type checks
calc(1,6) -> direct call to optimized C function

//This call may require coercive checks
calc($val1, $val2) -> call to coercive implementation of the function

So in order to optimize I guess it would be needed to generate 2
versions of a function: optimized and coercive, which will increase the
required ram at runtime and even the CPU in order to gain the
performance benefits presented on strict mode.

10 years ago by Stanislav Malyshev — view source

unread

Hi!

How is coercive much smarter? Basically what coercive would do is

It can accept 2.0 but not 2.5. Explicit cast is a sledgehammer - it
would convert both to 2.

How casting (int) could be such dangerous thing? Lets take for example
this code:

echo (int) "whats cooking!";
echo intval("whats cooking");

Both statements print 0, so how is casting unsafe???

Casting by itself is not dangerous. What is dangerous is using casting
to work around type system - since in this case it could hide an error
(such as passing string "whats cooking!" to function requiring integer).
Of course, you can say such errors are of no importance to you - in
which case you should never use typed parameters at all and you'll be
fine :) (mostly)

Stas Malyshev
smalyshev@gmail.com

10 years ago by Zeev Suraski — view source

unread

Anthony,

I started writing this long response, but instead, I want to localize the
whole discussion to the one true root difference. Your position on that
difference is the basis for your entire case, and my position on this
argument is the base for my entire case.

There we go:

And note that this can only work with strict types since you can do the
necessary type inference and reconstruction (both forward from a function
call, and backwards before it).

Please do explain how strict type hints help you do inference that you
couldn't do with dynamic type hints. Ultimately, your whole argument hinges
on that, but you mention it in parentheses almost as an afterthought.
I claim the opposite - you cannot infer ANYTHING from Strict STH that you
cannot infer from Coercive STH. Consequently, everything you've shown, down
to the C-optimized version of strict_foo() can be implemented in the exact
same way for very_lax_foo(). Being able to optimize away the value
containers is not unique to languages with strict type hints. It's done in
JavaScript JIT engines, and it was done in our JIT POC.

With lax (weak, coercive) types, the ability to do type reconstruction
drops
significantly. Because you can no longer do any backwards inference from
other function calls. Which means you can't prove if a type is stable in
most
cases (won't change). Therefore, you'll always have to allocate a ZVAL,
and
then the optimizations I showed above would stop working.

Again, using the scientific method I'm familiar with, that would be a
theory, and it would require proof. So far I haven't seen any proof, and I
believe I pretty much proved the opposite with my example.

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

And note that this can only work with strict types since you can do the
necessary type inference and reconstruction (both forward from a function
call, and backwards before it).

Please do explain how strict type hints help you do inference that you
couldn't do with dynamic type hints. Ultimately, your whole argument hinges
on that, but you mention it in parentheses almost as an afterthought.
I claim the opposite - you cannot infer ANYTHING from Strict STH that you
cannot infer from Coercive STH. Consequently, everything you've shown, down
to the C-optimized version of strict_foo() can be implemented in the exact
same way for very_lax_foo(). Being able to optimize away the value
containers is not unique to languages with strict type hints. It's done in
JavaScript JIT engines, and it was done in our JIT POC.

I do here: http://news.php.net/php.internals/83504

I'll re-state the specific part in this mail:

<?php declare(strict_types=1);
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = $something / 2;
return foo($x);
}

^^ In that case, without strict types, you'd have to generate code for
both integer and float paths. With strict types, this code is invalid.

At that point the developer has the choice to explicitly cast or put
in a floor() or one of a number of options.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 1:35 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,

And note that this can only work with strict types since you can do
the necessary type inference and reconstruction (both forward from a
function call, and backwards before it).

Please do explain how strict type hints help you do inference that you
couldn't do with dynamic type hints. Ultimately, your whole argument
hinges on that, but you mention it in parentheses almost as an
afterthought.
I claim the opposite - you cannot infer ANYTHING from Strict STH that
you cannot infer from Coercive STH. Consequently, everything you've
shown, down to the C-optimized version of strict_foo() can be
implemented in the exact same way for very_lax_foo(). Being able to
optimize away the value containers is not unique to languages with
strict type hints. It's done in JavaScript JIT engines, and it was done
in our
JIT POC.

I do here: http://news.php.net/php.internals/83504

I'll re-state the specific part in this mail:

<?php declare(strict_types=1);
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = $something / 2;
return foo($x);
}

^^ In that case, without strict types, you'd have to generate code for
both
integer and float paths. With strict types, this code is invalid.

Ok, but how does that support your case? That alludes to the functionality
difference between strict STH and dynamic STH, and perhaps your Static
Analysis argument.

How does it help you generate better code code?

Suggesting that the nature of a type hint can help you determine what's the
value that's going to be passed to it is akin to saying that the size and
shape of a door can tell you something about the person or beast that's
standing on the other side. It just can't.

Let me illustrate it in a less colorful way.

Snippet 1:
... code that deals with $input ...
foo($input);

function foo(int $x)
{
...
}

Snippet 2:
... code that deals with $input ...
foo($input);

function foo(float $x)
{
...
}

Question:
What can you learn from the signatures of foo() in snippet 1 and 2 about the
type of $input? Does the fact I changed the function signature from snippet
1 to 2 somehow affects the type of $input? In what way?

If I understood you correctly, you're assuming that $input will too come
over using a strict type hint, which would tell you that it's an int and
therefore safe. But a coercive type hint will do the exact same job.

You can tell because you know the function foo expects an integer. So you
can infer that $x will have to have the type integer due to the future
requirement. Which means the expression "$something / 2" must also be an
integer. We know that's not the case, so we can raise an error here.

This is static analysis, not better code generation. And it boils down to a
functionality difference, not performance difference.

Zeev

10 years ago by Anthony Ferrara — view source

unread

Ze'ev,

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 1:35 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,

And note that this can only work with strict types since you can do
the necessary type inference and reconstruction (both forward from a
function call, and backwards before it).

Please do explain how strict type hints help you do inference that you
couldn't do with dynamic type hints. Ultimately, your whole argument
hinges on that, but you mention it in parentheses almost as an
afterthought.
I claim the opposite - you cannot infer ANYTHING from Strict STH that
you cannot infer from Coercive STH. Consequently, everything you've
shown, down to the C-optimized version of strict_foo() can be
implemented in the exact same way for very_lax_foo(). Being able to
optimize away the value containers is not unique to languages with
strict type hints. It's done in JavaScript JIT engines, and it was done
in our
JIT POC.

I do here: http://news.php.net/php.internals/83504

I'll re-state the specific part in this mail:

<?php declare(strict_types=1);
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = $something / 2;
return foo($x);
}

^^ In that case, without strict types, you'd have to generate code for
both
integer and float paths. With strict types, this code is invalid.

Ok, but how does that support your case? That alludes to the functionality
difference between strict STH and dynamic STH, and perhaps your Static
Analysis argument.

How does it help you generate better code code?

Because strict types makes that an error case. So I can then tell the
user to fix it. Once they do (via cast, logic change, etc), I know the
types of every variable the entire way through. So I can generate
native code for both calls without using variants.

Suggesting that the nature of a type hint can help you determine what's the
value that's going to be passed to it is akin to saying that the size and
shape of a door can tell you something about the person or beast that's
standing on the other side. It just can't.

"It just can't" yet it's done all the time. There is working code in
the wild that does exactly that.

It doesn't tell you what's on the other side (which you seem to be
suggesting), but gives you the possibilities that won't cause
error.

So then if you find a possibility from the other direction that isn't
in the set of stable possibilities, you can tell the user (because
that would be a runtime error). The division case in the example shows
that.

Let me illustrate it in a less colorful way.

Snippet 1:
... code that deals with $input ...
foo($input);

function foo(int $x)
{
...
}

Snippet 2:
... code that deals with $input ...
foo($input);

function foo(float $x)
{
...
}

Question:
What can you learn from the signatures of foo() in snippet 1 and 2 about the
type of $input? Does the fact I changed the function signature from snippet
1 to 2 somehow affects the type of $input? In what way?

With strict typing at the foo() call site, it tells you that $input
has to be an int or float (respectively between the snippets).

And as the static analyzer traces back, if it finds possibilities that
don't match (for example, if you assigned it directly from $_POST),
it's able to say that either the original assignment or the function
call is an error.

So yes, it does affect the stable-state types that $input can have.
And if we detect an error, we can tell the dev ahead of time about it.
And hence they can make the appropriate fix.

If I understood you correctly, you're assuming that $input will too come
over using a strict type hint, which would tell you that it's an int and
therefore safe. But a coercive type hint will do the exact same job.

No. I'm assuming that $input came from something that we can infer a
type set from. Which is basically anything in the language.

You can tell because you know the function foo expects an integer. So you
can infer that $x will have to have the type integer due to the future
requirement. Which means the expression "$something / 2" must also be an
integer. We know that's not the case, so we can raise an error here.

This is static analysis, not better code generation. And it boils down to a
functionality difference, not performance difference.

That static analysis enables better code generation. Which is
precisely what I said in an earlier post:
http://news.php.net/php.internals/83501 And I showed an example of the
better code generation.

I hope that makes my point a little clearer,

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 2:25 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Ze'ev,

It's Zeev, thanks :)

Because strict types makes that an error case. So I can then tell the user
to
fix it. Once they do (via cast, logic change, etc), I know the types of
every
variable the entire way through. So I can generate native code for both
calls
without using variants.

I think we are indeed getting somewhere, I hope.
If I understand correctly, effectively the flow you're talking about in your
example is this:

The developers tries to run the program.
It fails because the static analyzer detects float being fed to an int.
The user changes the code to convert the input to int.
You can now optimize the whole flow better, since you know for a fact
it's an int.

Did I describe that correctly?

With strict typing at the foo() call site, it tells you that $input has to
be an int
or float (respectively between the snippets).

I'm not following.
Are you saying that because foo() expects an int or float respectively,
$input has to be int or float? What if $input is really a string? Or a
MySQL connection?
Or are you saying that there was a strict type hint in the function that
contains the call to foo(), so we know it's an int/float respectively? If
so, how would it be any different with a coercive type hint?

I hope that makes my point a little clearer,

It actually does, I hope. I think we are getting somewhere, but we're not
quite there yet.

Thanks,

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

I think we are indeed getting somewhere, I hope.
If I understand correctly, effectively the flow you're talking about in your
example is this:

The developers tries to run the program.

It fails because the static analyzer detects float being fed to an int.

The user changes the code to convert the input to int.

You can now optimize the whole flow better, since you know for a fact
it's an int.

Did I describe that correctly?

Partially.

The static analysis and compilation would be pure AOT. So the errors
would be told to the user when they try to analyze the program, not
run it. Similar to HHVM's hh_client.

However, there could be a "runtime compiler" which compiles in PHP's
compile flow (leveraging opcache, etc). In that case, if the type
assertion isn't stable, the function wouldn't be compiled (the
external analyzer would error, here it just doesn't compile). Then the
code would be run under the Zend engine (and error when called).

With strict typing at the foo() call site, it tells you that $input has to
be an int
or float (respectively between the snippets).

I'm not following.
Are you saying that because foo() expects an int or float respectively,
$input has to be int or float? What if $input is really a string? Or a
MySQL connection?

So think of it as a graph. When you start the type analysis, there's
one edge between $input and foo() with type mixed. Looking at foo's
argument, you can say that the type of that graph edge must be int.
Therefore it becomes an int. Then, when you look at $input, you see
that it can be other things, and therefore there are unstable states
which can error at runtime.

Or are you saying that there was a strict type hint in the function that
contains the call to foo(), so we know it's an int/float respectively? If
so, how would it be any different with a coercive type hint?

Not all data gets into a function from a parameter:

function bar() {
$x = $_POST['data'];
foo($x);
}

in that case, we know $x can only be a string or an array (unless we
find where that variable was written to in the program). So we know
for a fact that there's a type error, even though it wasn't a
parameter.

Going deeper, we can look at other cases:

function x() {
if (time() % 360 > 0) {
return 123;
}
}

function bar() {
$x = x();
foo($x);
}

In this case, we know that x() has two possible types: int/null. That
doesn't satisfy the valid possibilities for foo (int), hence there's a
possible type error.

The key difference is this: Forward analysis (typing $x by assignment)
can tell you valid modes for your program. Backward analysis
(determining $x's type by its usages) can tell you invalid modes for
your program. Combining them gives you more flexibility in
hard-to-infer/reconstruct situations.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 3:02 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,

I think we are indeed getting somewhere, I hope.
If I understand correctly, effectively the flow you're talking about
in your example is this:

The developers tries to run the program.

It fails because the static analyzer detects float being fed to an
int.

The user changes the code to convert the input to int.

You can now optimize the whole flow better, since you know for a
fact it's an int.

Did I describe that correctly?

Partially.

The static analysis and compilation would be pure AOT. So the errors would
be told to the user when they try to analyze the program, not run it.
Similar
to HHVM's hh_client.

How about that then:

The developers runs a static analyzer on the program.
It fails because the static analyzer detects float being fed to an int.
The user changes the code to convert the input to int.
You can now optimize the whole flow better, since you know for a fact
it's an int.

Is that an accurate flow?

However, there could be a "runtime compiler" which compiles in PHP's
compile flow (leveraging opcache, etc). In that case, if the type
assertion
isn't stable, the function wouldn't be compiled (the external analyzer
would
error, here it just doesn't compile). Then the code would be run under the
Zend engine (and error when called).

Got you. Is it fair to say that if we got to that case, it no longer
matters what type of type hints we have?

With strict typing at the foo() call site, it tells you that $input
has to be an int or float (respectively between the snippets).

I'm not following.
Are you saying that because foo() expects an int or float
respectively, $input has to be int or float? What if $input is really
a string? Or a MySQL connection?

So think of it as a graph. When you start the type analysis, there's one
edge
between $input and foo() with type mixed. Looking at foo's argument, you
can say that the type of that graph edge must be int.
Therefore it becomes an int. Then, when you look at $input, you see that
it
can be other things, and therefore there are unstable states which can
error
at runtime.

So when you say it 'must be an int', what you mean is that you assume it
needs to be an int, and attempt to either prove that or refute that. Is
that correct?
If you manage to prove it - you can generate optimal code.
If you manage to refute that - the static analyzer will emit an error.
If you can't determine - you defer to runtime.

Is that correct?

For now only focusing on these two parts so that we can make some progress;
May come back to others later...

Thanks,

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

Partially.

The static analysis and compilation would be pure AOT. So the errors would
be told to the user when they try to analyze the program, not run it.
Similar
to HHVM's hh_client.

How about that then:

The developers runs a static analyzer on the program.

It fails because the static analyzer detects float being fed to an int.

The user changes the code to convert the input to int.

You can now optimize the whole flow better, since you know for a fact
it's an int.

Is that an accurate flow?

Yes. At least for what I was talking about in this thread.

However, there could be a "runtime compiler" which compiles in PHP's
compile flow (leveraging opcache, etc). In that case, if the type
assertion
isn't stable, the function wouldn't be compiled (the external analyzer
would
error, here it just doesn't compile). Then the code would be run under the
Zend engine (and error when called).

Got you. Is it fair to say that if we got to that case, it no longer
matters what type of type hints we have?

Once you get to the end, no. Recki-CT proves that.

The difference though is the journey. The static analyzer can reason
about far more code with strict types than it can without (due to the
limited number of possibilities presented at each call). So this
leaves the dilema: compiled code that behaves slightly differently
(what Recki does) or whether it always behaves the same.

So think of it as a graph. When you start the type analysis, there's one
edge
between $input and foo() with type mixed. Looking at foo's argument, you
can say that the type of that graph edge must be int.
Therefore it becomes an int. Then, when you look at $input, you see that
it
can be other things, and therefore there are unstable states which can
error
at runtime.

So when you say it 'must be an int', what you mean is that you assume it
needs to be an int, and attempt to either prove that or refute that. Is
that correct?
If you manage to prove it - you can generate optimal code.
If you manage to refute that - the static analyzer will emit an error.
If you can't determine - you defer to runtime.

Is that correct?

Basically yes.

Anthony

10 years ago by Stanislav Malyshev — view source

unread

Hi!

The difference though is the journey. The static analyzer can reason
about far more code with strict types than it can without (due to the
limited number of possibilities presented at each call). So this
leaves the dilema: compiled code that behaves slightly differently
(what Recki does) or whether it always behaves the same.

Wait, so are you saying that advantage of having strict typing in PHP
core is that some analyzer - which does not share code with PHP core,
AFAIU - if it interpreted PHP types in strict manner and provided
warnings where types it can statically deduce do not match and the
authors of the code agreed with its suggestions and rewrote their code
so that the analyzer would not complain, would in some cases result in
code that might be JIT-optimized more efficiently?

That is not a claim about strict typing in PHP core having any benefit
at all. I'm not sure even this claim is true (as adding (int) doesn't
actually improve performance - it just shifts around the place where the
conversion is done, and once conversion is done, you can do the same
optimizations as before) - but even if there's some situation where it
is true, I don't see how it makes difference for PHP core (even in
situation of "PHP core + JIT extension" or "non-Zend PHP runtime with
AOT/JIT").

--
Stas Malyshev
smalyshev@gmail.com

10 years ago by Anthony Ferrara — view source

unread

Stas,

Hi!

The difference though is the journey. The static analyzer can reason
about far more code with strict types than it can without (due to the
limited number of possibilities presented at each call). So this
leaves the dilema: compiled code that behaves slightly differently
(what Recki does) or whether it always behaves the same.

Wait, so are you saying that advantage of having strict typing in PHP
core is that some analyzer - which does not share code with PHP core,
AFAIU - if it interpreted PHP types in strict manner and provided
warnings where types it can statically deduce do not match and the
authors of the code agreed with its suggestions and rewrote their code
so that the analyzer would not complain, would in some cases result in
code that might be JIT-optimized more efficiently?

That is not a claim about strict typing in PHP core having any benefit
at all. I'm not sure even this claim is true (as adding (int) doesn't
actually improve performance - it just shifts around the place where the
conversion is done, and once conversion is done, you can do the same
optimizations as before) - but even if there's some situation where it
is true, I don't see how it makes difference for PHP core (even in
situation of "PHP core + JIT extension" or "non-Zend PHP runtime with
AOT/JIT").

Please don't twist my words. Look at everything I said, don't take one
statement from one very specific topic out of context as some sort of
proof that there are no benefits.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 3:21 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Zeev,

Partially.

The static analysis and compilation would be pure AOT. So the errors
would be told to the user when they try to analyze the program, not run
it.
Similar
to HHVM's hh_client.

How about that then:

The developers runs a static analyzer on the program.

It fails because the static analyzer detects float being fed to an
int.

The user changes the code to convert the input to int.

You can now optimize the whole flow better, since you know for a
fact it's an int.

Is that an accurate flow?

Yes. At least for what I was talking about in this thread.

OK.

So the code after the fix would look like this:
<?php declare(strict_types=1);
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = (int) $something / 2; // (int) or whatever else makes it clear
it's an int
return foo($x);
}
?>

Let me explain how this could play out with coercive type hints:
<?php
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = $something / 2;
return foo($x);
}

We can all agree that determining the types of just about anything here is
ultra-easy, so easy you could do it with a static analyzer, as you
suggested. $int and $something are integers, while $x is either an integer
or a float. We also know that both foo() and bar() expect integers.

What's the optimal code we could generate here?
First, on the function body of foo(), we can clearly and easily translate
the whole into machine code, as we know we'll get a long and need to return
a long.
Moving to the caller scope in bar(), given we know $x is either a float or
an integer, we could either generate code that calls coerce_to_int($x), or
even some optimize machine code that checks zval.type and either uses the
lval or converts dval. This can be done in AOT, no need to wait for
runtime. Once we know for a fact we have an integer in our hands - we can
make the call directly to the optimized foo(), a C level call without the
overhead of a PHP function call.

If you look at the generated code, it's going to be remarkably similar
between the two cases. If the developer chooses to pick the casting route,
it will look almost identical - except it will be convert_to_long() that is
called instead of coerce_to_int(), the former being more aggressive than the
latter.

Can you see anything impossible or otherwise wrong with my description of
how the AOT compiler would work in this case, with coercive type hints? If
not, there are no performance benefits for the Strict typed version after
the user alters his code to behave similarly to what coercive type hints
would bring.

Based on our Twitter discussion, I think I may have not made my position
clear regarding where our differences are. I'm not claiming that you can't
do the optimizations you say you can do. Not at all. My point is that we
can do the very same optimizations with coercive types as well - basically,
that there is no delta.

However, there could be a "runtime compiler" which compiles in PHP's
compile flow (leveraging opcache, etc). In that case, if the type
assertion isn't stable, the function wouldn't be compiled (the
external analyzer would error, here it just doesn't compile). Then
the code would be run under the Zend engine (and error when called).

Got you. Is it fair to say that if we got to that case, it no longer
matters what type of type hints we have?

Once you get to the end, no. Recki-CT proves that.

Do you mean that the statement is unfair or that it no longer matters? If
it's the former, can you elaborate as to why?

The difference though is the journey. The static analyzer can reason about
far more code with strict types than it can without (due to the limited
number of possibilities presented at each call). So this leaves the
dilema:
compiled code that behaves slightly differently (what Recki does) or
whether
it always behaves the same.

So think of it as a graph. When you start the type analysis, there's
one edge between $input and foo() with type mixed. Looking at foo's
argument, you can say that the type of that graph edge must be int.
Therefore it becomes an int. Then, when you look at $input, you see
that it can be other things, and therefore there are unstable states
which can error at runtime.

So when you say it 'must be an int', what you mean is that you assume
it needs to be an int, and attempt to either prove that or refute
that. Is that correct?
If you manage to prove it - you can generate optimal code.
If you manage to refute that - the static analyzer will emit an error.
If you can't determine - you defer to runtime.

Is that correct?

Basically yes.

Let me describe here too how it may look with coercive hints. Instead of
beginning with the assertion that it must be an int, we make no guess as to
what it may be(). We would use the very same methods you would use to
prove or refute that it's an int, to determine whether it's an int. Our
ability to deduce that it's an int is going to be identical to your ability
to prove that it's an int. If we see that it comes from an int type hint,
from an int typed function, etc. - we'd be able to generate the same ultra
optimized C-level call. If we manage to deduce that it may be an int or a
float, we can still create an ultra-optimized calling code that would deal
with just these two cases, or call coerce_to_int(). If we deduce that it's
a type that cannot be converted to an int (e.g. array or resource) - we can
emit a compile-time error. And if we have no idea what it is, we emit a
regular function call. Going back to that () from earlier, even if we're
unable to deduce what it is, we can actually assume/hope that it'll be an
integer and if it is - pass it on directly to the C implementation with a C
level function call; And if not, go with the regular function call.

The machine code you're left with is pretty much equivalent in case we
reached the conclusion that the variable is an integer (which would be
roughly in the same cases you're able to prove it that it is). The
difference would be that it allows for the non-integer types to be accepted
according to the coercion rules, which is a functional difference, not
performance difference.

Again, I'm not at all saying you can't do the optimizations you're saying
you're going to do or already doing. In a way I'm saying the opposite - of
course you can do them. We can do them as well with coercive type hints.

Thanks,

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

So the code after the fix would look like this:
<?php declare(strict_types=1);
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = (int) $something / 2; // (int) or whatever else makes it clear
it's an int
return foo($x);
}
?>

Let me explain how this could play out with coercive type hints:
<?php
function foo(int $int): int {
return $int + 1;
}

function bar(int $something): int {
$x = $something / 2;
return foo($x);
}

We can all agree that determining the types of just about anything here is
ultra-easy, so easy you could do it with a static analyzer, as you
suggested. $int and $something are integers, while $x is either an integer
or a float. We also know that both foo() and bar() expect integers.

What's the optimal code we could generate here?
First, on the function body of foo(), we can clearly and easily translate
the whole into machine code, as we know we'll get a long and need to return
a long.
Moving to the caller scope in bar(), given we know $x is either a float or
an integer, we could either generate code that calls coerce_to_int($x), or
even some optimize machine code that checks zval.type and either uses the
lval or converts dval. This can be done in AOT, no need to wait for
runtime. Once we know for a fact we have an integer in our hands - we can
make the call directly to the optimized foo(), a C level call without the
overhead of a PHP function call.

Well, yes and no.

In this simple example, you could generate the division as float
division, then checking the mantissa to determine if it's an int.

long bar(long something) {
double x = something / 2;
if (x != (double)(long)x) {
raise_error();
}
return foo((long) x);
}

You're still doubling the number of CPU ops and adding at least one
branch at runtime, but not a massive difference.

However in general you'd have to use something like div_function and
use a union type of some sort. You mention this (about checking
zval.type at runtime). My goal would be to avoid using unions at all
(and hence no zval). Because that drastically simplifies both compiler
and code generator design.

Especially for a JIT compiler (local, not tracing) simplified design
generally translates to significantly faster runtime. Compare LLVM to
libjit: 50x difference in compile time.

If you look at the generated code, it's going to be remarkably similar
between the two cases. If the developer chooses to pick the casting route,
it will look almost identical - except it will be convert_to_long() that is
called instead of coerce_to_int(), the former being more aggressive than the
latter.

I wouldn't even bother with that, I'd just use a C cast (well, the ASM
equivalent). Saves function calls, zval representation, etc.

Can you see anything impossible or otherwise wrong with my description of
how the AOT compiler would work in this case, with coercive type hints? If
not, there are no performance benefits for the Strict typed version after
the user alters his code to behave similarly to what coercive type hints
would bring.

It's very much not about impossible. It's about complexity. Strict
code is easier to reason about, it's easier to analyze and it's easier
to code-generate because all of the reduced amount that you need to
support. And we're not talking about making users change their code
drastically. We're talking about -in many cases- minor tweaks.

Minor tweaks that would need to be done with your proposal as well. So
if we're going to require users change their code, why not make it
opt-in and give them the predictability that we can?

Got you. Is it fair to say that if we got to that case, it no longer
matters what type of type hints we have?

Once you get to the end, no. Recki-CT proves that.

Do you mean that the statement is unfair or that it no longer matters? If
it's the former, can you elaborate as to why?

No, I meant that Recki proves what you said (once you get to a stable
type analysis of even untyped code it doesn't matter the hints exist
or not).

So when you say it 'must be an int', what you mean is that you assume
it needs to be an int, and attempt to either prove that or refute
that. Is that correct?
If you manage to prove it - you can generate optimal code.
If you manage to refute that - the static analyzer will emit an error.
If you can't determine - you defer to runtime.

Is that correct?

Basically yes.

Let me describe here too how it may look with coercive hints. Instead of
beginning with the assertion that it must be an int, we make no guess as to
what it may be(). We would use the very same methods you would use to
prove or refute that it's an int, to determine whether it's an int. Our
ability to deduce that it's an int is going to be identical to your ability
to prove that it's an int. If we see that it comes from an int type hint,
from an int typed function, etc. - we'd be able to generate the same ultra
optimized C-level call. If we manage to deduce that it may be an int or a
float, we can still create an ultra-optimized calling code that would deal
with just these two cases, or call coerce_to_int(). If we deduce that it's
a type that cannot be converted to an int (e.g. array or resource) - we can
emit a compile-time error. And if we have no idea what it is, we emit a
regular function call. Going back to that () from earlier, even if we're
unable to deduce what it is, we can actually assume/hope that it'll be an
integer and if it is - pass it on directly to the C implementation with a C
level function call; And if not, go with the regular function call.

The machine code you're left with is pretty much equivalent in case we
reached the conclusion that the variable is an integer (which would be
roughly in the same cases you're able to prove it that it is). The
difference would be that it allows for the non-integer types to be accepted
according to the coercion rules, which is a functional difference, not
performance difference.

Well, the end result is pretty much equivalent. But only pretty much.
In the example above, the few CPU ops and extra branch will very
likely slow down the code significantly (more than a factor of 2).
It's an extremely small difference, the literal definition of a
micro-optimization (strict would be 2 or 3 operations depending on
platform - div and cast(maybe) and push vs coercive's additional
branch). But when talking about code generation and compilers, these
differences may be non-trivial.

Again, not saying this is major enough to be concerned about, but it's
not identical. There are small differences.

The big difference though is the complexity of the analyzer and
compiler (time, resources, programmer skill) to build one for the
coercive case. With strict types, its drastically easier (simply less
cases to be concerned about).

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Monday, February 23, 2015 4:38 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] JIT (was RE: [PHP-DEV] Coercive Scalar Type Hints
RFC)

Well, yes and no.

In this simple example, you could generate the division as float division,
then
checking the mantissa to determine if it's an int.

long bar(long something) {
double x = something / 2;
if (x != (double)(long)x) {
raise_error();
}
return foo((long) x);
}

You're still doubling the number of CPU ops and adding at least one branch
at
runtime, but not a massive difference.

To be honest, I missed an important part in the semantics of the sample
code, the fact that the result of the division in bar() is sent to function
with an integer type hint, which means it may with Coercive STH just as it
would with Strict STH (in retrospect I now understand you alluded to that in
your replies to Stas, but was too tired to realize that). That means that
we could conduct identical static analysis, alerting the developer to the
exact same possible type mismatch in both Coercive STH and Strict STH. I
actually fail now to see how the process would be any different at all
between the two modes. This particular code requires changes in order to
work in all cases - semantic changes - probably either explicit casting to
int or - more likely - changing the type hints to float. In either of these
cases, we'd now have fully known types for the entire flow and could
optimize it to machine code equally well and equally easily, and with the
same number of resulting CPU ops.

However in general you'd have to use something like div_function and use a
union type of some sort. You mention this (about checking zval.type at
runtime). My goal would be to avoid using unions at all (and hence no
zval).
Because that drastically simplifies both compiler and code generator
design.

That would be our goal as well. And in most cases, the success ratio will
be the same between coercive and strict implementations. In that snippet
(which again, I misanalysed) - a static analyzer will be able to alert to
the same issue, prompting the developer to fix it (equally in both Coercive
and Strict). With the two probable fixes - an explicit cast to int, or the
more likely change of type hints across the sample from int to float - we'd
be able to conduct the same compile-time analysis and generate optimal,
ZVAL-free code in both Coercive and Strict STH.

It's very much not about impossible.

I'm happy we have that clearly stated, as based on emails here and
elsewhere, it wasn't clear to a lot of people beforehand. Given identical
input source code, in all the cases where it's possible to generate C-level
calls with Strict, it will also be possible to generate C-level calls with
Coercive.

By the way, that misperception that it's possible to do optimizations with
Strict that aren't possible with Dynamic/Coercive doesn't have to do just
with assertions that were stated here or elsewhere. I've heard people
assuming that many years ago, effectively guessing that's the case without
having any meaningful understanding about execution engines or JIT, deducing
that strict type hints would somehow turn PHP into C in terms of our ability
to optimize.

It's about complexity. Strict code is
easier to reason about, it's easier to analyze and it's easier to
code-generate
because all of the reduced amount that you need to support. And we're not
talking about making users change their code drastically. We're talking
about
-in many cases- minor tweaks.

The explicit casting risk has been beaten to death so I won't dive into it
yet again.

I think it boils down to strict STH accepting fewer inputs, which a static
analyzer can sometimes pick up, thereby prompting developers to be more
explicit in their choices of types - in turn providing more compile-time
type information and more restrictive at that - thereby making the code
easier for AOT to work on. It still holds that given the same
explicitly-typed code, AOT/JIT can do an identical job between Strict STH
and Coercive STH. The difference is that the Static Analyzer would be able
to alert you to some more rejections - because there would be more
rejections with Strict than there would be with Coercive - rejections that
would prompt you to change your code. I still think that the cases where a
static analyzer can provide more insight in Strict vs. Coercive are
relatively rare. Our ability to infer the type in compile time is identical
between both; In cases where we can clearly know with absolute confidence
that the type we have is the type the function wants - we can optimize that
into a C call - in both cases. If we can't infer the type in compile-time,
then the code we'd generate would be the same in both cases. The main
difference is in cases where we can infer possible types, which would be
rejected in Strict but accepted in Coercive. With Strict, you could simply
warn about it, pushing the developer in the direction of changing his code
(potentially making it . With Coercive, you could generate optimized code
for the likely case, and catch-all code for the less likely cases.

Minor tweaks that would need to be done with your proposal as well. So if
we're going to require users change their code, why not make it opt-in and
give them the predictability that we can?

That's off topic for the JIT discussion. I explained why I think having two
modes would be have negative implications in the RFC.

Let me describe here too how it may look with coercive hints. Instead
of beginning with the assertion that it must be an int, we make no
guess as to what it may be(). We would use the very same methods you
would use to prove or refute that it's an int, to determine whether
it's an int. Our ability to deduce that it's an int is going to be
identical to your ability to prove that it's an int. If we see that
it comes from an int type hint, from an int typed function, etc. -
we'd be able to generate the same ultra optimized C-level call. If we
manage to deduce that it may be an int or a float, we can still create
an ultra-optimized calling code that would deal with just these two
cases, or call coerce_to_int(). If we deduce that it's a type that
cannot be
converted to an int (e.g. array or resource) - we can
emit a compile-time error. And if we have no idea what it is, we emit
a
regular function call. Going back to that () from earlier, even if
we're unable to deduce what it is, we can actually assume/hope that
it'll be an integer and if it is - pass it on directly to the C
implementation with a C level function call; And if not, go with the
regular
function call.

The machine code you're left with is pretty much equivalent in case we
reached the conclusion that the variable is an integer (which would be
roughly in the same cases you're able to prove it that it is). The
difference would be that it allows for the non-integer types to be
accepted according to the coercion rules, which is a functional
difference, not performance difference.

Well, the end result is pretty much equivalent. But only pretty much.
In the example above, the few CPU ops and extra branch will very likely
slow
down the code significantly (more than a factor of 2).

If we manage to conclude that the value is an integer - the code would not
only be pretty much identical, but completely identical.
If we manage to conclude that the value is either an integer or a float
(which I don't believe is a very common scenario, pretty unique to the
division operator) - then in both cases a static analyzer can alert the
developer his code is potentially unsafe given certain inputs. If the
developer decides to change his code to an explicit cast - we're back to the
first scenario. If not - then the generated code would still be similar
between Strict and Coercive, with the difference being Strict flat out
rejecting floats, while Coercive performing some check on them to see if
they can be converted with no data loss. In both Strict and Coercive, the
ZVAL structure will have to stick around, you'd have to check the type and
perform different actions depending on it. It's true that if the value is
the result of the division operator, then pretty much by definition a float
fed to the hint would always fail to convert to an int without data loss,
but that's really, really a very specialized property of the division
operator.

It boils down to what semantics the developer is after, not whether Strict
can generate more efficient code. In both cases, the static analyzer can
alert us to the same issue; Whether we can emit more efficient int-only
code would depend on whether the developer changes his code so that the
input can clearly be inferred as an int during compile-time.

Again, not saying this is major enough to be concerned about, but it's not
identical. There are small differences.

I agree, but they stem from the difference in functionality, not because we
can optimize Strict code better. Given identical input source code, Strict
and Coercive can be optimized to exactly the same code.

The big difference though is the complexity of the analyzer and compiler
(time, resources, programmer skill) to build one for the coercive case.
With
strict types, its drastically easier (simply less cases to be concerned
about).

There doesn't have to be any difference in the complexity of the analyzer,
not in terms of developing it and not in terms of the time or runtime
resources it would consume. Cases where you can infer the type with
confidence would be identical between Strict and Coercive. For cases where
we don't know anything - we won't be able to optimize much in either case.
For the few (IMHO uncommon) cases where we know the type could be one of
several, we're not obligated to optimize every possible branch. We could
opt to not optimize it at all and treat it like 'unknown', or optimize the
path for the most likely type, while keeping the rest unoptimized.

Another thought. If we end going in the direction of the Dual Mode RFC,
we'd probably want to create JIT/AOT for both modes, given both would be
used extensively, and that the differences between the two don't justify a
position where we say JIT/AOT cannot be done for dynamic/coercive. Having
to develop two modes for JIT is not a happy thought.

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

You're still doubling the number of CPU ops and adding at least one branch
at
runtime, but not a massive difference.

To be honest, I missed an important part in the semantics of the sample
code, the fact that the result of the division in bar() is sent to function
with an integer type hint, which means it may with Coercive STH just as it
would with Strict STH (in retrospect I now understand you alluded to that in
your replies to Stas, but was too tired to realize that). That means that
we could conduct identical static analysis, alerting the developer to the
exact same possible type mismatch in both Coercive STH and Strict STH. I
actually fail now to see how the process would be any different at all
between the two modes. This particular code requires changes in order to
work in all cases - semantic changes - probably either explicit casting to
int or - more likely - changing the type hints to float. In either of these
cases, we'd now have fully known types for the entire flow and could
optimize it to machine code equally well and equally easily, and with the
same number of resulting CPU ops.

With coercive types the analyzer/compiler would be forced to give up
on 100% valid code (not even "bad practice", but completely valid).
With strict types, the few cases it would need to give up will be
overly dynamic and hence risky with strict types.

So while, yes, you can compile a subset of coercive types using
the methods discussed here, you can compile a far greater percentage
of valid strict code. As in the opposite relationship (most valid
coercive code won't be compilable, where most valid strict code will).

And that's where the huge difference is.

However in general you'd have to use something like div_function and use a
union type of some sort. You mention this (about checking zval.type at
runtime). My goal would be to avoid using unions at all (and hence no
zval).
Because that drastically simplifies both compiler and code generator
design.

That would be our goal as well. And in most cases, the success ratio will
be the same between coercive and strict implementations. In that snippet

I don't think so. The cases where it would be the same coercive gives
literally no benefit over strict types. So at that point why not go
all the way and use strict types in the first place?

The cases where it makes a difference (a lot more than I think you're
counting) are going to be where 100% valid coercive code is too
dynamic to reason about (given our simple example above required
modification).

(which again, I misanalysed) - a static analyzer will be able to alert to
the same issue, prompting the developer to fix it (equally in both Coercive
and Strict). With the two probable fixes - an explicit cast to int, or the
more likely change of type hints across the sample from int to float - we'd
be able to conduct the same compile-time analysis and generate optimal,
ZVAL-free code in both Coercive and Strict STH.

Again, in trivial cases with 2-3 operations and simple types, yes. In
non-trivial cases, or in cases that you explicitly support that's
going to be a lot harder if not impossible:

function foo(string $bar): int {
return $bar;
}

In strict types, that's always an error. In coercive types, it
depends on what you pass in. If you pass "30", then everyone's happy.
If then, down the road, "30 dogs" gets passed in, boom.

If we want to say that the static analyzer can complain about 100%
valid (and encouraged, if we buy the statements you make in your RFC)
code, then great. But then that means we're no longer analyzing PHP,
but a subset of it.

It's very much not about impossible.

I'm happy we have that clearly stated, as based on emails here and
elsewhere, it wasn't clear to a lot of people beforehand. Given identical
input source code, in all the cases where it's possible to generate C-level
calls with Strict, it will also be possible to generate C-level calls with
Coercive.

It's possible today without types, see Recki-CT. However it's only
possible with a strict subset of coercive code (perhaps 5% of possible
valid code).

With strict types, it's possible on perhaps 95% of valid code.

It's about complexity. Strict code is
easier to reason about, it's easier to analyze and it's easier to
code-generate
because all of the reduced amount that you need to support. And we're not
talking about making users change their code drastically. We're talking
about
-in many cases- minor tweaks.

The explicit casting risk has been beaten to death so I won't dive into it
yet again.

I think it boils down to strict STH accepting fewer inputs, which a static
analyzer can sometimes pick up, thereby prompting developers to be more
explicit in their choices of types - in turn providing more compile-time
type information and more restrictive at that - thereby making the code
easier for AOT to work on. It still holds that given the same
explicitly-typed code, AOT/JIT can do an identical job between Strict STH
and Coercive STH. The difference is that the Static Analyzer would be able
to alert you to some more rejections - because there would be more
rejections with Strict than there would be with Coercive - rejections that
would prompt you to change your code. I still think that the cases where a
static analyzer can provide more insight in Strict vs. Coercive are
relatively rare. Our ability to infer the type in compile time is identical
between both; In cases where we can clearly know with absolute confidence
that the type we have is the type the function wants - we can optimize that
into a C call - in both cases. If we can't infer the type in compile-time,
then the code we'd generate would be the same in both cases. The main
difference is in cases where we can infer possible types, which would be
rejected in Strict but accepted in Coercive. With Strict, you could simply
warn about it, pushing the developer in the direction of changing his code
(potentially making it . With Coercive, you could generate optimized code
for the likely case, and catch-all code for the less likely cases.

I think you're being generous here in terms of how much code is going
to be analyzable. That's one of the points of strict typing, that if
it's not analyzable it's not valid code.

Minor tweaks that would need to be done with your proposal as well. So if
we're going to require users change their code, why not make it opt-in and
give them the predictability that we can?

That's off topic for the JIT discussion. I explained why I think having two
modes would be have negative implications in the RFC.

How is it off topic? I think it's incredibly important. Because you're
claiming that you can do the exact same thing with coercive as we can
do with strict. But that's only true if you change code in coercive
mode. Only if you use a subset of valid coercive code. But with strict
you get that for free. So if users are going to be having to modify
their code anyway, what benefit does coercive give them over strict?
They are going to opt-in anyway in either case.

Let me describe here too how it may look with coercive hints. Instead
of beginning with the assertion that it must be an int, we make no
guess as to what it may be(). We would use the very same methods you
would use to prove or refute that it's an int, to determine whether
it's an int. Our ability to deduce that it's an int is going to be
identical to your ability to prove that it's an int. If we see that
it comes from an int type hint, from an int typed function, etc. -
we'd be able to generate the same ultra optimized C-level call. If we
manage to deduce that it may be an int or a float, we can still create
an ultra-optimized calling code that would deal with just these two
cases, or call coerce_to_int(). If we deduce that it's a type that
cannot be
converted to an int (e.g. array or resource) - we can
emit a compile-time error. And if we have no idea what it is, we emit
a
regular function call. Going back to that () from earlier, even if
we're unable to deduce what it is, we can actually assume/hope that
it'll be an integer and if it is - pass it on directly to the C
implementation with a C level function call; And if not, go with the
regular
function call.

The machine code you're left with is pretty much equivalent in case we
reached the conclusion that the variable is an integer (which would be
roughly in the same cases you're able to prove it that it is). The
difference would be that it allows for the non-integer types to be
accepted according to the coercion rules, which is a functional
difference, not performance difference.

Well, the end result is pretty much equivalent. But only pretty much.
In the example above, the few CPU ops and extra branch will very likely
slow
down the code significantly (more than a factor of 2).

If we manage to conclude that the value is an integer - the code would not
only be pretty much identical, but completely identical.
If we manage to conclude that the value is either an integer or a float
(which I don't believe is a very common scenario, pretty unique to the
division operator) - then in both cases a static analyzer can alert the
developer his code is potentially unsafe given certain inputs. If the
developer decides to change his code to an explicit cast - we're back to the
first scenario. If not - then the generated code would still be similar
between Strict and Coercive, with the difference being Strict flat out
rejecting floats, while Coercive performing some check on them to see if
they can be converted with no data loss. In both Strict and Coercive, the
ZVAL structure will have to stick around, you'd have to check the type and
perform different actions depending on it. It's true that if the value is
the result of the division operator, then pretty much by definition a float
fed to the hint would always fail to convert to an int without data loss,
but that's really, really a very specialized property of the division
operator.

I'm talking about generic cases, and you're talking about special
cases. And in the generic cases you can't conclude the value is an
integer in coercive mode. But you can in strict mode.

Yes, for a very small subset of code strict provides no benefit over
coercive in terms of static analysis ability. But it's only for that
subset. The rest of the code strict provides significant benefits.

It boils down to what semantics the developer is after, not whether Strict
can generate more efficient code. In both cases, the static analyzer can
alert us to the same issue; Whether we can emit more efficient int-only
code would depend on whether the developer changes his code so that the
input can clearly be inferred as an int during compile-time.

Again, not saying this is major enough to be concerned about, but it's not
identical. There are small differences.

I agree, but they stem from the difference in functionality, not because we
can optimize Strict code better. Given identical input source code, Strict
and Coercive can be optimized to exactly the same code.

Yes, identical input. But we're not talking about identical input.
We're talking the general case.

And if you really believe that the general case you can analyze
coercive code better than strict code, there's no real point
continuing this discussion as there's no basis in reality.

Anthony

10 years ago by Lester Caine — view source

unread

And as the static analyzer traces back, if it finds possibilities that
don't match (for example, if you assigned it directly from $_POST),
it's able to say that either the original assignment or the function
call is an error.

Why would using an integer I've passed in a URL be a 'fault'? All of the
data navigation functions pass their state via the URL and one simply
protects against hackers by filtering the state to a default value if it
does not return the correct integer data.

JIT (was RE: Coercive Scalar Type Hints RFC)

-- Lester Caine - G8HFL

Even for AOT, I don't see any advantage for strict typing on the same code. The only difference is that strict AOT compiler would reject some code and some of that code may be non-JIT-friendly. On accepted code, again, I see no difference.

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL

Even for AOT, I don't see any advantage for strict typing on the same
code. The only difference is that strict AOT compiler would reject some
code and some of that code may be non-JIT-friendly. On accepted code,
again, I see no difference.

--
Lester Caine - G8HFL