Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:83247
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: error (pb1.pair.com: domain garfieldtech.com from 66.111.4.28 cause and error)
Message-ID: <54E66569.8000709@garfieldtech.com>
Date: Thu, 19 Feb 2015 17:36:25 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: internals@lists.php.net
References: <011801d04a07$83ab1c00$8b015400$@php.net>	<CABDNdAFQsbo-A-gijPCwYDOf3ts-yKpP0tAL--WUaLU26GB-AQ@mail.gmail.com>	<016f01d04a3a$e9183220$bb489660$@php.net>	<CAESVnVoM7YbmXA8hTG1O66j+6YX3Xx2Q3uj_F=TPjHA4ws=8JA@mail.gmail.com>	<d8031ce52e9965c789887e8f7b687a4e@mail.gmail.com>	<CAESVnVpA+M6iv_zbDmKTEF8Q1zKjwnbi1ucc=UyuUyRr+X+U3w@mail.gmail.com>	<022801d04ab1$4a0c47d0$de24d770$@php.net>	<1913e09d7f52541901d8574d2080a63f@mail.gmail.com>	<CAAyV7nFD=BQ5dBYiKzhbOKTomKmWvruTnb=R_n9mYB-Lz1P9pg@mail.gmail.com>	<7a5d96b34b98ec1f3ee17be7fa6a1e81@mail.gmail.com>	<CAPhkiZxpYbaPN1370_HFTVnBA=Jg-rU8DrB2P+2dmRcJfGWPtw@mail.gmail.com>	<2CBDEB67-3DE3-437D-9AF3-0E6A92027244@zend.com> <CAPhkiZwcD3h42=9eak4uP9n2GBE=v0LVP6Ncq9Oz7L91dJ5y5w@mail.gmail.com> <4cc0c81c7199a452534bb8edcdb19914@mail.gmail.com> <54E589F6.9030002@garfieldtech.com> <d62afd01c97a2d00466ac88579fa861c@mail.gmail.com>
In-Reply-To: <d62afd01c97a2d00466ac88579fa861c@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [PHP-DEV] Reviving scalar type hints
From: larry@garfieldtech.com (Larry Garfield)

On 02/19/2015 04:13 AM, Zeev Suraski wrote:
>> -----Original Message-----
>> From: Larry Garfield [mailto:larry@garfieldtech.com]
>> Sent: Thursday, February 19, 2015 9:00 AM
>> To: internals@lists.php.net
>> Subject: Re: [PHP-DEV] Reviving scalar type hints
>>
>> On 02/17/2015 01:30 PM, Zeev Suraski wrote:
>>>> Yes, I already know that.
>> At this point, if I could rephrase the "camps" a bit I see two different
>> sets of
>> priorities:
>>
>> 1) PHP should do what seems "obviously safe" to do, to make life easiest
>> for
>> developers.  That is, it's patently obvious that "32" and 32 are
>> equivalent, so
>> don't make developers worry about the distinction because to them there
>> isn't one.  This is an entirely reasonable position.
>>
>> 2) PHP would benefit hugely from static analysis tools and compile-time
>> type-based optimizations, but those are only possible with code that is
>> strongly typed.  Currently such tools do not really exist, but with
>> compile-
>> time-knowlable information could be written and even incorporated into
>> future versions of PHP without API breaks.  (I think Anthony demonstrated
>> earlier examples of function calls no longer being slow, for instance, if
>> the
>> type juggling could be removed at compile
>> time.)  This is an entirely reasonable position.
> Larry,
>
> There's actually very little difference between coercive type hinting and
> strict type hinting in terms of performance.  If you read what both Dmitry
> and Anthony said, it should be clear that the vast majority of gains can be
> had even without any sort of type hinting at all - and as Stas pointed out,
> JavaScript has some mind blowing JIT optimizations without any explicit type
> info at all.
>
> Moreover, I think it's easy to lose the forest from the trees here, by
> focusing on a very narrow piece of code - without looking at the bigger
> picture.
>
> Ultimately, if you have a piece of data that you want to pass from a caller
> to a callee, it could be under one of three labels:
> 1.  A piece of data the callee can use as-is.
> 2.  A piece of data the callee can use after conversion (be it explicit or
> implicit).
> 3.  A piece of data the callee cannot/shouldn't use.
>
> When comparing strict and coercive type hints, there's no difference between
> them in terms of #1;  There's a subtle difference with #3 - but only in the
> error situation.  In other words, for coercive type hints, it would just
> take a bit more time before they fail, because they have to conduct a few
> more checks.  However, that's an error situation anyway, which is either
> already going to bail out, or go through error handling code - which would
> be very slow anyway.
>
> So focusing on #2, in a practical real world situation - the difference is
> actually a lot more subtle than people might think if they only zoom into on
> the area around parameter passing.  The bigger picture is, what would the
> code author - the one making the call - want to do, semantically?  In other
> words, if you have "32" coming from a database or whatnot, are you likely to
> want an API that accepts an int to be able to use that?  I think the answer
> is almost always yes.  So practically, what will happen with strict typing
> is that you'd explicitly cast it to int, while with coercive typing - you'd
> rely on the language to do it for you.  Arguably, very little difference
> between the two in terms of performance.  Note that it's possible people
> will be able to come up with various edge cases where strict typing might
> somehow alert you to a situation that may push you to change your code in a
> way it might end up being slightly faster.  But those will be edge cases and
> should be taken in the context - in the vast majority of code patterns,
> there's zero difference between the two approaches in terms of performance.
>
> In terms of functionality, however, there's actually a substantial
> difference between the two - explicit casting is a lot more aggressive than
> the coercion rules we're thinking about for coercive type hints.  It'll
> happily and silently coerce "Apple" into 0, "100 dogs" into 100, and 3.1415
> into 3.
>
> Now, diving back to future potential AOT/JIT, it's simply not true that
> there's any gain at all from strict typing - or at least, neither Dmitry
> (who wrote a full JIT compiler for PHP that runs Mandelbrot as fast as gcc
> does) nor me were able to understand them.  Anthony spoke about being able
> to completely eliminate the zval container and all associated checks, so
> that in certain situations you'd be able to map a PHP integer all the way
> down to a C (or asm) integer.  That can certainly be done, but it has
> nothing to do with strict vs. coercive type hints.  Here's why:
>
> 1. At this point I think it's clear to everyone that inside the called
> function, there's zero difference between strict and coercive typing (or
> even the weak typing we were talking about earlier).  They're 100%
> guaranteed to receive what they asked, either because values were coerced or
> blocked from even making it into the function.
> 2. On the outside calling code - if you can conduct the level of type
> inference that would enable you to safely compile a PHP integer into a
> machine code integer, by all means - do it;   While at it, generate slightly
> different function calling code that would bypass zval type checks
> altogether, and provide that function with the integer it wanted.
>
> Note that in his JIT POC, Dmitry managed to conduct a lot of this without
> any type hinting *at all*, so while type hints (be them
> strict/coercive/weak) make this job a bit easier - they're hardly required;
> Nor do they solve the bigger challenging problem - which is type inference
> in the various functions' code bodies themselves - since we don't have
> variable declarations or strong typing in PHP.
>
>> Naturally those two positions are mutually exclusive; if the compiler has
>> to
>> allow for "32" to be converted to 32 at runtime, it can't optimize the
>> opcodes by removing the code that would do that conversion!
>>
>> In essence, opt-in-strict becomes an opt-in "compiler, be pedantic so you
>> can
>> make my code faster" flag.  More carrot than stick, since people can
>> control
>> when they opt-in to fancier compiler optimizations at the cost of some DX,
>> but only in some cases.
> I hope what I said above illustrates why it's a misperception - and I think
> it's a widely spread one.  If your data source has the wrong type, and you
> still want to use it - you'd have to convert it.  The cost would be similar
> whether it's done automatically by the language for you, or done manually
> through an explicit cast - the latter being significantly more likely to
> hide bugs.  If people are in favor of strict typing because they think it
> can help generate faster code - they should understand it's a misperception
> and focus on the functionality instead!
>
>> I started this email planning to ask Anthony how flexible strict checking
>> could
>> get without losing the benefits of it, but I think I've just convinced
>> myself the
>> answer is "not very".  Which then leaves only the question of internal
>> functions that Rasmus raised, which... it looks like is discussed in later
>> emails
>> so I will try to catch up on those. :-)
> I hope I can convince you back :)
> Given that are no substantial performance gains for strict typing vs.
> coercive typing, again, no performance gains from strict vs. coercive
> typing, we're really talking about functionality here.
>
> I actually think the strict camp has *a lot* to gain from the single, fairly
> strict but not as strict as zval.type comparison.  Most notably - the vast
> majority of use cases that were brought up by strict typing proponents, such
> as rejecting lossy conversions ("100 dogs" -> 100, 37.7 -> 37, etc.) and
> rejecting 'inventive' conversions (like bool->anything) - will not only be
> supported, but they would be the *default*, and actually only available
> behavior.  That is compared with the currently proposed RFC, where strict
> typing would have to be explicitly enabled.  I also think that avoiding the
> proliferation of explicit casts - that is bound to happen by people
> adjusting their code to be strict compliant in a hurry - is a big gain for
> many strict typing proponents.
>
> It's true that there may certain use cases that coercive type hints may make
> more difficult - such as static analysis (I'm not entirely sure why that is,
> but I never dived into that) - but that in itself isn't a good enough
> reason, IMHO, to introduce a second, separate mode that deals with scalars
> in such a different way than the rest of PHP.
>
> Obviously, I think 'weak' campers have a lot to gain too - by making
> sensible conversions work fine as expected, without having to resort to
> explicit casts.
> And everyone stands to gain from having just one mode, instead of two.
> The coercive typing approach would require each camp to give up a bit of
> their 'ideology', but it also gives both schools of thought *most* of what
> they want, including the key tenets for each camp (rejecting non-sensible
> conversions - always, allowing sensible ones - always).  I believe that's
> what makes it a good compromise, a better one than the currently proposed
> RFC.
>
> Thanks!
>
> Zeev

Thank you for the detailed reply, Zeev.

I am not a language engineer myself, so I can't speak to how or if 
full-static would be more performant.  I am mostly relying on the 
statement of others such as Anthony that it would be the case and trying 
to summarize/rephrase the camps in terms of the desired benefit (DX and 
performance/correctness) rather than the implementation ("weak" vs 
"strong").  If it's possible to mostly have our cake and eat it too, I'm 
all for that.  Anthony and Stas are discussing the details of that in 
the (now-misnamed) spin-off thread and much of it is sadly over my head.

Anthony, can you expand here at all about the practical benefits of 
strong-typing for variable passing for the compiler?  That seems to be 
the main point of contention: Whether or not there are real, practical 
benefits to be had in the compiler of knowing that a call will be in 
"strict mode".  (If there are, then the split-mode makes sense  If there 
are not, then there's little benefit to it.)

Either way, I agree 100% with Zeev that we can/should tighten up the 
coercion logic.  In 16 years of writing PHP I have never once had a 
situation where using "99 red balloons" in a context that wants an 
integer wasn't a bug.