Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:83469
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.44 as permitted sender)
MIME-Version: 1.0
References: <7ef509ef10bb345c792f9d259c7a3fbb@mail.gmail.com>
 <CAAyV7nHq_-9jyRcwJmiAxthmEo3u3MDXcyOC09LV2DETABzCoA@mail.gmail.com>
 <8250289916f5128b5bc1a114428d374e@mail.gmail.com> <CAAyV7nHAicJCQU87YkhZ700a=k4MRqVJtMV-mG4F0MQsn82OfQ@mail.gmail.com>
Date: Sun, 22 Feb 2015 13:00:06 +0000
Message-ID: <CACQesk5cqzWD4jGSsv7b1ZoiinL7vyC5ZSZ2nemTMuKfUwqk2Q@mail.gmail.com>
To: Anthony Ferrara <ircmaxell@gmail.com>, Zeev Suraski <zeev@zend.com>
Cc: PHP internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=001a11c3686e6739bf050facdd1c
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC
From: colder@php.net (Etienne Kneuss)

--001a11c3686e6739bf050facdd1c
Content-Type: text/plain; charset=UTF-8

On Sat Feb 21 2015 at 21:08:39 Anthony Ferrara <ircmaxell@gmail.com> wrote:

> Zeev,
>
> I won't nit-pick every point, but there are a few I think need to be
> clarified.
>
> >> > Proponents of Dynamic STH bring up consistency with the rest of the
> >> language, including some fundamental type-juggling aspects that have
> been
> >> key tenets of PHP since its inception. Strict STH, in their view, is
> >> inconsistent
> >> with these tenets.
> >>
> >> Dynamic STH is apparently consistency with the rest of the language's
> >> treatment of scalar types. It's inconsistent with the rest of the
> >> languages
> >> treatment of parameters.
> >
> > Not in the way Andrea proposed it, IIRC.  She opted to go for consistency
> > with internal functions.  Either way, at the risk of being shot for
> talking
> > about spiritual things, Dynamic STH is consistent with the dynamic
> spirit of
> > PHP, even if there are some discrepancies between its rule-set and the
> > implicit typing rules that govern expressions.  Note that in this RFC I'm
> > actually suggesting a possible way forward that will align *all* aspects
> of
> > PHP, including implicit casting - and have them all governed by a single
> set
> > of rules.
>
> The point I was making up to there is that we currently have 2 type
> systems: user-land object and ZPP-scalar. So in any given function you
> have 2 type systems interacting. The current ZPP scalar type is
> dynamic, and user-land object static.
>
> With the proposal here, you'd unify user-land scalar to behave as
> zpp-scalar. So you'd have two type systems in any given function:
> scalar and object (which behave differently).
>
> My proposal gives you the same two by default (scalar and object) and
> a strict switch to collapse them into a single, unified type system.
>
> This is even more apparent with the int-float acceptance, because we
> can mentally model Float as an object that extends Int. Then it makes
> perfect sense why you'd accept ints where you see floats, but not the
> opposite.
>
> >> However there's an important point to make here: a lot of best practice
> >> has
> >> been pushing against the way PHP treats scalar types in certain cases.
> >> Specifically around == vs === and using strict comparison mode in
> >> in_array,
> >> etc.
> >
> > I think you're correct on comparisons, but not so much on the rest.
> Dynamic
> > use of scalars in expressions is still exceptionally common in PHP code.
> > Even with comparisons, == is still very common - and you'd use == vs. ===
> > depending on what you need.
> >
> >> So while it appears consistent with the rest of PHP, it only does so if
> >> you
> >> ignore a large part of both the language and the way it's commonly used.
> >
> > Let's agree to disagree.  That's one thing we can always agree on!  :)
>
> I'm talking about the object system. I don't think you're disagreeing
> that it's static. Hence coercive scalars are consistent only if you
> look at 1/2 the type system. That was the point I was making there.
>
> >> 3. "Just Do It but give users an option to not" - This has the problems
> >> that
> >> E_DEPRECATED has, but it also gets us back to having fundamental code
> >> behavior controlled by an INI setting, which for a very long time this
> >> community has generally seen as a bad thing (especially for portability
> >> and
> >> code re-use).
> >
> > I do too, and I was upfront about their cons, not just pros.  And yet,
> they
> > all bring us to a much better outcome within a relatively short period of
> > time (in the lifetime of a language) than the Dual Mode will.
>
> Let's agree to disagree that an ini setting will be better than a
> per-file setting.
>
> In fact, I personally think this is major enough of an issue that I
> will vote no simply on this reason alone (type behavior depending on
> an ini setting in any way shape or form).
>
> >> > Further, the two sets can cause the same functions to behave
> >> > differently depending on where they're being called
> >>
> >> I think that's misleading. The functions will always behave the same.
> >> The difference is how you get data into the function. The behavior
> >> difference
> >> is in your code, not the end function.
> >
> > I'll be happy to get a suggestion from you on how to reword that.
> > Ultimately, from the layman user's point of view, she'd be calling foo()
> > from one place and have it accept her arguments, and foo() from another
> > place and have it reject the very same arguments.
>
> Let me think on it and I will come up with something.
>
> >> With strict mode, you'd have to embed a cast (smart or explicit) to
> >> convert to
> >> an integer at the point the data comes in.
> >
> > First, I'm not aware of smart/safe casts being available or proposed at
> this
> > point.
> > Secondly, why at the point the data comes in?  That would be ideal for
> > static analyzers, but it's probably a lot more common that it will be
> done
> > at the first point in time where it gets rejected.
>
> By "smart cast" I was referring to a function which checked
> is_numeric(). Not a new language construct.
>
> > I have a hard time connecting to the 'power' approach.  I think
> developers
> > want their code to work, with minimal effort, and be secure.  Coercive
> > scalar type hints will do an excellent job at that.  Strict type hints
> will
> > be more work, are bound to a lot of trigger "Oh come on" responses, and
> as a
> > special bonus - proliferate the use of explicit casts.  Let me top that -
> > you'd have developers who think they're security conscious, because
> they're
> > using strict mode - with code that's full of explicit casts.
>
> I agree we should have users avoid explicit casts. That's why the
> dual-mode proposal exists. If users don't want to control their types,
> they should use the default mode. And everything works fine.
>
> If they know what they want, then the explicit cast becomes a
> documenting piece of information that "this is supposed to happen".
> Ex:
>
> function takesInt(int $a) {}
>
> function foo(float $b) {
>     return takesInt($b);
> }
>
> In weak mode, that "just works". But is it supposed to just work? You
> have no idea. The next developer who comes will look at it and ask "is
> that supposed to truncate, or was that an oversight?" and have no
> idea. But in strict mode, placing an explicit cast before $b shows the
> next developer who comes there "the truncation was intentional".
>
> >> > Static Analysis. It is the position of several Strict STH proponents
> >> > that Strict STH can help static analysis in certain cases. For the
> >> > same reasons mentioned above about JIT, we don't believe that is the
> >> > case
> >>
> >> This is patently false.
> >
> > It's actually patently true.  We don't believe that is the case.  QED.
>
> To understand why "we don't believe" can be false, let's make an
> analogy: I can say that I don't believe in gravity. That doesn't mean
> that the opinion isn't patently false just because it was stated as an
> opinion (or rather the "believe" is true, but the implication of the
> belief is false)...
>
> >
> >> Keep not believing it all you want, but *static analysis*
> >> requires statically looking at code. Which means you have no value
> >> information. So static analysis can't possibly happen in cases where you
> >> need
> >> to know about value information (because it's not there). Yes, at
> function
> >> entry you know the types. But static analysis isn't about analyzing a
> >> single
> >> function (in fact, that's the least interesting case). It's more about
> >> analyzing a
> >> series of functions, a function call graph. And in that case strict
> typing
> >> (based
> >> only on
> >> type) does make a big difference.
> >
> > I think it's fair to say that while we were unable to convince you
> there's
> > no tangible extra value in Strict STH compared to any other kind of STH
> that
> > guarantees the type of value a function will get, you were also unable to
> > convince Dmitry, Stas or myself - all of which independently discussed it
> > with you.  Again, despite that, I'm not saying that you're "patently
> wrong",
> > just that I don't believe you're right.
>
> I've built a static analyzer that's public. I've talked to people who
> build them for a living. I don't claim to be an expert in them (far
> from it), but what I've seen and learned is that what you're talking
> about here either isn't possible (yet) or is difficult enough to be
> impractical (in terms of computing resources necessary).
>
> You can disagree with me all you want. You don't even need to convince
> me. All you need to do is disprove me. Show me a static analyzer for a
> sufficiently dynamic language (Scalar PHP or full JS - not ASM.js -
> would work) and I'll happy apologize and retract the comment. But so
> far all I've seen are people saying it's possible even in presence of
> arguments to the contrary (why it's not possible).
>
>
There have been several attempts:
for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf
or similar techniques applied to PHP, quite outdated though:
https://github.com/colder/phantm

You are right that the lack of static information about types is (one of
the) a main issue. Recovering the types has typically a huge performance
cost, or is unreliable

But seriously, time is getting wasted on this argument; it's actually a
no-brainer: more static information helps tools that rely on static
information. Yes. Absolutely. 100%.

The question is rather: at what weight should we take (potential/future)
external tools into account when developping language features?

--001a11c3686e6739bf050facdd1c--