Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:83412 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 60573 invoked from network); 21 Feb 2015 18:12:31 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Feb 2015 18:12:31 -0000 Authentication-Results: pb1.pair.com smtp.mail=ircmaxell@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ircmaxell@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.42 as permitted sender) X-PHP-List-Original-Sender: ircmaxell@gmail.com X-Host-Fingerprint: 209.85.215.42 mail-la0-f42.google.com Received: from [209.85.215.42] ([209.85.215.42:37288] helo=mail-la0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8B/7E-08895-E8AC8E45 for ; Sat, 21 Feb 2015 13:12:31 -0500 Received: by labpn19 with SMTP id pn19so11817535lab.4 for ; Sat, 21 Feb 2015 10:12:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=MgClRboqraslN8FUMAn0E7CK46zm12BNmppt1NyUTfQ=; b=QjIiY0a/IHwJud0DZSWFsW7kIcEK+p2I/1rVjQ1qZx6PzrFjU2mwAkTaDJhej3bRof tsWZ1aS8KHK4fxt+YVdB/6AFOJVLpJfoArw+dqVfFkWk+SIJ+Yc+vd9hzjXjJGJxmyu0 IfWtOBwqIDVEfL3ag/8F1xlyuNRV1BPSVW6BSpYdUAAMLOUc2c+e7Q6DvlrTuSPwtNPX tA4kINXy76vbizjfSWtAS/QXl8Zz9rYV25A7ayIQ3HKr3BIomcxnZ00RrjjnyL8q38w+ Op7dCZihNo8BAUCQ+mF7uR3yfsaChTkkRuSePjr61+KEH2ZHcCC5tAdq+EU7ZicuKh0S cOFw== MIME-Version: 1.0 X-Received: by 10.112.110.231 with SMTP id id7mr3029209lbb.28.1424542346475; Sat, 21 Feb 2015 10:12:26 -0800 (PST) Received: by 10.25.43.9 with HTTP; Sat, 21 Feb 2015 10:12:26 -0800 (PST) In-Reply-To: <7ef509ef10bb345c792f9d259c7a3fbb@mail.gmail.com> References: <7ef509ef10bb345c792f9d259c7a3fbb@mail.gmail.com> Date: Sat, 21 Feb 2015 13:12:26 -0500 Message-ID: To: Zeev Suraski Cc: PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC From: ircmaxell@gmail.com (Anthony Ferrara) Zeev, First off, thanks for putting forward a proposal. I look forward to a patch that can be experimented with. There are a few concerns that I have about the proposal however: > Proponents of Strict STH cite numerous advantages, primarily around code = safety/security. In their view, the conversion rules proposed by Dynamic ST= H can easily allow =E2=80=98garbage=E2=80=99 input to be silently converted= into arguments that the callee will accept =E2=80=93 but that may, in many= cases, hide difficult-to-find bugs or otherwise result in unexpected behav= ior. I think that's partially mis-stating the concern. It's less about "garbage input" and more about unpredictable behavior. You can't look at code and know that it will not produce an error with dynamic typing. That's one of the big advantages of strict typing that many people want. In reality the reasons are complex, varied and important to each person. > Proponents of Dynamic STH bring up consistency with the rest of the langu= age, including some fundamental type-juggling aspects that have been key te= nets of PHP since its inception. Strict STH, in their view, is inconsistent= with these tenets. Dynamic STH is apparently consistency with the rest of the language's treatment of scalar types. It's inconsistent with the rest of the languages treatment of parameters. However there's an important point to make here: a lot of best practice has been pushing against the way PHP treats scalar types in certain cases. Specifically around =3D=3D vs =3D=3D=3D and using strict comparison mode in in_array, etc. So while it appears consistent with the rest of PHP, it only does so if you ignore a large part of both the language and the way it's commonly used. In reality, the only thing PHP's type system is consistent at is being inconsistent. In the "Changes To Internal Functions" section, I think all three types are significantly flawed: 1. "Just Do It" - This is problematic because a very large chunk of code that worked in 5.x will all of a sudden not work in 7.0. This will likely create a python 2/3 issue, as it would require a LOT of code to be changed to make it compatible. 2. "Emit E_DEPRECATED" - This is problematic because raising errors (even if suppressed) is not cheap. And the potential for raising one for a non-trivial percentage of every native function call has the potential to have a MASSIVE performance impact for code designed for 5.x. Without a patch to test, it can't really be codified, but it would be a shame to lose the performance gains made with 7 because we're triggering 100's, 1000's or 10000's of errors in a single application run... 3. "Just Do It but give users an option to not" - This has the problems that E_DEPRECATED has, but it also gets us back to having fundamental code behavior controlled by an INI setting, which for a very long time this community has generally seen as a bad thing (especially for portability and code re-use). Moving along, > Further, the two sets can cause the same functions to behave differently = depending on where they're being called I think that's misleading. The functions will always behave the same. The difference is how you get data into the function. The behavior difference is in your code, not the end function. > For example, a =E2=80=9C32=E2=80=9D (string) value coming back from an in= teger column in a database table, would not be accepted as valid input for = a function expecting an integer. There's an important point to consider here. You're relying on information outside of the program to determine program correctness. So to say "coming back from an integer column" requires concrete knowledge and information that you can't possibly have in the program. What happens when some DBA changes the column type to a string type. The data will still work for a while, but then suddenly break without warning when a non-integer value comes in. Because the value-information comes from outside. With strict mode, you'd have to embed a cast (smart or explicit) to convert to an integer at the point the data comes in. So semantic information about the value is places right at the point of entry (forcing the code to be more explicit and clear). Additionally, with the dual-mode proposal DB interactions can be in weak mode and have the exact behavior you're describing here. Giving the user the choice, rather than making assumptions. > Strict zval.type based STH effectively eliminates this behavior, moving t= he burden of worrying about type conversion to the user. Correct. And you say that as if it's a bad thing. Being explicit about type conversions isn't what you'd do in a 10 line-of-code script where you can realize what the types are by just thinking about it. But on large scale systems exposing the type conversions to the user gives the power to actually understand the codebase when you can't fit the whole thing in your head at the same time. So what you cite here as a disadvantage many consider to be an advantage. > Performance I find it funny how the non-strict crowd keeps bringing up performance... > It is our position that there is no difference at all between strict and = coercive typing in terms of potential future AOT/JIT development - none at = all So really what you're saying is that you disagree with me publicly. A statement which I said on the side, and I said should not impact RFC or voting in any way. And is in no part in my RFC at all. Yet brought up again. > Static Analysis. It is the position of several Strict STH proponents that= Strict STH can help static analysis in certain cases. For the same reasons= mentioned above about JIT, we don't believe that is the case This is patently false. Keep not believing it all you want, but *static analysis* requires statically looking at code. Which means you have no value information. So static analysis can't possibly happen in cases where you need to know about value information (because it's not there). Yes, at function entry you know the types. But static analysis isn't about analyzing a single function (in fact, that's the least interesting case). It's more about analyzing a series of functions, a function call graph. And in that case strict typing (based only on type) does make a big difference. In short, I think the concerns around the handling of internal functions is significant enough to cause major concern about this proposal. Thanks Anthony On Sat, Feb 21, 2015 at 12:22 PM, Zeev Suraski wrote: > All, > > > > I=E2=80=99ve been working with Fran=C3=A7ois and several other people fro= m internals@ > and the PHP community to create a single-mode Scalar Type Hints proposal. > > > > I think it=E2=80=99s the RFC is a bit premature and could benefit from a = bit more > time, but given the time pressure, as well as the fact that a not fully > compatible subset of that RFC was published and has people already > discussing it, it made the most sense to publish it sooner rather than > later. > > > > The RFC is available here: > > > > wiki.php.net/rfc/coercive_sth > > > > Comments welcome! > > > Zeev