Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:83910 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 44934 invoked from network); 26 Feb 2015 17:48:40 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 26 Feb 2015 17:48:40 -0000 Authentication-Results: pb1.pair.com smtp.mail=zeev@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=zeev@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 209.85.223.171 as permitted sender) X-PHP-List-Original-Sender: zeev@zend.com X-Host-Fingerprint: 209.85.223.171 mail-ie0-f171.google.com Received: from [209.85.223.171] ([209.85.223.171:37404] helo=mail-ie0-f171.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D9/B8-32582-77C5FE45 for ; Thu, 26 Feb 2015 12:48:39 -0500 Received: by iecrl12 with SMTP id rl12so18495033iec.4 for ; Thu, 26 Feb 2015 09:48:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=KPFkoYyRCTITrD3aWaL/EJCLNYeCPZSZ8mvu3EslJWs=; b=lu6KBudFxYDdRFDvCmrU4KGZnJL+GBB3JbDH9qYzrtmi8y/So7P08GKxPwjisTqzsy 7lLEzMgnKyKBEimpqGnOVGoEJaZRPPKGjHaPtL3THkJ4eB7uCuYrFwvU78L2XgVHP58s 6TDy4mfd+NR5ws6xCYHXJARYI1nHdFp6VAKPWTmG4C68Xkk3M+ajO71LRAX8ueeR45Ln CPFoH34SLz8iIFwrKCjpDoi0kjsYZVdQUYGvQKZ9zLMTN5+t01CixPDbwA8XUXuIY/hf oCL1kgdlP91iGbUG8ztdaea2lDBTyxOdUb4cbXFcPUZh2D3DvHKCFNM7CdgpIJk9lbxV wYjQ== X-Gm-Message-State: ALoCoQlGbqLw0TAP2fOsmG2EfYGlju5XjIexOzkpQsH+x7f5yYlnC8mdM/S0wd2c1hOJ1Q+Xzbawjfet1IchWD4BsRODaQJpBC43sfbVJ+ihfPLqH4AH9RLJsauW1YmxPQ3Jt3Ksve+spAW9xrOMorzIvBrE/n6c7A== X-Received: by 10.51.16.1 with SMTP id fs1mr35469561igd.8.1424972916943; Thu, 26 Feb 2015 09:48:36 -0800 (PST) References: In-Reply-To: MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQGwbbdURx2shUG2o7Zj2Ywptu8MnJ1C9Cyg Date: Thu, 26 Feb 2015 19:48:35 +0200 Message-ID: <3d639901ae85227b219e7ee59b3140fe@mail.gmail.com> To: Theodore Brown Cc: internals@lists.php.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: RE: [PHP-DEV] A different user perspective on scalar type declarations From: zeev@zend.com (Zeev Suraski) > -----Original Message----- > From: Theodore Brown [mailto:theodorejb@outlook.com] > Sent: Thursday, February 26, 2015 5:29 PM > To: internals@lists.php.net > Subject: [PHP-DEV] A different user perspective on scalar type declarations > > I am a full-time PHP developer responsible for maintaining several large > enterprise applications, as well as a number of libraries and personal apps. > I have been following the scalar type proposals quite closely, as along with > return type declarations, scalar types have the potential to reduce errors, > simplify API documentation, and improve static code analysis. > > I am in favor of Anthony's Scalar Type Declarations RFC, for two simple > reasons: > > 1. It doesn't change the behavior of existing weak types. > > PHP has long had an emphasis on backwards compatibility, and I'm worried > that those not in favor of strict types are treating backwards compatibility > more recklessly than they otherwise would in their fervor to avoid two ways > of handling scalar types. In my experience dealing with large enterprise apps, > however, there are hundreds of places where code relies on GET/POST > parameters being automatically trimmed when passed to a function > expecting an integer. > The current coercive proposal would deprecate this and later make it an > error. > To avoid these notices/errors when upgrading, developers may take the > "easy" > route of casting any input passed to a function expecting an int or float. > This is the same "too strict may lead to too lax" problem pointed out by the > coercive RFC itself. There's a reason that integer handling was actually > *relaxed* back in PHP 5.1 (see > http://php.net/manual/en/migration51.integer-parameters.php). > Why suddenly make the default more strict again? > > I am not against tightening up some of the default weak conversions (e.g. to > not allow "99 bugs" for an int type), but in my opinion this should be done > very carefully, and separately from any scalar type declaration proposal. > Major changes to the casting rules have the potential to seriously harm PHP > 7 adoption, especially in enterprises with large amounts of legacy code. The > Scalar Type Declarations v0.5 RFC has the advantage here because it "just > works" when type hints are added to existing code in the default weak mode. You may have a point there. As Francois said, he was in favor of allowing leading and trailing spaces. I'll definitely reconsider. Would love to hear any additional feedback you may have about the conversion rules! My goal is to balance the 'Just works' aspect with the strict aspect, and still be able to put it into one rule-set, because I believe this has some inherent advantages. > 2. Strict types are important in some cases. > > When it comes to authentication and financial calculations (a couple of areas > I routinely deal with) it is extremely important that errors are caught and > fixed early in the development process. In financial or security-sensitive > code, I would *want* any value with the wrong type (even a string like "26") > to be flagged as an error when passed to a function expecting an integer. I agree completely; However, such use cases like this are a lot less common than the situations where you do want sensible coercion to be allowed. Not introducing language constructs to support strict typing doesn't mean I think it's never useful. I think it's at the level where it's better to leave it up to (very very simple) custom code, in the form of if (!is_int($foo)) errorout();, as opposed to introducing a whole 2nd mode into the language, with cognitive burden it brings. When I read Anthony's comment about the random number generators a couple of days ago: "I think the case you have to look at here is the target audience. Are you looking to be all things to all users? Or are you attempting to be an opinionated tool to help the 99%. Along with password_hash, I think this random library serves the 99%." I couldn't help but think the very same could be said about strict type hints (paraphrasing it myself, "I think the case we have to look at here is the target audience. Are we looking to be all things to all users? Or are we attempting to be an opinionated tool to help the 99%. With coercive types I think we serve the 99%." - whether it's 99% or 95% or 90% is negotiable - but it doesn't change the takeaway, I think). Now, the same can't be said when we use weak types. Weak type hints are completely useless for developers who want strict type hints, as their behavior is completely off from what they expect, and they'd never use them. But with the newly proposed coercive type hints - the gap narrows radically. The most common real world use cases strict campers brought up in the past as problematic with weak types - are gone. We're still left with some useful use cases for strict, but not at the level where it makes sense to add language-level support, especially in the form of dual mode, with all its downsides. > The option for type-based (rather than value-based) validation is equally > important when it comes to return types. Unless I have missed something, > the "Coercive Types for Function Arguments" RFC currently doesn't deal with > return types at all (they aren't mentioned in the RFC). Would it handle scalar > return types the same way as it does function arguments? If I declare a > function to return an int, and I return a string instead (even if the string is > numeric), there are many cases where it would be an unintentional error. > And if it errors depending on the value, rather than the type, it often > wouldn't be possible to catch the problem statically. We'll update the RFC to explicitly mention return. Yes, return values will be validated using the same coercive rules as function arguments - similarly to how they're dealt with in the v0.5 RFC. > Here's a simple example of the advantage offered by strict types and static > analysis in the Scalar Type Declarations v0.5 RFC: > > declare(strict_types=3D1); > > function getCustomerName(int $customerId): string { > =C2=A0=C2=A0=C2=A0 // look up customer name from database and return } > > function getInvoiceByCustomer(int $customerId): Invoice { > =C2=A0=C2=A0=C2=A0 // retrieve invoice data and return object } > > $id =3D filter_input(INPUT_GET, 'customer_id', FILTER_VALIDATE_INT); > > if ($id =3D=3D=3D false) { > =C2=A0=C2=A0=C2=A0 echo 'Customer ID must be an integer'; } else { > =C2=A0=C2=A0=C2=A0 $customer =3D getCustomerName($id); > =C2=A0=C2=A0=C2=A0 $invoice =3D getInvoiceByCustomer($customer); > =C2=A0=C2=A0=C2=A0 // display invoice > } > > Strict types + static analysis can tell you that this will fail (because it's based > purely on types, and a string is being passed to a function expecting an > integer). Coercive typing cannot statically tell you that it will fail, because it > doesn't know whether the string passed to `getInvoiceByCustomer` is > acceptable as an integer without also knowing its value. Correct. But a static analyzer can tell you it MAY fail, just as easily as a static analyzer for strict types can tell you it will fail. Now, which is better is up for debate. Personally I think the latter is better, or at the very least just as good. If, in fact, the string you're passing is really a numeric string (which if I'm reading you're code correctly, it probably is), then in the static case, seeing the error in the static analyzer - or at runtime - you're likely to resort to explicit casting. Explicit casting that may hide data loss if - for whatever reason - what you get (in some error situation or unexpected flow) ends up being a non-numeric string. In the coercive case - seeing the warning in the static analyzer - you're likely to take a look at it and verify that it's indeed getting the right value, but you'd keep it as-is, and let the language do a better job at converting the string to an int than an explicit cast would. This will actually result in more robust code that, in the unexpected event that a non-numeric string is received in the future - would reject it, instead of happily accepting it silently. > Conceptually, the optional > strict mode proposed in Anthony's RFC is not very different from =3D=3D v= s. =3D=3D=3D, > or `in_array` with the $strict argument set to true. And I certainly am glad > that PHP offers these options! Happy you like it :) But =3D=3D=3D is very different than strict mode. Wh= en we added it, it allowed you to do something that was just not possible to do before - and that was actually a perfect fit for a fairly common usecase (being able to differentiate between NULL and false and 0 in return values, for instance). The same cannot be said about strict type hints. They can be done easily today (using is_int() and friends), and - with the presence of coercive type hints - they're not nearly as commonly needed as =3D=3D=3D. > community into separate camps, I would say "It's too late!" The community > has already been split over this issue for years. Splitting isn't a binary thing. Of course, there are already lots of different camps in the PHP community - procedural vs. OO, frameworks vs. lean, etc. This would add *additional* fragmentation - as it doesn't cleanly map into any of the existing splits that already exist. Thanks for the feedback! It took me a while to answer this, I'm definitely leaning towards accepting leading and trailing whitespace for numeric strings now :) Zeev