Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:83580 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 82862 invoked from network); 23 Feb 2015 16:01:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Feb 2015 16:01:18 -0000 Authentication-Results: pb1.pair.com smtp.mail=matthew@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=matthew@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 209.85.215.48 as permitted sender) X-PHP-List-Original-Sender: matthew@zend.com X-Host-Fingerprint: 209.85.215.48 mail-la0-f48.google.com Received: from [209.85.215.48] ([209.85.215.48:40330] helo=mail-la0-f48.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id AA/14-01128-DCE4BE45 for ; Mon, 23 Feb 2015 11:01:17 -0500 Received: by labgd6 with SMTP id gd6so19265818lab.7 for ; Mon, 23 Feb 2015 08:01:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=ZI1b+PEfmdnQWsx/Zd6DN07YPiBupwy1iD+jWyRGmV8=; b=BNBVtLOAKCq93FrqNO+6hTKv7Eg8x6Yp6KpQYtg6+2+OC+hC0aMC91B0fRZeEvA2M0 QzoReQ+4eCaZ1TUCioJlEeYwT4hwEnzBpJEcpgzegfZ8hJ7gH5dpRnEelvDyDWCZqJzn 3Os8fOmX8HFik/Od8AkLbDXrnb2wHTJWA6ZrPONcvS5kHonkZJo+shXcu4c0JBOwy08A B3rGE+n3Rr1utB4YkZAvA5KXZocUTJ4knYuDmtj/th3JuwVrls/Cc75jk5Fnl9wFG3OS i5U5A7VTnG4/olihhaidRJXdNEM7Vw9U+htaGnkGVojtpmdchGYfYusops/K4+fqXxfH oupg== X-Gm-Message-State: ALoCoQk2ue7bJs3uSN88Wi6df2p/UIrB9N+1MqzE7n2dvmGf+FqOkg09YRxLq/cLHAUsxPxFM9huiZyPCaZbc4oooXCZOyQ0R6N7wHSN2Eq7fFoOfv0JGtCT2rftDRQ+723/S62ZR6snqHlu7udAE1GwjT9VWAvJKA== MIME-Version: 1.0 X-Received: by 10.112.36.69 with SMTP id o5mr10522496lbj.59.1424707274141; Mon, 23 Feb 2015 08:01:14 -0800 (PST) Received: by 10.112.164.9 with HTTP; Mon, 23 Feb 2015 08:01:14 -0800 (PST) Date: Mon, 23 Feb 2015 10:01:14 -0600 Message-ID: To: internals@lists.php.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: User perspective on STH From: matthew@zend.com ("Matthew Weier O'Phinney") I'm writing this as an author and maintainer of a framework and many librar= ies. Caveat, for those who aren't already aware: I work for Zend, and report to = Zeev. If you feel that will make my points impartial, please feel free to stop reading, but I do think my points on STH bear some consideration. I've been following the STH proposals off and on. I voted for Andrea's prop= osal, and, behind the scenes, defended it to Zeev. On a lot of consideration, and= as primarily a _consumer_ and _user_ of the language, I'm no longer convinced = that a dual-mode proposal makes sense. I worry that it will lead to: - A split within the PHP community, consisting of those that do not use typehints, those who do use typehints, and those who use strict. - Poor programming practices and performance degradation by those who adopt strict, due to poor usage of type casting. Let me explain. The big problem currently is that the engine behavior around casting can le= ad to data loss quickly. As has been demonstrated elsewhere: $value =3D (int) '100 dogs'; // 100 - non-numeric trailing values are = trimmed $value =3D (int) 'dog100'; // 0 - non-numeric values leading values -> 0 ... $value =3D (int) '-100'; // -100 - ... unless indicating sign. $value =3D (int) ' 100'; // 100 - space is trimmed; data loss! $value =3D (int) ' 100 '; // 100 - space is trimmed; data loss! $value =3D (int) '100.0'; // 100 - probably correct, but loss of pr= ecision $value =3D (int) '100.7'; // 100 - precision and data loss! $value =3D (int) 100.7; // 100 - precision and data loss! $value =3D (int) 0x1A; // 26 - hex $value =3D (int) '0x1A'; // 0 - shouldn't this be 26? why is this different? $value =3D (int) true; // 1 - should this be cast? $value =3D (int) false; // 0 - should this be cast? $value =3D (int) null; // 0 - should this be cast? Today, without scalar type hints, we end up writing code that has to first validate that we have something we can use, and then cast it. This can ofte= n be done with ext/filter, but it's horribly verbose: $value =3D filter_var( $value, FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX ); if (false =3D=3D=3D $value) { // throw an exception? } Many people skip the validation step entirely for the more succinct: $value =3D (int) $value; And this is where problems occur, because this is when data loss occurs. What I've observed in my 15+ years of using PHP is that people _don't_ vali= date; they either blindly accept data and assume it's of the correct type, or the= y blindly cast it without validation because writing that validation code is boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can offload that to libraries, but why introduce a new dependency in something = as simple as a value object? The promise of STH is that the values will be properly coerced, so that if = I write a function that expects an integer, but pass it something like '100' = or '0x1A', it will be cast for me =E2=80=94 but something that is not an integ= er and cannot be safely cast without data loss will be rejected, and an error can bubble = up my stack or into my logs. Both the Dual-Mode and the new Coercive typehints RFCs provide this. The Dual-Mode, however, can potentially take us back to the same code we ha= ve today when strict mode is enabled. Now, you may argue that you won't need to cast the value in the first place= , because STH! But what if the value you received is from a database? or from= a web request you've made? Chances are, the data is in a string, but the _val= ue_ may be of another type. With weak/coercive mode, you just pass the data as-= is, but with strict enabled, your choices are to either cast blindly, or to do = the same validation/casting as before: $value =3D filter_var( $value, FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX ); if (false =3D=3D=3D $value) { // throw an exception? } Interestingly, this adds overhead to your application (more function calls)= , and makes it harder to read and to maintain. Ironically, I foresee "strict" as = being a new "badge of honor" for many in the language ("my code works under stric= t mode!"), despite these factors. If I don't enable strict mode on my code, and somebody else turns on strict= when calling my code, there's the possibility of new errors if I do not perform validation or casting on such values. This means that the de facto standard= will likely be to code to strict (I can already envision the flood of PRs agains= t OSS projects for these issues). You can say, "But, Static Analysis!" all you want, but that doesn't lead to= me writing less code to accomplish the same thing; it just gives me a tool to = check the correctness of my code. (Yes, this _is_ important. But we also have a t= on of tooling around those concerns already, even if they aren't proper static analyzers.) From a developer experience factor, I find myself scratching my head: what = are we gaining with STH if we have a strict mode? I'm still writing exactly the= same code I am today to validate and/or cast my scalars before passing them to functions and methods if I want to be strict. The new coercive RFC offers much more promise to me as a consumer/user of t= he language. The primary benefit I see is that it provides a path forward towa= rds better casting logic in the language, which will ensure that =E2=80=94 in t= he future =E2=80=94 this: $value =3D (int) $value; will operate properly, and raise errors when data loss may occur. It means = that immediately, if I start using STH, I can be assured that _if_ my code runs,= I have values of the correct type, as they've been coerced safely. The lack o= f a strict mode means I can drop that defensive validation/casting code safely. My point is: I'm sick of writing code like this: /** * @param int $code * @param string $reason */ public function setStatus($code, $reason =3D null) { $code =3D filter_var( $value, FILTER_VALIDATE_INT, FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX ); if (false =3D=3D=3D $code) { throw new InvalidArgumentException( 'Code must be an integer' ); } if (null !=3D=3D $reason && ! is_string_$reason) { throw new InvalidArgumentException( 'Reason must be null or a string' ); } $this->code =3D $code; $this->reason =3D $reason; ); I want to be able to write this: public function setStatus(int $code, string $reason =3D null) { $this->code =3D $code; $this->reason =3D $reason; ); and _not_ push the burden on consumers to validate/cast their values. This is what I want from STH, no more no less: sane casting rules, and the ability to code to scalar types safely. While I can see some of the benefit= s of strict mode, I'm concerned about the schism it may create in the PHP librar= y ecosystem, and that many of the benefits of the coercive portion of that RF= C will be lost when working with data from unknown data sources. If you've read thus far, thank you for your consideration. I'll stop buggin= g you now. --=20 Matthew Weier O'Phinney Principal Engineer Project Lead, Zend Framework and Apigility matthew@zend.com http://framework.zend.com http://apigility.org PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc