Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:83584 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 89434 invoked from network); 23 Feb 2015 16:19:42 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Feb 2015 16:19:42 -0000 Authentication-Results: pb1.pair.com smtp.mail=pencap@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=pencap@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.223.181 as permitted sender) X-PHP-List-Original-Sender: pencap@gmail.com X-Host-Fingerprint: 209.85.223.181 mail-ie0-f181.google.com Received: from [209.85.223.181] ([209.85.223.181:43854] helo=mail-ie0-f181.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3D/75-01128-D135BE45 for ; Mon, 23 Feb 2015 11:19:41 -0500 Received: by iebtr6 with SMTP id tr6so24424222ieb.10 for ; Mon, 23 Feb 2015 08:19:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=2G0aUqRmvjhsdDTmKMrctF0DuqQ9ZdyhwXuBFaCjOCk=; b=hq604UTcrMXNf+g2xRE+EIW3dPvfj8HemyBTqU0dDlBmDFiLHa2rPYCmHPb2JoVwRG LVzlroPFcSzn4yjJC9yVsUtVLPfVNRJ0Oy+mH0qVbh9go/b8YK0e80s5xF2ABIxkBkI3 Y/2vOtwNn5VfXCtluKW72rm1IPuTo5rGzE+esubqNyuWcKH3ERObt7GNfIl9yj36uuKz hYtDyPPiSajZR7ayuUbhSsxuntesCpdCTwWps/dUejJLEt7PahwBceBgGZtDerEEhKCu INbjpkOlmM65bGh4sGIaUcThqYIcWNxErvrxSjn4JZLD4PNoh8ZeEI/Fa3Nz+FIxFxnZ 4rvA== MIME-Version: 1.0 X-Received: by 10.107.135.27 with SMTP id j27mr14807478iod.55.1424708378808; Mon, 23 Feb 2015 08:19:38 -0800 (PST) Received: by 10.64.125.33 with HTTP; Mon, 23 Feb 2015 08:19:38 -0800 (PST) In-Reply-To: References: Date: Mon, 23 Feb 2015 10:19:38 -0600 Message-ID: To: "Matthew Weier O'Phinney" Cc: PHP Internals Content-Type: multipart/alternative; boundary=001a113f90fcd3d5c8050fc3c424 Subject: Re: [PHP-DEV] User perspective on STH From: pencap@gmail.com (Mike Willbanks) --001a113f90fcd3d5c8050fc3c424 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, I'm writing this as an author and maintainer of a framework and many > libraries. > Caveat, for those who aren't already aware: I work for Zend, and report t= o > Zeev. > If you feel that will make my points impartial, please feel free to stop > reading, but I do think my points on STH bear some consideration. > > I've been following the STH proposals off and on. I voted for Andrea's > proposal, > and, behind the scenes, defended it to Zeev. On a lot of consideration, > and as > primarily a _consumer_ and _user_ of the language, I'm no longer convince= d > that > a dual-mode proposal makes sense. I worry that it will lead to: > > - A split within the PHP community, consisting of those that do not use > typehints, those who do use typehints, and those who use strict. > - Poor programming practices and performance degradation by those who ado= pt > strict, due to poor usage of type casting. > > Let me explain. > > The big problem currently is that the engine behavior around casting can > lead to > data loss quickly. As has been demonstrated elsewhere: > > $value =3D (int) '100 dogs'; // 100 - non-numeric trailing values ar= e > trimmed > $value =3D (int) 'dog100'; // 0 - non-numeric values leading > values -> 0 ... > $value =3D (int) '-100'; // -100 - ... unless indicating sign. > $value =3D (int) ' 100'; // 100 - space is trimmed; data loss! > $value =3D (int) ' 100 '; // 100 - space is trimmed; data loss! > $value =3D (int) '100.0'; // 100 - probably correct, but loss of > precision > $value =3D (int) '100.7'; // 100 - precision and data loss! > $value =3D (int) 100.7; // 100 - precision and data loss! > $value =3D (int) 0x1A; // 26 - hex > $value =3D (int) '0x1A'; // 0 - shouldn't this be 26? why is > this different? > $value =3D (int) true; // 1 - should this be cast? > $value =3D (int) false; // 0 - should this be cast? > $value =3D (int) null; // 0 - should this be cast? > I do think booleans should still be able to be cast from a user-land perspective. Often times a database does not deal with boolean values and the quickest way to convert them into what the database needs is to cast to an integer. However, it's not like $value =3D ($value) ? 1 : 0 would be much worse. > > Today, without scalar type hints, we end up writing code that has to firs= t > validate that we have something we can use, and then cast it. This can > often be > done with ext/filter, but it's horribly verbose: > > $value =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $value) { > // throw an exception? > } > > Many people skip the validation step entirely for the more succinct: > > $value =3D (int) $value; > > And this is where problems occur, because this is when data loss occurs. > > What I've observed in my 15+ years of using PHP is that people _don't_ > validate; > they either blindly accept data and assume it's of the correct type, or > they > blindly cast it without validation because writing that validation code i= s > boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you ca= n > offload that to libraries, but why introduce a new dependency in somethin= g > as > simple as a value object? > > The promise of STH is that the values will be properly coerced, so that i= f > I > write a function that expects an integer, but pass it something like '100= ' > or > '0x1A', it will be cast for me =E2=80=94 but something that is not an int= eger and > cannot > be safely cast without data loss will be rejected, and an error can bubbl= e > up my > stack or into my logs. > > Both the Dual-Mode and the new Coercive typehints RFCs provide this. > > The Dual-Mode, however, can potentially take us back to the same code we > have > today when strict mode is enabled. > > Now, you may argue that you won't need to cast the value in the first > place, > because STH! But what if the value you received is from a database? or > from a > web request you've made? Chances are, the data is in a string, but the > _value_ > may be of another type. With weak/coercive mode, you just pass the data > as-is, > but with strict enabled, your choices are to either cast blindly, or to d= o > the > same validation/casting as before: > > $value =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $value) { > // throw an exception? > } > > Interestingly, this adds overhead to your application (more function > calls), and > makes it harder to read and to maintain. Ironically, I foresee "strict" a= s > being > a new "badge of honor" for many in the language ("my code works under > strict > mode!"), despite these factors. > This has been my largest concern of dual mode and something that I completely see happening. > > If I don't enable strict mode on my code, and somebody else turns on > strict when > calling my code, there's the possibility of new errors if I do not perfor= m > validation or casting on such values. This means that the de facto > standard will > likely be to code to strict (I can already envision the flood of PRs > against OSS > projects for these issues). > > You can say, "But, Static Analysis!" all you want, but that doesn't lead > to me > writing less code to accomplish the same thing; it just gives me a tool t= o > check > the correctness of my code. (Yes, this _is_ important. But we also have a > ton of > tooling around those concerns already, even if they aren't proper static > analyzers.) > > From a developer experience factor, I find myself scratching my head: wha= t > are > we gaining with STH if we have a strict mode? I'm still writing exactly > the same > code I am today to validate and/or cast my scalars before passing them to > functions and methods if I want to be strict. > > The new coercive RFC offers much more promise to me as a consumer/user of > the > language. The primary benefit I see is that it provides a path forward > towards > better casting logic in the language, which will ensure that =E2=80=94 in= the > future =E2=80=94 > this: > > $value =3D (int) $value; > > will operate properly, and raise errors when data loss may occur. It mean= s > that > immediately, if I start using STH, I can be assured that _if_ my code > runs, I > have values of the correct type, as they've been coerced safely. The lack > of a > strict mode means I can drop that defensive validation/casting code safel= y. > > My point is: I'm sick of writing code like this: > > /** > * @param int $code > * @param string $reason > */ > public function setStatus($code, $reason =3D null) > { > $code =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $code) { > throw new InvalidArgumentException( > 'Code must be an integer' > ); > } > if (null !=3D=3D $reason && ! is_string_$reason) { > throw new InvalidArgumentException( > 'Reason must be null or a string' > ); > } > > $this->code =3D $code; > $this->reason =3D $reason; > ); > > I want to be able to write this: > > public function setStatus(int $code, string $reason =3D null) > { > $this->code =3D $code; > $this->reason =3D $reason; > ); > > and _not_ push the burden on consumers to validate/cast their values. > > This is what I want from STH, no more no less: sane casting rules, and th= e > ability to code to scalar types safely. While I can see some of the > benefits of > strict mode, I'm concerned about the schism it may create in the PHP > library > ecosystem, and that many of the benefits of the coercive portion of that > RFC > will be lost when working with data from unknown data sources. This is exactly what I am looking for as well. It provides me a far quicker means for writing out libraries and pushing more of the logic handling to the consumer. In addition, since I am generally consuming, it allows me also to handle things as I see fit and no longer need to worry if the library author decided to use namespaced exceptions, SPL exceptions or a general exception and cleans up the code from a standpoint of an end user as far as expectations. --001a113f90fcd3d5c8050fc3c424--