Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:83583 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 87273 invoked from network); 23 Feb 2015 16:09:55 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Feb 2015 16:09:55 -0000 Authentication-Results: pb1.pair.com smtp.mail=ceceldada@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ceceldada@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.212.173 as permitted sender) X-PHP-List-Original-Sender: ceceldada@gmail.com X-Host-Fingerprint: 209.85.212.173 mail-wi0-f173.google.com Received: from [209.85.212.173] ([209.85.212.173:51632] helo=mail-wi0-f173.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DB/05-01128-0D05BE45 for ; Mon, 23 Feb 2015 11:09:53 -0500 Received: by mail-wi0-f173.google.com with SMTP id bs8so18791713wib.0 for ; Mon, 23 Feb 2015 08:09:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/dZBR0kAOwhQMk+FzIEYxyW2p3TwnKvwjMNteL5NnRM=; b=R6lNsJ6hGcqWI9bUsuxRjSghAkMFYgWzgYRjXlqzH6pmqoOZa8hzqHqZM7Cuo59IGG vTiyKyhidIBTo3k7ngGKM9hsj+Rblqpg8z9skjGRz2J/U0ys6rOvCeUUCtOrk3xH14om 0grgV5qIh+5S5YIohzt1UGe5fwOsdouSOULEGUoptDSWzfINCvucVaUNVrpHGWlsvglo urQ7lM80cfsG9rq1lfJ5eAbVgjxx1WWNMMmNCswr8u+L4fdyD+aiqMecpYvDmC7yxleV W9mt+3B/8G9FwkFT9+6dUiU+fqTeEkFy5hlDI3bgpSlYMet9U4dsZ2VIDzoarW537JkN b+og== MIME-Version: 1.0 X-Received: by 10.194.158.39 with SMTP id wr7mr24205426wjb.118.1424707788984; Mon, 23 Feb 2015 08:09:48 -0800 (PST) Received: by 10.28.20.74 with HTTP; Mon, 23 Feb 2015 08:09:48 -0800 (PST) Received: by 10.28.20.74 with HTTP; Mon, 23 Feb 2015 08:09:48 -0800 (PST) In-Reply-To: References: Date: Mon, 23 Feb 2015 13:09:48 -0300 Message-ID: To: "Matthew Weier O'Phinney" Cc: PHP internals Content-Type: multipart/alternative; boundary=089e01160576abd6da050fc3a111 Subject: Re: [PHP-DEV] User perspective on STH From: ceceldada@gmail.com (Marcel Araujo) --089e01160576abd6da050fc3a111 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable This is first answer that makes sense for community needs. Em 23/02/2015 13:01, "Matthew Weier O'Phinney" escreveu: > I'm writing this as an author and maintainer of a framework and many > libraries. > Caveat, for those who aren't already aware: I work for Zend, and report t= o > Zeev. > If you feel that will make my points impartial, please feel free to stop > reading, but I do think my points on STH bear some consideration. > > I've been following the STH proposals off and on. I voted for Andrea's > proposal, > and, behind the scenes, defended it to Zeev. On a lot of consideration, > and as > primarily a _consumer_ and _user_ of the language, I'm no longer convince= d > that > a dual-mode proposal makes sense. I worry that it will lead to: > > - A split within the PHP community, consisting of those that do not use > typehints, those who do use typehints, and those who use strict. > - Poor programming practices and performance degradation by those who ado= pt > strict, due to poor usage of type casting. > > Let me explain. > > The big problem currently is that the engine behavior around casting can > lead to > data loss quickly. As has been demonstrated elsewhere: > > $value =3D (int) '100 dogs'; // 100 - non-numeric trailing values ar= e > trimmed > $value =3D (int) 'dog100'; // 0 - non-numeric values leading > values -> 0 ... > $value =3D (int) '-100'; // -100 - ... unless indicating sign. > $value =3D (int) ' 100'; // 100 - space is trimmed; data loss! > $value =3D (int) ' 100 '; // 100 - space is trimmed; data loss! > $value =3D (int) '100.0'; // 100 - probably correct, but loss of > precision > $value =3D (int) '100.7'; // 100 - precision and data loss! > $value =3D (int) 100.7; // 100 - precision and data loss! > $value =3D (int) 0x1A; // 26 - hex > $value =3D (int) '0x1A'; // 0 - shouldn't this be 26? why is > this different? > $value =3D (int) true; // 1 - should this be cast? > $value =3D (int) false; // 0 - should this be cast? > $value =3D (int) null; // 0 - should this be cast? > > Today, without scalar type hints, we end up writing code that has to firs= t > validate that we have something we can use, and then cast it. This can > often be > done with ext/filter, but it's horribly verbose: > > $value =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $value) { > // throw an exception? > } > > Many people skip the validation step entirely for the more succinct: > > $value =3D (int) $value; > > And this is where problems occur, because this is when data loss occurs. > > What I've observed in my 15+ years of using PHP is that people _don't_ > validate; > they either blindly accept data and assume it's of the correct type, or > they > blindly cast it without validation because writing that validation code i= s > boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you ca= n > offload that to libraries, but why introduce a new dependency in somethin= g > as > simple as a value object? > > The promise of STH is that the values will be properly coerced, so that i= f > I > write a function that expects an integer, but pass it something like '100= ' > or > '0x1A', it will be cast for me =E2=80=94 but something that is not an int= eger and > cannot > be safely cast without data loss will be rejected, and an error can bubbl= e > up my > stack or into my logs. > > Both the Dual-Mode and the new Coercive typehints RFCs provide this. > > The Dual-Mode, however, can potentially take us back to the same code we > have > today when strict mode is enabled. > > Now, you may argue that you won't need to cast the value in the first > place, > because STH! But what if the value you received is from a database? or > from a > web request you've made? Chances are, the data is in a string, but the > _value_ > may be of another type. With weak/coercive mode, you just pass the data > as-is, > but with strict enabled, your choices are to either cast blindly, or to d= o > the > same validation/casting as before: > > $value =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $value) { > // throw an exception? > } > > Interestingly, this adds overhead to your application (more function > calls), and > makes it harder to read and to maintain. Ironically, I foresee "strict" a= s > being > a new "badge of honor" for many in the language ("my code works under > strict > mode!"), despite these factors. > > If I don't enable strict mode on my code, and somebody else turns on > strict when > calling my code, there's the possibility of new errors if I do not perfor= m > validation or casting on such values. This means that the de facto > standard will > likely be to code to strict (I can already envision the flood of PRs > against OSS > projects for these issues). > > You can say, "But, Static Analysis!" all you want, but that doesn't lead > to me > writing less code to accomplish the same thing; it just gives me a tool t= o > check > the correctness of my code. (Yes, this _is_ important. But we also have a > ton of > tooling around those concerns already, even if they aren't proper static > analyzers.) > > From a developer experience factor, I find myself scratching my head: wha= t > are > we gaining with STH if we have a strict mode? I'm still writing exactly > the same > code I am today to validate and/or cast my scalars before passing them to > functions and methods if I want to be strict. > > The new coercive RFC offers much more promise to me as a consumer/user of > the > language. The primary benefit I see is that it provides a path forward > towards > better casting logic in the language, which will ensure that =E2=80=94 in= the > future =E2=80=94 > this: > > $value =3D (int) $value; > > will operate properly, and raise errors when data loss may occur. It mean= s > that > immediately, if I start using STH, I can be assured that _if_ my code > runs, I > have values of the correct type, as they've been coerced safely. The lack > of a > strict mode means I can drop that defensive validation/casting code safel= y. > > My point is: I'm sick of writing code like this: > > /** > * @param int $code > * @param string $reason > */ > public function setStatus($code, $reason =3D null) > { > $code =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $code) { > throw new InvalidArgumentException( > 'Code must be an integer' > ); > } > if (null !=3D=3D $reason && ! is_string_$reason) { > throw new InvalidArgumentException( > 'Reason must be null or a string' > ); > } > > $this->code =3D $code; > $this->reason =3D $reason; > ); > > I want to be able to write this: > > public function setStatus(int $code, string $reason =3D null) > { > $this->code =3D $code; > $this->reason =3D $reason; > ); > > and _not_ push the burden on consumers to validate/cast their values. > > This is what I want from STH, no more no less: sane casting rules, and th= e > ability to code to scalar types safely. While I can see some of the > benefits of > strict mode, I'm concerned about the schism it may create in the PHP > library > ecosystem, and that many of the benefits of the coercive portion of that > RFC > will be lost when working with data from unknown data sources. > > If you've read thus far, thank you for your consideration. I'll stop > bugging you > now. > > -- > Matthew Weier O'Phinney > Principal Engineer > Project Lead, Zend Framework and Apigility > matthew@zend.com > http://framework.zend.com > http://apigility.org > PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > --089e01160576abd6da050fc3a111--