Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:83597 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 13635 invoked from network); 23 Feb 2015 18:21:48 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Feb 2015 18:21:48 -0000 Authentication-Results: pb1.pair.com smtp.mail=pierre.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=pierre.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.175 as permitted sender) X-PHP-List-Original-Sender: pierre.php@gmail.com X-Host-Fingerprint: 209.85.216.175 mail-qc0-f175.google.com Received: from [209.85.216.175] ([209.85.216.175:37234] helo=mail-qc0-f175.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 08/1A-01128-9BF6BE45 for ; Mon, 23 Feb 2015 13:21:45 -0500 Received: by qcrw7 with SMTP id w7so12360056qcr.4 for ; Mon, 23 Feb 2015 10:21:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=5MCQ9x5Zqx3vFokwIuCTEQTwyDM5rSO+X1LPDs5NnUE=; b=lYU4fPAGj2KkDLvVToRxsYjyOGeEsdHRutmWq9BQiA1VnT1bDkkU9KA8ZJwTateb9E grktxK1TuYH2XMhaMdYwNTbsE+tSntRiOjNxOTzE/kvHDX+KP3UfFahU9XIz15ta/QKu UblDVbmKEsBreyX1ISHTDMVWxzNBj2aHPmYuWvR+HwZhCvfJ21wQD5pFmS+vjz/KD9j/ E97pUcGTdt6F7Jt9eZX3LbDcPIrh5+SU6Bl8UWMhijtfUDPiVDkr9TNshrlPVGG3mwNP S2i7qHmEElxFmvyzYXfzWkKZPoKxxt6ASxL7+rN1OqbJDxRf21DvqstvYJgvw8IDxfyC coiw== MIME-Version: 1.0 X-Received: by 10.140.233.148 with SMTP id e142mr582591qhc.15.1424715701777; Mon, 23 Feb 2015 10:21:41 -0800 (PST) Received: by 10.96.39.195 with HTTP; Mon, 23 Feb 2015 10:21:41 -0800 (PST) In-Reply-To: References: Date: Mon, 23 Feb 2015 10:21:41 -0800 Message-ID: To: "Matthew Weier O'Phinney" Cc: PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] User perspective on STH From: pierre.php@gmail.com (Pierre Joye) hi Matthew, On Mon, Feb 23, 2015 at 8:01 AM, Matthew Weier O'Phinney wrote: > I'm writing this as an author and maintainer of a framework and many libr= aries. > Caveat, for those who aren't already aware: I work for Zend, and report t= o Zeev. > If you feel that will make my points impartial, please feel free to stop > reading, but I do think my points on STH bear some consideration. > > I've been following the STH proposals off and on. I voted for Andrea's pr= oposal, > and, behind the scenes, defended it to Zeev. On a lot of consideration, a= nd as > primarily a _consumer_ and _user_ of the language, I'm no longer convince= d that > a dual-mode proposal makes sense. I worry that it will lead to: > > - A split within the PHP community, consisting of those that do not use > typehints, those who do use typehints, and those who use strict. > - Poor programming practices and performance degradation by those who ado= pt > strict, due to poor usage of type casting. > > Let me explain. > > The big problem currently is that the engine behavior around casting can = lead to > data loss quickly. As has been demonstrated elsewhere: > > $value =3D (int) '100 dogs'; // 100 - non-numeric trailing values ar= e trimmed > $value =3D (int) 'dog100'; // 0 - non-numeric values leading > values -> 0 ... > $value =3D (int) '-100'; // -100 - ... unless indicating sign. > $value =3D (int) ' 100'; // 100 - space is trimmed; data loss! > $value =3D (int) ' 100 '; // 100 - space is trimmed; data loss! > $value =3D (int) '100.0'; // 100 - probably correct, but loss of = precision > $value =3D (int) '100.7'; // 100 - precision and data loss! > $value =3D (int) 100.7; // 100 - precision and data loss! > $value =3D (int) 0x1A; // 26 - hex > $value =3D (int) '0x1A'; // 0 - shouldn't this be 26? why is > this different? > $value =3D (int) true; // 1 - should this be cast? > $value =3D (int) false; // 0 - should this be cast? > $value =3D (int) null; // 0 - should this be cast? > > Today, without scalar type hints, we end up writing code that has to firs= t > validate that we have something we can use, and then cast it. What does that have to do with the strict RFC? If you do not enable it, in your code or files, nothing will change to what you have today. I repeat, nothing. Even if your library (with no strict mode) is used from files/codes with strict mode enabled. On the other hands, if we change the way casting is done, in general and globally, I wish anyone good luck to actually validate their apps. Why? Because I am relatively confident that most of the apps out there have no way to actually test these changes with real input data and I very much doubts their respective unit tests, or behavior tests do cover these cases. > This can often be > done with ext/filter, but it's horribly verbose: > $value =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $value) { > // throw an exception? > } > > Many people skip the validation step entirely for the more succinct: > > $value =3D (int) $value; You lost me here. Input filtering is one thing. Arguments management another. Yes, they may look similar but really they are two different beasts. Or am I missing your point? > And this is where problems occur, because this is when data loss occurs. > > What I've observed in my 15+ years of using PHP is that people _don't_ va= lidate; > they either blindly accept data and assume it's of the correct type, or t= hey > blindly cast it without validation because writing that validation code i= s > boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you ca= n > offload that to libraries, but why introduce a new dependency in somethin= g as > simple as a value object? Right, and manage to remember what the casting rules are is even more painful and boring, we barely know by heart all of them and I surprised myself about a couple of them while reading the internals list (like fixing inconsistencies or old weird behavior). I am convinced that changing them now is not going to help anyone, in contrary. And this is what the weak typing RFC proposes. > The promise of STH is that the values will be properly coerced, so that i= f I > write a function that expects an integer, but pass it something like '100= ' or > '0x1A', it will be cast for me =E2=80=94 but something that is not an int= eger and cannot > be safely cast without data loss will be rejected, and an error can bubbl= e up my > stack or into my logs. There is no "Properly" in casting. It is almost only some arbitrary choices. The boolean one for example is just random to me. By the way, this is why I do like to be able to have a strict mode if I wish to: I do not want arbitrary rules, especially if I won't ever remember them. On the "users do not validate input values in their code, for functions or methods, well, it is an education problem. Just like in the core, we always bugs because we do not validate ranges, offset, etc. And know what? C is strict. Nothing change here. Some functions and methods do need validations of the inputs, that does not make strictness less good or more worst. It simply removes the magic casting part and make crystal clear what will happen when invalid types are used.. > Both the Dual-Mode and the new Coercive typehints RFCs provide this. > > The Dual-Mode, however, can potentially take us back to the same code we = have > today when strict mode is enabled. Either some coma is missing or I cannot remotely understand where strict mode will take us back. There is not such thing now. And lazy users will remain lazy, how the arguments are handled won't change them magically. > Now, you may argue that you won't need to cast the value in the first pla= ce, > because STH! But what if the value you received is from a database? or fr= om a > web request you've made? Chances are, the data is in a string, but the _v= alue_ > may be of another type. With weak/coercive mode, you just pass the data a= s-is, > but with strict enabled, your choices are to either cast blindly, or to d= o the > same validation/casting as before: > > $value =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); > if (false =3D=3D=3D $value) { > // throw an exception? > } > > Interestingly, this adds overhead to your application (more function call= s), and > makes it harder to read and to maintain. Ironically, I foresee "strict" a= s being > a new "badge of honor" for many in the language ("my code works under str= ict > mode!"), despite these factors. See previous comment. > If I don't enable strict mode on my code, and somebody else turns on stri= ct when > calling my code, You totally misunderstand the RFC. Whether strict mode is enabled or not in the caller code does not affect in any way the code in your library/files. I repeat: You control, and only you!, your file/code, the caller does not and cannot control the mode used in your code/file. Please understand that. > You can say, "But, Static Analysis!" all you want, but that doesn't lead = to me > writing less code to accomplish the same thing; it just gives me a tool t= o check > the correctness of my code. (Yes, this _is_ important. But we also have a= ton of > tooling around those concerns already, even if they aren't proper static > analyzers.) I totally agree. Hypothetical new tools or features because one or the other RFC is chosen is totally irrelevant to this discussion. Performance as well, as Dmitry mentioned again that strict dual mode will be slower, it is not in any significant way (read: below measures error margins). > From a developer experience factor, I find myself scratching my head: wha= t are > we gaining with STH if we have a strict mode? I'm still writing exactly t= he same > code I am today to validate and/or cast my scalars before passing them to > functions and methods if I want to be strict. Fair enough. But let other who do see benefits (see my numerous comments, or from other) use it. On the other, I let you imagine if we change the casting rules now, good luck. > The new coercive RFC offers much more promise to me as a consumer/user of= the > language. The primary benefit I see is that it provides a path forward to= wards > better casting logic in the language, which will ensure that =E2=80=94 in= the future =E2=80=94 > this: > > $value =3D (int) $value; > > will operate properly, and raise errors when data loss may occur. It mean= s that > immediately, if I start using STH, I can be assured that _if_ my code run= s, I > have values of the correct type, as they've been coerced safely. The lack= of a > strict mode means I can drop that defensive validation/casting code safel= y. Oh, I agree, clean the casting rules is totally necessary. But let forget about adoption, ok? I ignore totally the INI settings about it as it is such a bad idea than I have no word to explain why.... > My point is: I'm sick of writing code like this: > > /** > * @param int $code > * @param string $reason > */ > public function setStatus($code, $reason =3D null) > { > $code =3D filter_var( > $value, > FILTER_VALIDATE_INT, > FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX > ); I suppose you are not sick to write bugs but made a type, s, $value, $code, right? Besides the killing a fly with a tank to do this kind of validations like this, this is mainly due to the magic casting being inconsistent. Both weaks (but creates potential BC problems on the caller side) and strict solve this exact kind of validations. What none of the RFC will change is the business logic related validations, like ranges and the likes. These kind of validations could be easily solve using annotations (and why this is what we need next to scalar type hinting as well). > I want to be able to write this: Cheers, --=20 Pierre @pierrejoye | http://www.libgd.org