Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:83597
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CACouGvnY3rDDcpfZAv7pkv0ZgLTwUA_4g6m3_R3yDRScncbFrA@mail.gmail.com>
References: <CACouGvnY3rDDcpfZAv7pkv0ZgLTwUA_4g6m3_R3yDRScncbFrA@mail.gmail.com>
Date: Mon, 23 Feb 2015 10:21:41 -0800
Message-ID: <CAEZPtU7TjBtSdMar6agkFK-VpRWwks8QZHHP0ZAZTwx5_27gmA@mail.gmail.com>
To: "Matthew Weier O'Phinney" <matthew@zend.com>
Cc: PHP internals <internals@lists.php.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [PHP-DEV] User perspective on STH
From: pierre.php@gmail.com (Pierre Joye)

hi Matthew,

On Mon, Feb 23, 2015 at 8:01 AM, Matthew Weier O'Phinney
<matthew@zend.com> wrote:
> I'm writing this as an author and maintainer of a framework and many libr=
aries.
> Caveat, for those who aren't already aware: I work for Zend, and report t=
o Zeev.
> If you feel that will make my points impartial, please feel free to stop
> reading, but I do think my points on STH bear some consideration.
>
> I've been following the STH proposals off and on. I voted for Andrea's pr=
oposal,
> and, behind the scenes, defended it to Zeev. On a lot of consideration, a=
nd as
> primarily a _consumer_ and _user_ of the language, I'm no longer convince=
d that
> a dual-mode proposal makes sense. I worry that it will lead to:
>
> - A split within the PHP community, consisting of those that do not use
>   typehints, those who do use typehints, and those who use strict.
> - Poor programming practices and performance degradation by those who ado=
pt
>   strict, due to poor usage of type casting.
>
> Let me explain.
>
> The big problem currently is that the engine behavior around casting can =
lead to
> data loss quickly. As has been demonstrated elsewhere:
>
>     $value =3D (int) '100 dogs'; // 100  - non-numeric trailing values ar=
e trimmed
>     $value =3D (int) 'dog100';   // 0    - non-numeric values leading
> values -> 0 ...
>     $value =3D (int) '-100';     // -100 - ... unless indicating sign.
>     $value =3D (int) ' 100';     // 100  - space is trimmed; data loss!
>     $value =3D (int) ' 100 ';    // 100  - space is trimmed; data loss!
>     $value =3D (int) '100.0';    // 100  - probably correct, but loss of =
precision
>     $value =3D (int) '100.7';    // 100  - precision and data loss!
>     $value =3D (int) 100.7;      // 100  - precision and data loss!
>     $value =3D (int) 0x1A;       // 26   - hex
>     $value =3D (int) '0x1A';     // 0    - shouldn't this be 26? why is
> this different?
>     $value =3D (int) true;       // 1    - should this be cast?
>     $value =3D (int) false;      // 0    - should this be cast?
>     $value =3D (int) null;       // 0    - should this be cast?
>
> Today, without scalar type hints, we end up writing code that has to firs=
t
> validate that we have something we can use, and then cast it.

What does that have to do with the strict RFC? If you do not enable
it, in your code or files, nothing will change to what you have today.
I repeat, nothing. Even if your library (with no strict mode) is used
from files/codes with strict mode enabled.

On the other hands, if we change the way casting is done, in general
and globally, I wish anyone good luck to actually validate their apps.
Why? Because I am relatively confident that most of the apps out there
have no way to actually test these changes with real input data and I
very much doubts their respective unit tests, or behavior tests do
cover these cases.

> This can often be
> done with ext/filter, but it's horribly verbose:
>     $value =3D filter_var(
>         $value,
>         FILTER_VALIDATE_INT,
>         FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX
>     );
>     if (false =3D=3D=3D $value) {
>         // throw an exception?
>     }
>
> Many people skip the validation step entirely for the more succinct:
>
>     $value =3D (int) $value;

You lost me here. Input filtering is one thing. Arguments management
another. Yes, they may look similar but really they are two different
beasts. Or am I missing your point?


> And this is where problems occur, because this is when data loss occurs.
>
> What I've observed in my 15+ years of using PHP is that people _don't_ va=
lidate;
> they either blindly accept data and assume it's of the correct type, or t=
hey
> blindly cast it without validation because writing that validation code i=
s
> boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you ca=
n
> offload that to libraries, but why introduce a new dependency in somethin=
g as
> simple as a value object?

Right, and manage to remember what the casting rules are is even more
painful and boring, we barely know by heart all of them and I
surprised myself about a couple of them while reading the internals
list (like fixing inconsistencies or old weird behavior). I am
convinced that changing them now is not going to help anyone, in
contrary. And this is what the weak typing RFC proposes.

> The promise of STH is that the values will be properly coerced, so that i=
f I
> write a function that expects an integer, but pass it something like '100=
' or
> '0x1A', it will be cast for me =E2=80=94 but something that is not an int=
eger and cannot
> be safely cast without data loss will be rejected, and an error can bubbl=
e up my
> stack or into my logs.

There is no "Properly" in casting. It is almost only some arbitrary
choices. The boolean one for example is just random to me.

By the way, this is why I do like to be able to have a strict mode if
I wish to: I do not want arbitrary rules, especially if I won't ever
remember them.

On the "users do not validate input values in their code, for
functions or methods, well, it is an education problem. Just like in
the core, we always bugs because we do not validate ranges, offset,
etc. And know what? C is strict. Nothing change here. Some functions
and methods do need validations of the inputs, that does not make
strictness less good or more worst. It simply removes the magic
casting part and make crystal clear what will happen when invalid
types are used..

> Both the Dual-Mode and the new Coercive typehints RFCs provide this.
>
> The Dual-Mode, however, can potentially take us back to the same code we =
have
> today when strict mode is enabled.

Either some coma is missing or I cannot remotely understand where
strict mode will take us back. There is not such thing now. And lazy
users will remain lazy, how the arguments are handled won't change
them magically.

> Now, you may argue that you won't need to cast the value in the first pla=
ce,
> because STH! But what if the value you received is from a database? or fr=
om a
> web request you've made? Chances are, the data is in a string, but the _v=
alue_
> may be of another type. With weak/coercive mode, you just pass the data a=
s-is,
> but with strict enabled, your choices are to either cast blindly, or to d=
o the
> same validation/casting as before:
>
>     $value =3D filter_var(
>         $value,
>         FILTER_VALIDATE_INT,
>         FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX
>     );
>     if (false =3D=3D=3D $value) {
>         // throw an exception?
>     }
>
> Interestingly, this adds overhead to your application (more function call=
s), and
> makes it harder to read and to maintain. Ironically, I foresee "strict" a=
s being
> a new "badge of honor" for many in the language ("my code works under str=
ict
> mode!"), despite these factors.

See previous comment.

> If I don't enable strict mode on my code, and somebody else turns on stri=
ct when
> calling my code,

You totally misunderstand the RFC. Whether strict mode is enabled or
not in the caller code does not affect in any way the code in your
library/files. I repeat: You control, and only you!, your file/code,
the caller does not and cannot control the mode used in your
code/file. Please understand that.

> You can say, "But, Static Analysis!" all you want, but that doesn't lead =
to me
> writing less code to accomplish the same thing; it just gives me a tool t=
o check
> the correctness of my code. (Yes, this _is_ important. But we also have a=
 ton of
> tooling around those concerns already, even if they aren't proper static
> analyzers.)

I totally agree. Hypothetical new tools or features because one or the
other RFC is chosen is totally irrelevant to this  discussion.

Performance as well, as Dmitry mentioned again that strict dual mode
will be slower, it is not in any significant way (read: below measures
error margins).

> From a developer experience factor, I find myself scratching my head: wha=
t are
> we gaining with STH if we have a strict mode? I'm still writing exactly t=
he same
> code I am today to validate and/or cast my scalars before passing them to
> functions and methods if I want to be strict.

Fair enough. But let other who do see benefits (see my numerous
comments, or from other) use it. On the other, I let you imagine if we
change the casting rules now, good luck.

> The new coercive RFC offers much more promise to me as a consumer/user of=
 the
> language. The primary benefit I see is that it provides a path forward to=
wards
> better casting logic in the language, which will ensure that =E2=80=94 in=
 the future =E2=80=94
> this:
>
>     $value =3D (int) $value;
>
> will operate properly, and raise errors when data loss may occur. It mean=
s that
> immediately, if I start using STH, I can be assured that _if_ my code run=
s, I
> have values of the correct type, as they've been coerced safely. The lack=
 of a
> strict mode means I can drop that defensive validation/casting code safel=
y.

Oh, I agree, clean the casting rules is totally necessary. But let
forget about adoption, ok?

I ignore totally the INI settings about it as it is such a bad idea
than I have no word to explain why....

> My point is: I'm sick of writing code like this:
>
>     /**
>      * @param int $code
>      * @param string $reason
>      */
>     public function setStatus($code, $reason =3D null)
>     {
>         $code =3D filter_var(
>             $value,
>             FILTER_VALIDATE_INT,
>             FILTER_FLAG_ALLOW_OCTAL | FILTER_FLAG_ALLOW_HEX
>         );

I  suppose you are not sick to write bugs but made a type, s, $value,
$code, right?

Besides the killing a fly with a tank to do this kind of validations
like this, this is mainly due to the magic casting being inconsistent.
Both weaks (but creates potential BC problems on the caller side) and
strict solve this exact kind of validations.

What none of the RFC will change is the business logic related
validations, like ranges and the likes. These kind of validations
could be easily solve using annotations (and why this is what we need
next to scalar type hinting as well).

> I want to be able to write this:


Cheers,
--=20
Pierre

@pierrejoye | http://www.libgd.org