I'm writing this as an author and maintainer of a framework and many libraries.
Caveat, for those who aren't already aware: I work for Zend, and report to Zeev.
If you feel that will make my points impartial, please feel free to stop
reading, but I do think my points on STH bear some consideration.
I've been following the STH proposals off and on. I voted for Andrea's proposal,
and, behind the scenes, defended it to Zeev. On a lot of consideration, and as
primarily a consumer and user of the language, I'm no longer convinced that
a dual-mode proposal makes sense. I worry that it will lead to:
- A split within the PHP community, consisting of those that do not use
typehints, those who do use typehints, and those who use strict. - Poor programming practices and performance degradation by those who adopt
strict, due to poor usage of type casting.
Let me explain.
The big problem currently is that the engine behavior around casting can lead to
data loss quickly. As has been demonstrated elsewhere:
$value = (int) '100 dogs'; // 100 - non-numeric trailing values are trimmed
$value = (int) 'dog100'; // 0 - non-numeric values leading
values -> 0 ...
$value = (int) '-100'; // -100 - ... unless indicating sign.
$value = (int) ' 100'; // 100 - space is trimmed; data loss!
$value = (int) ' 100 '; // 100 - space is trimmed; data loss!
$value = (int) '100.0'; // 100 - probably correct, but loss of precision
$value = (int) '100.7'; // 100 - precision and data loss!
$value = (int) 100.7; // 100 - precision and data loss!
$value = (int) 0x1A; // 26 - hex
$value = (int) '0x1A'; // 0 - shouldn't this be 26? why is
this different?
$value = (int) true; // 1 - should this be cast?
$value = (int) false; // 0 - should this be cast?
$value = (int) null; // 0 - should this be cast?
Today, without scalar type hints, we end up writing code that has to first
validate that we have something we can use, and then cast it. This can often be
done with ext/filter, but it's horribly verbose:
$value = filter_var(
$value,
FILTER_VALIDATE_INT,
`FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX`
);
if (false === $value) {
// throw an exception?
}
Many people skip the validation step entirely for the more succinct:
$value = (int) $value;
And this is where problems occur, because this is when data loss occurs.
What I've observed in my 15+ years of using PHP is that people don't validate;
they either blindly accept data and assume it's of the correct type, or they
blindly cast it without validation because writing that validation code is
boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can
offload that to libraries, but why introduce a new dependency in something as
simple as a value object?
The promise of STH is that the values will be properly coerced, so that if I
write a function that expects an integer, but pass it something like '100' or
'0x1A', it will be cast for me — but something that is not an integer and cannot
be safely cast without data loss will be rejected, and an error can bubble up my
stack or into my logs.
Both the Dual-Mode and the new Coercive typehints RFCs provide this.
The Dual-Mode, however, can potentially take us back to the same code we have
today when strict mode is enabled.
Now, you may argue that you won't need to cast the value in the first place,
because STH! But what if the value you received is from a database? or from a
web request you've made? Chances are, the data is in a string, but the value
may be of another type. With weak/coercive mode, you just pass the data as-is,
but with strict enabled, your choices are to either cast blindly, or to do the
same validation/casting as before:
$value = filter_var(
$value,
FILTER_VALIDATE_INT,
`FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX`
);
if (false === $value) {
// throw an exception?
}
Interestingly, this adds overhead to your application (more function calls), and
makes it harder to read and to maintain. Ironically, I foresee "strict" as being
a new "badge of honor" for many in the language ("my code works under strict
mode!"), despite these factors.
If I don't enable strict mode on my code, and somebody else turns on strict when
calling my code, there's the possibility of new errors if I do not perform
validation or casting on such values. This means that the de facto standard will
likely be to code to strict (I can already envision the flood of PRs against OSS
projects for these issues).
You can say, "But, Static Analysis!" all you want, but that doesn't lead to me
writing less code to accomplish the same thing; it just gives me a tool to check
the correctness of my code. (Yes, this is important. But we also have a ton of
tooling around those concerns already, even if they aren't proper static
analyzers.)
From a developer experience factor, I find myself scratching my head: what are
we gaining with STH if we have a strict mode? I'm still writing exactly the same
code I am today to validate and/or cast my scalars before passing them to
functions and methods if I want to be strict.
The new coercive RFC offers much more promise to me as a consumer/user of the
language. The primary benefit I see is that it provides a path forward towards
better casting logic in the language, which will ensure that — in the future —
this:
$value = (int) $value;
will operate properly, and raise errors when data loss may occur. It means that
immediately, if I start using STH, I can be assured that if my code runs, I
have values of the correct type, as they've been coerced safely. The lack of a
strict mode means I can drop that defensive validation/casting code safely.
My point is: I'm sick of writing code like this:
/**
* @param int $code
* @param string $reason
*/
public function setStatus($code, $reason = null)
{
$code = filter_var(
$value,
FILTER_VALIDATE_INT,
`FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX`
);
if (false === $code) {
throw new InvalidArgumentException(
'Code must be an integer'
);
}
if (null !== $reason && ! is_string_$reason) {
throw new InvalidArgumentException(
'Reason must be null or a string'
);
}
$this->code = $code;
$this->reason = $reason;
);
I want to be able to write this:
public function setStatus(int $code, string $reason = null)
{
$this->code = $code;
$this->reason = $reason;
);
and not push the burden on consumers to validate/cast their values.
This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the benefits of
strict mode, I'm concerned about the schism it may create in the PHP library
ecosystem, and that many of the benefits of the coercive portion of that RFC
will be lost when working with data from unknown data sources.
If you've read thus far, thank you for your consideration. I'll stop bugging you
now.
--
Matthew Weier O'Phinney
Principal Engineer
Project Lead, Zend Framework and Apigility
matthew@zend.com
http://framework.zend.com
http://apigility.org
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc
This is first answer that makes sense for community needs.
Em 23/02/2015 13:01, "Matthew Weier O'Phinney" matthew@zend.com escreveu:
I'm writing this as an author and maintainer of a framework and many
libraries.
Caveat, for those who aren't already aware: I work for Zend, and report to
Zeev.
If you feel that will make my points impartial, please feel free to stop
reading, but I do think my points on STH bear some consideration.I've been following the STH proposals off and on. I voted for Andrea's
proposal,
and, behind the scenes, defended it to Zeev. On a lot of consideration,
and as
primarily a consumer and user of the language, I'm no longer convinced
that
a dual-mode proposal makes sense. I worry that it will lead to:
- A split within the PHP community, consisting of those that do not use
typehints, those who do use typehints, and those who use strict.- Poor programming practices and performance degradation by those who adopt
strict, due to poor usage of type casting.Let me explain.
The big problem currently is that the engine behavior around casting can
lead to
data loss quickly. As has been demonstrated elsewhere:$value = (int) '100 dogs'; // 100 - non-numeric trailing values are
trimmed
$value = (int) 'dog100'; // 0 - non-numeric values leading
values -> 0 ...
$value = (int) '-100'; // -100 - ... unless indicating sign.
$value = (int) ' 100'; // 100 - space is trimmed; data loss!
$value = (int) ' 100 '; // 100 - space is trimmed; data loss!
$value = (int) '100.0'; // 100 - probably correct, but loss of
precision
$value = (int) '100.7'; // 100 - precision and data loss!
$value = (int) 100.7; // 100 - precision and data loss!
$value = (int) 0x1A; // 26 - hex
$value = (int) '0x1A'; // 0 - shouldn't this be 26? why is
this different?
$value = (int) true; // 1 - should this be cast?
$value = (int) false; // 0 - should this be cast?
$value = (int) null; // 0 - should this be cast?Today, without scalar type hints, we end up writing code that has to first
validate that we have something we can use, and then cast it. This can
often be
done with ext/filter, but it's horribly verbose:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Many people skip the validation step entirely for the more succinct:
$value = (int) $value;
And this is where problems occur, because this is when data loss occurs.
What I've observed in my 15+ years of using PHP is that people don't
validate;
they either blindly accept data and assume it's of the correct type, or
they
blindly cast it without validation because writing that validation code is
boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can
offload that to libraries, but why introduce a new dependency in something
as
simple as a value object?The promise of STH is that the values will be properly coerced, so that if
I
write a function that expects an integer, but pass it something like '100'
or
'0x1A', it will be cast for me — but something that is not an integer and
cannot
be safely cast without data loss will be rejected, and an error can bubble
up my
stack or into my logs.Both the Dual-Mode and the new Coercive typehints RFCs provide this.
The Dual-Mode, however, can potentially take us back to the same code we
have
today when strict mode is enabled.Now, you may argue that you won't need to cast the value in the first
place,
because STH! But what if the value you received is from a database? or
from a
web request you've made? Chances are, the data is in a string, but the
value
may be of another type. With weak/coercive mode, you just pass the data
as-is,
but with strict enabled, your choices are to either cast blindly, or to do
the
same validation/casting as before:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Interestingly, this adds overhead to your application (more function
calls), and
makes it harder to read and to maintain. Ironically, I foresee "strict" as
being
a new "badge of honor" for many in the language ("my code works under
strict
mode!"), despite these factors.If I don't enable strict mode on my code, and somebody else turns on
strict when
calling my code, there's the possibility of new errors if I do not perform
validation or casting on such values. This means that the de facto
standard will
likely be to code to strict (I can already envision the flood of PRs
against OSS
projects for these issues).You can say, "But, Static Analysis!" all you want, but that doesn't lead
to me
writing less code to accomplish the same thing; it just gives me a tool to
check
the correctness of my code. (Yes, this is important. But we also have a
ton of
tooling around those concerns already, even if they aren't proper static
analyzers.)From a developer experience factor, I find myself scratching my head: what
are
we gaining with STH if we have a strict mode? I'm still writing exactly
the same
code I am today to validate and/or cast my scalars before passing them to
functions and methods if I want to be strict.The new coercive RFC offers much more promise to me as a consumer/user of
the
language. The primary benefit I see is that it provides a path forward
towards
better casting logic in the language, which will ensure that — in the
future —
this:$value = (int) $value;
will operate properly, and raise errors when data loss may occur. It means
that
immediately, if I start using STH, I can be assured that if my code
runs, I
have values of the correct type, as they've been coerced safely. The lack
of a
strict mode means I can drop that defensive validation/casting code safely.My point is: I'm sick of writing code like this:
/** * @param int $code * @param string $reason */ public function setStatus($code, $reason = null) { $code = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $code) { throw new InvalidArgumentException( 'Code must be an integer' ); } if (null !== $reason && ! is_string_$reason) { throw new InvalidArgumentException( 'Reason must be null or a string' ); } $this->code = $code; $this->reason = $reason; );
I want to be able to write this:
public function setStatus(int $code, string $reason = null) { $this->code = $code; $this->reason = $reason; );
and not push the burden on consumers to validate/cast their values.
This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the
benefits of
strict mode, I'm concerned about the schism it may create in the PHP
library
ecosystem, and that many of the benefits of the coercive portion of that
RFC
will be lost when working with data from unknown data sources.If you've read thus far, thank you for your consideration. I'll stop
bugging you
now.--
Matthew Weier O'Phinney
Principal Engineer
Project Lead, Zend Framework and Apigility
matthew@zend.com
http://framework.zend.com
http://apigility.org
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc
Hello,
I'm writing this as an author and maintainer of a framework and many
libraries.
Caveat, for those who aren't already aware: I work for Zend, and report to
Zeev.
If you feel that will make my points impartial, please feel free to stop
reading, but I do think my points on STH bear some consideration.I've been following the STH proposals off and on. I voted for Andrea's
proposal,
and, behind the scenes, defended it to Zeev. On a lot of consideration,
and as
primarily a consumer and user of the language, I'm no longer convinced
that
a dual-mode proposal makes sense. I worry that it will lead to:
- A split within the PHP community, consisting of those that do not use
typehints, those who do use typehints, and those who use strict.- Poor programming practices and performance degradation by those who adopt
strict, due to poor usage of type casting.Let me explain.
The big problem currently is that the engine behavior around casting can
lead to
data loss quickly. As has been demonstrated elsewhere:$value = (int) '100 dogs'; // 100 - non-numeric trailing values are
trimmed
$value = (int) 'dog100'; // 0 - non-numeric values leading
values -> 0 ...
$value = (int) '-100'; // -100 - ... unless indicating sign.
$value = (int) ' 100'; // 100 - space is trimmed; data loss!
$value = (int) ' 100 '; // 100 - space is trimmed; data loss!
$value = (int) '100.0'; // 100 - probably correct, but loss of
precision
$value = (int) '100.7'; // 100 - precision and data loss!
$value = (int) 100.7; // 100 - precision and data loss!
$value = (int) 0x1A; // 26 - hex
$value = (int) '0x1A'; // 0 - shouldn't this be 26? why is
this different?
$value = (int) true; // 1 - should this be cast?
$value = (int) false; // 0 - should this be cast?
$value = (int) null; // 0 - should this be cast?
I do think booleans should still be able to be cast from a user-land
perspective. Often times a database does not deal with boolean values and
the quickest way to convert them into what the database needs is to cast to
an integer.
However, it's not like $value = ($value) ? 1 : 0 would be much worse.
Today, without scalar type hints, we end up writing code that has to first
validate that we have something we can use, and then cast it. This can
often be
done with ext/filter, but it's horribly verbose:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Many people skip the validation step entirely for the more succinct:
$value = (int) $value;
And this is where problems occur, because this is when data loss occurs.
What I've observed in my 15+ years of using PHP is that people don't
validate;
they either blindly accept data and assume it's of the correct type, or
they
blindly cast it without validation because writing that validation code is
boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can
offload that to libraries, but why introduce a new dependency in something
as
simple as a value object?The promise of STH is that the values will be properly coerced, so that if
I
write a function that expects an integer, but pass it something like '100'
or
'0x1A', it will be cast for me — but something that is not an integer and
cannot
be safely cast without data loss will be rejected, and an error can bubble
up my
stack or into my logs.Both the Dual-Mode and the new Coercive typehints RFCs provide this.
The Dual-Mode, however, can potentially take us back to the same code we
have
today when strict mode is enabled.Now, you may argue that you won't need to cast the value in the first
place,
because STH! But what if the value you received is from a database? or
from a
web request you've made? Chances are, the data is in a string, but the
value
may be of another type. With weak/coercive mode, you just pass the data
as-is,
but with strict enabled, your choices are to either cast blindly, or to do
the
same validation/casting as before:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Interestingly, this adds overhead to your application (more function
calls), and
makes it harder to read and to maintain. Ironically, I foresee "strict" as
being
a new "badge of honor" for many in the language ("my code works under
strict
mode!"), despite these factors.
This has been my largest concern of dual mode and something that I
completely see happening.
If I don't enable strict mode on my code, and somebody else turns on
strict when
calling my code, there's the possibility of new errors if I do not perform
validation or casting on such values. This means that the de facto
standard will
likely be to code to strict (I can already envision the flood of PRs
against OSS
projects for these issues).You can say, "But, Static Analysis!" all you want, but that doesn't lead
to me
writing less code to accomplish the same thing; it just gives me a tool to
check
the correctness of my code. (Yes, this is important. But we also have a
ton of
tooling around those concerns already, even if they aren't proper static
analyzers.)From a developer experience factor, I find myself scratching my head: what
are
we gaining with STH if we have a strict mode? I'm still writing exactly
the same
code I am today to validate and/or cast my scalars before passing them to
functions and methods if I want to be strict.The new coercive RFC offers much more promise to me as a consumer/user of
the
language. The primary benefit I see is that it provides a path forward
towards
better casting logic in the language, which will ensure that — in the
future —
this:$value = (int) $value;
will operate properly, and raise errors when data loss may occur. It means
that
immediately, if I start using STH, I can be assured that if my code
runs, I
have values of the correct type, as they've been coerced safely. The lack
of a
strict mode means I can drop that defensive validation/casting code safely.My point is: I'm sick of writing code like this:
/** * @param int $code * @param string $reason */ public function setStatus($code, $reason = null) { $code = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $code) { throw new InvalidArgumentException( 'Code must be an integer' ); } if (null !== $reason && ! is_string_$reason) { throw new InvalidArgumentException( 'Reason must be null or a string' ); } $this->code = $code; $this->reason = $reason; );
I want to be able to write this:
public function setStatus(int $code, string $reason = null) { $this->code = $code; $this->reason = $reason; );
and not push the burden on consumers to validate/cast their values.
This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the
benefits of
strict mode, I'm concerned about the schism it may create in the PHP
library
ecosystem, and that many of the benefits of the coercive portion of that
RFC
will be lost when working with data from unknown data sources.
This is exactly what I am looking for as well. It provides me a far
quicker means for writing out libraries and pushing more of the logic
handling to the consumer. In addition, since I am generally consuming, it
allows me also to handle things as I see fit and no longer need to worry if
the library author decided to use namespaced exceptions, SPL exceptions or
a general exception and cleans up the code from a standpoint of an end user
as far as expectations.
Matt,
The big problem currently is that the engine behavior around casting can lead to
data loss quickly. As has been demonstrated elsewhere:$value = (int) '100 dogs'; // 100 - non-numeric trailing values are trimmed $value = (int) 'dog100'; // 0 - non-numeric values leading
values -> 0 ...
$value = (int) '-100'; // -100 - ... unless indicating sign.
$value = (int) ' 100'; // 100 - space is trimmed; data loss!
$value = (int) ' 100 '; // 100 - space is trimmed; data loss!
$value = (int) '100.0'; // 100 - probably correct, but loss of precision
$value = (int) '100.7'; // 100 - precision and data loss!
$value = (int) 100.7; // 100 - precision and data loss!
$value = (int) 0x1A; // 26 - hex
$value = (int) '0x1A'; // 0 - shouldn't this be 26? why is
this different?
$value = (int) true; // 1 - should this be cast?
$value = (int) false; // 0 - should this be cast?
$value = (int) null; // 0 - should this be cast?Today, without scalar type hints, we end up writing code that has to first
validate that we have something we can use, and then cast it. This can often be
done with ext/filter, but it's horribly verbose:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Many people skip the validation step entirely for the more succinct:
$value = (int) $value;
And this is where problems occur, because this is when data loss occurs.
And what about other languages that have exactly this behavior? Such
as Go/Hack/Haskell/etc. Do you see casts everywhere? No. You see them
where it needs to be explicit. Otherwise, people just write using the
correct types.
And it also hand-waves over the fact that the same problem exists with
coercive types. You're going to get the error anyway if you try to
pass "apple" to an int parameter. So if someone was going to cast with
strict, they will cast with coercive.
The difference is strict tells you ahead of time there's an error.
Where Coercive tells you at runtime. Where your app may blow up while
in prod. Perhaps what you want, perhaps not.
If I don't enable strict mode on my code, and somebody else turns on strict when
calling my code, there's the possibility of new errors if I do not perform
validation or casting on such values. This means that the de facto standard will
likely be to code to strict (I can already envision the flood of PRs against OSS
projects for these issues).
Incorrect. The only person that can turn on strict mode is you, the
author. Now someone can install your library, and edit it to turn on
strict mode (add the declares at the top of the file). But that's very
different from what strict proposes. And that's a problem you have
already today (how many bug reports do you get for "I modified XYZ
class and now it doesn't work").
However, with 2/3 of the options presented in the coercive RFC, you'll
have an INI setting that changes the behavior of your code for you
(the other 1/3 is potentially a significant BC break). How is that
better than a per-file switch? Something you as a library developer
have no control over...
My point is: I'm sick of writing code like this:
/** * @param int $code * @param string $reason */ public function setStatus($code, $reason = null) { $code = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $code) { throw new InvalidArgumentException( 'Code must be an integer' ); } if (null !== $reason && ! is_string_$reason) { throw new InvalidArgumentException( 'Reason must be null or a string' ); } $this->code = $code; $this->reason = $reason; );
I want to be able to write this:
public function setStatus(int $code, string $reason = null) { $this->code = $code; $this->reason = $reason; );
and not push the burden on consumers to validate/cast their values.
Again, you're completely misunderstanding the dual-mode proposal. Even
if you declared that code in strict mode, the determination of how the
call is made is up to the caller. Not the callee.
So in the exact example you showed, even if you declared strict, I
could call ->setStatus("10", new ObjectImplementingToString()); from
my non-strict code and it will work fine. In fact, it's designed
to work that way.
This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the benefits of
strict mode, I'm concerned about the schism it may create in the PHP library
ecosystem, and that many of the benefits of the coercive portion of that RFC
will be lost when working with data from unknown data sources.
Considering the strict mode is file-local, it's not all or nothing.
It's up to the author writing code to determine how to handle the
calls (s)he will make.
Anthony
And what about other languages that have exactly this behavior? Such
as Go/Hack/Haskell/etc. Do you see casts everywhere? No. You see them
where it needs to be explicit. Otherwise, people just write using the
correct types.And it also hand-waves over the fact that the same problem exists with
coercive types. You're going to get the error anyway if you try to
pass "apple" to an int parameter. So if someone was going to cast with
strict, they will cast with coercive.
True. But you're also hand-waving over a point I brought up: many,
many input sources for PHP return strings: HTTP calls, database calls,
etc. With coercive mode, I can pass these values on to other function
calls without a problem; with strict mode, I cannot; I MUST cast
first.
If I don't enable strict mode on my code, and somebody else turns on strict when
calling my code, there's the possibility of new errors if I do not perform
validation or casting on such values. This means that the de facto standard will
likely be to code to strict (I can already envision the flood of PRs against OSS
projects for these issues).Incorrect. The only person that can turn on strict mode is you, the
author. Now someone can install your library, and edit it to turn on
strict mode (add the declares at the top of the file). But that's very
different from what strict proposes.
Okay, I'm confused then.
Let's consider this scenario:
I have a library. It does not declare strict. It does make calls
to either a web service or a database. Let's get even more specific:
the code does not define any scalar type hints, but accepts a
callable.
function execute(callable $callback)
{
// fetch some data from a web service or database,
// gather item1 and item 2 from it,
// and pass the data on to the callback, which was passed to us.
// Assume that we know that item1 is an int-like value, and
item2 is a string-like value.
$callback($item1, $item2);
}
You, as a consumer, declare your script in strict mode, and make a
call to my own code.
declare(strict_types=1);
// somehow import the above function
$callback = function (int $item1, string $item2) {
// do something with the items...
};
execute($callback);
How does that operate?
https://wiki.php.net/rfc/scalar_type_hints_v5 indicates that the
caller determines strict mode, but I'm unclear what the scope of that
is: does it bubble down the entire stack? or does strict only apply to
the specific calls made (i.e., before it reaches the function/method
declared in the other file)? What happens when $callback is executed,
and $item1 is '10'? Is it interpreted strictly, or weakly? Where is
the boundary for where strict happens, exactly?
If strict only applies to the execute() invocation, and doesn't apply
to $callback or the calls made to the web service or database, then I
can retract my statements; however, if the strict applies all the way
down the stack from the caller, I can see definite issues. That's what
I'm worried about.
However, with 2/3 of the options presented in the coercive RFC, you'll
have an INI setting that changes the behavior of your code for you
(the other 1/3 is potentially a significant BC break). How is that
better than a per-file switch? Something you as a library developer
have no control over...
Personally, I'd prefer no INI switch, but I also recognize the BC
problems with that RFC. I want to note now, I'm not saying I support
either RFC specifically; my concern is with the dual-mode aspect of
the STH v0.5 (and predecessors).
This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the benefits of
strict mode, I'm concerned about the schism it may create in the PHP library
ecosystem, and that many of the benefits of the coercive portion of that RFC
will be lost when working with data from unknown data sources.Considering the strict mode is file-local, it's not all or nothing.
It's up to the author writing code to determine how to handle the
calls (s)he will make.
And, as noted, that's the part I need clarification on: is it really
local only to calls made directly in that file, or does strict follow
all the way down the chain?
Finally, there's the other aspect of type casting coercion from the
competing RFC, https://wiki.php.net/rfc/coercive_sth. The tables in
there make a lot of sense to me, as do the eventual ramifications on
language consistency. If dual-mode is really restricted only to the
direct calls made in the given file, and does not travel all the way
down the callstack, the ideal STH proposal, to me, would be combining
the aspects of the second proposal with regards to type coercion with
the dual-mode.
--
Matthew Weier O'Phinney
Principal Engineer
Project Lead, Zend Framework and Apigility
matthew@zend.com
http://framework.zend.com
http://apigility.org
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc
On Mon, Feb 23, 2015 at 6:21 PM, Matthew Weier O'Phinney matthew@zend.com
wrote:
On Mon, Feb 23, 2015 at 10:21 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:
<snip>And what about other languages that have exactly this behavior? Such
as Go/Hack/Haskell/etc. Do you see casts everywhere? No. You see them
where it needs to be explicit. Otherwise, people just write using the
correct types.And it also hand-waves over the fact that the same problem exists with
coercive types. You're going to get the error anyway if you try to
pass "apple" to an int parameter. So if someone was going to cast with
strict, they will cast with coercive.True. But you're also hand-waving over a point I brought up: many,
<snip>
many input sources for PHP return strings: HTTP calls, database calls,
etc. With coercive mode, I can pass these values on to other function
calls without a problem; with strict mode, I cannot; I MUST cast
first.If I don't enable strict mode on my code, and somebody else turns on
strict when
calling my code, there's the possibility of new errors if I do not
perform
validation or casting on such values. This means that the de facto
standard will
likely be to code to strict (I can already envision the flood of PRs
against OSS
projects for these issues).Incorrect. The only person that can turn on strict mode is you, the
author. Now someone can install your library, and edit it to turn on
strict mode (add the declares at the top of the file). But that's very
different from what strict proposes.Okay, I'm confused then.
Let's consider this scenario:
I have a library. It does not declare strict. It does make calls
to either a web service or a database. Let's get even more specific:
the code does not define any scalar type hints, but accepts a
callable.function execute(callable $callback) { // fetch some data from a web service or database, // gather item1 and item 2 from it, // and pass the data on to the callback, which was passed to us. // Assume that we know that item1 is an int-like value, and
item2 is a string-like value.
$callback($item1, $item2);
}You, as a consumer, declare your script in strict mode, and make a
call to my own code.declare(strict_types=1); // somehow import the above function $callback = function (int $item1, string $item2) { // do something with the items... }; execute($callback);
How does that operate?
This may give an error, but this is totaally the fault of the
strict_types=1 user.
Why does he define a typehint for a closure that is passed to non
typehinted code?
Its a bug he has introduced himself.
This is the only example you can construct to support your view, but its
conceptually flawed.
A bug on the side of the strict developer.
https://wiki.php.net/rfc/scalar_type_hints_v5 indicates that the
caller determines strict mode, but I'm unclear what the scope of that
is: does it bubble down the entire stack? or does strict only apply to
the specific calls made (i.e., before it reaches the function/method
declared in the other file)? What happens when $callback is executed,
and $item1 is '10'? Is it interpreted strictly, or weakly? Where is
the boundary for where strict happens, exactly?If strict only applies to the execute() invocation, and doesn't apply
<snip>
to $callback or the calls made to the web service or database, then I
can retract my statements; however, if the strict applies all the way
down the stack from the caller, I can see definite issues. That's what
I'm worried about.However, with 2/3 of the options presented in the coercive RFC, you'll
have an INI setting that changes the behavior of your code for you
(the other 1/3 is potentially a significant BC break). How is that
better than a per-file switch? Something you as a library developer
have no control over...Personally, I'd prefer no INI switch, but I also recognize the BC
<snip>
problems with that RFC. I want to note now, I'm not saying I support
either RFC specifically; my concern is with the dual-mode aspect of
the STH v0.5 (and predecessors).This is what I want from STH, no more no less: sane casting rules, and
the
ability to code to scalar types safely. While I can see some of the
benefits of
strict mode, I'm concerned about the schism it may create in the PHP
library
ecosystem, and that many of the benefits of the coercive portion of
that RFC
will be lost when working with data from unknown data sources.Considering the strict mode is file-local, it's not all or nothing.
It's up to the author writing code to determine how to handle the
calls (s)he will make.And, as noted, that's the part I need clarification on: is it really
local only to calls made directly in that file, or does strict follow
all the way down the chain?Finally, there's the other aspect of type casting coercion from the
competing RFC, https://wiki.php.net/rfc/coercive_sth. The tables in
there make a lot of sense to me, as do the eventual ramifications on
language consistency. If dual-mode is really restricted only to the
direct calls made in the given file, and does not travel all the way
down the callstack, the ideal STH proposal, to me, would be combining
the aspects of the second proposal with regards to type coercion with
the dual-mode.--
Matthew Weier O'Phinney
Principal Engineer
Project Lead, Zend Framework and Apigility
matthew@zend.com
http://framework.zend.com
http://apigility.org
PGP key: http://framework.zend.com/zf-matthew-pgp-key.asc
Matt,
And what about other languages that have exactly this behavior? Such
as Go/Hack/Haskell/etc. Do you see casts everywhere? No. You see them
where it needs to be explicit. Otherwise, people just write using the
correct types.And it also hand-waves over the fact that the same problem exists with
coercive types. You're going to get the error anyway if you try to
pass "apple" to an int parameter. So if someone was going to cast with
strict, they will cast with coercive.True. But you're also hand-waving over a point I brought up: many,
many input sources for PHP return strings: HTTP calls, database calls,
etc. With coercive mode, I can pass these values on to other function
calls without a problem; with strict mode, I cannot; I MUST cast
first.
Correct. Just like every other language that I mentioned.
<snip>If I don't enable strict mode on my code, and somebody else turns on strict when
calling my code, there's the possibility of new errors if I do not perform
validation or casting on such values. This means that the de facto standard will
likely be to code to strict (I can already envision the flood of PRs against OSS
projects for these issues).Incorrect. The only person that can turn on strict mode is you, the
author. Now someone can install your library, and edit it to turn on
strict mode (add the declares at the top of the file). But that's very
different from what strict proposes.Okay, I'm confused then.
Let's consider this scenario:
I have a library. It does not declare strict. It does make calls
to either a web service or a database. Let's get even more specific:
the code does not define any scalar type hints, but accepts a
callable.function execute(callable $callback) { // fetch some data from a web service or database, // gather item1 and item 2 from it, // and pass the data on to the callback, which was passed to us. // Assume that we know that item1 is an int-like value, and
item2 is a string-like value.
$callback($item1, $item2);
}You, as a consumer, declare your script in strict mode, and make a
call to my own code.declare(strict_types=1); // somehow import the above function $callback = function (int $item1, string $item2) { // do something with the items... }; execute($callback);
How does that operate?
https://wiki.php.net/rfc/scalar_type_hints_v5 indicates that the
caller determines strict mode, but I'm unclear what the scope of that
is: does it bubble down the entire stack? or does strict only apply to
the specific calls made (i.e., before it reaches the function/method
declared in the other file)? What happens when $callback is executed,
and $item1 is '10'? Is it interpreted strictly, or weakly? Where is
the boundary for where strict happens, exactly?If strict only applies to the execute() invocation, and doesn't apply
to $callback or the calls made to the web service or database, then I
can retract my statements; however, if the strict applies all the way
down the stack from the caller, I can see definite issues. That's what
I'm worried about.
It only applies to the execute
call (which is no different from 5.x,
since there's no scalar types). But it does not apply to the
$callback($item1, $item2) call because the location of the call was
in a non-strict file. So the code you mentioned works 100%.
There is one gotcha here, which is that you can do the following:
<?php declare(strict_types=1);
function doStuff(callable $cb): int {
$cb("test", "this");
}
?>
<?php // separate file
$cb = function(int $abc, string $def): int {};
$doStuff($cb);
?>
In that case, you'll get a type error inside of doStuff, because the
callback's call is incorrect. This is completely logical if you think
about it because you gave a type mismatch. This highlights a need for
typing of callables: function doStuff(callable(int, string):int $cb)
or something like that.
<snip>However, with 2/3 of the options presented in the coercive RFC, you'll
have an INI setting that changes the behavior of your code for you
(the other 1/3 is potentially a significant BC break). How is that
better than a per-file switch? Something you as a library developer
have no control over...Personally, I'd prefer no INI switch, but I also recognize the BC
<snip>
problems with that RFC. I want to note now, I'm not saying I support
either RFC specifically; my concern is with the dual-mode aspect of
the STH v0.5 (and predecessors).This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the benefits of
strict mode, I'm concerned about the schism it may create in the PHP library
ecosystem, and that many of the benefits of the coercive portion of that RFC
will be lost when working with data from unknown data sources.Considering the strict mode is file-local, it's not all or nothing.
It's up to the author writing code to determine how to handle the
calls (s)he will make.And, as noted, that's the part I need clarification on: is it really
local only to calls made directly in that file, or does strict follow
all the way down the chain?
No, it's local only to the call. The mode is specifically file-specific.
Finally, there's the other aspect of type casting coercion from the
competing RFC, https://wiki.php.net/rfc/coercive_sth. The tables in
there make a lot of sense to me, as do the eventual ramifications on
language consistency. If dual-mode is really restricted only to the
direct calls made in the given file, and does not travel all the way
down the callstack, the ideal STH proposal, to me, would be combining
the aspects of the second proposal with regards to type coercion with
the dual-mode.
If you mean making the "weak" mode more strict in the dual-mode, then
absolutely. We can and should do it. But we need to do it in a way
that maintains as much BC as possible (small, sane changes over time).
For example: erroring on "32 apples". That's sane, but is still a BC
break. Hence we need to tread slowly through the options and not just
push a change like that to the basic ZPP behavior through.
Anthony
hi Matthew,
On Mon, Feb 23, 2015 at 8:01 AM, Matthew Weier O'Phinney
matthew@zend.com wrote:
I'm writing this as an author and maintainer of a framework and many libraries.
Caveat, for those who aren't already aware: I work for Zend, and report to Zeev.
If you feel that will make my points impartial, please feel free to stop
reading, but I do think my points on STH bear some consideration.I've been following the STH proposals off and on. I voted for Andrea's proposal,
and, behind the scenes, defended it to Zeev. On a lot of consideration, and as
primarily a consumer and user of the language, I'm no longer convinced that
a dual-mode proposal makes sense. I worry that it will lead to:
- A split within the PHP community, consisting of those that do not use
typehints, those who do use typehints, and those who use strict.- Poor programming practices and performance degradation by those who adopt
strict, due to poor usage of type casting.Let me explain.
The big problem currently is that the engine behavior around casting can lead to
data loss quickly. As has been demonstrated elsewhere:$value = (int) '100 dogs'; // 100 - non-numeric trailing values are trimmed $value = (int) 'dog100'; // 0 - non-numeric values leading
values -> 0 ...
$value = (int) '-100'; // -100 - ... unless indicating sign.
$value = (int) ' 100'; // 100 - space is trimmed; data loss!
$value = (int) ' 100 '; // 100 - space is trimmed; data loss!
$value = (int) '100.0'; // 100 - probably correct, but loss of precision
$value = (int) '100.7'; // 100 - precision and data loss!
$value = (int) 100.7; // 100 - precision and data loss!
$value = (int) 0x1A; // 26 - hex
$value = (int) '0x1A'; // 0 - shouldn't this be 26? why is
this different?
$value = (int) true; // 1 - should this be cast?
$value = (int) false; // 0 - should this be cast?
$value = (int) null; // 0 - should this be cast?Today, without scalar type hints, we end up writing code that has to first
validate that we have something we can use, and then cast it.
What does that have to do with the strict RFC? If you do not enable
it, in your code or files, nothing will change to what you have today.
I repeat, nothing. Even if your library (with no strict mode) is used
from files/codes with strict mode enabled.
On the other hands, if we change the way casting is done, in general
and globally, I wish anyone good luck to actually validate their apps.
Why? Because I am relatively confident that most of the apps out there
have no way to actually test these changes with real input data and I
very much doubts their respective unit tests, or behavior tests do
cover these cases.
This can often be
done with ext/filter, but it's horribly verbose:
$value = filter_var(
$value,
FILTER_VALIDATE_INT,
FILTER_FLAG_ALLOW_OCTAL
|FILTER_FLAG_ALLOW_HEX
);
if (false === $value) {
// throw an exception?
}Many people skip the validation step entirely for the more succinct:
$value = (int) $value;
You lost me here. Input filtering is one thing. Arguments management
another. Yes, they may look similar but really they are two different
beasts. Or am I missing your point?
And this is where problems occur, because this is when data loss occurs.
What I've observed in my 15+ years of using PHP is that people don't validate;
they either blindly accept data and assume it's of the correct type, or they
blindly cast it without validation because writing that validation code is
boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can
offload that to libraries, but why introduce a new dependency in something as
simple as a value object?
Right, and manage to remember what the casting rules are is even more
painful and boring, we barely know by heart all of them and I
surprised myself about a couple of them while reading the internals
list (like fixing inconsistencies or old weird behavior). I am
convinced that changing them now is not going to help anyone, in
contrary. And this is what the weak typing RFC proposes.
The promise of STH is that the values will be properly coerced, so that if I
write a function that expects an integer, but pass it something like '100' or
'0x1A', it will be cast for me — but something that is not an integer and cannot
be safely cast without data loss will be rejected, and an error can bubble up my
stack or into my logs.
There is no "Properly" in casting. It is almost only some arbitrary
choices. The boolean one for example is just random to me.
By the way, this is why I do like to be able to have a strict mode if
I wish to: I do not want arbitrary rules, especially if I won't ever
remember them.
On the "users do not validate input values in their code, for
functions or methods, well, it is an education problem. Just like in
the core, we always bugs because we do not validate ranges, offset,
etc. And know what? C is strict. Nothing change here. Some functions
and methods do need validations of the inputs, that does not make
strictness less good or more worst. It simply removes the magic
casting part and make crystal clear what will happen when invalid
types are used..
Both the Dual-Mode and the new Coercive typehints RFCs provide this.
The Dual-Mode, however, can potentially take us back to the same code we have
today when strict mode is enabled.
Either some coma is missing or I cannot remotely understand where
strict mode will take us back. There is not such thing now. And lazy
users will remain lazy, how the arguments are handled won't change
them magically.
Now, you may argue that you won't need to cast the value in the first place,
because STH! But what if the value you received is from a database? or from a
web request you've made? Chances are, the data is in a string, but the value
may be of another type. With weak/coercive mode, you just pass the data as-is,
but with strict enabled, your choices are to either cast blindly, or to do the
same validation/casting as before:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Interestingly, this adds overhead to your application (more function calls), and
makes it harder to read and to maintain. Ironically, I foresee "strict" as being
a new "badge of honor" for many in the language ("my code works under strict
mode!"), despite these factors.
See previous comment.
If I don't enable strict mode on my code, and somebody else turns on strict when
calling my code,
You totally misunderstand the RFC. Whether strict mode is enabled or
not in the caller code does not affect in any way the code in your
library/files. I repeat: You control, and only you!, your file/code,
the caller does not and cannot control the mode used in your
code/file. Please understand that.
You can say, "But, Static Analysis!" all you want, but that doesn't lead to me
writing less code to accomplish the same thing; it just gives me a tool to check
the correctness of my code. (Yes, this is important. But we also have a ton of
tooling around those concerns already, even if they aren't proper static
analyzers.)
I totally agree. Hypothetical new tools or features because one or the
other RFC is chosen is totally irrelevant to this discussion.
Performance as well, as Dmitry mentioned again that strict dual mode
will be slower, it is not in any significant way (read: below measures
error margins).
From a developer experience factor, I find myself scratching my head: what are
we gaining with STH if we have a strict mode? I'm still writing exactly the same
code I am today to validate and/or cast my scalars before passing them to
functions and methods if I want to be strict.
Fair enough. But let other who do see benefits (see my numerous
comments, or from other) use it. On the other, I let you imagine if we
change the casting rules now, good luck.
The new coercive RFC offers much more promise to me as a consumer/user of the
language. The primary benefit I see is that it provides a path forward towards
better casting logic in the language, which will ensure that — in the future —
this:$value = (int) $value;
will operate properly, and raise errors when data loss may occur. It means that
immediately, if I start using STH, I can be assured that if my code runs, I
have values of the correct type, as they've been coerced safely. The lack of a
strict mode means I can drop that defensive validation/casting code safely.
Oh, I agree, clean the casting rules is totally necessary. But let
forget about adoption, ok?
I ignore totally the INI settings about it as it is such a bad idea
than I have no word to explain why....
My point is: I'm sick of writing code like this:
/** * @param int $code * @param string $reason */ public function setStatus($code, $reason = null) { $code = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` );
I suppose you are not sick to write bugs but made a type, s, $value,
$code, right?
Besides the killing a fly with a tank to do this kind of validations
like this, this is mainly due to the magic casting being inconsistent.
Both weaks (but creates potential BC problems on the caller side) and
strict solve this exact kind of validations.
What none of the RFC will change is the business logic related
validations, like ranges and the likes. These kind of validations
could be easily solve using annotations (and why this is what we need
next to scalar type hinting as well).
I want to be able to write this:
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
Hi Matthew,
Today, without scalar type hints, we end up writing code that has to first
validate that we have something we can use, and then cast it. This can often be
done with ext/filter, but it's horribly verbose:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Many people skip the validation step entirely for the more succinct:
$value = (int) $value;
And this is where problems occur, because this is when data loss occurs.
I would usually do:
is_int($value) or ctype_digit($value) and is_numeric($value) where
appropriate if I expect a later cast from a string. There are shorter
options that predate filter_var()
.
What I've observed in my 15+ years of using PHP is that people don't validate;
they either blindly accept data and assume it's of the correct type, or they
blindly cast it without validation because writing that validation code is
boring, verbose, and repetitive (I'm guilty of this myself!). Yes, you can
offload that to libraries, but why introduce a new dependency in something as
simple as a value object?
I think we should be very clear that lazy programming, of the bad
sort, should not in any way impede change.
Now, you may argue that you won't need to cast the value in the first place,
because STH! But what if the value you received is from a database? or from a
web request you've made? Chances are, the data is in a string, but the value
may be of another type. With weak/coercive mode, you just pass the data as-is,
but with strict enabled, your choices are to either cast blindly, or to do the
same validation/casting as before:$value = filter_var( $value, FILTER_VALIDATE_INT, `FILTER_FLAG_ALLOW_OCTAL` | `FILTER_FLAG_ALLOW_HEX` ); if (false === $value) { // throw an exception? }
Interestingly, this adds overhead to your application (more function calls), and
makes it harder to read and to maintain. Ironically, I foresee "strict" as being
a new "badge of honor" for many in the language ("my code works under strict
mode!"), despite these factors.
Pretty much all other languages face the exact same issue. Not all of
them are dynamically typed. I really do not see the string->int issue
as an issue for either RFC. You will have a string. It may need to be
an integer. It will be casted to an int if needed.
string->intent->scan->cast||error. Will it add overhead to your
application? I can't really say. Will you be writing these bitesize
casting functions? Chances are there'll be a package or core PHP
functions for that (perhaps, maybe) or your preferred ORM will do it
behind the scenes - once everyone has time to digest whatever passes.
At present we're at the "lick it" taste-test stage.
You can say, "But, Static Analysis!" all you want, but that doesn't lead to me
writing less code to accomplish the same thing; it just gives me a tool to check
the correctness of my code. (Yes, this is important. But we also have a ton of
tooling around those concerns already, even if they aren't proper static
analyzers.)
We do? Not to put words in your mouth, this is all me, a lot of the
tools people associate with testing (like the obvious one) require a
Human touch. Humans are lazy. I just wrote V2 of a tool to test the
testers. Many of these tools are not substitutes for type checking,
etc. They are complementary. Adding more code correctness checking
would generally be to the good, though we can all debate to what
degree.
From a developer experience factor, I find myself scratching my head: what are
we gaining with STH if we have a strict mode? I'm still writing exactly the same
code I am today to validate and/or cast my scalars before passing them to
functions and methods if I want to be strict.The new coercive RFC offers much more promise to me as a consumer/user of the
language. The primary benefit I see is that it provides a path forward towards
better casting logic in the language, which will ensure that — in the future —
this:$value = (int) $value;
will operate properly, and raise errors when data loss may occur. It means that
immediately, if I start using STH, I can be assured that if my code runs, I
have values of the correct type, as they've been coerced safely. The lack of a
strict mode means I can drop that defensive validation/casting code safely.
And the reply would be variations including "but was that the user's
intent?". Yes, you have the type you wanted. Was it the type the
caller intended you to have or did they do something in error that
just silently passed unnoticed?
<snip reason="the horror"> > > and _not_ push the burden on consumers to validate/cast their values.My point is: I'm sick of writing code like this:
And again, we're going to have to consider that burden. We're all
still thinking in terms of PHP5 and not considering how to be user
friendly, and the good type of lazy, under a stricter regime. Where
did the user get the status code from? Why did THAT not return an
integer? Don't we have those return types ready to rock? If this were
a response of some sort, why didn't they simply use an integer? Why
are we writing multiline blocks of code (everywhere and anywhere) and
not doing "extract method" followed by filling out the annoying
Packagist webhook for a git repo?
This is what I want from STH, no more no less: sane casting rules, and the
ability to code to scalar types safely. While I can see some of the benefits of
strict mode, I'm concerned about the schism it may create in the PHP library
ecosystem, and that many of the benefits of the coercive portion of that RFC
will be lost when working with data from unknown data sources.If you've read thus far, thank you for your consideration. I'll stop bugging you
now.
But you responded later! :)
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com