Hello Internals,
After the first discussion about this topic
(https://externals.io/message/117608) I have created a preliminary
implementation and an RFC for making implicit boolean type coercions
more strict:
https://wiki.php.net/rfc/stricter_implicit_boolean_coercions
With this email I'm starting the two week discussion period. I welcome
any feedback on it and hope to further iron out the implementation if
needed. I mainly chose the route of introducing a deprecation notice
because it is in line with other RFCs that have similar goals (like the
Deprecate implicit non-integer-compatible float to int conversions RFC),
and it is fairly non-intrusive.
Best regards,
Andreas
Hi Andreas,
Has any case study been done already about how it will affect existing
codebases?
Regards,
Kamil
Hello Kamil,
I suspect this is very different depending on the codebase. My main
reason for introducing this deprecation notice is to start highlighting
possible problems in codebases where nobody even suspected them before.
In one application recently I actually had the string "false" (coming
from a form transmission) being passed to a boolean argument and leading
to true, which definitely was unintended.
I have tried coming up with reasons why these deprecation notices could
be harmful or unnecessary, but I think anytime they occur there is a
legitimate reason to check the code (and make it more explicit). If an
integer 5 is passed to a boolean type, how is that not suspicious? Yet I
do think this is a deprecation notice that will be far less common than
some in recent history (like "non-integer-compatible float to int
conversions" or "passing null to non-nullable parameters of built-in
functions"), because there are far less boolean parameters in the
built-in functions of PHP compared to string/int, and I'd wager bool is
also less used in codebases compared to string/int/float.
If you or anybody else has a recommendation on how to measure how this
would affect codebases and what information in that regard would be
useful, I am open to look into that more. The impact should be rather
low in general because it is only a deprecation notice, and I think a
lot of codebases are getting used to dealing with deprecations more with
the last two PHP versions.
Best regards,
Andreas
Hi Andreas,
Has any case study been done already about how it will affect existing
codebases?Regards,
Kamil
https://wiki.php.net/rfc/stricter_implicit_boolean_coercions
Has any case study been done already about how it will affect existing codebases?
The last time this happened there were no checks:
https://wiki.php.net/rfc/deprecate_null_to_scalar_internal_arg
https://externals.io/message/112327
And unfortunately the result was not good for that one :-(
But, at least with this RFC, I can see how some coercions can be problematic, like the string "false" being cast to true.
With any of these changes, I just wonder what the costs/benefits are, and if there are any edge cases / oddities.
Craig
Hello Internals,
After the first discussion about this topic
(https://externals.io/message/117608) I have created a preliminary
implementation and an RFC for making implicit boolean type coercions
more strict:https://wiki.php.net/rfc/stricter_implicit_boolean_coercions
With this email I'm starting the two week discussion period. I welcome
any feedback on it and hope to further iron out the implementation if
needed. I mainly chose the route of introducing a deprecation notice
because it is in line with other RFCs that have similar goals (like the
Deprecate implicit non-integer-compatible float to int conversions RFC),
and it is fairly non-intrusive.Best regards,
Andreas
Hi Andreas,
Thanks! Am I right that it only affects type declarations, and the
"Unaffected PHP Functionality" section could also mention implicit
boolean evaluation in if
, ternary conditional (?:) and logical
operators (!, &&, ||, and, or, xor)?
Regards,
--
Guilliam Xavier
Thanks! Am I right that it only affects type declarations, and the
"Unaffected PHP Functionality" section could also mention implicit
boolean evaluation inif
, ternary conditional (?:) and logical
operators (!, &&, ||, and, or, xor)?
Yes, that is correct and I just added a note about that in the RFC -
thanks for the suggestion! I also added a part about how to avoid the
deprecation notice, which should also make it clearer / add some context.
Hello Internals,
After the first discussion about this topic
(https://externals.io/message/117608) I have created a preliminary
implementation and an RFC for making implicit boolean type coercions
more strict:https://wiki.php.net/rfc/stricter_implicit_boolean_coercions
Thanks for the RFC. I think it's overall a good idea, especially for
cases like "false" => true, and arguably for ints >1/<0, but my gut
feeling is the string=>bool deprecation will lead to a lot of pain.
I definitely see this being done in many places where you expect any
string value to be true and an empty string to be false (without
consideration for the "false" case which is more common in config style
use cases).
I don't know what you could do to improve this without making the RFC
useless though. I wish these things could be tried out in alpha or even
up to early RC builds so we could hopefully get community feedback in
before deciding whether it's worth the pain inflicted or not.
Best,
Jordi
--
Jordi Boggiano
@seldaek - https://seld.be
Thanks for the RFC. I think it's overall a good idea, especially for
cases like "false" => true, and arguably for ints >1/<0, but my gut
feeling is the string=>bool deprecation will lead to a lot of pain.I definitely see this being done in many places where you expect any
string value to be true and an empty string to be false (without
consideration for the "false" case which is more common in config
style use cases).I don't know what you could do to improve this without making the RFC
useless though. I wish these things could be tried out in alpha or
even up to early RC builds so we could hopefully get community
feedback in before deciding whether it's worth the pain inflicted or not.
For what its worth, I already found some bugs in the php-src tests where
a function definition was probably changed at some point and a string
was passed to a bool argument which was obviously not intended, or where
a weird value was used for a boolean argument without any apparent
reason. These are the type of unintended coercions I am trying to bring
to light with the RFC.
Using any string as true is also already a bit dangerous: "0" is a
special case string that results to false, which adds a big wrinkle to
the assumption "non-empty string is true".
My main arguments why I think this will not lead to too much unnecessary
pain:
- Each individual case is easy to fix, the easiest (but also least
useful) would be to loosly compare a value to true ($value == true)
instead of directly giving the value to a typed bool - bool arguments for internal functions are usually optional, less
numerous and are much more likely to be set by a constant expression
than a variable - deprecation notices are easy to ignore, and the "disadvantage" of
the high number of deprecation notices with 8.0 and 8.1 should be
that most tooling and codebases have gotten more used to dealing
with them without too much panic
I also added these points to the RFC, because I think there is some
resentment built up for deprecation notices by now, which I can understand.
Hello Internals,
After the first discussion about this topic
(https://externals.io/message/117608) I have created a preliminary
implementation and an RFC for making implicit boolean type coercions
more strict:https://wiki.php.net/rfc/stricter_implicit_boolean_coercions
With this email I'm starting the two week discussion period. I welcome
any feedback on it and hope to further iron out the implementation if
needed. I mainly chose the route of introducing a deprecation notice
because it is in line with other RFCs that have similar goals (like the
Deprecate implicit non-integer-compatible float to int conversions RFC),
and it is fairly non-intrusive.
This RFC worries me as, in my opinion, it makes PHP's behaviour more
surprising and inconsistent, not less.
It also raises the cognitive complexity for developers by yet another level.
- It introduces a new interpretation of boolean type coercion, which is
only applied against type declarations. - This new interpretation is not consistent with "normal" boolean type
coercion, not consistent with strict types (no coercion) and also not
consistent with what is considered a valid boolean value by the Filter
extension.
While I agree that the current behaviour of PHP can hide bugs, I fear
that even more bugs will be introduced when PHP contains yet a third
form of coercion to boolean and the type coercion used is dependent on
the context.
I see enough bugs on a daily basis which are/were introduced because
people don't know the type coercion rules well enough. Adding yet
another contextual layer to type coercion, makes things MORE complex,
not less.
I fear this will only lead to more bugs, not less (because people new to
PHP will learn the "type declaration" based boolean coercion rules and
assume they apply everywhere).
I also fear that for code bases which do not (yet) use scalar type
declarations, this will be one more argument not to introduce scalar
type declarations (while they should).
I'd say that for this RFC to be acceptable it would need to apply to all
implicit type coercions to boolean. However, the BC-break that would
cause and the fall-out of this for non-greenfields codebases is just too
huge, which, to me, makes this RFC a very strong no-no.
All in all, I largely agree with the flaws in this proposal as
previously pointed out by Christian Schneider in the preliminary
discussion. And I don't see those concerns addressed in the RFC (other
than making it more explicit what the actual intended change is).
Smile,
Juliette
I also fear that for code bases which do not (yet) use scalar type declarations, this will be one more argument not to introduce scalar type declarations (while they should).
I'd say that for this RFC to be acceptable it would need to apply to all implicit type coercions to boolean. However, the BC-break that would cause and the fall-out of this for non-greenfields codebases is just too huge, which, to me, makes this RFC a very strong no-no.
Forgive my ignorance, but when scalar types were introduced, I initially assumed type coercion would use the same rules as strval($var)
, (string) $val
, 'a' . $val
, intval($val)
, etc. I now appreciate that isn't the case, but I never understood why the following functions were not equivalent:
function my_function1($s, $i, $f, $b) {
$s = strval($s);
$i = intval($i);
$f = floatval($f);
$b = boolval($b);
var_dump($s, $i, $f, $b);
}
function my_function2(string $s, int $i, float $f, bool $b) {
var_dump($s, $i, $f, $b);
}
Where the second function, using scalar type declarations, has the advantage of being much shorter, and clearer for Static Analysis, IDE's, documentation, etc... if only it was as easy to add to existing projects.
This isn't to say boolval()
and other ways of changing type to a boolean couldn't throw a type error with unrecognised strings (e.g. "false"), and doing so might make sense (although I do share Juliette's concerns about the BC-breaks).
Craig
This RFC worries me as, in my opinion, it makes PHP's behaviour more
surprising and inconsistent, not less.It also raises the cognitive complexity for developers by yet another
level.
- It introduces a new interpretation of boolean type coercion, which
is only applied against type declarations.- This new interpretation is not consistent with "normal" boolean
type coercion, not consistent with strict types (no coercion) and also
not consistent with what is considered a valid boolean value by the
Filter extension.
I am not sure what you mean by "normal" boolean type coercion and how it
would be inconsistent with those. The RFC does not change the behavior
of boolean type coercions, it just points out possibly surprising coercions.
The filter extension has its very own interpretation of how to validate
a boolean which is completely incompatible with how PHP does implicit
type coercions already. Yet people using the filter extension are not
affected by this RFC at all, they have already chosen their way to
convert a value to boolean. But comparing the filter extension with the
implicit type conversion does show how dangerous it can be to for
example switch from FILTER_VALIDATE_BOOLEAN
(with
FILTER_NULL_ON_FAILURE) to the implicit boolean conversions from PHP:
with the filter extension "false", "off" and "no" are converted to
false, while all these values are silently converted to true by implicit
boolean conversion by PHP. People switching from the filter extension to
the built-in types of PHP have another way of shooting themselves in the
foot, and it is easy to miss the change in behavior.
While I agree that the current behaviour of PHP can hide bugs, I fear
that even more bugs will be introduced when PHP contains yet a third
form of coercion to boolean and the type coercion used is dependent on
the context.
Assuming from the rest of your response, I am assuming you mean
"truthiness" comparisons, like in if statements and other expressions.
Those are already different from coercions to typed booleans and have
different semantics. In an if statement you can check for truthiness
with any type, while a typed boolean only accepts scalar values. I am
seeing an advantage of further separating those use cases, as a
misconception seems to exist that a check for truthiness is just a
conversion to boolean. Having clearer semantics would actually lower
cognitive complexity from my perspective - knowing that only 7 values
are considered sensible for a typed boolean and getting a notice for any
other values makes it a lot clearer how it behaves and when I as a
developer will get warned.
I fear this will only lead to more bugs, not less (because people new
to PHP will learn the "type declaration" based boolean coercion rules
and assume they apply everywhere).
But they would actually apply anywhere, as far as I know: My suggested
allowed values for typed booleans can be used in if statements and lead
to the exact same behavior.
I also fear that for code bases which do not (yet) use scalar type
declarations, this will be one more argument not to introduce scalar
type declarations (while they should).
My suggestion is far less impactful for "legacy" codebases compared to
the other scalar type checking - an "int" parameter only accepting valid
numbers and leading to a TypeError otherwise is a much bigger hurdle
from my perspective than introducing a deprecation notice for a suspect
boolean type coercion. I cannot really believe that "failed" not being
auto-coerced to true will be the straw that breaks the camels back in
legacy codebases. And again: It is just a deprecation notice, I even
amended the RFC to say that this being elevated to a TypeError will have
to be decided after the impact of the deprecation notices become clear,
not now. My goal is not necessarily to be as strict as possible but
rather point out to developers where something might be going wrong.
I'd say that for this RFC to be acceptable it would need to apply to
all implicit type coercions to boolean. However, the BC-break that
would cause and the fall-out of this for non-greenfields codebases is
just too huge, which, to me, makes this RFC a very strong no-no.
I will amend the RFC to show how typed booleans already have their own
type coercions that are not the same as the truthiness checks in if or
similar expressions - thanks for pointing that out and showing that
there is some confusion there that might benefit from a clearer
explanation. Changing how truthiness is evaluated in PHP makes no sense
to me though, and I don't think such a drastic step is necessary to get
a lot of benefits.
If you have further feedback, I would be happy to hear it. So far I have
not gotten much feedback, and most of it was going towards being
positive, so being able to get to know the other perspective would help
to address it in the RFC and improve it overall.
All in all, I largely agree with the flaws in this proposal as
previously pointed out by Christian Schneider in the preliminary
discussion. And I don't see those concerns addressed in the RFC (other
than making it more explicit what the actual intended change is).
I amended the RFC and tried to address and include all of your points,
as well as made it clearer what the intention of the RFC is. Adding the
behavior of the filter extension was a sensible addition, I didn't even
think of its special behavior before you mentioned it, and going into
more detail about the other boolean coercions in PHP was also something
that needed a more explicit section in the RFC, as it has been brought
up by different people in different contexts already and I seem to have
prematurely dismissed its importance so far.
If you still think other concerns are not addressed I would be happy to
know about them.
Hello Internals,
After the first discussion about this topic
(https://externals.io/message/117608) I have created a preliminary
implementation and an RFC for making implicit boolean type coercions
more strict:https://wiki.php.net/rfc/stricter_implicit_boolean_coercions
With this email I'm starting the two week discussion period. I welcome
any feedback on it and hope to further iron out the implementation if
needed. I mainly chose the route of introducing a deprecation notice
because it is in line with other RFCs that have similar goals (like the
Deprecate implicit non-integer-compatible float to int conversions RFC),
and it is fairly non-intrusive.Best regards,
Andreas
I'll echo Juliette's comments.
I don't like this RFC as it introduces special coercion semantics for
boolean only in a function (+ typed properties (well at least I hope it
impacts typed properties)) context.
The obvious other context is the logical one with conditional statements
and/or boolean operators (&&, ||, etc.) where it is pretty typical to do if
($string) {} and use $string as a boolean value.
Having this be true in some contexts and false in others, depending on the
content of the string is far from ideal.
However, implicit coercion to bool can also arise when doing comparisons
where one of the operands is null or bool.
In this case which semantics should be used?
The proposed behaviour is in stark contrast to the one I brought forward
with the implicit float to int coersions.
As this deprecation occurs in all instances (moduli explicit casts).
This is also the reason why we stopped pursuing the deprecating implicit
bool to string RFC as using a boolean value within string
interpolation/echoing is somewhat common and would not reduce the cognitive
burden as one would need to remember yet another case about implicit
coercions.
Even the Coercive types for Function arguments RFC [1] which went further
in differentiating function scalar types declaration and what values would
be accepted did not include such a restriction for bool type declaration.
Where it does propose something somewhat sensible by deprecating float to
bool coercions altogether as comparing equality for floats is far from
ideal due to approximation errors.
Can this prevent bugs, for sure, but within the function context we already
have a tool to deal with these sorts of issues with strict_types.
Best regards,
George P. Banyard
I don't like this RFC as it introduces special coercion semantics for
boolean only in a function (+ typed properties (well at least I hope it
impacts typed properties)) context.The obvious other context is the logical one with conditional statements
and/or boolean operators (&&, ||, etc.) where it is pretty typical to do if
($string) {} and use $string as a boolean value.
Having this be true in some contexts and false in others, depending on the
content of the string is far from ideal.
However, implicit coercion to bool can also arise when doing comparisons
where one of the operands is null or bool.
In this case which semantics should be used?
The problem is that coercion to a typed boolean is not the same as
checking an expression for truthiness. It is somewhat related, but when
there is a coercion to a typed boolean only scalar values can be
coerced, which makes a big difference. Something like
if ([''])
is basically the same as
if ([''] == true)
which is why I call it checking for truthiness. Yet
$obj->booleanProperty = [''];
will lead to a type error. So there is a clear difference there already.
I also think the expectation is a different one - something like
if ($string)
is, as far as I have seen, most often used to check if a string it not
empty. Yet if you do
$obj->booleanProperty = $string;
you then seem to specifically want to end up with a boolean type, not do
a fuzzy check for truthiness. To me that is a very different situation
where I actually find the current behavior surprising, in that is does
not matter at all what scalar value I give a typed boolean, it will be
happy with anything, and if it was a long string that was converted to
boolean, you do possibly lose information. That can easily happen if
someone changes a property from a string to boolean, but still assigns
it a string somewhere.
As soon as you use operators like && or || the intention becomes clearer
again:
$obj->booleanProperty = $string && $string2;
There you are not passing the string directly to a boolean property, you
are checking an expression, and you can even do
$obj->booleanProperty = $array && $array2; // cannot pass an array to a
boolean, but you can check an array (or two) for truthiness / not being
empty
Can this prevent bugs, for sure, but within the function context we already
have a tool to deal with these sorts of issues with strict_types.
But could you not argue the same for integers too? Why when you pass a
string to an int parameter does it check if it is numeric, and not
auto-coerce "hello" to 0? That is different behavior too, and one can
use strict_types or explicit coercions instead. My argument is not that
the proposal in this RFC is somewhat elegant or solves deep-rooted
issues in the language. Similar to the numeric checks for int parameters
I am interested in this for purely practical reasons. If somewhere in
your application the integer -376 is passed to a boolean property (and
coerced to true), would you not rather know about it? Is it not likely a
bug, or at least an oversight? Would it not be easier on developers to
know that only 7 scalar values are considered "unambigous" when passing
to a typed boolean, and that they would be informed if they pass
something else?
I already rewrote the passage about raising this to a TypeError, where I
state that this should be determined later when there is sufficient
experience with these deprecation notices, and for me that is not really
urgent - if this just remains an unintrusive deprecation notice for the
next 10 years that would be fine with me. But I am convinced this will
be a useful notice in many applications, where bugs can be found that
otherwise would have stayed hidden for a very long time.
I appreciate your feedback, and if I can improve anything about the RFC
I am open to ideas. I think your RFCs for the type system have made PHP
a much better language, and you seem to think about the big picture of
the types in PHP, which is important for the language. This RFC might
seem like a niche specific change and it does not affect much of PHP
itself, but to me that could be what makes it a useful addition - if it
mainly reveals bugs in applications, couldn't that be enough of a reason
in favor of it?
I have created a preliminary
implementation and an RFC for making implicit boolean type coercions
more strict:https://wiki.php.net/rfc/stricter_implicit_boolean_coercions
With this email I'm starting the two week discussion period.
"When discussion ends, and a minimum period of two weeks has passed"
Fyi, the two weeks is a minimum, and almost certainly not enough time
for subtle BC breaking RFCs like this.
I appreciate your feedback, and if I can improve anything about the RFC
I think you should really explicitly list what the changes are in a
single table rather than in sentences.
Would you want to know when a value like -375, “false” or NaN
is given to a typed boolean (and coerced to true) in a codebase?
Yes.
Which is why I always use both PHPStan and Psalm to detect those types
of things where those types of bugs are important.
Also, strict mode is great.
In one application recently I actually had the string "false" (coming
from a form transmission) being passed to a boolean argument and leading
to true, which definitely was unintended.
But "false" is a perfectly sensible thing to pass as a string in an
API (as HTTP is a string based protocol). As is 'faux' if your API is
used by French speaking programmers.
You need an layer of code that converts from strings to the precise
types your API needs, rather than just passing values straight
through.
if it mainly reveals bugs in applications, couldn't that
be enough of a reason in favor of it?
Although this change would probably detect some bugs, it's not at all
obvious that the amount of pain would be worth it.
A lot of applications out there aren't being constantly developed.
Instead they are in maintenance mode, where there isn't a programmer
dedicated to constantly work on it. There would be lots of function
calls to check, and some of them would need code to be modified to
maintain the existing behaviour. And all of that would seem to be
subtle work, prone to mistakes.
cheers
Dan
Ack
In one application recently I actually had the string "false" (coming
from a form transmission) being passed to a boolean argument and leading
to true, which definitely was unintended.But "false" is a perfectly sensible thing to pass as a string in an
API (as HTTP is a string based protocol). As is 'faux' if your API is
used by French speaking programmers.You need an layer of code that converts from strings to the precise
types your API needs, rather than just passing values straight
through.
In my opinion, this is a good point in the RFC's favour, as the depreciation warning
will show up in exactly those places where this conversion layer was
erroneously missing.
"When discussion ends, and a minimum period of two weeks has passed"
Fyi, the two weeks is a minimum, and almost certainly not enough time
for subtle BC breaking RFCs like this.
I explicitely stated that to make it clear that this should be
considered a discussion period. I am not trying to circumvent anything,
just make it clear that now is the time to discuss this. And I am ready
to fully discuss this, as hopefully I have shown so far.
I am also not sure what you mean with "subtle BC breaking RFCs". How is
this RFC subtly BC breaking? It introduces a deprecation notice, that is
it - for one that is not subtle, it is explicit, and it does not break
BC except for introducing said notice. Is that not the very minimal BC
break you can achieve to highlight a possible problem in code?
I think you should really explicitly list what the changes are in a
single table rather than in sentences.
What do you think is unclear, and what would you include in such a
table? In the Proposal section I listed all values which are considered
allowed, and I tried to include more examples there too to highlight the
outcome. The behavior is not changed by this RFC.
Would you want to know when a value like -375, “false” or NaN
is given to a typed boolean (and coerced to true) in a codebase?
Yes.Which is why I always use both PHPStan and Psalm to detect those types
of things where those types of bugs are important.Also, strict mode is great.
Even when using PHPStan and Psalm you can encounter such a value, for
example coming from a form, or an API. Using strict mode is a
possibility, but also a big hammer - so how do you coerce a value coming
from a form in strict mode? With an explicit coercion like (bool) or
boolval()
? Because there you might also be losing information
unexpectedly - I do have some ideas to enable implicit coercions in
strict mode, because you might not want to convert "failed" from an API
to true by using explicit coercions, in my applications I would prefer
to at least notice or even throw an exception in such cases, and the
programming language streamlining this could be helpful.
In one application recently I actually had the string "false" (coming
from a form transmission) being passed to a boolean argument and leading
to true, which definitely was unintended.
But "false" is a perfectly sensible thing to pass as a string in an
API (as HTTP is a string based protocol). As is 'faux' if your API is
used by French speaking programmers.You need an layer of code that converts from strings to the precise
types your API needs, rather than just passing values straight
through.
That sounds good in theory, but it is the exact thing that is so hard in
practice, when dealing with forms, APIs, different programming
languages, different human languages, different input formats, changing
code, etc. I am not against writing great code and checking all values
all the time, but I do think this is not the reality.
Although this change would probably detect some bugs, it's not at all
obvious that the amount of pain would be worth it.A lot of applications out there aren't being constantly developed.
Instead they are in maintenance mode, where there isn't a programmer
dedicated to constantly work on it. There would be lots of function
calls to check, and some of them would need code to be modified to
maintain the existing behaviour. And all of that would seem to be
subtle work, prone to mistakes.
Earlier you argued that there is no need to detect -375, "false" or NaN
being coerced to a boolean type, because one can use strict mode and
static analyzers, now you argue that legacy applications would have too
much work generated by this RFC. But I don't quite understand why: This
RFC introduces deprecation notices, not TypeErrors. It does not force
legacy applications to change anything, it just points out boolean
coercions that seem dodgy. If these deprecation notices provide little
value in the upcoming years, there is no need to promote them to a
TypeError, they could even be removed again - but I do think they will
provide value and often point out bugs, so much so that a TypeError will
seem reasonable at some point in the future, but there really is no
hurry in escalating it. For now just being able to notice these boolean
coercions would make a huge difference.
I have amended the RFC on
https://wiki.php.net/rfc/stricter_implicit_boolean_coercions to address
the feedback I got so far, I also added an overview of scalar type
coercions to give a better sense of how these changes fit in with
current type coercion behavior, and I added a Future Scope section with
my plans to at some point do a follow-up RFC to make the implicit scalar
type coercion behavior available to PHP developers explicitly and why
that might make sense.
Does anyone have any more feedback or questions on this, or something
that is not considered? If not, I would start voting on Sunday (5th of
June), but I am happy to extend the discussion time if needed.