Hello Internals,
Implicit type coercions (when not using strict_types) have become
increasingly less lossy/surprising in PHP, especially coercions to
integer and float, where you get a TypeError if you pass a non-numeric
string to an integer parameter, and a deprecation notice if you pass a
float(-string) with a fractional part to an integer parameter. The big
exception so far is coercions to boolean, where you can provide any
scalar value and never get an error or a notice.
Any non-empty string (except "0") is converted to true and any non-zero
integer or float is converted to true. From my perspective this can
easily lead to hidden bugs, for example when passing the wrong variable
to a boolean argument or boolean property. Passing a string like "hello"
as a boolean is probably a bug, just like passing the number 854 or the
float 0.1 . "on" and "off" and "true" and "false" all lead to a boolean
true, as examples of strings that could be used in applications and
might not all be meant as a value of true.
I have not found any past proposals or discussions to change boolean
coercions, so I would like to find out how the thoughts on internals are
to change this, or if there are any reasons not to change this that I
have not thought of. Only allowing the following values would make sense
from my perspective:
'1' => true
1 => true
1.0 => true
'' => false
'0' => false
0 => false
0.0 => false
I can also see a case for allowing the strings 'true' and 'false', and
changing 'false' to be coerced to false, but that would be a BC break. I
am not sure if that is worthwhile.
Anything else would emit either a notice or a warning as a first step
(to be determined). My main goal would be to make these
probably-not-boolean usages more visible in codebases. Depending on the
feedback here I would create an RFC and try to do an implementation (to
then discuss it in more detail), so as of now this is mostly about
getting some basic feedback on such a change, and if someone else has
had any similar thoughts/plans.
Thanks for any feedback!
Andreas Leathley
Any non-empty string (except "0") is converted to true and any non-zero
integer or float is converted to true.
If we could get rid of "0" being false, that alone would be a huge
benefit for the language in the long run.
I know why it exists, but I don't think it should. In fact it never
should have existed in the first place.
Any non-empty string (except "0") is converted to true and any non-zero
integer or float is converted to true.If we could get rid of "0" being false, that alone would be a huge
benefit for the language in the long run.I know why it exists, but I don't think it should. In fact it never
should have existed in the first place.
For me, highlighting all the places where a possibly unintended
conversion to true is happening would make "0" a lot less bad. "0.0"
being silently true while "0" and 0.0 is false seems a bit awkward.
Am 26.04.2022 um 11:54 schrieb Andreas Leathley a.leathley@gmx.net:
I have not found any past proposals or discussions to change boolean
coercions, so I would like to find out how the thoughts on internals are
to change this, or if there are any reasons not to change this that I
have not thought of.
There are two big reasons:
- BC: Checking for the truthiness of a value is very common and would require a lot of code changes.
- Some of us like the conciseness of "if ($foo) ...", see below
I can also see a case for allowing the strings 'true' and 'false', and
changing 'false' to be coerced to false, but that would be a BC break. I
am not sure if that is worthwhile.
I'm definitely against adding more special cases like 'false'.
Side-note: Removing something like '0' => false is also a BC break, not just adding 'false'.
Anything else would emit either a notice or a warning as a first step
(to be determined). My main goal would be to make these
probably-not-boolean usages more visible in codebases. Depending on the
feedback here I would create an RFC and try to do an implementation (to
then discuss it in more detail), so as of now this is mostly about
getting some basic feedback on such a change, and if someone else has
had any similar thoughts/plans.
One of the big issues I have with this (as well as undefined variables not being allowed in if ($foo)) is that the replacement constructs are clunky:
if ($foo) =>. if (!empty($foo))
For me this is quite a regression in readability but then again that's a matter of taste.
And would !empty($foo) even work in your world or how would empty() be defined?
Regards,
- Chris
There are two big reasons:
- BC: Checking for the truthiness of a value is very common and would require a lot of code changes.
- Some of us like the conciseness of "if ($foo) ...", see below
That would not be my target - in an if expression a lot more values are
allowed anyway (arrays, objects, etc.), so this is not about determining
truthiness, but actual conversions to a bool type, like a bool
parameter, a bool return type or a bool property type. The inspiration
for this is the "Deprecate implicit non-integer compatible float to int
conversions" (https://wiki.php.net/rfc/implicit-float-int-deprecate),
minus the mathematical expressions that were considered there. It is
about avoiding probably-unintended values being coerced to boolean and
therefore losing information / adding unnoticed bugs to a codebase.
I'm definitely against adding more special cases like 'false'.
Side-note: Removing something like '0' => false is also a BC break, not just adding 'false'.
I am not suggesting removing the coercion from the string '0' to false
or changing anything about that.
One of the big issues I have with this (as well as undefined variables
not being allowed in if ($foo)) is that the replacement constructs are
clunky:
if ($foo) =>. if (!empty($foo))For me this is quite a regression in readability but then again that's a matter of taste.
And would !empty($foo) even work in your world or how would empty() be defined?
empty is also not something I would consider covering, as there you also
do not need a conversion to a bool type.
Am 26.04.2022 um 15:16 schrieb Andreas Leathley a.leathley@gmx.net:
There are two big reasons:
- BC: Checking for the truthiness of a value is very common and would require a lot of code changes.
- Some of us like the conciseness of "if ($foo) ...", see below
That would not be my target - in an if expression a lot more values are
allowed anyway (arrays, objects, etc.), so this is not about determining
truthiness, but actual conversions to a bool type, like a bool
parameter, a bool return type or a bool property type.
I see, so as long as there are no bool type hints for function parameters everything would be the same.
This would lead to a minor asymmetry for
$preserve = "yes";
if ($preserve) # Silently working, true
array_slice($array, $offset, preserve_keys: $preserve)); # Not working any more
I assume your solution would be to add an explicit cast to bool? i.e. something along the lines of
array_slice($array, $offset, preserve_keys: (bool)$preserve)); # Explicit cast to silence implicit conversion
I'm a bit worried about having to keep two different convert-to-bool rule sets in mind (implicit vs. explicit) and about the additional casts.
Regards,
- Chris
I see, so as long as there are no bool type hints for function parameters everything would be the same.
This would lead to a minor asymmetry for
$preserve = "yes";
if ($preserve) # Silently working, true
array_slice($array, $offset, preserve_keys: $preserve)); # Not working any moreI assume your solution would be to add an explicit cast to bool? i.e. something along the lines of
array_slice($array, $offset, preserve_keys: (bool)$preserve)); # Explicit cast to silence implicit conversionI'm a bit worried about having to keep two different convert-to-bool rule sets in mind (implicit vs. explicit) and about the additional casts.
There are already two different convert-to-bool rule sets for
array|object|resource|null, where they will be accepted in if
but rejected when
passed to a function taking a bool param:
$x = (object) ['a'=> 1];
if ($x) {
echo "if\n";
}
function takes_bool(bool $param) {}
takes_bool($x); // throws TypeError
Adding string|float to that list doesn't seem to be that big of a difference here.
Am 26.04.2022 um 11:54 schrieb Andreas Leathleya.leathley@gmx.net:
I have not found any past proposals or discussions to change boolean
coercions, so I would like to find out how the thoughts on internals are
to change this, or if there are any reasons not to change this that I
have not thought of.
I was actually thinking about this the other day, in the context of
adding new cast functions which reject more values than our current
explicit casts.
For integers, I can think of several levels of casts / type checks that
you could theoretically define:
- Plain type checking, no cast or coercion
- Non-lossy casts, e.g. (string)(int)'42' === '42'
- Unambiguous casts, e.g. (int)'42.0', (int)' 42 '
- Best-guess casts, e.g. (int)'99 red balloons' === 99, (int)'1e3' === 1000
- Zero failures, e.g. (int)'hello' === 0
When I started thinking about booleans, it was much harder to define
what makes sense. I've certainly seen new programmers confused that
(bool)'false' is true, and having it error would usually be more
helpful; but (bool)'on' being true is useful when dealing with HTML
forms, for instance.
And would !empty($foo) even work in your world or how would empty() be defined?
That's an interesting question; but I think it would be OK for empty()
to match with an explicit boolean cast, and both continue to accept
all values, while implicit casts become more strict. This is how
integers already work - (int)'hello' === 0, but passing 'hello' to an
int parameter is an error regardless of mode, as is 'hello' + 1
Regards,
--
Rowan Tommins
[IMSoP]
I was actually thinking about this the other day, in the context of
adding new cast functions which reject more values than our current
explicit casts.
This is also something I am interested in - having functions which do
the same as implicit type casts, so something like
"coerce_to_int('hello')" would lead to a TypeError like it would when
passing it to an int parameter, and maybe
"is_coerceable_to_int('hello')" which would return false, so it would be
possible to check values the same way as PHP does internally. The
current explicit cast functions are quite heavy-handed and not great for
every situation - when parsing a CSV or getting data from a database I
would rather coerce values where an error could occur if it doesn't make
sense than to explicitely cast and not even notice if the value made no
sense at all.
When I started thinking about booleans, it was much harder to define
what makes sense. I've certainly seen new programmers confused that
(bool)'false' is true, and having it error would usually be more
helpful; but (bool)'on' being true is useful when dealing with HTML
forms, for instance.
'on' is only true by "accident" though, because it is a non-empty
string, not because of its meaning, and then it is likely that the value
'off' could also be added at some point - which also would be true. One
of the big reasons of starting this discussion is because of HTML forms,
as that is where people might add values and not know what that leads to
in PHP. At these application boundaries it would be helpful to get a
clear message what value was converted to true (but might have lost
information by doing that) and what other value to use which is
considered more clear.
'on' is only true by "accident" though, because it is a non-empty
string, not because of its meaning, and then it is likely that the value
'off' could also be added at some point - which also would be true.
The reason I gave that particular example is that it's the default
submission value for an HTML checkbox when checked; if it's not checked,
it has no value at all (not even an empty string), so in that particular
context there is no corresponding "off".
I think it falls into the same category as something like '1e3' being
considered numeric - occasionally useful, but probably not worth the
potential confusion of a special case.
Regards,
--
Rowan Tommins
[IMSoP]
'on' is only true by "accident" though, because it is a non-empty
string, not because of its meaning, and then it is likely that the value
'off' could also be added at some point - which also would be true.The reason I gave that particular example is that it's the default
submission value for an HTML checkbox when checked; if it's not
checked, it has no value at all (not even an empty string), so in that
particular context there is no corresponding "off".
Interesting, I didn't know the default value of a checkbox is "on" if no
value is specified. That might make it another sensible value to accept
for implicit bool conversion, even though I am not sure how many
checkboxes are used without setting an explicit value, but it could be
considered an established value, especially with how commonplace HTML
forms and checkboxes are in PHP applications.
This is also something I am interested in - having functions which do
the same as implicit type casts, so something like
"coerce_to_int('hello')" would lead to a TypeError like it would when
passing it to an int parameter, and maybe
"is_coerceable_to_int('hello')" which would return false, so it would be
possible to check values the same way as PHP does internally. The
current explicit cast functions are quite heavy-handed and not great for
every situation - when parsing a CSV or getting data from a database I
would rather coerce values where an error could occur if it doesn't make
sense than to explicitely cast and not even notice if the value made no
sense at all.
Yep, that's pretty much exactly where I was going. My current thinking
is to have at least one of (deliberately long straw man names):
type_coerce_or_return_default(string $type, mixed $value, mixed
$default): mixed
type_coerce_or_throw_error(string $type, mixed $value): mixed
type_can_be_coerced(string $type, mixed $value): bool
Accepting the type as a string is ugly, but means we don't need three
functions for every type, and can easily support nullable types, union
types, etc. Crucially, no special syntax means it's possible to write
polyfills that compile in old PHP versions.
For simple types, you can mostly get away with one function:
$float_or_error = type_coerce_or_return_null('float', $var) ?? throw new
TypeError;
$is_valid = type_coerce_or_return_null('float', $var) !== null;
But that's no use when null is a valid return type, so you need a
hand-picked default and a bunch of extra boilerplate:
if ( $nullable_float = type_coerce_or_return_default('?float', $var,
false) === false ) { throw new TypeError; }
$is_valid = type_coerce_or_return_default('null|float|bool', $var,
'invalid value') !== 'invalid value';
The main point I'm stuck on at the moment is exactly how strict these
should be, since we already have more than one set of coercion rules for
each type, as Mel Dafert pointed out elsewhere on this thread. On the
one hand, they should probably match with at least some existing rules;
on the other, it would be weird to introduce them then immediately
deprecate some behaviour because we've decided to make the language
stricter elsewhere.
Regards,
--
Rowan Tommins
[IMSoP]
'on' is only true by "accident" though, because it is a non-empty
string, not because of its meaning, and then it is likely that the value
'off' could also be added at some point - which also would be true.The reason I gave that particular example is that it's the default
submission value for an HTML checkbox when checked; if it's not checked,
it has no value at all (not even an empty string), so in that particular
context there is no corresponding "off".
That's why you must test it with isset($_POST['checkbox_name']), not with
(bool)$_POST['checkbox_name'] or an implicit conversion.
if ( $nullable_float = type_coerce_or_return_default('?float', $var,
false) === false ) { throw new TypeError; }
Missing parentheses around the assignment (VS the comparison)!
Only allowing the following values would make sense
from my perspective:
'1' => true
1 => true
1.0 => true
'' => false
'0' => false
0 => false
0.0 => false
Seems a reasonable compromise (between BC and bugs-protection).
Regards,
--
Guilliam Xavier
On Tue, Apr 26, 2022 at 12:54 PM Andreas Leathley a.leathley@gmx.net
wrote:
Hello Internals,
Implicit type coercions (when not using strict_types) have become
increasingly less lossy/surprising in PHP, especially coercions to
integer and float, where you get a TypeError if you pass a non-numeric
string to an integer parameter, and a deprecation notice if you pass a
float(-string) with a fractional part to an integer parameter. The big
exception so far is coercions to boolean, where you can provide any
scalar value and never get an error or a notice.Any non-empty string (except "0") is converted to true and any non-zero
integer or float is converted to true. From my perspective this can
easily lead to hidden bugs, for example when passing the wrong variable
to a boolean argument or boolean property. Passing a string like "hello"
as a boolean is probably a bug, just like passing the number 854 or the
float 0.1 . "on" and "off" and "true" and "false" all lead to a boolean
true, as examples of strings that could be used in applications and
might not all be meant as a value of true.I have not found any past proposals or discussions to change boolean
coercions, so I would like to find out how the thoughts on internals are
to change this, or if there are any reasons not to change this that I
have not thought of. Only allowing the following values would make sense
from my perspective:'1' => true
1 => true
1.0 => true
'' => false
'0' => false
0 => false
0.0 => falseI can also see a case for allowing the strings 'true' and 'false', and
changing 'false' to be coerced to false, but that would be a BC break. I
am not sure if that is worthwhile.Anything else would emit either a notice or a warning as a first step
(to be determined). My main goal would be to make these
probably-not-boolean usages more visible in codebases. Depending on the
feedback here I would create an RFC and try to do an implementation (to
then discuss it in more detail), so as of now this is mostly about
getting some basic feedback on such a change, and if someone else has
had any similar thoughts/plans.
I thought about coercion on parameters, return value and property types a
few weeks ago as well, both to and from bool:
In my opinion, only int should be coerced:
- bool to int: false to 0 and true to 1
- int to bool: 0 to false and anything else to true
That's because sometimes boolean values are stored numerically as 0 or 1.
What can be coerced but it might be good to remove as well:
- int numeric string to bool: '0' to false and anything else to true,
validity of a int numeric string, being considered the same as when
coercing string to int (empty string is not valid).
What should not be coerced:
- bool to string: false to '' and true to '1'
- string to bool: '' to false and anything else to true
- bool to float: false to 0.0 and true to 1.0
- float to bool: 0.0 to false and anything else to true
To me, all of these look like possible bugs and an explicit conversion
should be used if correctly intended.
Other types already raise errors to and from bool:
null|object|resource|array.
In terms of RFC, it would be pretty controversial as there's a lot of BC
breaks.
"bool to string" and "bool to float" are probably the most clear cases.
I have no plans to initiate a RFC in the near future, just sharing my
thoughts about it mostly.
Regards,
Alex