[RFC] Strict operators directive

6 years ago by Arnold Daniels — view source

unread

Hi all,

I would like to open the discussion for RFC: "Strict operators directive".

This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to;

Typecasting is not based on the type of the other operand
Typecasting is not based on the value of any of the operands
Operators will throw a TypeError for unsupported types

Reasoning; The current rules for type casting done by operators are inconsistent and complex, which can lead to surprising results where a statement seemingly contradicts itself.

Using a directive means that backwards compatibility is guaranteed.

https://wiki.php.net/rfc/strict_operators

Yours,
Arnold Daniels

Arnold Daniels - Chat @ Spike [1mzl6]

6 years ago by Guilliam Xavier — view source

unread

On Tue, Jun 25, 2019 at 3:09 PM Arnold Daniels
arnold.adaniels.nl@gmail.com wrote:

Hi all,

I would like to open the discussion for RFC: "Strict operators directive".

This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to;

Typecasting is not based on the type of the other operand

Typecasting is not based on the value of any of the operands

Operators will throw a TypeError for unsupported types

Reasoning; The current rules for type casting done by operators are inconsistent and complex, which can lead to surprising results where a statement seemingly contradicts itself.

Using a directive means that backwards compatibility is guaranteed.

https://wiki.php.net/rfc/strict_operators

Yours,
Arnold Daniels

Arnold Daniels - Chat @ Spike [1mzl6]

Hello, thanks for the impressive work...
I have just one interrogation: why disallow ~ for strings?
(e.g. currently ~"\x00\x01\x02" gives "\xFF\xFE\xFD")

--
Guilliam Xavier

6 years ago by Arnold Daniels — view source

unread

On Tue, Jun 25, 2019 at 7:56 PM Guilliam Xavier guilliam.xavier@gmail.com
wrote:

On Tue, Jun 25, 2019 at 3:09 PM Arnold Daniels
arnold.adaniels.nl@gmail.com wrote:

Hi all,

I would like to open the discussion for RFC: "Strict operators
directive".

This RFC proposes a new directive 'strict_operators'. When enabled,
operators may cast operands to the expected type, but must comply to;

Typecasting is not based on the type of the other operand

Typecasting is not based on the value of any of the operands

Operators will throw a TypeError for unsupported types

Reasoning; The current rules for type casting done by operators are
inconsistent and complex, which can lead to surprising results where a
statement seemingly contradicts itself.

Using a directive means that backwards compatibility is guaranteed.

https://wiki.php.net/rfc/strict_operators

Yours,
Arnold Daniels

Arnold Daniels - Chat @ Spike
[1mzl6]

Hello, thanks for the impressive work...
I have just one interrogation: why disallow ~ for strings?
(e.g. currently ~"\x00\x01\x02" gives "\xFF\xFE\xFD")

--
Guilliam Xavier

Using ~ for strings should be allowed. I fixed it in the RFC.

Well spotted.

Arnold

6 years ago by Benjamin Morel — view source

unread

Impressive work indeed, this would be a perfect addition to strict_types
that would remove a lot of WTFs while preserving BC with older code.

Please note that the formatting of the RFC is broken after the Bitwise
Operators section.

Ben

6 years ago by Joe Watkins — view source

unread

Evening,

There doesn't seem to be a patch or implementation.

Aside from the proposed semantics, which I can't really read because the
document is malformed, the most important questions for me are: How is this
going to work? Can it be done without significant complexity in the
compiler or VM?

Without an implementation I can't really consider the ideas proposed,
because they are just ideas without proof that they are reasonably
implementable.

While you can technically move forward with an RFC without implementation,
in this case the implementation should inform our decision at vote time.

Cheers
Joe

Impressive work indeed, this would be a perfect addition to strict_types
that would remove a lot of WTFs while preserving BC with older code.

Please note that the formatting of the RFC is broken after the Bitwise
Operators section.

Ben

6 years ago by Christian Schneider — view source

unread

Am 25.06.2019 um 15:09 schrieb Arnold Daniels arnold.adaniels.nl@gmail.com:

This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to;

Typecasting is not based on the type of the other operand

Typecasting is not based on the value of any of the operands

Operators will throw a TypeError for unsupported types

While I understand that some people don't like the way PHP does type conversions I think this proposal creates a much bigger element of surprise when copying PHP code from one place to another than all the .ini-settings ever did.

It basically creates two languages in one and I won't be able to determine what
$a == 42
exactly does without having to look at the header of the file.

I'm inclined to say that if you want to make PHP a new language with a new core type concept then you should fork it and call it something else to avoid confusion.

Chris

6 years ago by Arnold Daniels — view source

unread

Fixed the formatting. Sorry about that. :-s

I really want to have a discussion prior to creating, to make sure there is
consensus on what should be implemented. However, I will create a patch
prior to voting.

The implementation I have in mind is;

add a flag to CG(active_op_array)->fn_flags (similar to
strict_types) *.
split function get_binary_op into get_binary_op_standard and a new
function get_binary_op_strict, where get_binary_op calls either based
on the op flag **.
add new functions for strict operators to zend_operators.c

https://github.com/php/php-src/blob/065559828022b37e88fc8eae4194efafea1b1506/Zend/zend_compile.c#L5127
**
https://github.com/php/php-src/blob/e18c60cd8dfed02311ebb3d11e3543d9a99c7c2a/Zend/zend_opcode.c#L1023

As proof of concept, I've created a test where the strict_types directive
affects the == and != operators, making them do an 'identical', resp
'not identical' operation.
https://github.com/jasny/php-src/compare/PHP-7.4...jasny:strict_types-affect-operators-test
( to test build branch and run
https://gist.github.com/jasny/eacd187c949459b70d8f8f0818411f0a )

I've added this information to the RFC. Any suggestions or remarks on the
way to implement this are appreciated.

Evening,

There doesn't seem to be a patch or implementation.

Aside from the proposed semantics, which I can't really read because the
document is malformed, the most important questions for me are: How is this
going to work? Can it be done without significant complexity in the
compiler or VM?

Without an implementation I can't really consider the ideas proposed,
because they are just ideas without proof that they are reasonably
implementable.

While you can technically move forward with an RFC without implementation,
in this case the implementation should inform our decision at vote time.

Cheers
Joe

On Tue, 25 Jun 2019, 23:19 Benjamin Morel, benjamin.morel@gmail.com
wrote:

Impressive work indeed, this would be a perfect addition to strict_types
that would remove a lot of WTFs while preserving BC with older code.

Please note that the formatting of the RFC is broken after the Bitwise
Operators section.

Ben

6 years ago by Claude Pache — view source

unread

Le 26 juin 2019 à 08:50, Christian Schneider cschneid@cschneid.com a écrit :

Am 25.06.2019 um 15:09 schrieb Arnold Daniels arnold.adaniels.nl@gmail.com:

This RFC proposes a new directive 'strict_operators'. When enabled, operators may cast operands to the expected type, but must comply to;

Typecasting is not based on the type of the other operand

Typecasting is not based on the value of any of the operands

Operators will throw a TypeError for unsupported types

While I understand that some people don't like the way PHP does type conversions I think this proposal creates a much bigger element of surprise when copying PHP code from one place to another than all the .ini-settings ever did.

It basically creates two languages in one and I won't be able to determine what
$a == 42
exactly does without having to look at the header of the file.

I'm inclined to say that if you want to make PHP a new language with a new core type concept then you should fork it and call it something else to avoid confusion.

Chris

Indeed. The directive may make operators more strict in what they accept, but it should avoid changing the semantics. Concretely, we must have either:

"120" > "99.9"; // true

or:

"120" > "99.9"; // TypeError

Anything else will bring confusion.

—Claude

6 years ago by Benjamin Morel — view source

unread

"120" > "99.9"; // TypeError
Anything else will bring confusion.

Not sure about this, you can do it the JS way: if both operands are
strings, then it behaves like strcmp():

"23" > "4"; // false
"23" > "221"; // true

I'm not saying that we should do it, but this would not be confusing to me
at all.

Ben

6 years ago by Christian Schneider — view source

unread

Am 26.06.2019 um 11:09 schrieb Benjamin Morel benjamin.morel@gmail.com:

"120" > "99.9"; // TypeError
Anything else will bring confusion.

Not sure about this, you can do it the JS way: if both operands are
strings, then it behaves like strcmp():

"23" > "4"; // false
"23" > "221"; // true

I'm not saying that we should do it, but this would not be confusing to me
at all.

With the proposed change both
"23" > "4" === true # Current behaviour
and
"23" > "4" === false # New strict behaviour
could be the case depending on a declaration somewhere else in the source code.

That's the confusion Claude and I were talking about: You cannot be sure what a very simple line of code does.

Chris

6 years ago by Benjamin Morel — view source

unread

(...) could be the case depending on a declaration somewhere else in the
source code.
That's the confusion Claude and I were talking about: You cannot be sure
what a very simple line of code does.

Oh, I see. You mean that only replacing some of the current results with
TypeErrors would be acceptable; returning a different value would not.
This makes a lot of sense, but once again prevents the language from slowly
moving towards something different (and better), leaving it stuck in its
legacy forever.

I'm starting to believe that a joint effort to fork PHP if the only way out
:(

Ben

6 years ago by Claude Pache — view source

unread

Le 26 juin 2019 à 11:36, Benjamin Morel benjamin.morel@gmail.com a écrit :

(...) could be the case depending on a declaration somewhere else in the
source code.
That's the confusion Claude and I were talking about: You cannot be sure
what a very simple line of code does.

Oh, I see. You mean that only replacing some of the current results with
TypeErrors would be acceptable; returning a different value would not.
This makes a lot of sense, but once again prevents the language from slowly
moving towards something different (and better), leaving it stuck in its
legacy forever.

I'm starting to believe that a joint effort to fork PHP if the only way out
:(

Ben

It would be something “different”, but not necessarily “better”.

Programmers may intentionally rely on the current semantics when comparing numeric strings, e.g. in the following cases:

values that are grabbed from a database using a driver that returns only strings (or nulls);
values that are read from $_POST and that ultimately stems from some HTML <input type="number"> element.

It was certainly a fundamental design error to have both implicit type conversion and operators that did different things based on the type of their operands. That leads to the infamous "1" + 1 == 11 problem in JavaScript, or the the "3" < "24" problem in PHP. That could have been avoided in two ways:

either by forbidding implicit conversion;
or by using different operators for different types (as does Perl).

Now, returning to the case of the comparison operators like < or ==. Instead of killing implicit conversion and redefining the meaning of those operators in cases that are not just edge case, it may be preferable to use the other approach:

in some strict mode, reserve <, == etc. for numeric comparison, and throw a TypeError one of the operand is not numeric;
If we deem it worth, define a new operators for string comparison. (Although I’m not really sure it is worth: we have strcmp() and === for byte-to-byte comparison, and the Collator class for alphabetical sorting that actually works in languages not restricted to unaccented latin characters.)

—Claude

6 years ago by Rowan Collins — view source

unread

On Wed, 26 Jun 2019 at 10:36, Benjamin Morel benjamin.morel@gmail.com
wrote:

Oh, I see. You mean that only replacing some of the current results with
TypeErrors would be acceptable; returning a different value would not.
This makes a lot of sense, but once again prevents the language from slowly
moving towards something different (and better), leaving it stuck in its
legacy forever.

If we're talking about combining operator overloading and type juggling in
the way that JS does it, I would definitely debate whether that's "better".
It leads to the weird circular situation where to know what an operator
means, you have to look at the types; but to know how the types will be
interpreted, you need to know what the operator means.

Perl is a notable contrast: the types of operands are deduced based on the
operator, but there are different operators to force them to different
types. So 23 < 4 and "23" < "4" are both numeric comparisons, so return
false; but 23 lt 4 and "23" lt "4" do string comparisons, and return
true. That way the user's intent is clear, but you don't have to manually
cast values or remember how different combinations will be interpreted.

I'm starting to believe that a joint effort to fork PHP if the only way out

If what you want is a fork of PHP with stronger typing, then take a look at
Hack https://hacklang.org/

Regards,

Rowan Collins
[IMSoP]

6 years ago by Alain D D Williams — view source

unread

Perl is a notable contrast: the types of operands are deduced based on the
operator, but there are different operators to force them to different
types. So 23 < 4 and "23" < "4" are both numeric comparisons, so return
false; but 23 lt 4 and "23" lt "4" do string comparisons, and return
true. That way the user's intent is clear, but you don't have to manually
cast values or remember how different combinations will be interpreted.

IMHO the Perl way is better: the different operators mean that I will get what I
want, I don't need to worry about an accidental type juggle; it is also
(presumably) faster as the run time does not need to: look at a string, decide
if it could be a number and maybe change what it does.

The big problem is backwards compatibility, so new operators would be needed:

string compare: lt, gt, etc, not much of a problem

numeric compare: #< #> would be nice were it not that # means comment.

--
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256 https://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: https://www.phcomp.co.uk/contact.php
#include <std_disclaimer.h

6 years ago by Benjamin Morel — view source

unread

in some strict mode, reserve <, == etc. for numeric comparison, and
throw a TypeError one of the operand is not numeric;

If we deem it worth, define a new operators for string comparison.
(Although I’m not really sure it is worth: we have strcmp() and === for
byte-to-byte comparison, and the Collator class for alphabetical sorting
that actually works in languages not restricted to unaccented latin
characters.)

It's true that string comparison (sorting) is a much harder problem that
cannot be solved without additional knowledge of the encoding of the
string; so I agree that it might be better to just throw a TypeError when
comparing strings, and leave the user with an operator that only works on
numbers, and explicitly use dedicated functions when comparing strings.

This makes sense for "<", "<=", ">", ">=", but what about "==" and "!="?

Currently, "11" == "11.0"; what would this yield under the new proposal?

leave it as is: return true in this case => contradicts the whole purpose
of the new proposal
throw a TypeError when performing the above comparison => not acceptable
either I guess; every language allows == and != on strings, forcing to use
strict comparison operators is a bit weird here.
change the semantics to return false when both operands are strings, and
don't match => not acceptable to you as you cannot know what a line of code
does without checking the header

What would you suggest here?

Ben

6 years ago by Rowan Collins — view source

unread

On Wed, 26 Jun 2019 at 12:46, Benjamin Morel benjamin.morel@gmail.com
wrote:

This makes sense for "<", "<=", ">", ">=", but what about "==" and "!="?

Currently, "11" == "11.0"; what would this yield under the new proposal?

leave it as is: return true in this case => contradicts the whole purpose
of the new proposal

throw a TypeError when performing the above comparison => not acceptable
either I guess; every language allows == and != on strings, forcing to use
strict comparison operators is a bit weird here.

change the semantics to return false when both operands are strings, and
don't match => not acceptable to you as you cannot know what a line of code
does without checking the header

Given that we already have === and !==, could the strict mode simply throw
an error for any use of the non-strict == and != versions?

declare(strict_operators=1);
var_dump( "11" == "11.0" ); # TypeError: "Cannot use non-strict equality
operator in strict operator mode."
var_dump( "11" === "11.0"); # bool(false)

I'm not sure whether I like the idea or not, but I thought I'd throw it out
there as a possibility.

Regards,

Rowan Collins
[IMSoP]

6 years ago by Benjamin Morel — view source

unread

Given that we already have === and !==, could the strict mode simply throw
an error for any use of the non-strict == and != versions?
declare(strict_operators=1);
var_dump( "11" == "11.0" ); # TypeError: "Cannot use non-strict equality
operator in strict operator mode."
var_dump( "11" === "11.0"); # bool(false)
I'm not sure whether I like the idea or not, but I thought I'd throw it out
there as a possibility.

That's definitely a possibility, that I'm sure a lot of people will dislike.

I personally don't have a strong opinion about it.

Ben

6 years ago by Arnold Daniels — view source

unread

On Wed, Jun 26, 2019 at 1:18 PM Alain D D Williams addw@phcomp.co.uk
wrote:

Perl is a notable contrast: the types of operands are deduced based on
the
operator, but there are different operators to force them to different
types. So 23 < 4 and "23" < "4" are both numeric comparisons, so
return
false; but 23 lt 4 and "23" lt "4" do string comparisons, and return
true. That way the user's intent is clear, but you don't have to manually
cast values or remember how different combinations will be interpreted.

IMHO the Perl way is better: the different operators mean that I will get
what I
want, I don't need to worry about an accidental type juggle; it is also
(presumably) faster as the run time does not need to: look at a string,
decide
if it could be a number and maybe change what it does.

The big problem is backwards compatibility, so new operators would be
needed:

string compare: lt, gt, etc, not much of a problem

numeric compare: #< #> would be nice were it not that # means comment.

Note that using a directive means there is no inherit backward
compatibility issue. We're talking about copy/pasted code sniplets only.

Solving the issues presented, maintaining BC, without the use of a
directive would require the addition of multiple type-specific operators.
String compare, numeric compare, array compare, etc, etc. PHP code would
become unrecognizable. I'm not a fan of that alternative.

Arnold

6 years ago by Arnold Daniels — view source

unread

On Wed, Jun 26, 2019 at 12:39 PM Claude Pache claude.pache@gmail.com
wrote:

Le 26 juin 2019 à 11:36, Benjamin Morel benjamin.morel@gmail.com a
écrit :

(...) could be the case depending on a declaration somewhere else in the
source code.
That's the confusion Claude and I were talking about: You cannot be sure
what a very simple line of code does.

Oh, I see. You mean that only replacing some of the current results with
TypeErrors would be acceptable; returning a different value would not.
This makes a lot of sense, but once again prevents the language from
slowly
moving towards something different (and better), leaving it stuck in its
legacy forever.

I'm starting to believe that a joint effort to fork PHP if the only way
out
:(

Ben

It would be something “different”, but not necessarily “better”.

Programmers may intentionally rely on the current semantics when comparing
numeric strings, e.g. in the following cases:

values that are grabbed from a database using a driver that returns only
strings (or nulls);

values that are read from $_POST and that ultimately stems from some
HTML <input type="number"> element.

It was certainly a fundamental design error to have both implicit type
conversion and operators that did different things based on the type of
their operands. That leads to the infamous "1" + 1 == 11 problem in
JavaScript, or the the "3" < "24" problem in PHP. That could have been
avoided in two ways:

either by forbidding implicit conversion;

or by using different operators for different types (as does Perl).

Now, returning to the case of the comparison operators like < or ==.
Instead of killing implicit conversion and redefining the meaning of those
operators in cases that are not just edge case, it may be preferable to
use the other approach:

in some strict mode, reserve <, == etc. for numeric comparison, and
throw a TypeError one of the operand is not numeric;

If we deem it worth, define a new operators for string comparison.
(Although I’m not really sure it is worth: we have strcmp() and === for
byte-to-byte comparison, and the Collator class for alphabetical sorting
that actually works in languages not restricted to unaccented latin
characters.)

—Claude

Forbidding implicit type conversion completely is taking it to far. Some
operators like string concatenation (.) can perform conversions just fine.

The issue at hand is limited to operators that are affected by the value
(not only the type) of the operands. Specifically:

When using numeric strings with relational operators. This includes
statements like "16" == "016".
When comparing two arrays, eg [null] == [0] and [0] == ["foo"], or
comparing two objects.
In a switch statement.
--
Whether a switch is or isn't affected by strict_operators should be
determined via a secondary vote.
Concerning the == and != with arrays and objects. There is currently
a range of differences when compared to the effect of === and !==. To
what extent is the typecasting intended? Some cases like [0] == [false]
can be common. As such widening primitive conversion from bool to int might
be a good idea (in general). Beyond that allowing cases like [[]] == [false] would undermine the purpose of this RFC as it allows seemingly
self-contradicting statements to evaluate to true, like
$a == $b && $a == $c && $b != $c
with
$a = [false];
$b = [0];
$c = [[]];
The strict_types directives already require you to cast raw data from
$_GET/$_POST or a database. In case using the directive would disallow
strings, arrays, and objects as operands for relational operators (throwing
a TypeError), would still require explicit casting. The difference is
that when you forget to do that or copy the code from a code base where
this isn't required, you'd always get a TypeError, rather than it giving
a different result.

I don't think a `TypeError` should not be thrown based on the value of an
operand, only based on the type. Also. implicitly casting strings to
numbers, but not casting other types (like arrays), is only making the
logic of operators more complex and inconsistent. Disallowing relational
operators `==`, `!=`, `<`, `>`, `<=`, `>=` and `<=>` for strings
altogether, requiring the use of a function is an option. However, IMHO
this is killing a fly with a cannon, as the problem is limited to
"copy/pasted code for comparing numeric strings from a source file that
doesn't use strict_operators to a file does use it".

So, should a directive, declared at the top of the file, affect how the
code in that file is executed? Afaics YES, that's exactly what it's for
> The declare construct is used to set execution directives for a block
of code. (https://www.php.net/declare)

6 years ago by Dik Takken — view source

unread

Hello,

Thanks a lot for your work on this RFC, it looks like a nice way to
allow the language to gradually move forward.

As pointed out by others, the ==, ===, != and !== operators are a bit
problematic. A possible solution could be to leave them out of the RFC.
The reason to do so is that the choice between strict or non-strict
comparison is already possible by choosing the appropriate operator. In
my view, explicitly using == in stead of === is either intentional or a
bug. If it is intentional, the author consciously chose to be
non-strict. The strictness declaration would then only affect operators
for which no strict variant exists or where the operator is implicit
(switch statement).

As for changing the behavior of in_array() and friends: I would love the
idea of not having to use the strict argument everywhere anymore.
However, changing behavior of functions that are not in the same file
that has the strictness declaration seems inconsistent. The scope of the
declaration would not be well defined anymore. There may be other means
to fix this annoyance, like introducing a strict variant of in_array().

Regarding the switch statement: While it is not an operator, one could
argue that it is a case of implicit use of an operator.

Regards,
Dik Takken

6 years ago by Arnold Daniels — view source

unread

On Wed, Jun 26, 2019 at 1:46 PM Benjamin Morel benjamin.morel@gmail.com
wrote:

in some strict mode, reserve <, == etc. for numeric comparison, and
throw a TypeError one of the operand is not numeric;

If we deem it worth, define a new operators for string comparison.
(Although I’m not really sure it is worth: we have strcmp() and === for
byte-to-byte comparison, and the Collator class for alphabetical sorting
that actually works in languages not restricted to unaccented latin
characters.)

It's true that string comparison (sorting) is a much harder problem that
cannot be solved without additional knowledge of the encoding of the
string; so I agree that it might be better to just throw a TypeError when
comparing strings, and leave the user with an operator that only works on
numbers, and explicitly use dedicated functions when comparing strings.

This makes sense for "<", "<=", ">", ">=", but what about "==" and "!="?

Currently, "11" == "11.0"; what would this yield under the new proposal?

leave it as is: return true in this case => contradicts the whole purpose
of the new proposal

throw a TypeError when performing the above comparison => not acceptable
either I guess; every language allows == and != on strings, forcing to use
strict comparison operators is a bit weird here.

change the semantics to return false when both operands are strings, and
don't match => not acceptable to you as you cannot know what a line of code
does without checking the header

What would you suggest here?

Ben

PHP considers a string as a simple byte array. I want to stress that any
discussion about character sets or collations is beyond the scope of this
RFC.

The directive only affects the result of comparing two numeric strings and
non-numeric strings. As such, the RFC assumes the current result of
comparing non-numeric strings to be 100% correct. To those who disagree
with this assumption; please create a separate RFC to discuss this topic
and do not take it into consideration in regards to the strict_operators
RFC.

The RFC is modeled after strict_types, so to quote part of its
motivation "... this RFC proposes a fourth approach: per-file strict or
weak type-checking. This has the following advantages: People can choose
the type checking model that suits them best, which means this approach
should hopefully placate both the strict and weak type checking camps. ..."

Take under consideration that the use of strict_operators is optional.
Those who are inclined to use it consider the current behavior of implicit
type casting to be problematic. As such, I imagine that this group does not
(want to) use code that exploits this behavior. Those who do not find the
current behavior problematic will typically not use the directive and thus
are unaffected by it. Disallowing all relational operators for strings is
too radical and primarily caters towards those who aren't inclined to use
the directive in the first place. In short; it's a compromise that makes
nobody happy.

The RFC will take the following stance; The directive is catering towards
those that find implicit casting by relational operators on two operands of
the same type, purely based on the value of those operands, very
undesirable. For the audience that's inclined to use the directive, any
issues that come from copy/pasting code that exploits this behavior are
considered acceptable and should be solved.

I've added two discussion points to the RFC based on the discussed concerns.

Arnold

6 years ago by Arnold Daniels — view source

unread

Hi Dik,

Thanks for taking the time to review this RFC.

Hello,

Thanks a lot for your work on this RFC, it looks like a nice way to
allow the language to gradually move forward.

As pointed out by others, the ==, ===, != and !== operators are a bit
problematic. A possible solution could be to leave them out of the RFC.
The reason to do so is that the choice between strict or non-strict
comparison is already possible by choosing the appropriate operator. In
my view, explicitly using == in stead of === is either intentional or a
bug. If it is intentional, the author consciously chose to be
non-strict. The strictness declaration would then only affect operators
for which no strict variant exists or where the operator is implicit
(switch statement).

I would argue the following; The explicit use of the strict_operator is
intentional, meaning that the author consciously chose to be strict and
does not expect some operators to still be non-strict. The issues pointed
out, apply to all comparison operators. Ignoring == and != in the RFC
creates an inconsistency, while not properly addressing those concerns.

As for changing the behavior of in_array() and friends: I would love the
idea of not having to use the strict argument everywhere anymore.
However, changing behavior of functions that are not in the same file
that has the strictness declaration seems inconsistent. The scope of the
declaration would not be well defined anymore. There may be other means
to fix this annoyance, like introducing a strict variant of in_array().

:+1:

Regarding the switch statement: While it is not an operator, one could
argue that it is a case of implicit use of an operator.

I agree. Internally it's defined as an operator even. Still, I'll put this
up as a secondary vote.

Regards,
Dik Takken

6 years ago by Dik Takken — view source

unread

I would argue the following; The explicit use of the strict_operator is
intentional, meaning that the author consciously chose to be strict and
does not expect some operators to still be non-strict. The issues pointed
out, apply to all comparison operators. Ignoring == and != in the RFC
creates an inconsistency, while not properly addressing those concerns.

Yes, I guess you're right about treating all operators in a strict way
with a simpler set of rules is more consistent.

Concerning the issue with copying existing code into a file that uses
stricter interpretation of the code: I think this should be regarded as
performing an upgrade of the code that is being copied. No problem in my
view.

In the section about widening the scope you address the type juggling
that happens on array access, like $array[12.34]. One could argue that
accessing an array item by key is implicit use of the == operator, just
like a switch statement is. I would love to see it included in the main
proposal in stead of proposing it as part of a different directive. The
change in behavior could be similar to what is proposed for the switch
statement: Array keys are compared using the === operator.

Regards,
Dik Takken

6 years ago by Nikita Popov — view source

unread

On Tue, Jun 25, 2019 at 3:10 PM Arnold Daniels arnold.adaniels.nl@gmail.com
wrote:

Hi all,

I would like to open the discussion for RFC: "Strict operators directive".

This RFC proposes a new directive 'strict_operators'. When enabled,
operators may cast operands to the expected type, but must comply to;

Typecasting is not based on the type of the other operand

Typecasting is not based on the value of any of the operands

Operators will throw a TypeError for unsupported types

Reasoning; The current rules for type casting done by operators are
inconsistent and complex, which can lead to surprising results where a
statement seemingly contradicts itself.

Using a directive means that backwards compatibility is guaranteed.

https://wiki.php.net/rfc/strict_operators

Hi Arnold,

I like the idea behind this RFC. This is a good way to avoid unfortunate
legacy behavior without breaking BC. Here are some more detailed thoughts:

I think to be really useful, this additionally needs
https://wiki.php.net/rfc/namespace_scoped_declares or some variation
thereof. Being able to say "this whole library uses strict operators" is
much more useful than specifying this in every file (and possibly missing
it somewhere and thus getting the wrong semantics). I will try to get a new
version of this RFC based on directories rather than namespaces into PHP 8.
The sentence "In this case, we're passing an int to a function that
accepts float. The parameter is converted (widened) to float." should
probably not be referring to functions and parameters.
"To compare two numeric strings as numbers, they need to be cast to
floats." This may loose precision for integers. It is better to cast to
numbers (int or float) using, with the canonical way being +$x. But I guess
that won't work under strict_operators. Maybe we should have a (number)
cast (it already exists internally...)
This has already been mentioned by others: Having $str1 < $str2 perform a
strcmp() style comparison under strict_operators is surprising. I think
that overall the use of lexicographical string comparisons is quite rare
and should be performed using an explicit strcmp() call. More likely than
not, writing $str1 < $str2 is a bug and should generate a TypeError. Of
course, equality comparisons like $str1 == $str2 should still work, similar
to the distinction you make for arrays.
If I understand correctly, under this RFC "foo" == 0 will throw a
TypeError, but ["foo"] == [0] will return false. Generally the behavior of
the recursive comparison here is that it's the same as strict == but all
errors become not-equal instead. Correct? I'm not sure how I feel about
this, as it seems to introduce one more set of semantics next to the weak
==, strict == and === semantics there already are.
I also find it somewhat odd that you can't write something like "$obj !=
null" anymore, only "$obj !== null".
I think the "solution" to the last three points is a) only support
numbers in relational operators (<,<=,>,>=,<=>) and throw TypeErrors
otherwise (maybe modulo provisions for object overloading) and b) allow
comparing any types in == and !=, without throwing a TypeError. The
question "Are 42 and 'foobar' equal?" has a pretty clear answer: "No they
aren't", so there is no need to make this a TypeError (while the question
"Is 42 larger than 'foobar'?" has no good answer.) I believe doing
something like this would roughly match how Python 3 works. (Edit: I see
now that this is mentioned in the FAQ, but I think it would be good to
reconsider this. It would solve most of my problems with this proposal.)
String increment seems like a pretty niche use case, and I believe that
many people find the overflow behavior quite surprising. I think it may be
better to forbid string increment under strict_operators.
A similar argument can be made for the use of &, | and ^ on strings.
While I have some personal fondness for these, in practical terms these are
rarely used and may be indicative of a bug. I think both for string
increment and string and/or/xor it may be better to expose these as
functions so their use is more explicit.

Regards,
Nikita

6 years ago by Arnold Daniels — view source

unread

Hi Nikita,

Thanks for your feedback.

I'll fix the textual errors you mentioned.

"To compare two numeric strings as numbers, they need to be cast to

floats." This may loose precision for integers. It is better to cast to
numbers (int or float) using, with the canonical way being +$x. But I guess
that won't work under strict_operators. Maybe we should have a (number)
cast (it already exists internally...)

Good point. While in most cases you know if you're working with floats or
integers, adding a way to cast to either an int or float would be nice.
Maybe preferably through a function like numberval($x) or simply
number($x), so the(type)` syntax is reserved for actual types. That
would be an RFC on its own though.

This has already been mentioned by others: Having $str1 < $str2 perform
a strcmp() style comparison under strict_operators is surprising. I think
that overall the use of lexicographical string comparisons is quite rare
and should be performed using an explicit strcmp() call. More likely than
not, writing $str1 < $str2 is a bug and should generate a TypeError. Of
course, equality comparisons like $str1 == $str2 should still work, similar
to the distinction you make for arrays.

Ok, fair. I'll change it so <,<=,>,>=,<=> comparison on a string throws a
TypeError, similar to arrays, resources, and objects.

If I understand correctly, under this RFC "foo" == 0 will throw a
TypeError, but ["foo"] == [0] will return false. Generally the behavior of
the recursive comparison here is that it's the same as strict == but all
errors become not-equal instead. Correct? I'm not sure how I feel about
this, as it seems to introduce one more set of semantics next to the weak
==, strict == and === semantics there already are.

The syntax would be $a == $b (or $a == [0]), where $a and $b are a
string/int in one case and both an array in the other case. In the second
case, we can't throw a TypeError as both operands are of the same type.

I also find it somewhat odd that you can't write something like "$obj !=
null" anymore, only "$obj !== null".

To check against null, it's better to use !==. For objects (and resources)
using != null is ok, but for other types, it's currently not. For
example; [] == null gives true.

I think the "solution" to the last three points is a) only support
numbers in relational operators (<,<=,>,>=,<=>) and throw TypeErrors
otherwise (maybe modulo provisions for object overloading) and b) allow
comparing any types in == and !=, without throwing a TypeError. The
question "Are 42 and 'foobar' equal?" has a pretty clear answer: "No they
aren't", so there is no need to make this a TypeError (while the question
"Is 42 larger than 'foobar'?" has no good answer.) I believe doing
something like this would roughly match how Python 3 works. (Edit: I see
now that this is mentioned in the FAQ, but I think it would be good to
reconsider this. It would solve most of my problems with this proposal.)

Besides the argument in the FAQ, having the == and != return do a type
check, means there are a lot more cases where the behavior changes rather
than that a TypeError is thrown. Currently "foobar" == 0 returns true,
but this would make it return false. So would 1 == true, "0" == 0 and
"0" == false. To reduce the cases where the behavior changes to a
minimum, it's better to throw TypeErrors for == and !=.

String increment seems like a pretty niche use case, and I believe that
many people find the overflow behavior quite surprising. I think it may be
better to forbid string increment under strict_operators.

Ok

A similar argument can be made for the use of &, | and ^ on strings.
While I have some personal fondness for these, in practical terms these are
rarely used and may be indicative of a bug. I think both for string
increment and string and/or/xor it may be better to expose these as
functions so their use is more explicit.

These operators make it very easy to work with binary data as strings in
PHP. In other languages you have to work with byte arrays, which is a major
pain. They're also very intuitive; "wow" & "xul" is the same as
chr(ord('w') & ord('x')) . chr(ord('o') & ord('u')). chr(ord('w') & ord('l')). I think these should stay.

Regards,
Nikita

Arnold

6 years ago by Marc — view source

unread

Hi,

I also find it somewhat odd that you can't write something like "$obj !=
null" anymore, only "$obj !== null".

To check against null, it's better to use !==. For objects (and resources)
using != null is ok, but for other types, it's currently not. For
example; [] == null gives true.

I would argue that two operands will not be the same if they are of
different types (except for int/float).

Means 0 == "" and 0 == "0" will both be false but 0 == 0 and 0 == 0.0
will be true.

Note: In my opinion int/float check should also make sure that there is
no data loss on comparing a very bit integer to a float. In this case
they should not be equal.

I think the "solution" to the last three points is a) only support
numbers in relational operators (<,<=,>,>=,<=>) and throw TypeErrors
otherwise (maybe modulo provisions for object overloading) and b) allow
comparing any types in == and !=, without throwing a TypeError. The
question "Are 42 and 'foobar' equal?" has a pretty clear answer: "No they
aren't", so there is no need to make this a TypeError (while the question
"Is 42 larger than 'foobar'?" has no good answer.) I believe doing
something like this would roughly match how Python 3 works. (Edit: I see
now that this is mentioned in the FAQ, but I think it would be good to
reconsider this. It would solve most of my problems with this proposal.)

Besides the argument in the FAQ, having the == and != return do a type
check, means there are a lot more cases where the behavior changes rather
than that a TypeError is thrown. Currently "foobar" == 0 returns true,
but this would make it return false. So would 1 == true, "0" == 0 and
"0" == false. To reduce the cases where the behavior changes to a
minimum, it's better to throw TypeErrors for == and !=.

I thing this goes hand in hand with the empty check like empty("0") is
true but empty("00") is false.

I couldn't find how/if this will change with this RFC.

String increment seems like a pretty niche use case, and I believe that
many people find the overflow behavior quite surprising. I think it may be
better to forbid string increment under strict_operators.

Ok

A similar argument can be made for the use of &, | and ^ on strings.
While I have some personal fondness for these, in practical terms these are
rarely used and may be indicative of a bug. I think both for string
increment and string and/or/xor it may be better to expose these as
functions so their use is more explicit.

These operators make it very easy to work with binary data as strings in
PHP. In other languages you have to work with byte arrays, which is a major
pain. They're also very intuitive; "wow" & "xul" is the same as
chr(ord('w') & ord('x')) . chr(ord('o') & ord('u')). chr(ord('w') & ord('l')). I think these should stay.

I do agree here. Even if working with binary strings isn't most common
in PHP web development I actually use these for bitsets.

But I have a note here that bit shifting currently does not work with
binary strings (tries to cast binary string to integer) and even if it
would shift the binary string the >> is designed to keep the first bit
as a positive/negative flag which of course does not make sense for
binary strings.

In my opinion the bit shifting operators should accept and work well
with binary strings. I don't see a reason why it's performing a type
cast here.

Regards,
Nikita

Arnold

Marc

6 years ago by Nikita Popov — view source

unread

On Wed, Jul 10, 2019 at 11:38 PM Arnold Daniels <
arnold.adaniels.nl@gmail.com> wrote:

Hi Nikita,

Thanks for your feedback.

I'll fix the textual errors you mentioned.

"To compare two numeric strings as numbers, they need to be cast to

floats." This may loose precision for integers. It is better to cast to
numbers (int or float) using, with the canonical way being +$x. But I
guess
that won't work under strict_operators. Maybe we should have a (number)
cast (it already exists internally...)

Good point. While in most cases you know if you're working with floats or
integers, adding a way to cast to either an int or float would be nice.
Maybe preferably through a function like numberval($x) or simply
number($x), so the(type)` syntax is reserved for actual types. That
would be an RFC on its own though.

This has already been mentioned by others: Having $str1 < $str2 perform
a strcmp() style comparison under strict_operators is surprising. I think
that overall the use of lexicographical string comparisons is quite rare
and should be performed using an explicit strcmp() call. More likely than
not, writing $str1 < $str2 is a bug and should generate a TypeError. Of
course, equality comparisons like $str1 == $str2 should still work,
similar
to the distinction you make for arrays.

Ok, fair. I'll change it so <,<=,>,>=,<=> comparison on a string throws a
TypeError, similar to arrays, resources, and objects.

If I understand correctly, under this RFC "foo" == 0 will throw a
TypeError, but ["foo"] == [0] will return false. Generally the behavior
of
the recursive comparison here is that it's the same as strict == but all
errors become not-equal instead. Correct? I'm not sure how I feel about
this, as it seems to introduce one more set of semantics next to the weak
==, strict == and === semantics there already are.

The syntax would be $a == $b (or $a == [0]), where $a and $b are a
string/int in one case and both an array in the other case. In the second
case, we can't throw a TypeError as both operands are of the same type.

You can, if you treat == comparisons recursively, which is how I personally
would expect them to work. There shouldn't be a difference in behavior
between $a == $b and [$a] == [$b] -- either both should throw TypeError if
$a and $b have different types, or neither should.

I also find it somewhat odd that you can't write something like "$obj !=

null" anymore, only "$obj !== null".

To check against null, it's better to use !==. For objects (and resources)
using != null is ok, but for other types, it's currently not. For
example; [] == null gives true.

I think the "solution" to the last three points is a) only support
numbers in relational operators (<,<=,>,>=,<=>) and throw TypeErrors
otherwise (maybe modulo provisions for object overloading) and b) allow
comparing any types in == and !=, without throwing a TypeError. The
question "Are 42 and 'foobar' equal?" has a pretty clear answer: "No they
aren't", so there is no need to make this a TypeError (while the question
"Is 42 larger than 'foobar'?" has no good answer.) I believe doing
something like this would roughly match how Python 3 works. (Edit: I see
now that this is mentioned in the FAQ, but I think it would be good to
reconsider this. It would solve most of my problems with this proposal.)

Besides the argument in the FAQ, having the == and != return do a type
check, means there are a lot more cases where the behavior changes rather
than that a TypeError is thrown. Currently "foobar" == 0 returns true,
but this would make it return false. So would 1 == true, "0" == 0 and
"0" == false. To reduce the cases where the behavior changes to a
minimum, it's better to throw TypeErrors for == and !=.

Yes, I can certainly see how that would be an issue. I think both variants
have pretty major issues and there may not be a good solution here. In that
case I we might want to consider banning == and != entirely under
strict_operators mode and force the use of === and !== which will have the
usual well-defined semantics.

I think that would cause the least overall confusion and wouldn't even be a
particularly large loss in terms of functionality. I think the only really
useful behavior of == (that is not available with ===) under this proposal
was the non-identity object comparison.

Nikita

String increment seems like a pretty niche use case, and I believe that

many people find the overflow behavior quite surprising. I think it may
be
better to forbid string increment under strict_operators.

Ok

A similar argument can be made for the use of &, | and ^ on strings.
While I have some personal fondness for these, in practical terms these
are
rarely used and may be indicative of a bug. I think both for string
increment and string and/or/xor it may be better to expose these as
functions so their use is more explicit.

These operators make it very easy to work with binary data as strings in
PHP. In other languages you have to work with byte arrays, which is a major
pain. They're also very intuitive; "wow" & "xul" is the same as
chr(ord('w') & ord('x')) . chr(ord('o') & ord('u')). chr(ord('w') & ord('l')). I think these should stay.

Regards,
Nikita

Arnold