Hello internals,
This discussion will use my previous RFC as the starting point for
conversation: https://wiki.php.net/rfc/user_defined_operator_overloads
There has been discussion on list recently about revisiting the topic of
operator overloads after the previous effort which I proposed was declined.
There are a variety of reasons, I think, this is being discussed, both on
list and off list.
-
As time has gone on, more people have come forward with use cases. Often
they are use cases that have been mentioned before, but it has become more
clear that these use cases are more common than was suggested previously. -
Several voters, contributors, and participants have had more time (years
now) to investigate and research some of the related issues, which
naturally leads to changes in opinion or perspective. -
PHP has considered and been receptive toward several RFCs since my
original proposal which update the style of PHP in ways which are congruent
with the KIND of language that has operator overloads.
I mentioned recently that I would not participate in another operator
overload RFC unless I felt that the views of internals had become more
receptive to the topic, and after some discussion with several people
off-list, I feel that it is at least worth discussing for the next version.
Operator overloads has come up as a missing feature in several discussions
on list since the previous proposal was declined. This includes:
[RFC] [Discussion] Support object type in BCMath 1
Native decimal scalar support and object types in BcMath 2
Custom object equality 3
pipes, scalar objects and on? 4
[RFC][Discussion] Object can be declared falsifiable 5
The request to support comparison operators (>, >=, ==, !=, <=, <, <=>) has
come up more frequently, but particularly in discussion around linear
algebra, arbitrary precision mathematics, and dimensional numbers (such as
currency or time), the rest of the operators have also come up.
Typically, these use cases are themselves very niche, but the capabilities
operator overloads enable would be much more widely used. From discussion
on list, it seems likely that very few libraries would need to implement
operator overloads, but the libraries that do would be well used and thus
MANY devs would be consumers of operator overloads.
I want to discuss what changes to the previous proposal people would be
seeking, and why. The most contentious design choice of the previous
proposal was undoubtedly the operator
keyword and the decision to make
operator overload implementations distinct from normal magic methods. For
some of the voters who voted yes on the previous RFC, this was a "killer
feature" of the proposal, while for some of the voters who voted no it was
the primary reason they were against the feature.
There are also several technical and tangentially related items that are
being worked on that would be necessary for operator overloads (and were
originally included in my implementation of the previous RFC). This
includes:
-
Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand
position can be preserved during ALL comparisons. -
Updating ZEND_UNCOMPARABLE such that it has a value other than -1, 0, or
1 which are typically reserved during an ordering comparison. -
Allowing values to be equatable without also being orderable (such as
with matrices, or complex numbers).
These changes could and should be provided independent of operator
overloads. Gina has been working on a separate RFC which would cover all
three of these issues. You can view the work-in-progress on that RFC here:
https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md
I hope to start off this discussion productively and work towards improving
the previous proposal into something that voters are willing to pass. To do
that, I think these are the things that need to be discussed in this thread:
-
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do you
feel that way? -
Should the capability to overload comparison operators be provided in
the same RFC, or would it be better to separate that into its own RFC? Why
do you feel that way? -
Do you feel there were any glaring design weaknesses in the previous RFC
that should be addressed before it is re-proposed? -
Do you feel that there is ANY design, version, or implementation of
operator overloads possible that you would support and be in favor of,
regardless of whether it matches the approach taken previously? If so, can
you describe any of the core ideas you feel are most important?
Jordan
External Links:
Hello internals,
This discussion will use my previous RFC as the starting point for
conversation: https://wiki.php.net/rfc/user_defined_operator_overloadsThere has been discussion on list recently about revisiting the topic
of operator overloads after the previous effort which I proposed was
declined. There are a variety of reasons, I think, this is being
discussed, both on list and off list.
As time has gone on, more people have come forward with use cases.
Often they are use cases that have been mentioned before, but it has
become more clear that these use cases are more common than was
suggested previously.Several voters, contributors, and participants have had more time
(years now) to investigate and research some of the related issues,
which naturally leads to changes in opinion or perspective.PHP has considered and been receptive toward several RFCs since my
original proposal which update the style of PHP in ways which are
congruent with the KIND of language that has operator overloads.I mentioned recently that I would not participate in another operator
overload RFC unless I felt that the views of internals had become more
receptive to the topic, and after some discussion with several people
off-list, I feel that it is at least worth discussing for the next
version.Operator overloads has come up as a missing feature in several
discussions on list since the previous proposal was declined. This
includes:[RFC] [Discussion] Support object type in BCMath [1]
Native decimal scalar support and object types in BcMath [2]
Custom object equality [3]
pipes, scalar objects and on? [4]
[RFC][Discussion] Object can be declared falsifiable [5]
The request to support comparison operators (>, >=, ==, !=, <=, <, <=>)
has come up more frequently, but particularly in discussion around
linear algebra, arbitrary precision mathematics, and dimensional
numbers (such as currency or time), the rest of the operators have also
come up.Typically, these use cases are themselves very niche, but the
capabilities operator overloads enable would be much more widely used.
From discussion on list, it seems likely that very few libraries would
need to implement operator overloads, but the libraries that do would
be well used and thus MANY devs would be consumers of operator
overloads.I want to discuss what changes to the previous proposal people would be
seeking, and why. The most contentious design choice of the previous
proposal was undoubtedly theoperator
keyword and the decision to
make operator overload implementations distinct from normal magic
methods. For some of the voters who voted yes on the previous RFC, this
was a "killer feature" of the proposal, while for some of the voters
who voted no it was the primary reason they were against the feature.There are also several technical and tangentially related items that
are being worked on that would be necessary for operator overloads (and
were originally included in my implementation of the previous RFC).
This includes:
Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand
position can be preserved during ALL comparisons.Updating ZEND_UNCOMPARABLE such that it has a value other than -1,
0, or 1 which are typically reserved during an ordering comparison.Allowing values to be equatable without also being orderable (such
as with matrices, or complex numbers).These changes could and should be provided independent of operator
overloads. Gina has been working on a separate RFC which would cover
all three of these issues. You can view the work-in-progress on that
RFC here:
https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.mdI hope to start off this discussion productively and work towards
improving the previous proposal into something that voters are willing
to pass. To do that, I think these are the things that need to be
discussed in this thread:
I voted in favor of the RFC last time around, and assuming an essentially similar RFC is submitted again will most likely vote in favor again. I do believe this is a useful "surgical" feature; not typically used, but when used, very valuable.
- Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?
IIRC, the main argument against operator
was that it was a new keyword people would need to learn, and tools would need to adapt to understand. That is... a curious argument, to me, as it applies to nearly every new language feature that gets added. We just voted in property hooks and asymmetric visibility. Both introduce new syntax for both users and tooling to adapt to. Both passed by substantial margins.
For me, the main argument in favor of the operator
keyword is that it allows us to sidestep a particular challenge: Is the method to override the + operator named __plus() or __add()? Do those words even mean the right thing? Should * be named __times(), __multiply(), or __dotproduct()? Depending on what type of data you're working on, any of those could be accurate, or completely wrong and misleading. Instead writing
operator +(...)
means I know precisely which symbol I'm defining. As a side benefit, we also don't have to think about what visibility means on an operator, which is just kinda weird.
As an extreme example, Python's popular PathLib uses /
as a concatenation operator, because it "looks like" a path. A PHP implementation of the same could be something like this:
class Path
{
private array $parts;
public function __construct(string $path) {
$this->parts = array_filter(explode('/', $path));
}
public function __toString() {
return implode('/', $this->parts);
}
// And then one of these:
public function __divide(...) { ... }
// or
operator /(...) { ... }
}
One of those is horribly misleading about what's going on. The other is extremely descriptive.
Now, opponents of operator overloading, or just the operator
keyword, would argue that the above is exactly what they want to avoid. For me, that's exactly what I want to enable. At some level, it's just a basic philosophical difference.
Another argument I recall is that the operator
syntax makes it "natural" and tempting to extend to user-defined operators, so you could define your own operator+-*&() for an object to do god knows what. I fully agree, there is a risk to doing that. However, that is not what this RFC was, and I presume will be, suggesting: The available operators are a built-in fixed list. If we want to add some new operator that only makes sense for objects (such as a bind operator, something I'd love), that would be its own RFC that we could argue about. And those who don't want user-definable operators can readily vote against any future proposal to do so, while still giving us the benefit of the known operators.
I will also note that, in my research into collections in other languages, providing operator overloads for collections is extremely common, and in practice I find it very ergonomic. We will probably want collections as built in classes for performance anyway (whether using a custom syntax or generics), but I note that as another datapoint where operator overloads make a great deal of sense, but their "standard arithmetic names" would be very misleading. (They often use the boolean & and | operators, too, which results in some really nice and very readable code.)
- Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?
My sense, which I've written about before, is that there are different "sets" of operators that cluster together. IIRC, I said something like:
A. Comparisons. (<=>, ==, etc.)
B. Arithmetic operators (+, -, *, /).
C. Everything else (concat, etc.)
I personally believe that's the order of importance. If we wanted to just dip our toes into operator overloading, we could do just set A for now. I'm fine with all three groups being approved, as I have uses for all of them. I could see having A be the main vote, and B and C being secondary votes on the same RFC.
- Do you feel there were any glaring design weaknesses in the previous
RFC that should be addressed before it is re-proposed?
Only minor things, I think. "OperandPosition" is a very long name to type all the time, even if I can't come up with something better. The operand ordering is kinda weird, but given how much discussion went into it last time I don't have any alternative that isn't either worse or more weird, or both.
The "multiply by -1 for <=>" bit I don't fully understand the point of. The RFC tries to explain, but I don't quite grok it.
For reflection, I'd be inclined to make ReflectionOperator extends ReflectionMethod, rather than an isOperator() method. That is largely stylistic, I suppose, as I am generally a fan of making the type system do the work for us whenever possible.
- Do you feel that there is ANY design, version, or implementation of
operator overloads possible that you would support and be in favor of,
regardless of whether it matches the approach taken previously? If so,
can you describe any of the core ideas you feel are most important?
I was fairly happy with the previous version, so proposing that as-is would have my vote. I would probably oppose including arbitrary symbol overloading at this time.
To me, the most important factors are:
-
It's type-safe, and leverages the type system to "make invalid states unrepresentable" as much as possible. (I'd put the rules around <=> into this category.)
-
It allows me to opt-in piecemeal to just those operators that make sense.
-
The performance overhead compared to using a method is minimal.
-
It is future-compatible with further language evolution, to the extent possible. (The
operator
keyword helps here.)
I'd love to see this brought up again, and hope there is sufficient interest to do so.
--Larry Garfield
On Sun, Sep 15, 2024 at 9:12 PM Larry Garfield larry@garfieldtech.com
wrote:
The "multiply by -1 for <=>" bit I don't fully understand the point of.
The RFC tries to explain, but I don't quite grok it.
I will perhaps respond with more detail to the rest of your message later,
but I wanted to address this specifically, because I also feel that the
original RFC I wrote didn't explain that well. The situation this bit was
referring to is as follows:
You have an object Foo that implements an overload for the <=>
operator.
The proposed signature for <=>
was simply:
operator <=>($other): int
This operator did not have an OperandPosition
argument. The reason for
this was to prevent developers from creating situations where 5 > $foo
is
true and 5 < $foo
is true. Instead, internally it did the same sort of
reordering that the engine currently does. It calls the implementation the
developer defined, and then checks if the object that the implementation
was called from was on the right side of the operator. If it was, then it
multiplies the result of the user defined overload by -1. Multiplying the
result of the overload ONLY when the overload is called for the right side
is equivalent to flipping the order. So 5 > $foo
is multiplied by -1 and
then evaluated as if it were $foo < 5
. This is an edge case, but it was
an important one in my mind.
It would be entirely unnecessary if we allowed the <=>
overload to know
what position it was in, but that would enable lots of developer mistakes
in my mind for no real gain. Instead, developers should just implement the
overload as if the object assumes it will always be called from the left
side of the comparison.
Jordan
The reason for this was to prevent developers from creating situations where
5 > $foo
is true and5 < $foo
is true.
Just to point out: currently, PHP already does nonsensical comparisons:
Granted, it is 'technically' correct that ($a <= $b || $b <= $a) === false; but this really should be an error IMHO instead of a non-logical result.
— Rob
The reason for this was to prevent developers from creating situations
where5 > $foo
is true and5 < $foo
is true.Just to point out: currently, PHP already does nonsensical comparisons:
Granted, it is 'technically' correct that ($a <= $b || $b <= $a) ===
false; but this really should be an error IMHO instead of a non-logical
result.— Rob
Yes, the default comparisons for objects is a little strange. This should
be helped by Gina's RFC which I mentioned in my original email. The main
issue is that at the moment Equatable and Orderable are inseparable within
the PHP engine.
Jordan
The "multiply by -1 for <=>" bit I don't fully understand the point of. The RFC tries to explain, but I don't quite grok it.
I will perhaps respond with more detail to the rest of your message
later, but I wanted to address this specifically, because I also feel
that the original RFC I wrote didn't explain that well. The situation
this bit was referring to is as follows:You have an object Foo that implements an overload for the
<=>
operator. The proposed signature for<=>
was simply:
operator <=>($other): int
This operator did not have an
OperandPosition
argument. The reason
for this was to prevent developers from creating situations where5 > $foo
is true and5 < $foo
is true. Instead, internally it did the
same sort of reordering that the engine currently does. It calls the
implementation the developer defined, and then checks if the object
that the implementation was called from was on the right side of the
operator. If it was, then it multiplies the result of the user defined
overload by -1. Multiplying the result of the overload ONLY when the
overload is called for the right side is equivalent to flipping the
order. So5 > $foo
is multiplied by -1 and then evaluated as if it
were$foo < 5
. This is an edge case, but it was an important one in
my mind.It would be entirely unnecessary if we allowed the
<=>
overload to
know what position it was in, but that would enable lots of developer
mistakes in my mind for no real gain. Instead, developers should just
implement the overload as if the object assumes it will always be
called from the left side of the comparison.Jordan
OK, that makes a lot more sense. Reading through the text of the RFC, I didn't catch that it only multiplied by -1 if the comparing object was on the right. The implementation is fine, but if you do for a second round, making that a bit clearer would be helpful.
--Larry Garfield
On Sun, Sep 15, 2024 at 12:52 AM Jordan LeDoux jordan.ledoux@gmail.com
wrote:
These changes could and should be provided independent of operator
overloads. Gina has been working on a separate RFC which would cover all
three of these issues. You can view the work-in-progress on that RFC here:
https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md
Unrelated topic, sorry for the spam.
I just wanted to point out that interface default methods will play nicely
with the mentioned interfaces: Equatable and Comparable:
interface Equatable {
public function equals(mixed $other): bool;
}
interface Comparable extends Equatable {
public function compare(mixed $other): int;
public function equals(mixed $other): bool {
return $this->compare($other) === 0;
}
}
So that it signals a clear intent of: "what is comparable is also
equatable, and this is the default implementation for it.
Alex
This discussion will use my previous RFC as the starting point for conversation: https://wiki.php.net/rfc/user_defined_operator_overloads
There has been discussion on list recently about revisiting the topic of operator overloads after the previous effort which I proposed was declined. There are a variety of reasons, I think, this is being discussed, both on list and off list.
On behalf of all struggling PHP developers who would like to implement
patterns like Value Objects, with custom equality criterias;
understanding that this is going to be read by quite an amount of
people, I still would like to express my, and perhaps others',
emotional state:
Please make it happen guys 😭🙏!!111
I also agree that ==
comparisons should be prioritized if only a
subset of operators is to be implemented at once. The arithmetic is
also useful for stuff like GMP, but the niche in PHP is smaller for
that use case.
Regards,
Illia / someniatko
On behalf of all struggling PHP developers who would like to implement
patterns like Value Objects, with custom equality criterias
I seriously doubt anyone is struggling without this, unless you care to
provide proof to the contrary. I think this is "nice to have" at best, and
in that regard, probably disproporional to the effort required to support
it.
Cheers,
Bilge
On behalf of all struggling PHP developers who would like to implement
patterns like Value Objects, with custom equality criteriasI seriously doubt anyone is struggling without this, unless you care to
provide proof to the contrary. I think this is "nice to have" at best, and
in that regard, probably disproporional to the effort required to support
it.Cheers,
Bilge
Perhaps. I would like to point out that (somewhat to my surprise) the PHP
reddit thread about my original RFC which was declined had about 2/3
community approval in the straw poll:
https://www.reddit.com/poll/rv11fc
I do not disagree that operator overloads as a feature is a specific tool
for a specific problem, and people can ALWAYS make method calls instead of
use operators. But it seems clear to me that there are a great many
developers who feel the feature will help them write more
understandable/maintainable/capable code. The people "struggling" without
this are people trying to develop extremely technical and niche libraries
(like myself). It is a "nice to have" for most people, but I do not believe
that diminishes the number of developers I've talked with before and after
my previous RFC who were wanting this feature.
Jordan
I want to discuss what changes to the previous proposal people would
be seeking, and why. The most contentious design choice of the
previous proposal was undoubtedly theoperator
keyword and the
decision to make operator overload implementations distinct from
normal magic methods. For some of the voters who voted yes on the
previous RFC, this was a "killer feature" of the proposal, while for
some of the voters who voted no it was the primary reason they were
against the feature.
I am still generally in favour, just like I was on the previous
iteration. And yes, I would say having the "operator" keyword was a
"killer feature" for me.
I hope to start off this discussion productively and work towards
improving the previous proposal into something that voters are willing
to pass. To do that, I think these are the things that need to be
discussed in this thread:
- Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?
Yes. Making it clear what happens is useful.
- Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?
I'm not too worried, but usually smaller RFCs have a larger chance of
being accepted.
cheers,
Derick
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?
I think there are two fundamental decisions which inform a lot of the
rest of the design:
- Are we over-riding operators or operations? That is, is the user
saying "this is what happens when you put a + symbol between two Foo
objects", or "this is what happens when you add two Foo objects together"? - How do we despatch a binary operator to one of its operands? That is,
given $a + $b, where $a and $b are objects of different classes, how do
we choose which implementation to run?
One extreme is the "operators are just methods with funny names"
approach: $a + $b is just sugar for $a->operator+($b); $a can do
whatever it likes, but if it doesn't implement the operator, an error
happens. There's no need to indicate reversed operands, no
implementation on $b is never called.
This is simple to implement, and great for users who want to build
concise DSLs; but that degree of freedom is often unpopular.
Towards the other end on question 1, you have defined operations with
expected semantics, return types, relationships between operators, etc.
The previous RFC actually went down this route for comparisons, defining
a single "operator <=>" that actually overloaded all the comparison
operators at once.
I think if we're going down that route, a name like "__compare" or
"interface Comparable { function compare(...) }" makes more sense -
you're not actually saying "this is what happens if you type a
spaceship", you're saying "here's how to compare two objects".
On question 2, there are a few different possibilities.
Despatch based on type:
a) Binary operators are defined globally on specific type pairs, and the
"best" overload chosen from all those currently loaded
b) Slightly more restricted: they are defined as static methods, and the
best overload chosen from the union of those defined on classes A and B
(this is how C# works)
c) Operator overloads are only possible between a class and a
scalar/non-object, or a class and one of its ancestors; the
implementation on the most specific class is used (e.g. if B extends A,
B's implementation will be used)
All of these can be written in a way that guarantees consistency ($a +
$b will always call the same as $b + $a). Both (a) and (b) would be
quite alien to PHP, which doesn't otherwise have multiple despatch, but
(c) is quite tempting as a conservative approach.
Despatch by trial and error:
d) Each class can only define one overload for an operator, but can
specify which types it accepts; if the definition on type A does not
accept instances of B, the definition on type B is attempted
e) Operator overloads all accept "mixed", but the definition on A can
dynamically return a value which causes the definition on B to be
attempted (this is how Python works)
f) Instead of returning a special value, allow throwing a special
exception; can be combined with option (d) by having the system catch
any TypeError
g) As in the previous proposal, the implementation on class B is only
called if no implementation on class A exists
Each of these can be combined with a special case to always prefer
sub-classes; e.g if B extends A, then (new A) + (new B) should call the
implementation on B first, even though it's on the RHS. (I spotted this
in the Python docs, and it seems very sensible.)
Finally, a very quick note on the OperandPosition enum: I think just a
"bool $isReversed" would be fine - the "natural" expansion of "$a+$b" is
"$a->operator+($b, false)"; the "fallback" is "$b->operator+($a, true)"
Regards,
--
Rowan Tommins
[IMSoP]
On Tue, Sep 17, 2024 at 1:18 AM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?I think there are two fundamental decisions which inform a lot of the
rest of the design:
- Are we over-riding operators or operations? That is, is the user
saying "this is what happens when you put a + symbol between two Foo
objects", or "this is what happens when you add two Foo objects together"?
If we allow developers to define arbitrary code which is executed as a
result of an operator, we will always end up allowing the first one.
- How do we despatch a binary operator to one of its operands? That is,
given $a + $b, where $a and $b are objects of different classes, how do
we choose which implementation to run?
This is something not many other people have been interested in so far, but
interestingly there is a lot of prior art on this question in other
languages! :)
The best approach, from what I have seen and developer usage in other
languages, is somewhat complicated to follow, but I will do my best to make
sure it is understandable to anyone who happens to be following this thread
on internals.
The approach I plan to use for this question has a name: Polymorphic
Handler Resolution. The overload that is executed will be decided by the
following series of decisions:
- Are both of the operands objects? If not, use the overload on the one
that is. (NOTE: if neither are objects, the new code will be bypassed
entirely, so I do not need to handle this case) - If they are both objects, are they both instances of the same class? If
they are, use the overload of the one on the left. - If they are not objects of the same class, is one of them a direct
descendant of the other? If so, use the overload of the descendant. - If neither of them are direct descendants of the other, use the overload
of the object on the left. Does it produce a type error because it does not
accept objects of the type in the other position? Return the error and
abort instead of re-trying by using the overload on the right.
This results from what it means to extend
a class. Suppose you have a
class Foo
and a class Bar
that extends Foo
. If both Foo
and Bar
implement an overload, that means Bar
inherited an overload. It is either
the same as the overload from Foo
, in which case it shouldn't matter
which is executed, or it has been updated with even more specific logic
which is aware of the extra context that Bar
provides, in which case we
want to execute the updated implementation.
So the implementation on the left would almost always be executed, unless
the implementation on the right comes from a class that is a direct
descendant of the class on the left.
Foo + Bar
Bar + Foo
In practice, you would very rarely (if ever) use two classes from entirely
different class inheritance hierarchies in the same overload. That would
closely tie the two classes together in a way that most developers try to
avoid, because the implementation would need to be aware of how to handle
the classes it accepts as an argument.
The exception to this that I can imagine is something like a container,
that maybe does not care what class the other object is because it doesn't
mutate it, only store it.
But for virtually every real-world use case, executing the overload for the
child class regardless of its position would be preferred, because
overloads will tend to be confined to the core types of PHP + the classes
that are part of the hierarchy the overload is designed to interact with.
Finally, a very quick note on the OperandPosition enum: I think just a
"bool $isReversed" would be fine - the "natural" expansion of "$a+$b" is
"$a->operator+($b, false)"; the "fallback" is "$b->operator+($a, true)"Regards,
--
Rowan Tommins
[IMSoP]
This is similar to what I originally designed, and I actually moved to an
enum based on feedback. The argument was something like $isReversed
or
$left
or so on is somewhat ambiguous, while the enum makes it extremely
explicit.
However, it's not a design detail I am committed to. I just want to let you
know why it was done that way.
Jordan
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?I think there are two fundamental decisions which inform a lot of the
rest of the design:
- Are we over-riding operators or operations? That is, is the user
saying "this is what happens when you put a + symbol between two Foo
objects", or "this is what happens when you add two Foo objects together"?If we allow developers to define arbitrary code which is executed as a result of an operator, we will always end up allowing the first one.
- How do we despatch a binary operator to one of its operands? That is,
given $a + $b, where $a and $b are objects of different classes, how do
we choose which implementation to run?This is something not many other people have been interested in so far, but interestingly there is a lot of prior art on this question in other languages! :)
The best approach, from what I have seen and developer usage in other languages, is somewhat complicated to follow, but I will do my best to make sure it is understandable to anyone who happens to be following this thread on internals.
The approach I plan to use for this question has a name: Polymorphic Handler Resolution. The overload that is executed will be decided by the following series of decisions:
- Are both of the operands objects? If not, use the overload on the one that is. (NOTE: if neither are objects, the new code will be bypassed entirely, so I do not need to handle this case)
- If they are both objects, are they both instances of the same class? If they are, use the overload of the one on the left.
- If they are not objects of the same class, is one of them a direct descendant of the other? If so, use the overload of the descendant.
- If neither of them are direct descendants of the other, use the overload of the object on the left. Does it produce a type error because it does not accept objects of the type in the other position? Return the error and abort instead of re-trying by using the overload on the right.
This results from what it means to
extend
a class. Suppose you have a classFoo
and a classBar
that extendsFoo
. If bothFoo
andBar
implement an overload, that meansBar
inherited an overload. It is either the same as the overload fromFoo
, in which case it shouldn't matter which is executed, or it has been updated with even more specific logic which is aware of the extra context thatBar
provides, in which case we want to execute the updated implementation.So the implementation on the left would almost always be executed, unless the implementation on the right comes from a class that is a direct descendant of the class on the left.
Foo + Bar
Bar + Foo
In practice, you would very rarely (if ever) use two classes from entirely different class inheritance hierarchies in the same overload. That would closely tie the two classes together in a way that most developers try to avoid, because the implementation would need to be aware of how to handle the classes it accepts as an argument.
The exception to this that I can imagine is something like a container, that maybe does not care what class the other object is because it doesn't mutate it, only store it.
But for virtually every real-world use case, executing the overload for the child class regardless of its position would be preferred, because overloads will tend to be confined to the core types of PHP + the classes that are part of the hierarchy the overload is designed to interact with.
Finally, a very quick note on the OperandPosition enum: I think just a
"bool $isReversed" would be fine - the "natural" expansion of "$a+$b" is
"$a->operator+($b, false)"; the "fallback" is "$b->operator+($a, true)"Regards,
--
Rowan Tommins
[IMSoP]This is similar to what I originally designed, and I actually moved to an enum based on feedback. The argument was something like
$isReversed
or$left
or so on is somewhat ambiguous, while the enum makes it extremely explicit.However, it's not a design detail I am committed to. I just want to let you know why it was done that way.
Jordan
To be clear: I’m very much in favor of operator overloading. I frequently work with both Money value objects, and DateTime objects that I need to manipulate through arithmetic with others of the same type.
What if I wanted to create a generic add($a, $b)
function, how would I type hint the params to ensure that I only get “addable” things? I would expect that to be:
- Ints
- Floats
- Objects of classes with “operator+” defined
I think that an interface is the right solution for that, and you can just union with int/float type hints: add(int | float | Addable …$operands) (or add(int | float | (Foo & Addable) …$operands)
Is this type of behavior even allowed? I think the intention is that it must be otherwise the decision over which overload method gets called is drastically simplified.
Perhaps for a first iteration, operator overloads only work between objects of the same type or their descendants — and if a descendant overrides the overload, the descendants version is used regardless of left/right precedence.
I suspect this will simplify the complexity of the magic, and solve the majority of cases where operator overloading is desired.
- Davey
On Tue, Sep 17, 2024 at 1:18 AM Rowan Tommins [IMSoP] <
imsop.php@rwec.co.uk> wrote:
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?I think there are two fundamental decisions which inform a lot of the
rest of the design:
- Are we over-riding operators or operations? That is, is the user
saying "this is what happens when you put a + symbol between two Foo
objects", or "this is what happens when you add two Foo objects together"?If we allow developers to define arbitrary code which is executed as a
result of an operator, we will always end up allowing the first one.
- How do we despatch a binary operator to one of its operands? That is,
given $a + $b, where $a and $b are objects of different classes, how do
we choose which implementation to run?This is something not many other people have been interested in so far,
but interestingly there is a lot of prior art on this question in other
languages! :)The best approach, from what I have seen and developer usage in other
languages, is somewhat complicated to follow, but I will do my best to make
sure it is understandable to anyone who happens to be following this thread
on internals.The approach I plan to use for this question has a name: Polymorphic
Handler Resolution. The overload that is executed will be decided by the
following series of decisions:
- Are both of the operands objects? If not, use the overload on the one
that is. (NOTE: if neither are objects, the new code will be bypassed
entirely, so I do not need to handle this case)- If they are both objects, are they both instances of the same class? If
they are, use the overload of the one on the left.- If they are not objects of the same class, is one of them a direct
descendant of the other? If so, use the overload of the descendant.- If neither of them are direct descendants of the other, use the
overload of the object on the left. Does it produce a type error because it
does not accept objects of the type in the other position? Return the error
and abort instead of re-trying by using the overload on the right.This results from what it means to
extend
a class. Suppose you have a
classFoo
and a classBar
that extendsFoo
. If bothFoo
andBar
implement an overload, that meansBar
inherited an overload. It is either
the same as the overload fromFoo
, in which case it shouldn't matter
which is executed, or it has been updated with even more specific logic
which is aware of the extra context thatBar
provides, in which case we
want to execute the updated implementation.So the implementation on the left would almost always be executed, unless
the implementation on the right comes from a class that is a direct
descendant of the class on the left.
Foo + Bar
Bar + Foo
In practice, you would very rarely (if ever) use two classes from entirely
different class inheritance hierarchies in the same overload. That would
closely tie the two classes together in a way that most developers try to
avoid, because the implementation would need to be aware of how to handle
the classes it accepts as an argument.The exception to this that I can imagine is something like a container,
that maybe does not care what class the other object is because it doesn't
mutate it, only store it.But for virtually every real-world use case, executing the overload for
the child class regardless of its position would be preferred, because
overloads will tend to be confined to the core types of PHP + the classes
that are part of the hierarchy the overload is designed to interact with.Finally, a very quick note on the OperandPosition enum: I think just a
"bool $isReversed" would be fine - the "natural" expansion of "$a+$b" is
"$a->operator+($b, false)"; the "fallback" is "$b->operator+($a, true)"Regards,
--
Rowan Tommins
[IMSoP]This is similar to what I originally designed, and I actually moved to an
enum based on feedback. The argument was something like$isReversed
or
$left
or so on is somewhat ambiguous, while the enum makes it extremely
explicit.However, it's not a design detail I am committed to. I just want to let
you know why it was done that way.Jordan
To be clear: I’m very much in favor of operator overloading. I frequently
work with both Money value objects, and DateTime objects that I need to
manipulate through arithmetic with others of the same type.What if I wanted to create a generic
add($a, $b)
function, how would I
type hint the params to ensure that I only get “addable” things? I would
expect that to be:
- Ints
- Floats
- Objects of classes with “operator+” defined
I think that an interface is the right solution for that, and you can just
union with int/float type hints: add(int | float | Addable …$operands) (or
add(int | float | (Foo & Addable) …$operands)Is this type of behavior even allowed? I think the intention is that it
must be otherwise the decision over which overload method gets called is
drastically simplified.Perhaps for a first iteration, operator overloads only work between
objects of the same type or their descendants — and if a descendant
overrides the overload, the descendants version is used regardless of
left/right precedence.I suspect this will simplify the complexity of the magic, and solve the
majority of cases where operator overloading is desired.
- Davey
The problem with providing interfaces is something the nikic addressed very
early in my design process and convinced me of: an Addable
interface will
not actually tell you if two objects can be added together. A Money
class
and a Vector2D
class might both have an implementation for operator +()
and implement some kind of Addable
interface. But there is no sensible
way in which they could actually be added. Knowing that an object
implements an overload is not enough in most cases to use operators with
them. This is part of the reason that I am skeptical of people who worry
about accidentally using random overloads.
The signature for the implementation in the Money
class, might look
something like this:
operator +(Money $other, OperandPosition $position): Money
while the signature for the implementation in the Vector2D
class might
look something like this:
operator +(Vector2D|array $other, OperandPosition $position): Vector2D
Any attempt to add these two together will result in a TypeError
.
Classes which have overloads that look like the following would be
something I think developers should be IMMEDIATELY suspicious of:
operator +(object $other, OperandPosition $position)
operator +(mixed $other, OperandPosition $position)
Does your implementation really have a plan for how to +
with a stream
resource like a file handler, as well as an int? Can you just as easily use
+
with the DateTime
class as you can with a Money
class in your
implementation?
I think there are very few use cases that would survive code reviews or
feedback or testing that look like any of these signatures.
There are situations in which objects might accept objects from a different
class hierarchy. For instance, with the changes Saki has made there are now
objects for numbers in the BcMath extension. Those are objects that might
be quite widely accepted in overload implementations, since they represent
numbers in the same way that just an int or float might. But I highly doubt
that it's even possible for the overload to accept those sorts of things
without also being aware of them, and if the overload is aware of them it
can type-hint them in the signature.
Jordan
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?I think there are two fundamental decisions which inform a lot of the
rest of the design:
- Are we over-riding operators or operations? That is, is the user
saying "this is what happens when you put a + symbol between two Foo
objects", or "this is what happens when you add two Foo objects together"?If we allow developers to define arbitrary code which is executed as a result of an operator, we will always end up allowing the first one.
- How do we despatch a binary operator to one of its operands? That is,
given $a + $b, where $a and $b are objects of different classes, how do
we choose which implementation to run?This is something not many other people have been interested in so far, but interestingly there is a lot of prior art on this question in other languages! :)
The best approach, from what I have seen and developer usage in other languages, is somewhat complicated to follow, but I will do my best to make sure it is understandable to anyone who happens to be following this thread on internals.
The approach I plan to use for this question has a name: Polymorphic Handler Resolution. The overload that is executed will be decided by the following series of decisions:
- Are both of the operands objects? If not, use the overload on the one that is. (NOTE: if neither are objects, the new code will be bypassed entirely, so I do not need to handle this case)
- If they are both objects, are they both instances of the same class? If they are, use the overload of the one on the left.
- If they are not objects of the same class, is one of them a direct descendant of the other? If so, use the overload of the descendant.
- If neither of them are direct descendants of the other, use the overload of the object on the left. Does it produce a type error because it does not accept objects of the type in the other position? Return the error and abort instead of re-trying by using the overload on the right.
This results from what it means to
extend
a class. Suppose you have a classFoo
and a classBar
that extendsFoo
. If bothFoo
andBar
implement an overload, that meansBar
inherited an overload. It is either the same as the overload fromFoo
, in which case it shouldn't matter which is executed, or it has been updated with even more specific logic which is aware of the extra context thatBar
provides, in which case we want to execute the updated implementation.So the implementation on the left would almost always be executed, unless the implementation on the right comes from a class that is a direct descendant of the class on the left.
Foo + Bar
Bar + Foo
In practice, you would very rarely (if ever) use two classes from entirely different class inheritance hierarchies in the same overload. That would closely tie the two classes together in a way that most developers try to avoid, because the implementation would need to be aware of how to handle the classes it accepts as an argument.
The exception to this that I can imagine is something like a container, that maybe does not care what class the other object is because it doesn't mutate it, only store it.
But for virtually every real-world use case, executing the overload for the child class regardless of its position would be preferred, because overloads will tend to be confined to the core types of PHP + the classes that are part of the hierarchy the overload is designed to interact with.
Finally, a very quick note on the OperandPosition enum: I think just a
"bool $isReversed" would be fine - the "natural" expansion of "$a+$b" is
"$a->operator+($b, false)"; the "fallback" is "$b->operator+($a, true)"Regards,
--
Rowan Tommins
[IMSoP]This is similar to what I originally designed, and I actually moved to an enum based on feedback. The argument was something like
$isReversed
or$left
or so on is somewhat ambiguous, while the enum makes it extremely explicit.However, it's not a design detail I am committed to. I just want to let you know why it was done that way.
Jordan
To be clear: I’m very much in favor of operator overloading. I frequently work with both Money value objects, and DateTime objects that I need to manipulate through arithmetic with others of the same type.
What if I wanted to create a generic
add($a, $b)
function, how would I type hint the params to ensure that I only get “addable” things? I would expect that to be:
- Ints
- Floats
- Objects of classes with “operator+” defined
I think that an interface is the right solution for that, and you can just union with int/float type hints: add(int | float | Addable …$operands) (or add(int | float | (Foo & Addable) …$operands)
Is this type of behavior even allowed? I think the intention is that it must be otherwise the decision over which overload method gets called is drastically simplified.
Perhaps for a first iteration, operator overloads only work between objects of the same type or their descendants — and if a descendant overrides the overload, the descendants version is used regardless of left/right precedence.
I suspect this will simplify the complexity of the magic, and solve the majority of cases where operator overloading is desired.
- Davey
The problem with providing interfaces is something the nikic addressed very early in my design process and convinced me of: an
Addable
interface will not actually tell you if two objects can be added together. AMoney
class and aVector2D
class might both have an implementation foroperator +()
and implement some kind ofAddable
interface. But there is no sensible way in which they could actually be added. Knowing that an object implements an overload is not enough in most cases to use operators with them. This is part of the reason that I am skeptical of people who worry about accidentally using random overloads.The signature for the implementation in the
Money
class, might look something like this:
operator +(Money $other, OperandPosition $position): Money
while the signature for the implementation in the
Vector2D
class might look something like this:
operator +(Vector2D|array $other, OperandPosition $position): Vector2D
Any attempt to add these two together will result in a
TypeError
.Classes which have overloads that look like the following would be something I think developers should be IMMEDIATELY suspicious of:
operator +(object $other, OperandPosition $position)
operator +(mixed $other, OperandPosition $position)
Does your implementation really have a plan for how to
+
with a stream resource like a file handler, as well as an int? Can you just as easily use+
with theDateTime
class as you can with aMoney
class in your implementation?I think there are very few use cases that would survive code reviews or feedback or testing that look like any of these signatures.
There are situations in which objects might accept objects from a different class hierarchy. For instance, with the changes Saki has made there are now objects for numbers in the BcMath extension. Those are objects that might be quite widely accepted in overload implementations, since they represent numbers in the same way that just an int or float might. But I highly doubt that it's even possible for the overload to accept those sorts of things without also being aware of them, and if the overload is aware of them it can type-hint them in the signature.
Jordan
Goods points, while Money objects are frequently added together, I would typically add DateInterval instances to DateTime instances, which breaks the limitation.
- Davey
1. Are we over-riding *operators* or *operations*? That is, is the user saying "this is what happens when you put a + symbol between two Foo objects", or "this is what happens when you add two Foo objects together"?
If we allow developers to define arbitrary code which is executed as a
result of an operator, we will always end up allowing the first one.
I don't think that's really true. Take the behaviour of comparisons in
your previous RFC: if that RFC had been accepted, the user would have
had no way to make $a < $b and $a > $b have different behaviour, because
the same overload would be called, with the same parameters, in both cases.
Slightly less strict is requiring groups of operators: the Haskell "num"
typeclass (roughly similar to an interface) requires definitions for all
of "+", "*", "abs", "signum", "fromInteger", and either unary or binary
"-". It also defines the type signatures for each. If this was the only
way to overload the "+" operator, users would have to really go out of
their way to use it to mean something unrelated addition.
As it happens, Haskell does allow arbitrary operator overloads, and in
fact goes to the other extreme and allows entirely new operators to be
invented. The same is true in PostgreSQL - you can implement the
<<//-^+^-//>> operator if you want to.
I think it's absolutely possible - and desirable - to choose a
philosophical position on that spectrum, and use it to drive design
decisions. The choice of "__add" vs "operator+" is one such decision.
The approach I plan to use for this question has a name: Polymorphic
Handler Resolution. The overload that is executed will be decided by
the following series of decisions:
- Are both of the operands objects? If not, use the overload on the
one that is. (NOTE: if neither are objects, the new code will be
bypassed entirely, so I do not need to handle this case)- If they are both objects, are they both instances of the same
class? If they are, use the overload of the one on the left.- If they are not objects of the same class, is one of them a direct
descendant of the other? If so, use the overload of the descendant.- If neither of them are direct descendants of the other, use the
overload of the object on the left. Does it produce a type error
because it does not accept objects of the type in the other position?
Return the error and abort instead of re-trying by using the overload
on the right.
This is option (g) in my list, with the additional "prefer sub-classes"
rule (step 3), which I agree would be a good addition.
As noted, it doesn't provide symmetry, because step 4 depends on the
order in the source code. Option (c) is the same algorithm without step
4, so guarantees that $a + $b and $b + $a will always call the same method.
Options (d), (e), and (f) each add an extra step: one operand can signal
"I don't know" and the other operand gets a chance to answer. They're
essentially ways to "partially implement" an operator.
Options (a) and (b) perform the same kind of polymorphic resolution on
both operands, which is how many languages work for functions and/or
methods already.
Reading the C# spec, if there is more than one candidate overload which
is equally specific, an error is raised. I guess you could do the same
even with one implementation per class, by replacing step 4 in your
algorithm:
- If neither of them are direct descendants of the other, and only
one implements the operator, use it.- If neither of them are direct descendants of the other, and both
implement the operator, throw an error.
Let's call that option (h) :)
By the way, searching online for the phrase "Polymorphic Handler
Resolution" finds no results other than you saying it is the name for
this algorithm.
This is similar to what I originally designed, and I actually moved to
an enum based on feedback. The argument was something like
$isReversed
or$left
or so on is somewhat ambiguous, while the
enum makes it extremely explicit.
Ah, fair enough. Explicitness vs conciseness is always a trade-off. My
thinking was that the "reversed" form would be far more rarely called
than the "normal" form; but that depends a lot on which resolution
algorithm is used.
Regards,
--
Rowan Tommins
[IMSoP]
On Tue, Sep 17, 2024 at 12:27 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
- Are we over-riding operators or operations? That is, is the user
saying "this is what happens when you put a + symbol between two Foo
objects", or "this is what happens when you add two Foo objects together"?If we allow developers to define arbitrary code which is executed as a
result of an operator, we will always end up allowing the first one.I don't think that's really true. Take the behaviour of comparisons in
your previous RFC: if that RFC had been accepted, the user would have had
no way to make $a < $b and $a > $b have different behaviour, because the
same overload would be called, with the same parameters, in both cases.Slightly less strict is requiring groups of operators: the Haskell "num"
typeclass (roughly similar to an interface) requires definitions for all of
"+", "*", "abs", "signum", "fromInteger", and either unary or binary "-".
It also defines the type signatures for each. If this was the only way to
overload the "+" operator, users would have to really go out of their way
to use it to mean something unrelated addition.As it happens, Haskell does allow arbitrary operator overloads, and in
fact goes to the other extreme and allows entirely new operators to be
invented. The same is true in PostgreSQL - you can implement the
<<//-^+^-//>> operator if you want to.I think it's absolutely possible - and desirable - to choose a
philosophical position on that spectrum, and use it to drive design
decisions. The choice of "__add" vs "operator+" is one such decision.
Ah, I see. I suppose I never really entertained an idea like this because
in my mind it can't even handle non-trivial math, let alone the sorts of
things that people might want to use overloads for. Once you get past
arithmetic with real numbers into almost any other kind of math, which
operators are meaningful, and what they mean exactly, begins to depend a
lot on context. This is why I felt like even if we were limiting the use
cases to math projects, things like commutativity should not necessarily be
enforced.
The line $a + $b
and $b + $a
are SUPPOSED to give different results for
certain types of math objects, for instance. The line $a - $b
and $b - $a
more obviously give different results to most people, because
subtraction is not commutative even for real numbers.
My personal opinion is that the RFC should not assume the overloads are
used in a particular domain (like real number arithmetic), and thus should
not attempt to enforce these kinds of behaviors. But, opinions like this
are actually what I was hoping to receive from this thread. This could be
the way forward that voters are more interested in, even if it wouldn't be
my own first preference as it will be highly limiting to the applicable
domains.
The approach I plan to use for this question has a name: Polymorphic
Handler Resolution. The overload that is executed will be decided by the
following series of decisions:
- Are both of the operands objects? If not, use the overload on the one
that is. (NOTE: if neither are objects, the new code will be bypassed
entirely, so I do not need to handle this case)- If they are both objects, are they both instances of the same class? If
they are, use the overload of the one on the left.- If they are not objects of the same class, is one of them a direct
descendant of the other? If so, use the overload of the descendant.- If neither of them are direct descendants of the other, use the
overload of the object on the left. Does it produce a type error because it
does not accept objects of the type in the other position? Return the error
and abort instead of re-trying by using the overload on the right.This is option (g) in my list, with the additional "prefer sub-classes"
rule (step 3), which I agree would be a good addition.As noted, it doesn't provide symmetry, because step 4 depends on the order
in the source code. Option (c) is the same algorithm without step 4, so
guarantees that $a + $b and $b + $a will always call the same method.Options (d), (e), and (f) each add an extra step: one operand can signal
"I don't know" and the other operand gets a chance to answer. They're
essentially ways to "partially implement" an operator.Options (a) and (b) perform the same kind of polymorphic resolution on
both operands, which is how many languages work for functions and/or
methods already.Reading the C# spec, if there is more than one candidate overload which is
equally specific, an error is raised. I guess you could do the same even
with one implementation per class, by replacing step 4 in your algorithm:
- If neither of them are direct descendants of the other, and only one
implements the operator, use it.- If neither of them are direct descendants of the other, and both
implement the operator, throw an error.Let's call that option (h) :)
By the way, searching online for the phrase "Polymorphic Handler
Resolution" finds no results other than you saying it is the name for this
algorithm.
Hmmm, I will see if I can find where I came across the term in my original
research then. I did about 4 months of research for my RFC, but that was
several years ago at this point, so I might be mistaken.
So I understand here that you're looking for commutativity in which
overload is actually called, even if it doesn't create commutativity in
the result of the operation. That the executed overload should be the
same no matter the order of the operands.
This was something I also was interested in, but I could not find a
solution I was happy with. All of the things you have detailed here have
tradeoffs that I'm unsure about. This is an open question of design that I
feel requires more input and more voices from others who are interested,
because I don't feel like any of these approaches (including the one that I
went with) are better, they are just different.
This is similar to what I originally designed, and I actually moved to an
enum based on feedback. The argument was something like$isReversed
or
$left
or so on is somewhat ambiguous, while the enum makes it extremely
explicit.Ah, fair enough. Explicitness vs conciseness is always a trade-off. My
thinking was that the "reversed" form would be far more rarely called than
the "normal" form; but that depends a lot on which resolution algorithm is
used.Regards,
--
Rowan Tommins
[IMSoP]
It would also depend on whether it is used with scalars.
For instance, $numObj - 5
and 5 - $numObj
. For both of these, you want
to call the overload on $numObj
, because it's the only avenue that won't
result in a fatal error (assuming that the overload knows how to work with
int values). The case of an object with an overload being used with an
operand that is a non-object will most likely result in reversed calls
quite frequently. This will be a prominent issue for some use cases (like
arbitrary precision math), and an almost non-existent issue for other use
cases (like currency or time).
Jordan
I think it's absolutely possible - and desirable - to choose a philosophical position on that spectrum, and use it to drive design decisions. The choice of "__add" vs "operator+" is one such decision.
Ah, I see. I suppose I never really entertained an idea like this
because in my mind it can't even handle non-trivial math, let alone the
sorts of things that people might want to use overloads for. Once you
get past arithmetic with real numbers into almost any other kind of
math, which operators are meaningful, and what they mean exactly,
begins to depend a lot on context. This is why I felt like even if we
were limiting the use cases to math projects, things like commutativity
should not necessarily be enforced.The line
$a + $b
and$b + $a
are SUPPOSED to give different results
for certain types of math objects, for instance. The line$a - $b
and
$b - $a
more obviously give different results to most people, because
subtraction is not commutative even for real numbers.My personal opinion is that the RFC should not assume the overloads are
used in a particular domain (like real number arithmetic), and thus
should not attempt to enforce these kinds of behaviors. But, opinions
like this are actually what I was hoping to receive from this thread.
This could be the way forward that voters are more interested in, even
if it wouldn't be my own first preference as it will be highly limiting
to the applicable domains.
I'm not sure where exactly in this thread to put this, so I'm putting it here...
Rowan makes an interesting point regarding operators vs operations. In particular, the way the <=> logic is defined, it is defining an operation: comparison. Using it for anything other than ordering comparison is simply not viable, nor useful. It's defining a custom implementation if a specific pre-existing action.
For all the other operators, the logic seems to be defined for an operator, the behavior of which is "whatever makes sense in your use case, idk." That is, to use Rowan's distinction, a philosophically different approach. Not a bad one, necessarily. In fact, I think it's a very good one.
But, as they are different, perhaps that suggests that comparison should instead not be implemented as an operator overload per se, but as a named magic method. The existing logic for it is, I think, fine, but it's a fair criticism that you're not defining "what happens for a method-ish named <=>", you're defining "how do objects compare." So I think it would make sense to replace the <=> override with a __compare(mixed $other): int
, which any class could implement to opt-in to ordering comparisons, and thus work with <, >, ==, <=>, etc. (And, importantly, still keep the "specify the type(s) you want to be able to compare against" logic, already defined.)
A similar argument could probably be made for ==, though I've not fully thought through if I agree or not. Again, I think the previously defined logic is fine. It would be just changing the spelling from operator ==(mixed $other): bool
to public function __equals(mixed $other): bool
. But that again better communicates that it is a core language behavior that is being overridden, rather than an arbitrarily defined symbol-function-thing with domain-specific meaning.
There was an RFC for a Comparable interface back in the stone age (2010), but it looks like it never went to a vote: https://wiki.php.net/rfc/comparable
Arguably, this would then make more sense as a stand-alone RFC that happens to reuse a lot of the existing code and logic defined for operator overloads, which are all still just as valid.
That does not apply to the arithmetic, bitwise, or logic operators. Overriding + or / for a specific domain is not the same, as you're not hooking into engine behavior the way <=> or == are. For those, I'd prefer to stick to the current/previous implementation, with the operator
keyword, for reasons I explained before.
Jordan, does that distinction make sense to you?
--Larry Garfield
On Tue, Sep 17, 2024 at 6:49 PM Larry Garfield larry@garfieldtech.com
wrote:
I think it's absolutely possible - and desirable - to choose a
philosophical position on that spectrum, and use it to drive design
decisions. The choice of "__add" vs "operator+" is one such decision.Ah, I see. I suppose I never really entertained an idea like this
because in my mind it can't even handle non-trivial math, let alone the
sorts of things that people might want to use overloads for. Once you
get past arithmetic with real numbers into almost any other kind of
math, which operators are meaningful, and what they mean exactly,
begins to depend a lot on context. This is why I felt like even if we
were limiting the use cases to math projects, things like commutativity
should not necessarily be enforced.The line
$a + $b
and$b + $a
are SUPPOSED to give different results
for certain types of math objects, for instance. The line$a - $b
and
$b - $a
more obviously give different results to most people, because
subtraction is not commutative even for real numbers.My personal opinion is that the RFC should not assume the overloads are
used in a particular domain (like real number arithmetic), and thus
should not attempt to enforce these kinds of behaviors. But, opinions
like this are actually what I was hoping to receive from this thread.
This could be the way forward that voters are more interested in, even
if it wouldn't be my own first preference as it will be highly limiting
to the applicable domains.I'm not sure where exactly in this thread to put this, so I'm putting it
here...Rowan makes an interesting point regarding operators vs operations. In
particular, the way the <=> logic is defined, it is defining an operation:
comparison. Using it for anything other than ordering comparison is simply
not viable, nor useful. It's defining a custom implementation if a
specific pre-existing action.For all the other operators, the logic seems to be defined for an
operator, the behavior of which is "whatever makes sense in your use case,
idk." That is, to use Rowan's distinction, a philosophically different
approach. Not a bad one, necessarily. In fact, I think it's a very good
one.But, as they are different, perhaps that suggests that comparison should
instead not be implemented as an operator overload per se, but as a named
magic method. The existing logic for it is, I think, fine, but it's a fair
criticism that you're not defining "what happens for a method-ish named
<=>", you're defining "how do objects compare." So I think it would make
sense to replace the <=> override with a__compare(mixed $other): int
,
which any class could implement to opt-in to ordering comparisons, and thus
work with <, >, ==, <=>, etc. (And, importantly, still keep the "specify
the type(s) you want to be able to compare against" logic, already defined.)A similar argument could probably be made for ==, though I've not fully
thought through if I agree or not. Again, I think the previously defined
logic is fine. It would be just changing the spelling fromoperator ==(mixed $other): bool
topublic function __equals(mixed $other): bool
.
But that again better communicates that it is a core language behavior that
is being overridden, rather than an arbitrarily defined
symbol-function-thing with domain-specific meaning.There was an RFC for a Comparable interface back in the stone age (2010),
but it looks like it never went to a vote:
https://wiki.php.net/rfc/comparableArguably, this would then make more sense as a stand-alone RFC that
happens to reuse a lot of the existing code and logic defined for operator
overloads, which are all still just as valid.That does not apply to the arithmetic, bitwise, or logic operators.
Overriding + or / for a specific domain is not the same, as you're not
hooking into engine behavior the way <=> or == are. For those, I'd prefer
to stick to the current/previous implementation, with theoperator
keyword, for reasons I explained before.Jordan, does that distinction make sense to you?
--Larry Garfield
Yes, I certainly understand the distinction. The RFC does not treat all
operators equally. For many of them, the only opinion it holds is whether
or not the operator is unary or binary, which is something enforced by the
compiler anyway. But for comparisons, the RFC went out of its way to ensure
that the overloads cannot repurpose any comparisons for non-comparison,
non-ordering tasks usefully (as far as return value goes). In that sense,
yes, I see how that feels more like an operation instead of an operator.
The only hesitation I would have about that is the clunky/ugly feeling I
get of having some of them be symbols and some of them be names for a
reason that will be totally inscrutable to 95% of developers and just be
"one of those PHP quirks". In principle though, I do get what you're saying
here.
Jordan
- Are we over-riding operators or operations? That is, is the user
saying "this is what happens when you put a + symbol between two Foo
objects", or "this is what happens when you add two Foo objects together"?If we allow developers to define arbitrary code which is executed as a result of an operator, we will always end up allowing the first one.
I don't think that's really true. Take the behaviour of comparisons in your previous RFC: if that RFC had been accepted, the user would have had no way to make $a < $b and $a > $b have different behaviour, because the same overload would be called, with the same parameters, in both cases.
Slightly less strict is requiring groups of operators: the Haskell "num" typeclass (roughly similar to an interface) requires definitions for all of "+", "*", "abs", "signum", "fromInteger", and either unary or binary "-". It also defines the type signatures for each. If this was the only way to overload the "+" operator, users would have to really go out of their way to use it to mean something unrelated addition.
As it happens, Haskell does allow arbitrary operator overloads, and in fact goes to the other extreme and allows entirely new operators to be invented. The same is true in PostgreSQL - you can implement the <<//-^+^-//>> operator if you want to.
I think it's absolutely possible - and desirable - to choose a philosophical position on that spectrum, and use it to drive design decisions. The choice of "__add" vs "operator+" is one such decision.
The approach I plan to use for this question has a name: Polymorphic Handler Resolution. The overload that is executed will be decided by the following series of decisions:
- Are both of the operands objects? If not, use the overload on the one that is. (NOTE: if neither are objects, the new code will be bypassed entirely, so I do not need to handle this case)
- If they are both objects, are they both instances of the same class? If they are, use the overload of the one on the left.
- If they are not objects of the same class, is one of them a direct descendant of the other? If so, use the overload of the descendant.
- If neither of them are direct descendants of the other, use the overload of the object on the left. Does it produce a type error because it does not accept objects of the type in the other position? Return the error and abort instead of re-trying by using the overload on the right.
This is option (g) in my list, with the additional "prefer sub-classes" rule (step 3), which I agree would be a good addition.
As noted, it doesn't provide symmetry, because step 4 depends on the order in the source code. Option (c) is the same algorithm without step 4, so guarantees that $a + $b and $b + $a will always call the same method.
Options (d), (e), and (f) each add an extra step: one operand can signal "I don't know" and the other operand gets a chance to answer. They're essentially ways to "partially implement" an operator.
Options (a) and (b) perform the same kind of polymorphic resolution on both operands, which is how many languages work for functions and/or methods already.
Reading the C# spec, if there is more than one candidate overload which is equally specific, an error is raised. I guess you could do the same even with one implementation per class, by replacing step 4 in your algorithm:
- If neither of them are direct descendants of the other, and only one implements the operator, use it.
- If neither of them are direct descendants of the other, and both implement the operator, throw an error.
Let's call that option (h) :)
By the way, searching online for the phrase "Polymorphic Handler Resolution" finds no results other than you saying it is the name for this algorithm.
This is similar to what I originally designed, and I actually moved to an enum based on feedback. The argument was something like
$isReversed
or$left
or so on is somewhat ambiguous, while the enum makes it extremely explicit.Ah, fair enough. Explicitness vs conciseness is always a trade-off. My thinking was that the "reversed" form would be far more rarely called than the "normal" form; but that depends a lot on which resolution algorithm is used.
Regards,
--
Rowan Tommins
[IMSoP]
To be honest, this juggling of caller orders has me a bit concerned. For example, matrix multiplication isn’t communitive, as are non-abelion groups in general (quaternions being another popular system), but, I am used to Scala, where the left-hand is the one always called.
I understand that this is what the operant position is for, but it strikes me as something that extreme care has to be called for when working with these types of objects when another object is involved. For example, quaternions can be multiplied by a matrix and the order is super important (used for 3d rotations) but it appears the actual method called may not be deterministic because these classes may be unrelated, the one on the left is called, which may or may not result in a correct answer. All that is to say, this is just to illustrate how complex this ordering algorithm seems to be. Depending on how the libraries are implemented and whether they are designed to work together.
I would prefer to see something simple, and easy to reason about. We can abuse some mathematical properties to result in something quite simple:
- If both are scalar, use existing logic.
- If one is scalar and the other is not, use existing logic.
- If one is scalar and the other overrides the operation, rearrange the operation per its communitive rules so the object is on the left. $scalar + $obj == $obj + $scalar; $scalar - $obj == -$obj + $scalar, -($obj - $scalar). It is generally accepted (IIRC) that when scalars are involved, we don’t need to be concerned with non-abelion groups.
- If both are objects, use the one on the left.
I think this is much easier to reason about (you either get a scalar or another object) that doesn’t involve a developer deeply understanding the inheritance of the objects in question or to understand the algorithm for choosing which one will be called.
— Rob
I would prefer to see something simple, and easy to reason about. We
can abuse some mathematical properties to result in something quite
simple:
- If both are scalar, use existing logic.
- If one is scalar and the other is not, use existing logic.
- If one is scalar and the other overrides the operation, rearrange
the operation per its communitive rules so the object is on the
left. $scalar + $obj == $obj + $scalar; $scalar - $obj == -$obj +
$scalar, -($obj - $scalar). It is generally accepted (IIRC) that
when scalars are involved, we don’t need to be concerned with
non-abelion groups.- If both are objects, use the one on the left.
Step 3 requires operators to be overloaded in groups: the rearrangement
of the binary "-" operator requires definitions of both the unary "-"
operator and the binary "+" operator; and definitions that meet the
appropriate mathematical rules.
IMO, that's a lot more complicated than calling the "+" overload with an
OperandPosition::RightSide flag; or Python's approach of separate "add"
and "reflected add" magic methods.
Since you mentioned Scala, I looked it up, and it seems to be on the
other end of the spectrum: operators are just methods, with no
mathematical meaning or special dispatch behaviour. In fact, "a plus b"
is just another way of writing "a.plus(b)", so "a + b" is just a way of
writing "a.+(b)"
Maybe it would be "useful enough" to just restrict to left-hand side:
- If the left operand is an object which implements the specified
operator, call that implementation with the right operand as argument - Else, proceed as current PHP.
This is where gathering a good catalogue of use cases would come in
handy: which of them would be impossible, or annoyingly difficult, with
a more restrictive resolution method?
Regards,
--
Rowan Tommins
[IMSoP]
Since you mentioned Scala, I looked it up, and it seems to be on the
other end of the spectrum: operators are just methods, with no
mathematical meaning or special dispatch behaviour. In fact, "a plus b"
is just another way of writing "a.plus(b)", so "a + b" is just a way of
writing "a.+(b)"Maybe it would be "useful enough" to just restrict to left-hand side:
In my opinion, this is the only reasonable way to implement operator
overloads in PHP. It is easy to understand, and what can easily be
understood is easy to explain, document, and to reason about. I do not
understand why we're talking about commutative operations; even the
inconspicuous plus operator is not commutative in PHP
(https://3v4l.org/nQcL5). Those who want to implement an numerical
tower (or whatever) can still implement the operations as being
commutative (where appropriate) by doing manual double-dispatch. Yeah,
doesn't fit with scalars, but where is the actual problem? And I
wouldn't want to restrict the functionality of overloading exiting
operators. If a library completely goes overboard with operator
overloading, either only few will use it, or it might be a fantastic
tool of which nobody of us could have even thought of.
Now, comparison operators pose a particular issue if overloads where
implemented this way, namely that the engine already swaps them; there
are no greater than (or equal) OPcodes. However, this already doesn't
work for uncomparable values, yielding "surprising" results (e.g.
#15773). As such, it might be worth considering to have a separate PR
regarding (overloading of) comparison operators.
And I would consider equality operator overloading as yet a different
issue, since that operation is (or at least should be) inherently
commutative. What we have now, however, is not that helpful, and breaks
encapsulation (https://3v4l.org/hTR2v); although without that it would
be completely useless.
Christoph
Maybe it would be "useful enough" to just restrict to left-hand side:
In my opinion, this is the only reasonable way to implement operator
overloads in PHP. It is easy to understand, and what can easily be
understood is easy to explain, document, and to reason about. I do not
understand why we're talking about commutative operations; even the
inconspicuous plus operator is not commutative in PHP
(https://3v4l.org/nQcL5).
There are really three different things we shouldn't confuse:
-
Commutativity of the operation, as in $a + $b and $b + $a having the same result. As you say, this is a non-goal; we already have examples of non-commutative operators in PHP, and there are plenty more that have been given.
-
Commutativity of the resolution. This is slightly subtler: if $a and $b both have implementations of the operator, should $a + $b and $b + $a call the same implementation? We can say "no", but it may be surprising to some users that if $b is a sub-class of $a, its version of + isn't used by preference.
-
Resolution when only one side has an implementation. For instance, how do you define an overload for 1 / $object? Or for (new DateTime) + (new MySpecialDateOffset)? It's possible to work around this if the custom class has to be on the left, but probably not very intuitive.
It's also worth considering that the resolution of PHP's operators aren't currently determined by their left-hand side, e.g. int + float and float + int both return a float, which certainly feels like "preferring the float implementation regardless of order", even if PHP doesn't technically implement it that way.
Regards,
Rowan Tommins
[IMSoP]
On Wed, Sep 18, 2024 at 6:11 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:
Maybe it would be "useful enough" to just restrict to left-hand side:
In my opinion, this is the only reasonable way to implement operator
overloads in PHP. It is easy to understand, and what can easily be
understood is easy to explain, document, and to reason about. I do not
understand why we're talking about commutative operations; even the
inconspicuous plus operator is not commutative in PHP
(https://3v4l.org/nQcL5).There are really three different things we shouldn't confuse:
Commutativity of the operation, as in $a + $b and $b + $a having the
same result. As you say, this is a non-goal; we already have examples of
non-commutative operators in PHP, and there are plenty more that have been
given.Commutativity of the resolution. This is slightly subtler: if $a and
$b both have implementations of the operator, should $a + $b and $b + $a
call the same implementation? We can say "no", but it may be surprising to
some users that if $b is a sub-class of $a, its version of + isn't used by
preference.Resolution when only one side has an implementation. For instance,
how do you define an overload for 1 / $object? Or for (new DateTime) + (new
MySpecialDateOffset)? It's possible to work around this if the custom class
has to be on the left, but probably not very intuitive.It's also worth considering that the resolution of PHP's operators
aren't currently determined by their left-hand side, e.g. int + float and
float + int both return a float, which certainly feels like "preferring the
float implementation regardless of order", even if PHP doesn't technically
implement it that way.
How about doing it like in Python, where there is __add__
and __radd__
?
And the engine could call $op1->add($op2) if $op1 is an object and add()
is implemented, or otherwise call $op2->rightAdd($op1) if $op2 is an object
and rightAdd()
is implemented, or otherwise fail with an error.
We could have (distinct) interfaces for both add()
and rightAdd()
.
Or use magic methods like __add()
and __rightAdd()
to allow stricter
types instead of mixed
on the other operand. I think there is no extra
complexity for the engine by using magic methods or interfaces.
--
Alex
On Sat, Sep 14, 2024 at 11:51 PM Jordan LeDoux jordan.ledoux@gmail.com
wrote:
Hello internals,
This discussion will use my previous RFC as the starting point for
conversation: https://wiki.php.net/rfc/user_defined_operator_overloadsThere has been discussion on list recently about revisiting the topic of
operator overloads after the previous effort which I proposed was declined.
There are a variety of reasons, I think, this is being discussed, both on
list and off list.
As time has gone on, more people have come forward with use cases.
Often they are use cases that have been mentioned before, but it has become
more clear that these use cases are more common than was suggested
previously.Several voters, contributors, and participants have had more time
(years now) to investigate and research some of the related issues, which
naturally leads to changes in opinion or perspective.PHP has considered and been receptive toward several RFCs since my
original proposal which update the style of PHP in ways which are congruent
with the KIND of language that has operator overloads.I mentioned recently that I would not participate in another operator
overload RFC unless I felt that the views of internals had become more
receptive to the topic, and after some discussion with several people
off-list, I feel that it is at least worth discussing for the next version.Operator overloads has come up as a missing feature in several discussions
on list since the previous proposal was declined. This includes:[RFC] [Discussion] Support object type in BCMath 1
Native decimal scalar support and object types in BcMath 2
Custom object equality 3
pipes, scalar objects and on? 4
[RFC][Discussion] Object can be declared falsifiable 5
The request to support comparison operators (>, >=, ==, !=, <=, <, <=>)
has come up more frequently, but particularly in discussion around linear
algebra, arbitrary precision mathematics, and dimensional numbers (such as
currency or time), the rest of the operators have also come up.Typically, these use cases are themselves very niche, but the capabilities
operator overloads enable would be much more widely used. From discussion
on list, it seems likely that very few libraries would need to implement
operator overloads, but the libraries that do would be well used and thus
MANY devs would be consumers of operator overloads.I want to discuss what changes to the previous proposal people would be
seeking, and why. The most contentious design choice of the previous
proposal was undoubtedly theoperator
keyword and the decision to make
operator overload implementations distinct from normal magic methods. For
some of the voters who voted yes on the previous RFC, this was a "killer
feature" of the proposal, while for some of the voters who voted no it was
the primary reason they were against the feature.There are also several technical and tangentially related items that are
being worked on that would be necessary for operator overloads (and were
originally included in my implementation of the previous RFC). This
includes:
Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand
position can be preserved during ALL comparisons.Updating ZEND_UNCOMPARABLE such that it has a value other than -1, 0,
or 1 which are typically reserved during an ordering comparison.Allowing values to be equatable without also being orderable (such as
with matrices, or complex numbers).These changes could and should be provided independent of operator
overloads. Gina has been working on a separate RFC which would cover all
three of these issues. You can view the work-in-progress on that RFC here:
https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.mdI hope to start off this discussion productively and work towards
improving the previous proposal into something that voters are willing to
pass. To do that, I think these are the things that need to be discussed in
this thread:
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do you
feel that way?Should the capability to overload comparison operators be provided in
the same RFC, or would it be better to separate that into its own RFC? Why
do you feel that way?Do you feel there were any glaring design weaknesses in the previous
RFC that should be addressed before it is re-proposed?Do you feel that there is ANY design, version, or implementation of
operator overloads possible that you would support and be in favor of,
regardless of whether it matches the approach taken previously? If so, can
you describe any of the core ideas you feel are most important?Jordan
External Links:
I'm not experienced with other languages and overloading, so consider this
reply as me not knowing enough about the subject. Rowan asked an
interesting question: "Are we over-riding operators or operations?"
which made me think about behaviors as a 3rd alternative. Instead of
individual operator overloading, could classes define how they would act as
certain primitives or types that have overloading under the hood? We have
Stringable
with __toString
, which might not be the best example but
does point in a similar direction. I don't know if this is a direction
worth exploring but wanted to at least bring it up.
interface IntBehavior {
public function asInt(): int;
}
class PositiveInt implements IntBehavior {
public readonly int $value;
public function __construct(int $value) {
$this->value = max(0, $value);
}
public function asInt(): int {
return $this->value;
}
}
var_dump(10 + new PositiveInt(5)); // 15
var_dump(new PositiveInt(10) + 15); // 25
var_dump(new PositiveInt(100) + new PositiveInt(100)); // 200
// leaves it to the developer to do:
$number = new PositiveInt(new PositiveInt(10) + 5);
On Sat, Sep 14, 2024 at 11:51 PM Jordan LeDoux jordan.ledoux@gmail.com
wrote:Hello internals,
This discussion will use my previous RFC as the starting point for
conversation: https://wiki.php.net/rfc/user_defined_operator_overloadsThere has been discussion on list recently about revisiting the topic of
operator overloads after the previous effort which I proposed was declined.
There are a variety of reasons, I think, this is being discussed, both on
list and off list.
As time has gone on, more people have come forward with use cases.
Often they are use cases that have been mentioned before, but it has become
more clear that these use cases are more common than was suggested
previously.Several voters, contributors, and participants have had more time
(years now) to investigate and research some of the related issues, which
naturally leads to changes in opinion or perspective.PHP has considered and been receptive toward several RFCs since my
original proposal which update the style of PHP in ways which are congruent
with the KIND of language that has operator overloads.I mentioned recently that I would not participate in another operator
overload RFC unless I felt that the views of internals had become more
receptive to the topic, and after some discussion with several people
off-list, I feel that it is at least worth discussing for the next version.Operator overloads has come up as a missing feature in several
discussions on list since the previous proposal was declined. This includes:[RFC] [Discussion] Support object type in BCMath 1
Native decimal scalar support and object types in BcMath 2
Custom object equality 3
pipes, scalar objects and on? 4
[RFC][Discussion] Object can be declared falsifiable 5
The request to support comparison operators (>, >=, ==, !=, <=, <, <=>)
has come up more frequently, but particularly in discussion around linear
algebra, arbitrary precision mathematics, and dimensional numbers (such as
currency or time), the rest of the operators have also come up.Typically, these use cases are themselves very niche, but the
capabilities operator overloads enable would be much more widely used. From
discussion on list, it seems likely that very few libraries would need to
implement operator overloads, but the libraries that do would be well used
and thus MANY devs would be consumers of operator overloads.I want to discuss what changes to the previous proposal people would be
seeking, and why. The most contentious design choice of the previous
proposal was undoubtedly theoperator
keyword and the decision to make
operator overload implementations distinct from normal magic methods. For
some of the voters who voted yes on the previous RFC, this was a "killer
feature" of the proposal, while for some of the voters who voted no it was
the primary reason they were against the feature.There are also several technical and tangentially related items that are
being worked on that would be necessary for operator overloads (and were
originally included in my implementation of the previous RFC). This
includes:
Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand
position can be preserved during ALL comparisons.Updating ZEND_UNCOMPARABLE such that it has a value other than -1, 0,
or 1 which are typically reserved during an ordering comparison.Allowing values to be equatable without also being orderable (such as
with matrices, or complex numbers).These changes could and should be provided independent of operator
overloads. Gina has been working on a separate RFC which would cover all
three of these issues. You can view the work-in-progress on that RFC here:
https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.mdI hope to start off this discussion productively and work towards
improving the previous proposal into something that voters are willing to
pass. To do that, I think these are the things that need to be discussed in
this thread:
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do you
feel that way?Should the capability to overload comparison operators be provided in
the same RFC, or would it be better to separate that into its own RFC? Why
do you feel that way?Do you feel there were any glaring design weaknesses in the previous
RFC that should be addressed before it is re-proposed?Do you feel that there is ANY design, version, or implementation of
operator overloads possible that you would support and be in favor of,
regardless of whether it matches the approach taken previously? If so, can
you describe any of the core ideas you feel are most important?Jordan
External Links:
I'm not experienced with other languages and overloading, so consider this
reply as me not knowing enough about the subject. Rowan asked an
interesting question: "Are we over-riding operators or operations?"
which made me think about behaviors as a 3rd alternative. Instead of
individual operator overloading, could classes define how they would act as
certain primitives or types that have overloading under the hood? We have
Stringable
with__toString
, which might not be the best example but
does point in a similar direction. I don't know if this is a direction
worth exploring but wanted to at least bring it up.interface IntBehavior { public function asInt(): int; } class PositiveInt implements IntBehavior { public readonly int $value; public function __construct(int $value) { $this->value = max(0, $value); } public function asInt(): int { return $this->value; } } var_dump(10 + new PositiveInt(5)); // 15 var_dump(new PositiveInt(10) + 15); // 25 var_dump(new PositiveInt(100) + new PositiveInt(100)); // 200 // leaves it to the developer to do: $number = new PositiveInt(new PositiveInt(10) + 5);
I actually did explore something like this during my initial design phases
before ever bringing it up on list the first time. I decided that it was
certainly useful, and perhaps even something I would also want, but did not
solve the problem I was trying to solve.
The problem I was trying to solve involved lots of things that cannot be
represented well by primitive types (which is presumably why they are
classes in the first place). Things like Complex Numbers, Matrices, or
Money. Money can be converted to a float of course (or an int depending on
implementation), but Money does not want to be added with something like
request count, which might also be an int. Or if it does, it probably wants
to know exactly what the context is. There are lots of these kinds of value
classes that might be representable with scalars, but would lose a lot of
their context and safety if that is done.
On the other hand, Money would probably not want to be multiplied with
other Money values. What would Money squared mean exactly? Things like this
are very difficult to control for if all you provide is a way to control
casting to scaar types.
Jordan
The problem I was trying to solve involved lots of things that cannot be
represented well by primitive types (which is presumably why they are
classes in the first place). Things like Complex Numbers, Matrices, or
Money. Money can be converted to a float of course (or an int depending
on implementation), but Money does not want to be added with something
like request count, which might also be an int. Or if it does, it
probably wants to know exactly what the context is. There are lots of
these kinds of value classes that might be representable with scalars,
but would lose a lot of their context and safety if that is done.
Even plain addition with plain numbers can be fraught. Cardinal numbers
(1,2,3...) can be added to and subtracted from each other. Ordinal
numbers (1st,2nd,3rd...), on the other hand, cannot be added together
and subtracting one from another results in an interval, not a number
(which means that addition involving ordinal numbers means things like
ordinal+interval=ordinal).
Then there are quantities. Quantities of arbitrary dimension (length,
duration, monetary value...) can be multiplied, resulting in a quantity
of a new dimension, but only quantities of the same dimension can be added.
On the other hand, Money would probably not want to be multiplied with
other Money values. What would Money squared mean exactly? Things like
this are very difficult to control for if all you provide is a way to
control casting to scaar types.
While I can't think off the top of my head of a case where money
quantities might be multiplied by other money quantities, I can think of
situations where one might want to divide them...
Should the next version of this RFC use the
operator
keyword, or
should that approach be abandoned for something more familiar? Why do
you feel that way?Should the capability to overload comparison operators be provided
in the same RFC, or would it be better to separate that into its own
RFC? Why do you feel that way?Do you feel there were any glaring design weaknesses in the
previous RFC that should be addressed before it is re-proposed?Do you feel that there is ANY design, version, or implementation of
operator overloads possible that you would support and be in favor of,
regardless of whether it matches the approach taken previously? If so,
can you describe any of the core ideas you feel are most important?
Hello Jordan,
Happy you are following up on operator overloads, as I was sad to see
the vote fail last time.
I think the RFC might benefit from focusing on the comparison operators
and the basic arithmetic operators this time, so ==, <=>, +, -, *, /
(and maybe % and **). I would especially leave out the bitwise operators
(for a possible future RFC), as those to me seem extra niche and not
very self-explanatory in terms of good use cases/examples. ==, <=>, +,
-, * and / would deliver almost all the benefits to operator overloading
I can currently think of.
Giving more concrete examples in the RFC of places in the current PHP
ecosystem where these operators would simplify code might be helpful -
the last RFC had mainly a generic list of use cases, but seeing actual
code would help to make it salient how some code can be a lot more
readable, especially if you now know about even more use cases than 3
years ago.
Otherwise I am hoping that more opponents of operator overloads will
chime in and give some constructive feedback.
Hello internals,
This discussion will use my previous RFC as the starting point for
conversation: https://wiki.php.net/rfc/user_defined_operator_overloads
Replying to the top to avoid dragging any particular side discussion into this...
I've seen a few people, both in this thread and previously, make the argument that "operator overloads are only meaningful for math, and seriously no one does fancy math in PHP, so operator overloads are not necessary." This statement is patently false, and I would ask that everyone stop using it as it is simply FUD. I don't mean just the "no one does fancy math in PHP", but "operators are only meaningful for math" is just an ignorant statement.
As someone asked for use cases, here's a few use cases for operator overloads that do not fall into the category of fancy numeric math. Jordan, feel free to borrow any of these verbatim for your next RFC draft if you wish. (I naturally haven't tried running any of them, so forgive any typos or bugs. It's just to get the point across of each use case.)
Pathlib
As I mentioned previously, Python's pathlib uses / to join different path fragments together. In PHP, one could implement such a library very simply with overloads. (A not-late-night implementation would probably be more performant than this, but it's just a POC.)
class Path implements Stringable
{
private array $parts;
public function __construct(?string $path = null) {
$this->parts = array_filter(explode('/', $path ?? ''));
}
public static function fromArray(array $parts): self {
$new = new self();
$new->parts = $parts;
return $new;
}
public function __toString() {
return implode('/', $this->parts);
}
operator /(Path|string $other, OperandPosition $pos): Path {
if ($other instanceof Path) {
$other = (string)$other;
}
$otherParts = array_filter(explode('/', $path));
return match ($pos) {
OperandPosition::LeftSide => self::fromArray([...$this->parts, ...$otherParts]),
OperandPosition::RightSide => self::fromArray([...$otherParts, ...$this->parts]),
};
}
}
$p = new Path('/foo/bar');
$p2 = $p / 'beep' / 'narf/poink';
Collections
In my research into collections in other languages, I found it was extremely common for collections to have operator overloads on them. Rather than repeat it here, I will just link to my results and recommendations for what operators would make sense for what operation:
Enum sets
Ideally, we would just use generic collections for this directly. However, even without generics, bitwise overloads would allow for this to be implemented fairly easily for a given enum. (Again, a smarter implementation is likely possible with actual effort.)
enum Perm {
case Read;
case Write;
case Exec;
}
class Permissions {
private array $cases = [];
public function __construct(Perm ...$perms) {
foreach ($perms as $case) {
$this->cases[$case->name] = 1;
}
}
operator +(Perm $other, OperandPosition $pos): Permissions {
$new = clone($this);
$new->cases[$other->name] = 1;
return $new;
}
operator +(Perm $other, OperandPosition $pos): Permissions {
$new = clone($this);
unset($new->cases[$other->name]);
return $new;
}
operator |(Permissions $other, OperandPosition $pos): Permissions {
$new = clone($this);
foreach ($other->cases as $caseName => $v) {
$new->cases[$caseName] = 1;
}
return $new;
}
operator &(Permission $other, OperandPosition $pos): Permissions {
$new = new self();
$new->cases = array_key_intersect($this->cases, $other->cases);
return $new;
}
// Not sure what operator makes sense here, so punting as this is just an example.
public function has(Perm $p): bool {
return array_key_exists($this->cases, $p->name);
}
}
$p = new Permissions(Perm::Read);
$p2 = $p + Perm::Exec;
$p3 = $p2 | new Permissions(Perm::Write);
$p3 -= Perm::Exec;
$p3->has(Perm::Read);
Function composition
I have long argued that PHP needs both a pipe operator and a function composition operator. It wouldn't be ideal, but something like this is possible. (Ideally we'd use the string concat operator here, but the RFC doesn't show it. It would be a trivial change to use instead.)
class Composed {
/** @var \Closure[] */
private array $steps = [];
public function __construct(?\Closure $c = null) {
$this->steps[] = $c;
}
private static function fromList(array $cs): self {
$new = new self();
$new->steps = $cs;
return $new;
}
public function __invoke(mixed $arg): mixed {
foreach ($this->steps as $step) {
$arg = $step($arg);
}
return $arg;
}
operator +(\Closure $other, OperandPosition $pos): self {
return match ($pos) {
OperandPosition::LeftSide => self::fromArray([...$this->steps, $other]),
OperandPosition::RightSide => self::fromArray([$other, ...$this->steps]),
};
}
}
$fun = new Composed()
- someFunc(...)
- $obj->someMethod(...)
- fn(string $a) => $a . ' (archived)'
- strlen(...);
$fun($input); // Calls each closure in turn.
Note that there are a half-dozen libraries in the wild that do something akin to this, just much more clumsily, including in Laravel. The code above would be vastly simpler and easier to maintain and debug.
Units
Others have mentioned this before, but to make clear what it could look like:
abstract readonly class MetricDistance implements MetricDistance {
protected int $factor = 1;
public function __construct(private int $length) {}
public function +(MetricDistance $other, OperandPos $pos): self {
return new self(floor(($this->length * $this->factor + $other->length * $other->factor)/$this->factor));
}
public function -(MetricDistance $other, OperandPos $pos): self {
return match ($pos) {
OperandPosition::LeftSide => new self(floor(($this->length * $this->factor - $other->length * $other->factor)/$this->factor)),
OperandPosition::RightSide => new self(floor($other->length * $other->factor - $this->length * $this->factor)/$this->factor)),
};
}
public function __toString(): string {
return $this->length;
}
}
readonly class Meters extends MetricDistance {
protected int $factor = 1;
}
readonly class Kilometers extends MetricDistance {
protected int $factor = 1000;
}
$m1 = new Meters(500);
$k1 = new Kilometers(3);
$m1 += $k1;
print $m1; // prints 3500
$m1 + 12; // Error. 12 what?
There's likely a bug in the above somewhere, but it's late and it still gets the point across for now.
(Side note: The previous RFC supported abstract operator declarations, but not declarations on interfaces. That seems necessary for completeness.)
Date and time
DateTimeImmutable and DateInterval already do this, and they're not "fancy math."
I consider all of the above to be reasonable, viable, and useful applications of operator overloading, none of which are fancy or esoteric math cases. Others may dislike them, stylistically. That's a subjective question, so opinions can differ. But the viability of the above cases is not disputable, so the claim that operator overloading is too niche to be worth it is, I would argue, demonstrably false.
--Larry Garfield
Hello internals,
This discussion will use my previous RFC as the starting point for conversation:https://wiki.php.net/rfc/user_defined_operator_overloads
There has been discussion on list recently about revisiting the topic of operator overloads after the previous effort which I proposed was declined. There are a variety of reasons, I think, this is being discussed, both on list and off list.
As time has gone on, more people have come forward with use cases. Often they are use cases that have been mentioned before, but it has become more clear that these use cases are more common than was suggested previously.
Several voters, contributors, and participants have had more time (years now) to investigate and research some of the related issues, which naturally leads to changes in opinion or perspective.
PHP has considered and been receptive toward several RFCs since my original proposal which update the style of PHP in ways which are congruent with the KIND of language that has operator overloads.
I mentioned recently that I would not participate in another operator overload RFC unless I felt that the views of internals had become more receptive to the topic, and after some discussion with several people off-list, I feel that it is at least worth discussing for the next version.
Operator overloads has come up as a missing feature in several discussions on list since the previous proposal was declined. This includes:
[RFC] [Discussion] Support object type in BCMath 1
Native decimal scalar support and object types in BcMath 2
Custom object equality 3
pipes, scalar objects and on? 4
[RFC][Discussion] Object can be declared falsifiable 5
The request to support comparison operators (>, >=, ==, !=, <=, <, <=>) has come up more frequently, but particularly in discussion around linear algebra, arbitrary precision mathematics, and dimensional numbers (such as currency or time), the rest of the operators have also come up.
Typically, these use cases are themselves very niche, but the capabilities operator overloads enable would be much more widely used. From discussion on list, it seems likely that very few libraries would need to implement operator overloads, but the libraries that do would be well used and thus MANY devs would be consumers of operator overloads.
I want to discuss what changes to the previous proposal people would be seeking, and why. The most contentious design choice of the previous proposal was undoubtedly the
operator
keyword and the decision to make operator overload implementations distinct from normal magic methods. For some of the voters who voted yes on the previous RFC, this was a "killer feature" of the proposal, while for some of the voters who voted no it was the primary reason they were against the feature.There are also several technical and tangentially related items that are being worked on that would be necessary for operator overloads (and were originally included in my implementation of the previous RFC). This includes:
Adding a new opcode for LARGER and LARGER_OR_EQUAL so that operand position can be preserved during ALL comparisons.
Updating ZEND_UNCOMPARABLE such that it has a value other than -1, 0, or 1 which are typically reserved during an ordering comparison.
Allowing values to be equatable without also being orderable (such as with matrices, or complex numbers).
These changes could and should be provided independent of operator overloads. Gina has been working on a separate RFC which would cover all three of these issues. You can view the work-in-progress on that RFC here: https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md
I hope to start off this discussion productively and work towards improving the previous proposal into something that voters are willing to pass. To do that, I think these are the things that need to be discussed in this thread:
Should the next version of this RFC use the
operator
keyword, or should that approach be abandoned for something more familiar? Why do you feel that way?Should the capability to overload comparison operators be provided in the same RFC, or would it be better to separate that into its own RFC? Why do you feel that way?
Do you feel there were any glaring design weaknesses in the previous RFC that should be addressed before it is re-proposed?
Do you feel that there is ANY design, version, or implementation of operator overloads possible that you would support and be in favor of, regardless of whether it matches the approach taken previously? If so, can you describe any of the core ideas you feel are most important?
Jordan
External Links:
I haven't read much of the discussion, because it is at 40 emails long already.
However, as someone that used to be vehemently against operator overloading (because of C++ whackyness) and has come around I will mention the following.
PHP supports operator overloading already, be that overloading the array access notation via the ArrayAccess interface, overloading comparison operators since at least PHP 5.3, or overloading binary operations since PHP 5.6 via the Internal operator overloading and GMP improvements RFC. 1
Yes the latter two may be reserved to internal classes, but it is extremely easy to expose this internal mechanism to userland by using a custom extension.
Therefore, userland has access to this capability in PHP today just in a very clunky way.
Will exposing this capability in an "easier" fashion to userland lead to people abusing it? Possibly, but PHP doesn't need the help of userland for abusing operators.
Indeed the + operator works with arrays, and is not commutative, which is highly surprising.
This leads to a rather insane situation where the engine does not assume + is always commutative but does so for *, something which makes no sense as multiplication or product operations are (in the grand scheme of mathematics) rarely commutative.
So if an extension wants to provide operator overloading for vectors or matrices the current situation encourages using + for multiplication, and * for addition, which is bonkers.
I am also very much in favour of using the sigils rather than names for overloading the operators.
But if people do not want the operator
keyword, or have some aspects be exposed via interfaces (like Comparable/Ordable) than we should find a way to be able to write:
public static function ==(MandatoryType $left, MandatoryType $right) {}
public static function <=>(MandatoryType $left, MandatoryType $right) {}
The other thing, which is different from the previous proposal, and becomes more relevant if we use functions, is that I think that they should be static methods that take the left and right operands instead of assuming it is on the left and/or needing a boolean argument indicating if it is on the left or not.
I am still planing on going through an overhaul of PHP comparison semantics, as indicated by my existing draft, but this might take a while.
Best regards,
Gina P. Banyard
This leads to a rather insane situation where the engine does not assume + is always commutative but does so for *, something which makes no sense as multiplication or product operations are (in the grand scheme of mathematics) rarely commutative.
Gosh, I had completely forgotten about ZEND_TRY_BINARY_OBJECT_OPERATION.
So I withdraw my statement about the only reasonable way to implement
operator overloading in PHP[1], and state that there is no reasonable
way to implement operator overloading in PHP at all (excluding
comparison and equality operations), since apparently, we cannot even
get the only two internal bundled classes to properly handle this.
Cf. https://3v4l.org/gksqI/rfc#vgit.master and
https://3v4l.org/o5Uhh/rfc#vgit.master.
[1] https://externals.io/message/125550#125621
Christoph
This leads to a rather insane situation where the engine does not assume + is always commutative but does so for *, something which makes no sense as multiplication or product operations are (in the grand scheme of mathematics) rarely commutative.
Gosh, I had completely forgotten about ZEND_TRY_BINARY_OBJECT_OPERATION.
So I withdraw my statement about the only reasonable way to implement
operator overloading in PHP[1], and state that there is no reasonable
way to implement operator overloading in PHP at all (excluding
comparison and equality operations), since apparently, we cannot even
get the only two internal bundled classes to properly handle this.
Cf. https://3v4l.org/gksqI/rfc#vgit.master and
The issue here is that GMP does not return FAILURE when it does not support an operand, but instead throws an exception, which blocks any polymorphic handling.
Being currently knees deep in the implementation of GMP it does some questionable stuff and should be refactored.
However, the fact that if only one operand is an object and returns FAILURE and the binary operation can still succeed is a bit of an issue which needs to be address regardless IMHO.
Best regards,
Gina P. Banyard