Hello internals,
I last brought this RFC up for discussion in August, and there was
certainly interesting discussion. Since then there have been many
improvements, and I'd like to re-open discussion on this RFC. I mentioned
in the first email to the list that I was planning on taking a while before
approaching a vote, however the RFC is much closer to vote-ready now, and
I'd like to open discussion with that in mind.
RFC Link: https://wiki.php.net/rfc/user_defined_operator_overloads
There is a patch for this RFC, however the latest commits are not playable.
It will build, but with various problems which are being worked on related
to enums. The last playable commit can be found by checking out this commit:
https://github.com/JordanRL/php-src/commit/e044f53830a9ded19f7c16a9542521601ac3f331
This commit however does not have the enum for operator position described
in the RFC. It uses a bool instead with true being the left side, and false
being the right side.
Implementation details still left:
- There are issues related to opcache/JIT still, so if you want to play
around with the playable commit disable both. - Reflection has not been updated, but the proposed updates necessary are
described in the RFC.
It is a long RFC, but operator overloads are a complicated topic if done
correctly. Please review the FAQ section before asking a question, as it
covers many of the main objections or inquiries to the feature. I'd be
happy to expand on any of the answers there if prompted however.
Jordan
Hi Jordan,
Hello internals,
I last brought this RFC up for discussion in August, and there was
certainly interesting discussion. Since then there have been many
improvements, and I'd like to re-open discussion on this RFC.
In general I'm in favour of this RFC; a few months ago I was
programming something and operator overloads would have been a good
solution, but then I remembered I was using PHP, and they haven't been
made possible yet.
However.....I think the new 'operator' keyword is probably not the way
to go. Although it's cute, it has some significant downsides.
There are quite a few downstream costs for making a new type of
methods in classes. All projects that analyze code (rector,
codesniffer, PHPStorm, PhpStan/Psalm, PHPUnit's code coverage
annotations etc) would have to add a non-trivial amount of code to not
bork when reading the new syntax. Requiring more code to be added and
maintained in PHP's builtin Reflection extension is also a cost.
That's quite a bit of work for a feature that has relatively rare
use-cases.
I just don't agree/understand with some of the reasoning in the RFC of
why using symbols is preferable.
"In such a situation, using magic methods would not be desired, as any
combination of symbols may be used for the new infix. The restrictions
on function names, such as needing to reserve the & to mark a function
as being by-reference, would place limitations on such future scope."
I don't get this. The magic methods in previous drafts of the RFC
don't have a problem with & as the methods are named with 'two
underscores' + name e.g. __bitwiseAnd. That does't appear to cause a
problem with an ampersand?
"By representing the implementations by the symbols themselves, this
RFC avoids forcing implementations to be mislabeled with words or
names which do not match the semantic meaning of that symbol in the
program context.
The name of the function (e.g. __add) always refers to the symbol used
where it is used, not what it is doing.
If the code is $a + $b
then that is an addition operator, when the
code is read. If I was reading the code, and I saw that either $a or
$b were objects, I would know to go looking for an __add magic method.
" '// This function unions, it does not add'"
Then that is probably an example of an inappropriate use of operator
overloads, and so shouldn't be used as a justification for a syntax
choice.
"Non-Callable - Operand implementations cannot be called on an
instance of an object the way normal methods can."
I think this is just wrong, and makes the RFC unacceptable to me.
Although most of the code I write is code that just performs
operations as I see fit, some of the time the operations need to be
driven by user data. Even something simple like a
calculator-as-a-service would need to call the operations dynamically
from user provided data.
I also have an aesthetic preference when writing tests to be explicit
as possible, rather than concise as possible e.g.
$foo->__add(5, OperandPosition::LeftSide);
$foo->__add(5, OperandPosition::RightSide);
instead of:
$foo + 5;
5 + $foo
As I find that easier to reason about.
cheers
Dan
Ack
/congratulations on stunning the audience into silence otherwise though.
Hi Jordan,
Hello internals,
I last brought this RFC up for discussion in August, and there was
certainly interesting discussion. Since then there have been many
improvements, and I'd like to re-open discussion on this RFC.In general I'm in favour of this RFC; a few months ago I was
programming something and operator overloads would have been a good
solution, but then I remembered I was using PHP, and they haven't been
made possible yet.
I too far prefer this RFC to its predecessors, and hope it passes in some form.
However.....I think the new 'operator' keyword is probably not the way
to go. Although it's cute, it has some significant downsides.There are quite a few downstream costs for making a new type of
methods in classes. All projects that analyze code (rector,
codesniffer, PHPStorm, PhpStan/Psalm, PHPUnit's code coverage
annotations etc) would have to add a non-trivial amount of code to not
bork when reading the new syntax. Requiring more code to be added and
maintained in PHP's builtin Reflection extension is also a cost.
That's quite a bit of work for a feature that has relatively rare
use-cases.I just don't agree/understand with some of the reasoning in the RFC of
why using symbols is preferable."In such a situation, using magic methods would not be desired, as any
combination of symbols may be used for the new infix. The restrictions
on function names, such as needing to reserve the & to mark a function
as being by-reference, would place limitations on such future scope."I don't get this. The magic methods in previous drafts of the RFC
don't have a problem with & as the methods are named with 'two
underscores' + name e.g. __bitwiseAnd. That does't appear to cause a
problem with an ampersand?"By representing the implementations by the symbols themselves, this
RFC avoids forcing implementations to be mislabeled with words or
names which do not match the semantic meaning of that symbol in the
program context.The name of the function (e.g. __add) always refers to the symbol used
where it is used, not what it is doing.If the code is
$a + $b
then that is an addition operator, when the
code is read. If I was reading the code, and I saw that either $a or
$b were objects, I would know to go looking for an __add magic method." '// This function unions, it does not add'"
Then that is probably an example of an inappropriate use of operator
overloads, and so shouldn't be used as a justification for a syntax
choice.
I believe the intent is for things like dot product.
int * int
This is known as "multiplication", or we could abbreviate it __mul if we felt like it.
vector * vector
This is known as a "Dot product", or "scalar product", and is not really the same as multiplication. It uses effectively the same operator sigil, though.
There's also cross-product, which nominally uses x
as a symbol in mathematics. Which is... often also translated to * in code, but is a very different operation and is not multiplication as we know it, nor is it the same as dot product.
So __mul() would be an incorrect name for either one. Using symbols, however, would (with some future extension to make it extensible) allow for:
operator *(Vector $v, $left) { ... }
operator x(Vector $v, $left) { ... }
which (for someone who knows vector math) is a lot more self-explanatory than "which one is being mistranslated as multiply?"
At least, that's my understanding of the argument for it. The point about making meta programming more difficult is certainly valid, though. Personally I think I could go either way on this one, as I see valid arguments either direction.
"Non-Callable - Operand implementations cannot be called on an
instance of an object the way normal methods can."
I think this is just wrong, and makes the RFC unacceptable to me.
Although most of the code I write is code that just performs
operations as I see fit, some of the time the operations need to be
driven by user data. Even something simple like a
calculator-as-a-service would need to call the operations dynamically
from user provided data.
I largely agree here. I don't know if it's because of the operator
choice or not, but being able to call an operator dynamically is important in many use cases. It doesn't have to be a pristine syntax, but some way to do that dynamically (without having a big match statement everywhere you need to) would be very welcome.
Another question: Can an interface or abstract class require an operator to be implemented? That's not currently discussed at all. (I would expect the answer to be Yes.)
--Larry Garfield
Using symbols, however, would (with some future extension to make it extensible) allow for:
I don't get how it's easier, other than being able to skip naming the
symbol name. e.g. adding union and intersection operators
function __union(...){}
function __intersection(...){}
vs
operator ∪(...){}
operator ∩(...){}
In fact, I find one of those quite a bit easier to read...
Larry Garfield wrote:
It uses effectively the same operator sigil, though.
Yes, that's what I was trying to say.
Danack wrote:
The name of the function (e.g. __add) always refers to the symbol used
where it is used, not what it is doing.
If the naming is taken from the sigil, then it's always appropriate.
So if operator * has the magic method __asterisk instead of __mul, it
avoids any suggestion of what the operation actually means for the
object.
btw, I don't really care about this naming problem. My concern is that
it's being used as a reason for introducing a special new type
function, when it's really not a big enough problem to deserve making
the language have special new syntax.
cheers
Dan
Ack
Using symbols, however, would (with some future extension to make it extensible) allow for:
I don't get how it's easier, other than being able to skip naming the
symbol name. e.g. adding union and intersection operatorsfunction __union(...){}
function __intersection(...){}vs
operator ∪(...){}
operator ∩(...){}In fact, I find one of those quite a bit easier to read...
If the list of operators is expanded by the engine, yes. The point is that IF it were decided in the future to allow user-space defined operators, that would be considerably easier with a separate keyword. Eg:
class Matrix {
operator dot(Matrix $other, bool $left) {
// ...
}
}
$result = $m1 dot $m2;
Whether that is something we want to do is another question, but the operator keyword makes that logistically easy, while using __mul or __astrisk makes it logistically hard.
Using an attribute instead to bind a method to an operator, as previously suggested, would also have that flexibility if we ever wanted it. There seems to be a lot of backpressure against using attributes for such things, though, and it wouldn't cleanly self-resolve the issue of keywords that make sense on methods being nonsensical on operators. (public, static, etc.). I'd probably be fine with it myself, but I cannot speak for others.
--Larry Garfield
If the list of operators is expanded by the engine, yes. The point is that IF it were decided in the future to allow user-space defined operators, that would be considerably easier with a separate keyword.
A real-life example of this approach would be PostgreSQL, where a
user-defined operator can be (almost) any combination of + - * / < > = ~
! @ # % ^ & | ` ?
It would be possible to have an open-ended naming scheme for these,
such as "function __atSign_leftAngle" for the operator @> (which
conventionally means "contains" in PostgreSQL) but it would be rather
awkward compared to "operator &>" or "#[Operator('&>')]".
Regards,
--
Rowan Tommins
[IMSoP]
Danack wrote:
btw, I don't really care about this naming problem. My concern is that
it's being used as a reason for introducing a special new type
function, when it's really not a big enough problem to deserve making
the language have special new syntax.
Danack wrote:
I think you've taken the position that using the symbols are cool, and
you're reasoning about how the RFC should operate from decision.
Ah, I see. That's a more fundamental objection than the technicals, I
think. It sort of implies that any arguments I provide are justifications
rather than arguments, which makes it difficult to have a productive
conversation about it. You expressed a similar concern about your efforts
to present arguments to me, which makes sense if this is your fundamental
concern.
First, let me start off by saying that I fully acknowledge and document in
the RFC that it is possible to provide a perfectly workable version of
this RFC without the operator keyword. I mention as much in the RFC. If
that is a true blocker for voters, I would at least consider it. However, I
do believe that's the incorrect decision. Not because it's "cool". The code
that handles the parsing of the new keyword is the only part of this RFC
that I didn't write from scratch, it was contributed by someone more
familiar with the parser. I feel like I could hardly have the "coolness" of
the work being my motivating factor when I did not in fact write that part
of the code.
But I do understand the concern. Adding complexity without reason is
foolish, particularly on a project that impacts many people and is
maintained by volunteers. As I immediately told you, I don't think your
concern is without merit, and I don't think it's something that should be
dismissed. But I clearly have (still) done a poor job communicating what I
perceive as the factors that outweigh this concern. It's not that I think
the concern is invalid or that it's small, it's that I view other things as
being an acceptable tradeoff. So I'll attempt one more time to communicate
why.
Forwards Compatibility
Other replies have touched on this, and the RFC talks about this too, but
perhaps the language used has been skipping a couple of steps. This is, by
far, the biggest driving factor for why I believe the operator keyword is
the correct decision, so I will spend most of my time here.
There are two main kinds of forward compatibility achieved with a new
keyword that are difficult to achieve with magic methods: compatibility
with arbitrary symbol combinations, and behavior modifiers that can be
scoped only to operators. You mention that the symbols could be replaced
with their symbol names in english, which avoids the issue of misnaming the
functions. But this would still require the engine to specifically support
every symbol combination that is allowed.
Now, in this RFC I am limiting overloads to not only symbols which are
already used, but to a specific subset of them which are predetermined.
This is for several reasons:
- The PHP developer community will have no direct experience with operator
overloads unless they have experience with another language such as C# or
python which supports them. Giving developers an initial set of operators
that address 90% of use cases but are limited allows the PHP developer
community time to learn and experiment with the feature while avoiding some
of the most dangerous possible misuses, such as accidentally redefining the
&& operator in a way that breaks boolean algebra. - This reduces the change necessary to the VM, to class entries, and to
the behavior of existing opcodes. This PR is already very large, and I
wanted to make sure that it wasn't impossible for the people who
participate here on their own time to actually consider the changes being
suggested. - I am already aware of several people within internals that believe any
version of this feature will result in uncontrolled chaos in PHP codebases.
I think this is false, as I do not see that kind of uncontrolled chaos in
the languages which do have this feature. However I would think that
allowing arbitrary overloads would increase that proportion. - This is limited to operator combinations with objects, which ALL
currently result in an error. That means there is no code that was working
on PHP 8.1 that will break with this included, as all such code currently
results in a fatal error. The current error is even the parent class of the
error after this RFC, so even the catch blocks, if they currently exist
in PHP codebases, should continue to work as before.
However, once a feature is added it is very difficult to change it. Not
only for backward compatibility reasons, but for the sheer inertia of the
massive impact that PHP has. I do not plan on ever proposing that arbitrary
symbol combinations be allowed for overloads myself. But I cannot possibly
know what internals might think of that possibility 10 years from now when
this feature has been in widespread usage for a long time. Using magic
methods makes it extremely difficult at any point in the future to allow
PHP developers the option of an overload for say +=+. What would such a
magic method be? __plus_equals_plus()? With some kind of magic in the
compiler to rename symbols in certain circumstances?
That sounds far less maintainable to me. It seems more likely that even
if it were a desired feature 10 years from now, it would be something that
would be extremely difficult to implement, maintain, and pass.
I also elaborate in the RFC as to why I think allowing operator specific
method modifiers is a very powerful bit of forwards compatibility as well.
Method modifiers simply result in a change to the function flags mask,
which is an extremely low cost lookup, which makes it very easy to
implement such features in the future if they are desired. I want to make
sure that once included, this feature doesn't result in a dead-end
implementation that boxes internals out of improvements that can be made
moving forward. I think that this is something that is far easier to do
with the operator keyword than it is with magic methods.
Code That Promotes Correct Usage
Enums, as an example, are classes. Internally, they are classes in most
respects. So why is a new keyword for enums useful? Not only for many of
the same reasons listed above, but also because it is useful for the
language to communicate to the developer that a certain thing should be
treated differently, even if it shares a syntax. The fact that PHP
developers can see that enums are different from classes in their code is
not a trivial and unimportant matter.
In the same way, operator overloads are methods. Internally, they are
methods in most respects. But it is useful for the language to
communicate that these methods will change engine behavior. It is
useful for it to communicate that they should be treated differently. The
fact that PHP developers will be able to see that operators are different
from methods will help avoid some of the concerns people have with misuse.
It will communicate that these are areas where new maxims and new habits
should apply, that new things must be learned and new rules followed.
This may seem like such an esoteric suggestion to some, but it follows from
an entire field of study: human-centered design. This is a rigorous field
which explores how technology can be designed to be used correctly.
Acceptance Of Restrictions
We can, of course, place restrictions on how operator overloads are used
when we are concerned about causing trouble. But such restrictions will
generate frustration and opposition in some circumstances. Enums are
another great example. Methods on enums are simply not allowed to do things
that will mutate the object. The engine simply prohibits it. This makes a
lot of sense for enums, but would such restrictions be possible if enums
were simply classes which have cases within them? Technically, certainly it
would be possible. But while I do not hear a lot of PHP developers
complaining about having method behavior restricted in enums, I expect that
there would be a lot of this unnecessary noise if instead PHP developers
saw them as "classes which have cases".
The fact that they are marked as a distinct construct simply makes such
restrictions make more sense to the people who use them.
These are engine hooks. People should not be shoving lots of other logic
into operator overloads. They should always be returning a result, they
should nearly always be implemented immutably, they should document the
logic of interaction with the given operator and nothing more. They
shouldn't be directly called, because they should not contain the kind of
logic that you want to directly call.
One of these restrictions that I included in this RFC was that typing the
parameters is not optional. This is extremely useful for operator
overloads, because you must document all the types that your implementation
understands how to interact with, and the engine will simply not allow for
undetermined or uncertain values to be handled. This restriction would feel
very out of place to many in a function, because other PHP functions do not
behave this way. But for a new thing, with a new keyword that marks itself
as something separate? Well now it makes sense. New things have their own
rules. Just like the restrictions on enum classes.
--
I think these things outweigh the cost of adding a new keyword,
particularly a new keyword that is limited only to the class definition and
that has behavior and syntax that is substantially similar to something
developers are already familiar with. I truly believe this is the better
way of doing this feature, I would not suggest it otherwise. And while an
implementation that doesn't include this is possible and workable, I feel
it is suboptimal and limiting. I feel that it is more likely to result in
problematic usage, complaints, and buggy code from PHP developers.
This new keyword required very minimal changes to the parser, and no
changes to the compiler. I think this is an acceptable tradeoff for the
benefits it brings. That is the reason that I am arguing for it, and no
other reason. I'm sorry if it seems like I am not listening to what you are
saying. That is not the case, I take the feedback of others on this list
very seriously. It's just that you haven't yet brought up a point which I
haven't considered and personally decided was worth the benefits. I agree
this will result in changes for tooling. I accept that those changes will
be larger with a new keyword. I do not think that it is worth delivering an
inferior version of this feature that is more prone to error and misuse,
and is more restricted in future scope.
Jordan
Hi!
- I am already aware of several people within internals that believe any
version of this feature will result in uncontrolled chaos in PHP codebases.
I think this is false, as I do not see that kind of uncontrolled chaos in
the languages which do have this feature. However I would think that
allowing arbitrary overloads would increase that proportion.
Depends on how you define "uncontrolled chaos". I have encountered
toolkits where the authors think it's cute to define "+" to mean
something that has nothing to do with mathematical addition (go read the
manual for an hour to figure what it actually does) or for += to mean
something different than + and assignment. Some people even learn to
love it, so far I haven't.
However, once a feature is added it is very difficult to change it. Not
only for backward compatibility reasons, but for the sheer inertia of the
massive impact that PHP has. I do not plan on ever proposing that arbitrary
symbol combinations be allowed for overloads myself. But I cannot possibly
know what internals might think of that possibility 10 years from now when
this feature has been in widespread usage for a long time. Using magic
methods makes it extremely difficult at any point in the future to allow
PHP developers the option of an overload for say +=+. What would such a
That's awesome. The only thing worse than a toolkit author that thinks
it's cute to play with "+" is the one that thinks it's cute to invent
some random combination of special characters and assign it some meaning
that of course is obvious to the author, but unfortunately that's the
only person in existence to whom is it obvious. Some people enjoy the
code being a puzzle that you need to untangle and make a sherlockian
detective work to even begin to understand what is going on in this
code. Other people have work to do. And again, what's the intuitive
difference between operators +=+@-+ and ++--=!* ?
Of course, of course I know, every feature can be abused. The difference
here is that if it's hard to think about any use that wouldn't be an abuse.
That sounds far less maintainable to me. It seems more likely that even
if it were a desired feature 10 years from now, it would be something that
would be extremely difficult to implement, maintain, and pass.
I must notice "if" carries a lot of load here.
Stas Malyshev
smalyshev@gmail.com
There are quite a few downstream costs for making a new type of
methods in classes. All projects that analyze code (rector,
codesniffer, PHPStorm, PhpStan/Psalm, PHPUnit's code coverage
annotations etc) would have to add a non-trivial amount of code to not
bork when reading the new syntax. Requiring more code to be added and
maintained in PHP's builtin Reflection extension is also a cost.
That's quite a bit of work for a feature that has relatively rare
use-cases.
While I'm not suggesting that this isn't an important consideration for
voters with this RFC, (I think it should be weighed for sure), I think
that all of the things you mentioned will need similar updates to work
correctly with this RFC even if it was done with plain old magic methods
instead. Again, I'm not saying it's not important, but isn't this true of
many RFCs that add new syntax, such as enums or even things like array
unpacking?
As for Reflection, I hadn't gotten to that yet, but I actually pushed the
commit for it with full test coverage of the changes last night. It wasn't
actually that big of a change, and the latest commit on the PR can be
checked out and played if you want to test it. With support for the enum
casing as well. Operators get an additional function flag ZEND_ACC_OPERATOR
that made it fairly simple to implement the Reflection changes with minimal
additional code. The new methods I mentioned are actually smaller that the
implementations for the existing ones. As far as maintainability goes,
removing this aspect doesn't make this RFC more maintainable in core in my
opinion. It becomes harder to maintain it I think, as it requires more
special casing in other places that is more obtuse and obscure.
I don't get this. The magic methods in previous drafts of the RFC
don't have a problem with & as the methods are named with 'two
underscores' + name e.g. __bitwiseAnd. That does't appear to cause a
problem with an ampersand?
This was referring to continuing to use symbol names, but without a new
keyword.
The name of the function (e.g. __add) always refers to the symbol used
where it is used, not what it is doing.
If the code is
$a + $b
then that is an addition operator, when the
code is read. If I was reading the code, and I saw that either $a or
$b were objects, I would know to go looking for an __add magic method.
True, and I don't think it would be ambiguous this way. However, method
names for other methods tend to describe what the function does.
" '// This function unions, it does not add'"
Then that is probably an example of an inappropriate use of operator
overloads, and so shouldn't be used as a justification for a syntax
choice.
Larry's example with dot-product and cross-product is a better example, but
one that is less accessible to those who don't know vector math/linear
algebra. I used this example because this is exactly how PHP treats + for
arrays, so I would think most Collections would implement + as the union
operator in order to remain consistent with internals.
I think this is just wrong, and makes the RFC unacceptable to me.
Although most of the code I write is code that just performs
operations as I see fit, some of the time the operations need to be
driven by user data. Even something simple like a
calculator-as-a-service would need to call the operations dynamically
from user provided data.
I'm confused why this wouldn't still be possible? First, you can still get
the closure for the operator implementation from Reflection if you really,
really need it. But second, with user data couldn't you use a setter to
change the object state prior to the op, use a method specifically for
calling as a method, or just combine them with the operator?
$obj->setValue($userData);
$result = $obj + $userData;
I also have an aesthetic preference when writing tests to be explicit
as possible, rather than concise as possible e.g.$foo->__add(5, OperandPosition::LeftSide);
$foo->__add(5, OperandPosition::RightSide);instead of:
$foo + 5;
5 + $fooAs I find that easier to reason about.
This I can understand. Again, I don't think you're wrong here, I think this
is a matter of opinion and taste. I can understand your position, but I
think the long-term maintainability and support the keyword offers is worth
the short term pain.
Larry:
I largely agree here. I don't know if it's because of the
operator
choice or not, but being able to call an operator dynamically is important
in many use cases. It doesn't have to be a pristine syntax, but some way
to do that dynamically (without having a big match statement everywhere you
need to) would be very welcome.
Overall, the point in making them non-callable was to force PHP developers
to stop thinking about these as methods. They are modifications to engine
behavior that are given directly to PHP devs. Using them as methods would
usually indicate incorrect usage. If that's what you need, then probably
you need a method, not an operator overload. If you need both, then what
I would suggest is implementing the logic as a normal method, and then
calling that method inside of the operator overload.
Another question: Can an interface or abstract class require an operator
to be implemented? That's not currently discussed at all. (I would expect
the answer to be Yes.)
Yes. Both abstract classes and interfaces can require implementations of
the operator keyword as they would with a method.
Jordan
I think that all of the things you mentioned will need similar
updates to work correctly with this RFC even if it was done
with plain old magic methods instead.
No, that's not true. Codesniffer and all the other tools parse magic
methods just fine. Stuff like the coverage notation for PHPUnit would
understand @covers BigNumber::__plus
just fine.
The main piece of work each of them would need to do to support this
RFC, if it was based on magic methods, is being able to understand
that objects can work with operators:
$foo = new BigNumber(5);
$foo + 5; // Check that BigNumber implements the magic method __plus
That is far less work than having to add stuff to parse a new way to
declare functions.
Jordan LeDoux wrote:
Danack wrote:
"Non-Callable - Operand implementations cannot be called on an
instance of an object the way normal methods can."
I think this is just wrong, and makes the RFC unacceptable to me.First, you can still get the closure for the operator implementation
from Reflection if you really, really need it.
Sorry, but I just find that a bizarre thing to suggest. Introducing a
new type of function that can only be called in a particular way needs
to have really strong reasons for that, not "oh you can still call it
through reflection".
I think you've taken the position that using the symbols are cool, and
you're reasoning about how the RFC should operate from decision.
I'm not sure I can make a reasonable argument against it that you
would find persuasive, but to me it's adding a non-trivial amount of
complexity, which tips the RFC from being acceptable, to not.
cheers
Dan
Ack
RFC Link: https://wiki.php.net/rfc/user_defined_operator_overloads
I'm not strongly opinionated on either approach (magic __add vs operator +)
although I have a very slight preference to your propose operator + syntax,
however despite very occasionally thinking that this would be a useful
feature, I always start to think about all the awful ways this could be
used, making debugging and reasoning about code harder, since unexpected
magic behaviour can (and probably will be) introduced.
The proposal is technically reasonable, pending consideration of reflection
and so on, but I just think the desire to use this in ways it shouldn't be
used will be too great, and we'll end up with horrendous complexity.
Perhaps I'm too cynical.
The only mitigation for unnecessary complexity I can think of is to force
overloaded operators to be "arrow functions" to encourage only minimal
code, e.g.
operator +(Number $other, OperandPosition $operandPos): Number => return
new Number ($this->value + $other->value);
RFC Link: https://wiki.php.net/rfc/user_defined_operator_overloads
I'm not strongly opinionated on either approach (magic __add vs operator +)
although I have a very slight preference to your propose operator + syntax,
however despite very occasionally thinking that this would be a useful
feature, I always start to think about all the awful ways this could be
used, making debugging and reasoning about code harder, since unexpected
magic behaviour can (and probably will be) introduced.The proposal is technically reasonable, pending consideration of reflection
and so on, but I just think the desire to use this in ways it shouldn't be
used will be too great, and we'll end up with horrendous complexity.
Perhaps I'm too cynical.The only mitigation for unnecessary complexity I can think of is to force
overloaded operators to be "arrow functions" to encourage only minimal
code, e.g.operator +(Number $other, OperandPosition $operandPos): Number => return
new Number ($this->value + $other->value);
I don't think that would be possible. As many of the examples in the RFC show, there are numerous cases where an operator function/callback/thing will need branching logic internally. Even if we did that, people could just sub-call to a single function which would be just as complex as if it were in the operator callback directly.
(Though I still would like to see "short functions" in the language, despite it having been rejected once already.)
--Larry Garfield
The only mitigation for unnecessary complexity I can think of is to force
overloaded operators to be "arrow functions" to encourage only minimal
code, e.g.operator +(Number $other, OperandPosition $operandPos): Number => return
new Number ($this->value + $other->value);
I don't think that would be possible. As many of the examples in the RFC show, there are numerous cases where an operator function/callback/thing will need branching logic internally. Even if we did that, people could just sub-call to a single function which would be just as complex as if it were in the operator callback directly.
I don't know if this would actually be helpful, but you could force
the operator definition to be an alias for a normal method. That would
(at least partially) solve the "dynamic call" problem, because the
underlying method would be available with the existing dynamic call syntax.
Perhaps we could use an Attribute to bind the operator to the method,
which would also reduce the impact on tools that need to parse class
definitions:
class Collection{ #[Operator('+')]
public function union(Collection$other, OperandPosition$operandPos) {}
}
An interesting extension would be to have an optional argument to the
Attribute which binds separate methods for each direction of arguments,
rather than exposing it as a parameter:
class Number{ #[Operator('/', OperandPosition::LeftSide)]
public function divideBy(Number $divisor) {}
#[Operator('/', OperandPosition::RightSide)]
publicfunction fractionOf(Number $dividend) {}
}
Regards,
--
Rowan Tommins
[IMSoP]
Perhaps we could use an Attribute to bind the operator to the method, which would also reduce the impact on tools that need to parse class definitions:
class Collection{ #[Operator('+')]
public function union(Collection$other, OperandPosition$operandPos) {}
}An interesting extension would be to have an optional argument to the Attribute which binds separate methods for each direction of arguments, rather than exposing it as a parameter:
class Number{ #[Operator('/', OperandPosition::LeftSide)]
public function divideBy(Number $divisor) {}#[Operator('/', OperandPosition::RightSide)]
publicfunction fractionOf(Number $dividend) {}
}
Sorry about the whitespace mess in the above examples; this may or may not show better:
class Collection{
#[Operator('+')]
public function union(Collection $other, OperandPosition $operandPos) {}
}
class Number{
#[Operator('/', OperandPosition::LeftSide)]
public function divideBy(Number $divisor) {}
#[Operator('/', OperandPosition::RightSide)]
public function fractionOf(Number $dividend) {}
}
Regards,
--
Rowan Tommins
[IMSoP]
The only mitigation for unnecessary complexity I can think of is to force
overloaded operators to be "arrow functions" to encourage only minimal
code, e.g.
The 'valid' use-cases for this aren't going to minimal pieces of code.
Things like a matrix object that supports multiplying by the various
things that matrices can be multiplied by won't fit into an arrow
function.
Also, I fundamentally disagree with making stuff difficult to use to
'punsish' users who want to use that feature. If you want to enforce
something like a max line length in a coding standard, or forbidding
usage of a feature, that is up to you and any code style tool you use.
I just think the desire to use this in ways it shouldn't be
used will be too great,
It might be an idea to add a list of bad examples to the RFC, so
people can refer to how not to use it, rather than each programming
team having to make the same mistakes.
cheers
Dan
Ack
Hello internals,
I last brought this RFC up for discussion in August, and there was
certainly interesting discussion. Since then there have been many
improvements, and I'd like to re-open discussion on this RFC. I mentioned
in the first email to the list that I was planning on taking a while before
approaching a vote, however the RFC is much closer to vote-ready now, and
I'd like to open discussion with that in mind.RFC Link: https://wiki.php.net/rfc/user_defined_operator_overloads
Hi Jordan,
Thanks a lot for your work on this RFC! I like the direction this is going.
One thing that may be worthwhile looking into is the query builder use
case. I mentioned it before:
https://externals.io/message/115648#115771
Basically it would enable using plain PHP expressions in stead of
strings. So in stead of
$query->where('product.price < ?1')->setParameter(1, 100);
one could write:
$query->where(Price < 100);
Here Price is a class that represents a database column which has a
(static) overload of the '<' operator. The operator overload yields an
object representing a database expression, which gets passed to the
where() method.
In general I don't like this sort of clever construct which makes one
wonder what on earth is going on. The reason I do like this particular
use case is that it can simplify code and enable static analysis of
query expressions.
Now I'm not suggesting to support this creative use of operator
overloading in the current RFC. It may however be useful to consider if
this use case could be supported by a future RFC in a backward
compatible way. Perhaps the RFC could mention it as a possible future
extension.
Kind regards,
Dik Takken
$query->where(Price < 100);
Here Price is a class that represents a database column which has a
(static) overload of the '<' operator. The operator overload yields an
object representing a database expression, which gets passed to the
where() method.
The biggest problem with this particular example is not the operator
overloading, but the bare word "Price", which is currently a constant
lookup, not a class reference, as in:
const Price = 50;
var_dump(Price < 100);
However, with any version of operator overloading that didn't limit the
return values of the overloaded operator, you could do something like:
$query->where(Product::$price < 100)
Where the static property Product::$price is an object which overloads
the "<" operator to return some kind of Condition object which can be
used by the query builder.
Regards,
--
Rowan Tommins
[IMSoP]
$query->where(Price < 100);
Here Price is a class that represents a database column which has a
(static) overload of the '<' operator. The operator overload yields an
object representing a database expression, which gets passed to the
where() method.The biggest problem with this particular example is not the operator
overloading, but the bare word "Price", which is currently a constant
lookup, not a class reference, as in:const Price = 50;
var_dump(Price < 100);However, with any version of operator overloading that didn't limit the
return values of the overloaded operator, you could do something like:$query->where(Product::$price < 100)
Where the static property Product::$price is an object which overloads
the "<" operator to return some kind of Condition object which can be
used by the query builder.
Cool as that would be, it poses a problem as it would mean the Price object could either be directly compariable, or query-builder-comparable, but not both. There's no way to have multiple <=> overrides in different contexts.
Also, for <=> in particular, that one is restricted to only return -1 | 0 | 1 anyway, so it wouldn't be able to return a query builder object.
--Larry Garfield
Hi Jordan,
Thanks a lot for your work on this RFC! I like the direction this is going.
One thing that may be worthwhile looking into is the query builder use
case. I mentioned it before:https://externals.io/message/115648#115771
Basically it would enable using plain PHP expressions in stead of
strings. So in stead of$query->where('product.price < ?1')->setParameter(1, 100);
one could write:
$query->where(Price < 100);
Here Price is a class that represents a database column which has a
(static) overload of the '<' operator. The operator overload yields an
object representing a database expression, which gets passed to the
where() method.In general I don't like this sort of clever construct which makes one
wonder what on earth is going on. The reason I do like this particular
use case is that it can simplify code and enable static analysis of
query expressions.Now I'm not suggesting to support this creative use of operator
overloading in the current RFC. It may however be useful to consider if
this use case could be supported by a future RFC in a backward
compatible way. Perhaps the RFC could mention it as a possible future
extension.Kind regards,
Dik Takken
This is not a use case I highlighted because it's one that would be
difficult to support with this RFC. But as you say, it could be a good
future expansion. In particular, putting a query builder object into core
with some more advanced overloads built in may be the best way to
accomplish this, particularly if it is built with the idea in mind that the
entities themselves may also have overloads.
I can certainly add it to the future scope of this RFC however.
--
RE: The operator keyword and operator implementations being non-callable.
This was a limitation that I purposely placed on the operator keyword, as I
didn't like the idea of allowing syntax of the style $obj->{'+'}();
or
$obj->$op();
and I wanted developers to clearly understand that they
shouldn't treat these as normal methods in the vast majority of
circumstances. However, it seems this is one of the largest sticking points
for those who would otherwise support the RFC. To that end, I'm considering
removing that restriction on the operator
keyword. If I were to do that,
you'd no longer need to wrap the operator in a closure to call it, though
the parser would still have problems with $obj->+(...);
I suppose my question then would be, is this an acceptable compromise on
the operator keyword? It removes one of the more annoying hurdles that
Danack mentioned and that others have pointed out, but retains much of the
benefits of the keyword that I expressed in my last email.
Jordan
Le 16/12/2021 à 05:01, Jordan LeDoux a écrit :
This is not a use case I highlighted because it's one that would be
difficult to support with this RFC. But as you say, it could be a good
future expansion. In particular, putting a query builder object into core
with some more advanced overloads built in may be the best way to
accomplish this, particularly if it is built with the idea in mind that the
entities themselves may also have overloads.I can certainly add it to the future scope of this RFC however.
--
RE: The operator keyword and operator implementations being non-callable.
This was a limitation that I purposely placed on the operator keyword, as I
didn't like the idea of allowing syntax of the style$obj->{'+'}();
or
$obj->$op();
and I wanted developers to clearly understand that they
shouldn't treat these as normal methods in the vast majority of
circumstances. However, it seems this is one of the largest sticking points
for those who would otherwise support the RFC. To that end, I'm considering
removing that restriction on theoperator
keyword. If I were to do that,
you'd no longer need to wrap the operator in a closure to call it, though
the parser would still have problems with$obj->+(...);
I suppose my question then would be, is this an acceptable compromise on
the operator keyword? It removes one of the more annoying hurdles that
Danack mentioned and that others have pointed out, but retains much of the
benefits of the keyword that I expressed in my last email.Jordan
Hello,
I'm not an internals hacker nor someone who can vote, but here is my
opinion about operator overloading: I don't like it. Nevertheless, if it
has to be done, I'd like it to be less challenging for PHP newcomers or
everyday developers.
An operator is not much more than a function shortcut, gmp examples
prove that point quite well. I don't see why it is necessary to create a
new syntax. It seems in the discussion that the magic method ship has
sailed, but I'd much prefer it.
I don't see why a user wouldn't be able to call an operator
method/function outside of the operator context. It has a signature:
left operand and right operand are its parameters, and it has a return
type. The engine internally will just call this function as userland
code could do.
I think that adding a new syntax for it, and allowing weird function
names which are the operator symbols will probably create some mind fuck
in people's mind when reading the code. I like things being simple, and
I'd love operator overloads to be simple functions, no more no less, not
"a new thing". The more PHP syntax grows the more complex it is to learn
and read.
Regarding the naming debate about operator names being different
depending upon the context, I agree, but I don't care, if operators have
a name in the engine, I'd prefer methods to carry the same name even if
it yields a different name in the domain semantics of the object: if you
are the low level operator function developer, you know what you are
doing furthermore you can still comment code. If you're the end user,
you will read the domain API documentation, not the code itself in many
cases.
If the names are a problem, why not registering those using an attribute
? If I remember well it was mentioned somewhere in the mail thread, it
would provide a way to explicitly register any method, with any name, as
being an operator implementation, sus userland could keep both syntax
and use the one they wish (i.e. $newNumber = $number->add($otherNumber);
or $newNumber = $number + $otherNumber. It's not insane to keep this
possibility open, on the contrary, it'd leverage the fact that operator
overloading in some context is just a shiny eye candy way of writing
some domain function shortcut.
That's my opinion and I don't if it worth a penny, but in my mind, an
operator is a function, and there's no reason that it'd be a different
thing.
Regards,
--
Pierre
Hello internals,
some concerns I have about operator overloading.
(I have seen and played with operator overloading long time ago in
C++, this is my background for these points.)
- Searchable names.
Methods and functions have searchable and clickable names. Operators don't.
The "searchable" applies to grep searches in code, but also google
searches for documentation and support.
This adds to the concerns already raised by others, that we will see
arbitrary operators "just because we can".
- Symmetry
Operators like "+" or "==" are often expected to be symmetric / commutative.
(for "*" I would not universally expect this, e.g. matrix
multiplication is not symmetric)
https://en.wikipedia.org/wiki/Commutative_property
https://en.wikipedia.org/wiki/Symmetric_function
Having one side of the operator "own" the implementation feels wrong,
and could lead to problems with inheritance down the line.
From C++ I remember that at the time, there was a philosophy of
defining and implementing these kinds of operations outside of the
objects that hold the data.
- Lack of real parameter overloading
Unlike C++ (or C?), we don't have real method/function overloading
based on parameters.
I also don't see it being added in this operator RFC (and I would
disagree with adding it here, if we don't add it for functions/methods
first).
We have to solve this with inheritance override, and with if ($arg
instanceof ...) in the implementation.
What this does not give us is conditional return types:
This is what I would do in a language with parameter-based overloading:
class Matrix {
operator * (Matrix $other): Matrix {..}
operator * (float $factor): Matrix {..}
operator * (Vector $vector): Vector {..}
}
class Vector {
operator * (Vector $vector): float {..}
operator * (float $factor): Vector {..}
}
(Or if we also have templates/generics, we could put dimension
constraints on the types, so that we cannot multiply matrices where
dimensions mismatch.)
Without real parameter overloading, we have to use instanceof instead,
and we cannot have the conditional type hints.
-- Andreas
Le 16/12/2021 à 05:01, Jordan LeDoux a écrit :
This is not a use case I highlighted because it's one that would be
difficult to support with this RFC. But as you say, it could be a good
future expansion. In particular, putting a query builder object into core
with some more advanced overloads built in may be the best way to
accomplish this, particularly if it is built with the idea in mind that the
entities themselves may also have overloads.I can certainly add it to the future scope of this RFC however.
--
RE: The operator keyword and operator implementations being non-callable.
This was a limitation that I purposely placed on the operator keyword, as I
didn't like the idea of allowing syntax of the style$obj->{'+'}();
or
$obj->$op();
and I wanted developers to clearly understand that they
shouldn't treat these as normal methods in the vast majority of
circumstances. However, it seems this is one of the largest sticking points
for those who would otherwise support the RFC. To that end, I'm considering
removing that restriction on theoperator
keyword. If I were to do that,
you'd no longer need to wrap the operator in a closure to call it, though
the parser would still have problems with$obj->+(...);
I suppose my question then would be, is this an acceptable compromise on
the operator keyword? It removes one of the more annoying hurdles that
Danack mentioned and that others have pointed out, but retains much of the
benefits of the keyword that I expressed in my last email.Jordan
Hello,
I'm not an internals hacker nor someone who can vote, but here is my
opinion about operator overloading: I don't like it. Nevertheless, if it
has to be done, I'd like it to be less challenging for PHP newcomers or
everyday developers.An operator is not much more than a function shortcut, gmp examples
prove that point quite well. I don't see why it is necessary to create a
new syntax. It seems in the discussion that the magic method ship has
sailed, but I'd much prefer it.I don't see why a user wouldn't be able to call an operator
method/function outside of the operator context. It has a signature:
left operand and right operand are its parameters, and it has a return
type. The engine internally will just call this function as userland
code could do.I think that adding a new syntax for it, and allowing weird function
names which are the operator symbols will probably create some mind fuck
in people's mind when reading the code. I like things being simple, and
I'd love operator overloads to be simple functions, no more no less, not
"a new thing". The more PHP syntax grows the more complex it is to learn
and read.Regarding the naming debate about operator names being different
depending upon the context, I agree, but I don't care, if operators have
a name in the engine, I'd prefer methods to carry the same name even if
it yields a different name in the domain semantics of the object: if you
are the low level operator function developer, you know what you are
doing furthermore you can still comment code. If you're the end user,
you will read the domain API documentation, not the code itself in many
cases.If the names are a problem, why not registering those using an attribute
? If I remember well it was mentioned somewhere in the mail thread, it
would provide a way to explicitly register any method, with any name, as
being an operator implementation, sus userland could keep both syntax
and use the one they wish (i.e. $newNumber = $number->add($otherNumber);
or $newNumber = $number + $otherNumber. It's not insane to keep this
possibility open, on the contrary, it'd leverage the fact that operator
overloading in some context is just a shiny eye candy way of writing
some domain function shortcut.That's my opinion and I don't if it worth a penny, but in my mind, an
operator is a function, and there's no reason that it'd be a different
thing.Regards,
--
Pierre
--
To unsubscribe, visit: https://www.php.net/unsub.php
Methods and functions have searchable and clickable names. Operators don't.
The "searchable" applies to grep searches in code, but also google
That's one of the reasons why I prefer a magic methods based approach.
function __plus(...){}
can be searched for...and for future scope, something like:
function __union(...){}
is more self-documenting (imo) than:
operator ∪(...){}
Lack of real parameter overloading
Unlike C++ (or C?), we don't have real method/function overloading
based on parameters.
Java is probably a better comparison language than C.
I have a note on the core problem that method overloading would face
for PHP here: https://phpopendocs.com/rfc_codex/method_overloading
But although they both involved the word 'overloading', operator
overloading, and method overloading are really separate features.
we cannot have the conditional type hints.
btw you can just say 'types'.
Unlike some lesser languages, in PHP parameter types are enforced at
run-time; they aren't hints. I believe all references to hints (in
relation to types at least) have been removed from the PHP manual.
cheers
Dan
Ack
Methods and functions have searchable and clickable names. Operators don't.
The "searchable" applies to grep searches in code, but also googleThat's one of the reasons why I prefer a magic methods based approach.
function __plus(...){}
can be searched for...and for future scope, something like:
function __union(...){}
is more self-documenting (imo) than:
operator ∪(...){}
I don't mind using magic methods for this, compared to an operator keyword.
It is also what I found is happening in python.
However, this does not give us searchability in the calling place,
only where it is declared / implemented.
Lack of real parameter overloading
Unlike C++ (or C?), we don't have real method/function overloading
based on parameters.Java is probably a better comparison language than C.
I have a note on the core problem that method overloading would face
for PHP here: https://phpopendocs.com/rfc_codex/method_overloadingBut although they both involved the word 'overloading', operator
overloading, and method overloading are really separate features.
I see the distinction in overloading based on the object type on the
left, vs overloading based on parameter types.
For a method call $a->f($b), the implementation of ->f() is chosen
based on the type of $a, but not $b.
For an operator call "$a + $b", with the system proposed here, again,
the implementation of "+" will be chosen based on the type of $a, but
not $b.
For native operator calls, the implementation is chosen based on the
types of $a and $b, but in general they are cast to the same type
before applying the operator.
For global function calls f($a, $b), the implementation is always the same.
In a language with parameter-based overloading, the implementation can
be chosen based on the types of $a and $b.
This brings me back to the "symmetry" concern.
In a call "$a->f($b)", it is very clear that the implementation is owned by $a.
However, in an operator expression "$a + $b", it looks as if both
sides are on equal footing, whereas in reality $a "owns" the
implementation.
Add to this that due to the weak typing and implicit casting,
developers could be completely misled by looking at an operator
invocation, if a value (in our case just the left side) has an
unexpected type in some edge cases.
Especially if it is not clear whether the value is a scalar or an object.
With a named method call, at least it is constrained to classes that
implement a method with that name.
we cannot have the conditional type hints.
btw you can just say 'types'.
Unlike some lesser languages, in PHP parameter types are enforced at
run-time; they aren't hints. I believe all references to hints (in
relation to types at least) have been removed from the PHP manual.
Ok, what I mean is return type declarations.
In a class Matrix, operator(Matrix $other): Matrix {} can be declared
to always return Matrix, and operator(float $factor): float {} can be
declared to always return float.
However, with a generic operator(mixed $other): Matrix|float {}, we
cannot natively declare when the return value will be Matrix or float.
(a tool like psalm could still do it)
But even for parameters, if I just say "type" it won't be clear if I
mean the declared type on the parameter, or the actual type of the
argument value.
cheers
Dan
Ack
I see the distinction in overloading based on the object type on the
left, vs overloading based on parameter types.For a method call $a->f($b), the implementation of ->f() is chosen
based on the type of $a, but not $b.
For an operator call "$a + $b", with the system proposed here, again,
the implementation of "+" will be chosen based on the type of $a, but
not $b.
For native operator calls, the implementation is chosen based on the
types of $a and $b, but in general they are cast to the same type
before applying the operator.
For global function calls f($a, $b), the implementation is always the same.In a language with parameter-based overloading, the implementation can
be chosen based on the types of $a and $b.This brings me back to the "symmetry" concern.
In a call "$a->f($b)", it is very clear that the implementation is owned by $a.
However, in an operator expression "$a + $b", it looks as if both
sides are on equal footing, whereas in reality $a "owns" the
implementation.Add to this that due to the weak typing and implicit casting,
developers could be completely misled by looking at an operator
invocation, if a value (in our case just the left side) has an
unexpected type in some edge cases.
Especially if it is not clear whether the value is a scalar or an object.
With a named method call, at least it is constrained to classes that
implement a method with that name.
The RFC covers all of this, and the way it works around it. Absent method overloading (which I don't expect any time soon, especially given how vehemently Nikita is against it), it's likely the best we could do.
In a class Matrix, operator(Matrix $other): Matrix {} can be declared
to always return Matrix, and operator(float $factor): float {} can be
declared to always return float.
However, with a generic operator(mixed $other): Matrix|float {}, we
cannot natively declare when the return value will be Matrix or float.
(a tool like psalm could still do it)
I... have no idea what you're talking about here. The RFC as currently written is not a "generic operator". It's
operator *(Matrix $other, bool $left): Matrix
The implementer can type both $other and the return however they want. That could be Matrix in both cases, or it could be Matrix|float, or whatever. That's... the same as every other return type we have now.
--Larry Garfield
I see the distinction in overloading based on the object type on the
left, vs overloading based on parameter types.For a method call $a->f($b), the implementation of ->f() is chosen
based on the type of $a, but not $b.
For an operator call "$a + $b", with the system proposed here, again,
the implementation of "+" will be chosen based on the type of $a, but
not $b.
For native operator calls, the implementation is chosen based on the
types of $a and $b, but in general they are cast to the same type
before applying the operator.
For global function calls f($a, $b), the implementation is always the same.In a language with parameter-based overloading, the implementation can
be chosen based on the types of $a and $b.This brings me back to the "symmetry" concern.
In a call "$a->f($b)", it is very clear that the implementation is owned by $a.
However, in an operator expression "$a + $b", it looks as if both
sides are on equal footing, whereas in reality $a "owns" the
implementation.Add to this that due to the weak typing and implicit casting,
developers could be completely misled by looking at an operator
invocation, if a value (in our case just the left side) has an
unexpected type in some edge cases.
Especially if it is not clear whether the value is a scalar or an object.
With a named method call, at least it is constrained to classes that
implement a method with that name.The RFC covers all of this, and the way it works around it. Absent method overloading (which I don't expect any time soon, especially given how vehemently Nikita is against it), it's likely the best we could do.
In a class Matrix, operator(Matrix $other): Matrix {} can be declared
to always return Matrix, and operator(float $factor): float {} can be
declared to always return float.
However, with a generic operator(mixed $other): Matrix|float {}, we
cannot natively declare when the return value will be Matrix or float.
(a tool like psalm could still do it)I... have no idea what you're talking about here. The RFC as currently written is not a "generic operator". It's
operator *(Matrix $other, bool $left): Matrix
The implementer can type both $other and the return however they want. That could be Matrix in both cases, or it could be Matrix|float, or whatever. That's... the same as every other return type we have now.
Basically the same as others have been saying in more recent comments.
In a class Matrix, you might want to implement three variations of the
- operator:
- Matrix * Matrix = Matrix.
- Matrix * float = Matrix.
- Matrix * Vector = Vector.
Same for other classes and operators: - Money / float = Money
- Money / Money = float
- Distance * Distance = Area
- Distance * float = Distance
Without parameter-based overloading, this needs union return types, IF
we want to support all variations with operators:
- Matrix * (Matrix|float|Vector) = Matrix|Vector.
- Money / (Money|float) = float|Money
- Distance * (Distance|float) = Area|Distance
Which gives you a return type with some ambiguity.
With methods, you could have different method names with dedicated return types.
The naming can be awkward, so I am giving different possibilities here.
- Matrix->mulFloat(float) = Matrix->scale(float) = Matrix
- Matrix->mul(Matrix) = Matrix::product(Matrix, Matrix) = Matrix
- Matrix->mulVector(Vector) = Vector
To me, the best seems a method name that somehow predicts the return type.
Possible solutions for the developer who is writing a Matrix class and
who wants to use overloaded operators:
- Accept the ambiguity of the return type, and use tools like psalm to
be more precise. - Only use the * operator for one or 2 of the 3 variations (those that
return Matrix), and introduce a regular function for the third:- Matrix * Matrix|float = Matrix
- Matrix->mulVector(Vector) = Vector
This "concern" is not a complete blocker for the proposal.
For math-related use cases like the above, the natural expectation to
use operators can be so strong that we can live with some return type
ambiguity.
-- Andreas
--Larry Garfield
--
To unsubscribe, visit: https://www.php.net/unsub.php
In a class Matrix, you might want to implement three variations of the
- operator:
- Matrix * Matrix = Matrix.
- Matrix * float = Matrix.
- Matrix * Vector = Vector.
Same for other classes and operators:- Money / float = Money
- Money / Money = float
- Distance * Distance = Area
- Distance * float = Distance
these are bad examples and nightmare to maintain. I think even more with
lovely typed languages. Matrix*float are better implemented as method here.
On Thu, Dec 16, 2021 at 4:21 AM Andreas Hennings andreas@dqxtech.net
wrote:
Having one side of the operator "own" the implementation feels wrong,
and could lead to problems with inheritance down the line.From C++ I remember that at the time, there was a philosophy of
defining and implementing these kinds of operations outside of the
objects that hold the data.
This makes sense in C++ because they are all statically typed and compiled,
but the types of many things in PHP are not necessarily known until
runtime, so it's a bit different. First, there is no situation in which the
overloads will ever be called outside the context of an object instance.
Any situation that would trigger a lookup for overloads will involve an
object instance that has a value of some kind.
$val = $obj + 5;
They are not static methods because they aren't static methods in the
engine execution context. Making them static would break scope needlessly
in direct calls (if those are ever made), and would be a lie within the
engine. Python is a language that handles them exactly in this manner.
Under future scope I have a section titled Polymorphic Handler Resolution
that addresses the issue of inheritance. I decided to make that a separate
future scope for two reasons:
- It has some unique efficiency and optimization concerns around resolving
the inheritance structure during an operator opline. In general developers
expect operators to be very, very fast operations. Overloaded operators
will always be slower, but adding inheritance checks would exacerbate that.
I wanted to consider that separately so that it could receive its own focus. - For those who have limited personal experience with operator overloads
in the context of object instances (basically, those who haven't used
Python or something similar), the reason for that requirement might be
difficult to see. It will be easier to demonstrate the need to voters on
this list after this RFC is implemented I think. There's already a lot of
domain-specific background to this. The + operator isn't commutative in
PHP right now, for instance. If you want to see how, considerarray + array
. Whether an operator is commutative is always domain dependent.
The main edge-case that will be addressed by this future scope is:
$val = $parent + $child;
Where $child
is a descendent of $parent
, and both implement different
overloads. In such a case, you want to execute the overload of the child
class. An example of this would be a Number
class and a Fraction
class
that extends Number
. You would want to execute the overload on Fraction
regardless of whether it was the left or right operand. This is how the
instanced overloads behave in Python as well. I plan on bringing that as a
follow up RFC before 8.2 feature freeze if this RFC passes.
--
As a note, I have been swayed by the comments of many others at this point
to make operator implementations callable directly on objects, and remove
the callable restriction on them. The RFC has been updated to reflect this,
and I will update the PR for it as well when I have the time.
Jordan
Hi Jordan,
Thanks for the RFC. I have a couple questions:
Suppose I have classes Foo and Bar, and I want to support the following
operations:
- Foo * Bar (returns Foo)
- Bar * Foo (returns Foo)
If I understand correctly, there are three possible ways I could implement
this:
a) Implement the * operator in Foo, accepting a Foo|Bar, and use the
OperandPosition to determine if I am doing Foo * Bar or Bar * Foo and
implement the necessary logic accordingly.
b) Implement the * operator in Bar, accepting a Foo|Bar, and use the
OperandPosition to determine if I am doing Foo * Bar or Bar * Foo and
implement the necessary logic accordingly.
c) Implement the * operator in Foo, accepting a Bar (handles Foo * Bar
side); Implement the * operator in Bar, accepting a Foo (handles Bar * Foo
side)
Is this understanding correct? If so, which is the preferred approach and
why? If not, can you clarify the best way to accomplish this?
Next, suppose I also want to support int * Foo (returns int). To do this, I
must implement * in Foo, which would look like one of the following
(depending on which approach above)
public operator *(Foo|int $other, OperandPos $pos): Foo|int { ... }
public operator *(Foo|Bar|int $other, OperandPos $pos): Foo|int { ... }
Now, suppose I have an operation like 42 * $foo
, which as described
above, should return int. It seems it is not possible to enforce this via
typing, is that correct? i.e. every time I use this, I am forced to do:
$result = 42 * $foo;
if (is_int($result)) {
// can't just assume it's an int because * returns Foo|int
}
Thanks,
--Matt
Hi!
Hi Jordan,
Thanks for the RFC. I have a couple questions:
Suppose I have classes Foo and Bar, and I want to support the following
operations:
- Foo * Bar (returns Foo)
- Bar * Foo (returns Foo)
If I understand correctly, there are three possible ways I could implement
this:
And that's one of the reasons I feel so uneasy with this. When reading
this code: $foo * $bar - how do I know which of the ways you took and
where should I look for the code that is responsible for it? When I see
$foo->times($bar) it's clear who's in charge and where I find the code.
Terse code is nice but not at the expense of making it write-only.
--
Stas Malyshev
smalyshev@gmail.com
On Fri, Dec 17, 2021 at 10:36 AM Stanislav Malyshev smalyshev@gmail.com
wrote:
And that's one of the reasons I feel so uneasy with this. When reading
this code: $foo * $bar - how do I know which of the ways you took and
where should I look for the code that is responsible for it? When I see
$foo->times($bar) it's clear who's in charge and where I find the code.
Terse code is nice but not at the expense of making it write-only.
I think that something on php.net that focuses on best practices and things
to watch out for could go a long way towards this. In general, when people
search for information on how to do something, if that information isn't in
the PHP manual, they'll end up getting a random answer from stackoverflow.
I'd definitely be willing to put in some work to help on such documentation.
I very much expect that this feature will result in community developed
standards, such as a PSR.
Jordan
When reading
this code: $foo * $bar - how do I know which of the ways you took and
where should I look for the code that is responsible for it? When I see
$foo->times($bar) it's clear who's in charge and where I find the code.
Terse code is nice but not at the expense of making it write-only.
Well, there's only two places to look with operator overloads, but
yes you're right, using operator overloads for single operation is not
a good example of how they make code easier to read. The more
complicated example from the introduction to the RFC
https://wiki.php.net/rfc/user_defined_operator_overloads#introduction
shows how they make complex maths easier to read.
The exact position of where that trade-off is 'worth it' is going to
be different for different people. But one of the areas where PHP is
'losing ground' to Python is how Python is better at processing data
with maths, and part of that is how even trivial things, such as
complex numbers, are quite difficult to implement and/or use in
userland PHP.
Stanislav Malyshev wrote:
And again, what's the intuitive
difference between operators +=+@-+ and ++--=!* ?
That's not part of the RFC.
There's enough trade-offs to discuss already; people don't need to
imagine more that aren't part of what is being proposed.
I have encountered
toolkits where the authors think it's cute to define "+" to mean
something that has nothing to do with mathematical addition
Rather than leaving everyone to make the same mistakes again, this RFC
might be improved by having a list of stuff that it really shouldn't
be used for. At least then anyone who violates those guidelines does
so at their own risk. Having guidelines would also help junior devs
point out to more senior devs that "you're trying to be clever and the
whole team is going to regret this".
I started a 'Guidelines for operator overloads' here
(https://github.com/Danack/GuidelinesForOperatorOverloads/blob/main/guidelines.md)
- if anyone has horrorible examples they'd like to add, PR's are
welcome.
cheers
Dan
Ack
When reading
this code: $foo * $bar - how do I know which of the ways you took and
where should I look for the code that is responsible for it? When I see
$foo->times($bar) it's clear who's in charge and where I find the code.
Terse code is nice but not at the expense of making it write-only.Well, there's only two places to look with operator overloads, but
yes you're right, using operator overloads for single operation is not
a good example of how they make code easier to read. The more
complicated example from the introduction to the RFC
https://wiki.php.net/rfc/user_defined_operator_overloads#introduction
shows how they make complex maths easier to read.
I think the example in the RFC is interesting, but not ideal to
advertise the RFC.
The example is with native scalar types and build-in operator implementations.
(I don't know how GMP works internally, but for an average user of PHP
it does not make sense to call this "overloaded")
In fact, if we add overloaded operators as in the RFC, the example
becomes less easy to read, because now we can no longer be sure by
just looking at the snippet:
- Are those variables scalar values, or objects?
- Are the operators using the built-in implementation or some custom
overloaded implementation? (depends on the operand types) - Are the return values or intermediate values scalars or objects?
We need really good variable names, and/or other contextual
information, to answer those questions.
This said, I am sure we can find good examples.
In this thread, people already mentioned Matrix/Vector, Money/Currency
and Time/Duration.
Others would be various numbers with physical measuring units.
The exact position of where that trade-off is 'worth it' is going to
be different for different people. But one of the areas where PHP is
'losing ground' to Python is how Python is better at processing data
with maths, and part of that is how even trivial things, such as
complex numbers, are quite difficult to implement and/or use in
userland PHP.
Could be interesting to look for examples in Python.
I was not lucky so far, but there must be something..
Stanislav Malyshev wrote:
And again, what's the intuitive
difference between operators +=+@-+ and ++--=!* ?That's not part of the RFC.
There's enough trade-offs to discuss already; people don't need to
imagine more that aren't part of what is being proposed.I have encountered
toolkits where the authors think it's cute to define "+" to mean
something that has nothing to do with mathematical additionRather than leaving everyone to make the same mistakes again, this RFC
might be improved by having a list of stuff that it really shouldn't
be used for. At least then anyone who violates those guidelines does
so at their own risk. Having guidelines would also help junior devs
point out to more senior devs that "you're trying to be clever and the
whole team is going to regret this".I started a 'Guidelines for operator overloads' here
(https://github.com/Danack/GuidelinesForOperatorOverloads/blob/main/guidelines.md)
- if anyone has horrorible examples they'd like to add, PR's are
welcome.
I think it is a good start.
I would avoid appealing to "common sense" or "logical sense" though,
this can mean different things to different people, and is also
somewhat tautological, like "do good things, avoid bad things".
More meaningful terms can be "familiar", "expectations",
"predictable", "non-ambiguous".
(I see this language is coming from the C++ document, but either way I
don't like it)
Possible alternative language snippets:
For designing operators:
- Replicate familiar notations from the subject domain, e.g. maths,
physics, commerce. (this has some overlap with the first point) - Return the type and value that people expect based on their
expectations and mental models. - Use identifiers (class names, method names, variable names) from the
same subject domain language that inspires the operators. - Avoid ambiguity: If different people will have different
expectations for return type and value, introduce well-named methods
instead of overloaded operators. - Completeness trade-off: Understand the full range of operators, and
type combinations for the same operator, that is common in the subject
domain. Then decide which of those should be supported with operator
overloads, and which should be supported with methods instead. - Take inspiration from code examples outside of PHP.
For using operators:
Use descriptive variable names, method names, other identifiers, and
other hints (@var comments etc), so that the type and role of each
variable and value can be easily understood.
E.g. "$duration = $tStart - $tEnd;".
"If you provide constructive operators, they should not change their operands."
I think we should use the term "immutable":
Operators should be immutable to the operands.
If operands are objects, the operator should return a new instance
instead of modifying the existing instance.
For +=, we should still recommend immutable behavior. It is just so
much better :)
$price = new PriceInDollar(5);
assert($price->amount() === 5);
$old_price = $price;
assert($price === $old_price); // same object.
$price *= 2; // Shortcut for "$price = $price * 2;", creating a new instance.
assert($price->amount() === 10);
assert($old_price->amount() === 5);
assert($price !== $old_price); // different object.
Btw it would be really interesting to find such a list of
recommendations for Python.
The reference you added,
https://isocpp.org/wiki/faq/operator-overloading#op-ov-rules, is for
C++, which is less comparable to PHP than Python is.
-- Andreas
cheers
Dan
Ack--
To unsubscribe, visit: https://www.php.net/unsub.php
On Mon, Dec 20, 2021 at 4:43 PM Andreas Hennings andreas@dqxtech.net
wrote:
The exact position of where that trade-off is 'worth it' is going to
be different for different people. But one of the areas where PHP is
'losing ground' to Python is how Python is better at processing data
with maths, and part of that is how even trivial things, such as
complex numbers, are quite difficult to implement and/or use in
userland PHP.Could be interesting to look for examples in Python.
I was not lucky so far, but there must be something..
...
Btw it would be really interesting to find such a list of
recommendations for Python.
The reference you added,
https://isocpp.org/wiki/faq/operator-overloading#op-ov-rules, is for
C++, which is less comparable to PHP than Python is.
During my research phase of this RFC I was able to review many different
takes on this from the Python space. Here is one example of a community
discussion about it:
One of the interesting things here is that most of the warnings for
Python users discussed at the link are actually designed to not be issues
within this RFC. That's on purpose of course, I tried to think about how
some of the design issues in Python could be improved.
In Python, the +, +=, and ++ operators are implemented independently. In
this RFC, you may only overload the + operator, and then the VM handles the
appropriate surrounding logic for the other operators. For instance, with
++$obj or $obj++, you want to return either a reference or a copy,
depending on if it's a pre- or post-increment. In this RFC, the handling of
when the ZVAL is returned is handled by the VM automatically, and a
subordinate call to the opcode for + is made when appropriate. The
reassignment += works similarly, with the ZVAL's being assigned
automatically and a subordinate call. This vastly reduces the surface for
inconsistency.
Another warning that is discussed is around overloading the == operator. A
big reason for this is that the Python overloads do NOT require the ==
overload to return a particular type. Because of this, overloading the ==
operator can result in situations in Python where it is difficult to
compare objects for equality. However, in this RFC the == operator can only
be overloaded to return a boolean, so the semantic meaning of the operator
remains the same. Though you could of course do something terrible and
mutate the object during an equality comparison, you must return a boolean
value, ensuring that the operator cannot be co-opted for other purposes
easily. Additionally, the != and == cannot be independently implemented in
this RFC, but can in Python.
In Python the inequality operators can be implemented independently: >, >=,
<=, <. They also are not required to return a boolean value. In this RFC,
independent overloads for the different comparisons are not provided.
Instead, you must implement the <=> operator and return an int. Further,
the int value you return is normalized to -1, 0, 1 within the engine. This
ensures that someone could not repurpose the > operator to pull something
out of a queue, for instance. (They could still repurpose >> to do so, but
since the shift left and shift right operators are not used in the context
of boolean algebra often in PHP, that's far less dangerous.) A future scope
that I plan on working on is actually having an Ordering enum that must be
returned by the <=> overload instead, that even more explicitly defines
what sorts of states can be returned from this overload.
A lot of years of language design experience have been invested into
operator overloading across various languages. I wanted to at least try to
take advantage of all this experience when writing this RFC. It's why I say
that PHP will end up with the most restrictive operator overloads of any
language I'm aware of. There will still be pain points (returning union
types is not an easy thing to eliminate without full compile time type
resolution), but as far as buggy or problematic code, there's a lot about
this RFC that works to prevent it.
A determined programmer can still create problems, but I find this
(personally) an uncompelling argument against the feature. There are many
features in PHP that a determined programmer can create problems with. The
__get, __set, __call, and __callStatic magic methods can actually allow you
to overload the assignment operator for certain contexts. The __toString
magic method can already be used to mutate the object through a simple
concatenation. The ArrayAccess interface forces you to deal with the union
of all types (mixed), even when that doesn't make sense. And these are
just the PHP features that in some way already interact with operators and
objects in special circumstances.
Jordan
The exact position of where that trade-off is 'worth it' is going to
be different for different people. But one of the areas where PHP is
'losing ground' to Python is how Python is better at processing data
with maths, and part of that is how even trivial things, such as
complex numbers, are quite difficult to implement and/or use in
userland PHP.Could be interesting to look for examples in Python.
I was not lucky so far, but there must be something..
...
Btw it would be really interesting to find such a list of
recommendations for Python.
The reference you added,
https://isocpp.org/wiki/faq/operator-overloading#op-ov-rules, is for
C++, which is less comparable to PHP than Python is.During my research phase of this RFC I was able to review many different takes on this from the Python space. Here is one example of a community discussion about it:
One of the interesting things here is that most of the warnings for Python users discussed at the link are actually designed to not be issues within this RFC. That's on purpose of course, I tried to think about how some of the design issues in Python could be improved.
Right.
Your RFC might be the best we can do in the current PHP world, and
better than what exists in some other languages.
The remaining concerns would apply to any operator overloading RFC in
current PHP, and would not be special to this one.
In Python, the +, +=, and ++ operators are implemented independently. In this RFC, you may only overload the + operator, and then the VM handles the appropriate surrounding logic for the other operators. For instance, with ++$obj or $obj++, you want to return either a reference or a copy, depending on if it's a pre- or post-increment. In this RFC, the handling of when the ZVAL is returned is handled by the VM automatically, and a subordinate call to the opcode for + is made when appropriate. The reassignment += works similarly, with the ZVAL's being assigned automatically and a subordinate call. This vastly reduces the surface for inconsistency.
I see the "Implied Operators" section.
I assume this means that a new instance will be created, and stored on
the same variable, if the original operator is written in an
immutable way (which it should be)?
E.g.
$money = new Money(5);
$orig = $money;
$m2 = $money + new Money(2);
assert($money === $orig); // Checking object identity.
assert($m2 !== $orig);
$money += new Money(1); // Equivalent to $money = $money + new Money(1);
assert($money !== $orig);
assert($orig->amount() === 5);
I think we need a strong recommendation to implement operators as immutable.
Another warning that is discussed is around overloading the == operator. A big reason for this is that the Python overloads do NOT require the == overload to return a particular type. Because of this, overloading the == operator can result in situations in Python where it is difficult to compare objects for equality. However, in this RFC the == operator can only be overloaded to return a boolean, so the semantic meaning of the operator remains the same. Though you could of course do something terrible and mutate the object during an equality comparison, you must return a boolean value, ensuring that the operator cannot be co-opted for other purposes easily. Additionally, the != and == cannot be independently implemented in this RFC, but can in Python.
In Python the inequality operators can be implemented independently: >, >=, <=, <. They also are not required to return a boolean value. In this RFC, independent overloads for the different comparisons are not provided. Instead, you must implement the <=> operator and return an int. Further, the int value you return is normalized to -1, 0, 1 within the engine. This ensures that someone could not repurpose the > operator to pull something out of a queue, for instance. (They could still repurpose >> to do so, but since the shift left and shift right operators are not used in the context of boolean algebra often in PHP, that's far less dangerous.) A future scope that I plan on working on is actually having an Ordering enum that must be returned by the <=> overload instead, that even more explicitly defines what sorts of states can be returned from this overload.
A lot of years of language design experience have been invested into operator overloading across various languages. I wanted to at least try to take advantage of all this experience when writing this RFC. It's why I say that PHP will end up with the most restrictive operator overloads of any language I'm aware of. There will still be pain points (returning union types is not an easy thing to eliminate without full compile time type resolution), but as far as buggy or problematic code, there's a lot about this RFC that works to prevent it.
A determined programmer can still create problems, but I find this (personally) an uncompelling argument against the feature. There are many features in PHP that a determined programmer can create problems with. The __get, __set, __call, and __callStatic magic methods can actually allow you to overload the assignment operator for certain contexts. The __toString magic method can already be used to mutate the object through a simple concatenation. The ArrayAccess interface forces you to deal with the union of all types (mixed), even when that doesn't make sense. And these are just the PHP features that in some way already interact with operators and objects in special circumstances.
Jordan
On Tue, Dec 21, 2021 at 5:47 AM Andreas Hennings andreas@dqxtech.net
wrote:
I see the "Implied Operators" section.
I assume this means that a new instance will be created, and stored on
the same variable, if the original operator is written in an
immutable way (which it should be)?E.g.
$money = new Money(5);
$orig = $money;
$m2 = $money + new Money(2);
assert($money === $orig); // Checking object identity.
assert($m2 !== $orig);
$money += new Money(1); // Equivalent to $money = $money + new Money(1);
assert($money !== $orig);
assert($orig->amount() === 5);I think we need a strong recommendation to implement operators as
immutable.
Yes. The documentation for operator overloads should be much larger than
this RFC, and if this passes my focus for the rest of 8.2 will be on two
things:
- Working on a few smaller follow up RFCs (sorting/ordering enum,
polymorphic handler resolution) - Working to help on the documentation of this feature
All of the examples in the documentation should be for immutable
implementations, and there should be an explicit recommendation for
immutable implementations as well. With operators, mutable versions are
created with the operators under the "Implied" section instead of by
creating an immutable implementation of the operator itself.
Jordan
I see the "Implied Operators" section.
I assume this means that a new instance will be created, and stored on
the same variable, if the original operator is written in an
immutable way (which it should be)?E.g.
$money = new Money(5);
$orig = $money;
$m2 = $money + new Money(2);
assert($money === $orig); // Checking object identity.
assert($m2 !== $orig);
$money += new Money(1); // Equivalent to $money = $money + new Money(1);
assert($money !== $orig);
assert($orig->amount() === 5);I think we need a strong recommendation to implement operators as immutable.
Yes. The documentation for operator overloads should be much larger than this RFC, and if this passes my focus for the rest of 8.2 will be on two things:
- Working on a few smaller follow up RFCs (sorting/ordering enum, polymorphic handler resolution)
- Working to help on the documentation of this feature
All of the examples in the documentation should be for immutable implementations, and there should be an explicit recommendation for immutable implementations as well. With operators, mutable versions are created with the operators under the "Implied" section instead of by creating an immutable implementation of the operator itself.
Right. But even for the "implied" operators, I would say "mutable"
should refer to the variable, but not to the object.
This is what I tried to communicate with the code example.
Jordan
I think the example in the RFC is interesting, but not ideal to
advertise the RFC.
The example is with native scalar types and build-in operator implementations.
(I don't know how GMP works internally, but for an average user of PHP
it does not make sense to call this "overloaded")
I think you have misunderstood the example. GMP doesn't work with scalar
types, it works with its own objects; the general approach is to call
gmp_init() with a string describing a large number that cannot be
represented by a PHP integer. This gives you an object which doesn't
have any methods (it replaced a resource in older versions), but can be
used with the gmp_* functions, and with mathematical operators
overloaded in the engine.
So the questions you posed are not hypothetical:
- Are those variables scalar values, or objects?
- Are the operators using the built-in implementation or some custom
overloaded implementation? (depends on the operand types)- Are the return values or intermediate values scalars or objects?
They are objects, using an overloaded implementation of the operators,
and returning more objects.
The only difference is that right now, you can only overload operators
in an extension, not in userland code.
Regards,
--
Rowan Tommins
[IMSoP]
I think the example in the RFC is interesting, but not ideal to
advertise the RFC.
The example is with native scalar types and build-in operator implementations.
(I don't know how GMP works internally, but for an average user of PHP
it does not make sense to call this "overloaded")I think you have misunderstood the example. GMP doesn't work with scalar
types, it works with its own objects; the general approach is to call
gmp_init() with a string describing a large number that cannot be
represented by a PHP integer. This gives you an object which doesn't
have any methods (it replaced a resource in older versions), but can be
used with the gmp_* functions, and with mathematical operators
overloaded in the engine.
Wow, you are right. I should read more before I post.
Thank you Rowan!
Sorry everybody for the distraction.
So the questions you posed are not hypothetical:
Indeed.
The "concern" already applies for those extension-provided operator overloads.
- Are those variables scalar values, or objects?
- Are the operators using the built-in implementation or some custom
overloaded implementation? (depends on the operand types)- Are the return values or intermediate values scalars or objects?
They are objects, using an overloaded implementation of the operators,
and returning more objects.
Well the initial values could be scalar or GMP. As soon as we hit any
gmp_*() function, the return type is going to be GMP.
In the rewritten example using mostly operators, the gmp_invert() is
the only part that guarantees the return type to be GMP.
Without that gmp_invert(), the return value could as well be scalar,
if all initial variables are.
float|GMP * float|GMP = float|GMP
gmp_mul(float|GMP, float|GMP) = GMP
The only difference is that right now, you can only overload operators
in an extension, not in userland code.
So the same "concern" already applies here,
But it can be outweighed by the benefit.
Regards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi Jordan,
Thanks for the RFC. I have a couple questions:
Suppose I have classes Foo and Bar, and I want to support the following
operations:
- Foo * Bar (returns Foo)
- Bar * Foo (returns Foo)
If I understand correctly, there are three possible ways I could implement
this:a) Implement the * operator in Foo, accepting a Foo|Bar, and use the
OperandPosition to determine if I am doing Foo * Bar or Bar * Foo and
implement the necessary logic accordingly.
b) Implement the * operator in Bar, accepting a Foo|Bar, and use the
OperandPosition to determine if I am doing Foo * Bar or Bar * Foo and
implement the necessary logic accordingly.
c) Implement the * operator in Foo, accepting a Bar (handles Foo * Bar
side); Implement the * operator in Bar, accepting a Foo (handles Bar * Foo
side)Is this understanding correct? If so, which is the preferred approach and
why? If not, can you clarify the best way to accomplish this?
You are correct in your understanding. All three of these would accomplish
what you want, but would have varying levels of maintainability. Which you
choose would depend on the specifics of the Foo and Bar class. For
instance, if the Bar class was one that you didn't ever expect to use on
its own with operators, only in combination with Foo, then it would make
sense to use option 1. The inverse would be true if Bar was the only one
you ever expected to use with operators on its own.
The better way, in general, would be for Foo and Bar to extend a common
class that implements the overload in the same way for both. In most
circumstances, (but not all), if you have two different objects used with
each other with operators, they should probably share a parent class or be
instances of the same class. Like I said, this isn't always true, but for
the majority of use cases I would expect it is.
Next, suppose I also want to support int * Foo (returns int). To do this,
I must implement * in Foo, which would look like one of the following
(depending on which approach above)public operator *(Foo|int $other, OperandPos $pos): Foo|int { ... }
public operator *(Foo|Bar|int $other, OperandPos $pos): Foo|int { ... }Now, suppose I have an operation like
42 * $foo
, which as described
above, should return int. It seems it is not possible to enforce this via
typing, is that correct? i.e. every time I use this, I am forced to do:$result = 42 * $foo;
if (is_int($result)) {
// can't just assume it's an int because * returns Foo|int
}
In general I would say that returning a union from an operator overload is
a recipe for problems. I would either always return an int, or always
return an instance of the calling class. Mostly, this is because any scalar
can be easily represented with a class as well.
Jordan
On Fri, Dec 17, 2021 at 10:37 AM Jordan LeDoux jordan.ledoux@gmail.com
wrote:
Hi Jordan,
Thanks for the RFC. I have a couple questions:
Suppose I have classes Foo and Bar, and I want to support the following
operations:
- Foo * Bar (returns Foo)
- Bar * Foo (returns Foo)
If I understand correctly, there are three possible ways I could
implement this:a) Implement the * operator in Foo, accepting a Foo|Bar, and use the
OperandPosition to determine if I am doing Foo * Bar or Bar * Foo and
implement the necessary logic accordingly.
b) Implement the * operator in Bar, accepting a Foo|Bar, and use the
OperandPosition to determine if I am doing Foo * Bar or Bar * Foo and
implement the necessary logic accordingly.
c) Implement the * operator in Foo, accepting a Bar (handles Foo * Bar
side); Implement the * operator in Bar, accepting a Foo (handles Bar * Foo
side)Is this understanding correct? If so, which is the preferred approach and
why? If not, can you clarify the best way to accomplish this?You are correct in your understanding. All three of these would accomplish
what you want, but would have varying levels of maintainability. Which you
choose would depend on the specifics of the Foo and Bar class. For
instance, if the Bar class was one that you didn't ever expect to use on
its own with operators, only in combination with Foo, then it would make
sense to use option 1. The inverse would be true if Bar was the only one
you ever expected to use with operators on its own.The better way, in general, would be for Foo and Bar to extend a common
class that implements the overload in the same way for both. In most
circumstances, (but not all), if you have two different objects used with
each other with operators, they should probably share a parent class or be
instances of the same class. Like I said, this isn't always true, but for
the majority of use cases I would expect it is.Next, suppose I also want to support int * Foo (returns int). To do this,
I must implement * in Foo, which would look like one of the following
(depending on which approach above)public operator *(Foo|int $other, OperandPos $pos): Foo|int { ... }
public operator *(Foo|Bar|int $other, OperandPos $pos): Foo|int { ... }Now, suppose I have an operation like
42 * $foo
, which as described
above, should return int. It seems it is not possible to enforce this via
typing, is that correct? i.e. every time I use this, I am forced to do:$result = 42 * $foo;
if (is_int($result)) {
// can't just assume it's an int because * returns Foo|int
}In general I would say that returning a union from an operator overload is
a recipe for problems. I would either always return an int, or always
return an instance of the calling class. Mostly, this is because any scalar
can be easily represented with a class as well.Jordan
Hi Jordan,
Thanks for the info. I share Stas's unease with having many different
places we must look in order to understand what $foo * $bar actually
executes. I'm also uneasy with the requirement of union typing in order for
an operator to support multiple types. This will lead to implementations
which are essentially many methods packed into one: one "method" for each
type in the union, and potentially one "method" for each LHS vs. RHS. When
combined, these two issues will make readability difficult. It will be
difficult to know what $foo * $bar actually executes, and once we find it,
the implementation may be messy.
I agree that returning a union is a recipe for a problem, but the fact that
the input parameter must be a union can imply that the return value must
also be a union. For example, Num * Num may return Num, but Num * Vector3
may return Vector3, or Vector3 * Vector3 may represent dot product and
return Num. But let's not get hung up on specific scenarios; it's a problem
that exists in the general sense, and I believe that if PHP is to offer
operator overloading, it should do so in a way that is type safe and
unambiguous.
Method overloading could address both issues (LHS always "owns" the
implementation, and has a separate implementation for each type allowed on
the RHS). But I see this as a non-starter because it would not allow scalar
types on the LHS.
It's difficult to think of a solution that addresses both of these issues
without introducing more. One could imagine something like the following:
register_operator(, function (Foo $lhs, Bar $rhs): Foo { ...});
register_operator(, function (Bar $lhs, Foo $rhs): Foo { ...});
register_operator(*, function (int $lhs, Foo $rhs): int { ...});
But this just brings a new set of problems, including visibility issues
(i.e. can't use private fields in the implementation), and the fact that
this requires executing a function at runtime rather than being defined at
compile time.
I don't have any ideas that address all of these issues, but I do think
they deserve further thought.
Thanks,
--Matt
Hello internals,
register_operator(, function (Foo $lhs, Bar $rhs): Foo { ...});
register_operator(, function (Bar $lhs, Foo $rhs): Foo { ...});
register_operator(*, function (int $lhs, Foo $rhs): int { ...});But this just brings a new set of problems, including visibility issues
(i.e. can't use private fields in the implementation), and the fact that
this requires executing a function at runtime rather than being defined at
compile time.
Since this is going deeply into magic land anyways, we could go another step further
and make this a builtin/"macro" that does happen at compile-time, but also can
impose additional restrictions on what is
allowed - namely, that the registered function must not be inlined but a static
method on one of the arguments.
For example: (syntax completely imaginary here but slightly inspired by rust):
register_operator!(+, lhs: Bar, rhs: Foo, ret: Bar, Bar::addFooBar);
register_operator!(+, lhs: Bar, rhs: Bar, ret: Bar, Bar::addBar);
register_operator!(+, lhs: int, rhs: Bar, ret: int, Bar::addBarInt);
register_operator!(+, lhs: Foo, rhs: int, ret: Foo, Foo::addFooInt, commutative: true);
with
class Bar {
...
public static addFooBar (Bar $bar, Foo $foo): Bar { }
// etc.
}
Advantages:
- Explicitly named methods that can be called/tested separately
- Slightly improved searchability - grepping "register_operator" will show all operator
combinations inside a code base - Cannot implement operators for arbitrary classes that one does not own - the method
must be from one of the operands - Multiple distinct methods per operand/class without full method overloading
- No restrictions around having scalar types only as rhs
If I am not mistaken, the engine should also be able to typecheck the methods
to ensure that the types are correct, and additionally also be able to disallow
overlaps (eg. defining Foo+Bar commutitatively as well as Bar+Foo),
which should throw an error as soon as the second definition is encountered.
Disadvantage: This sounds like a lot of work to implement, and I am not sure
if the checks are even possible the way I'm imagining them (with classes being
loaded on demand, etc.).
Also, this syntax would definitely need work, I just wanted to point out that on the
drawing board, many of these design problems are solvable.
Whether they are worth the effort, and whether this is a good idea at all, is left
for others to decide.
Regards,
Mel
Thanks for the info. I share Stas's unease with having many different
places we must look in order to understand what $foo * $bar actually
executes. I'm also uneasy with the requirement of union typing in order for
an operator to support multiple types. This will lead to implementations
which are essentially many methods packed into one: one "method" for each
type in the union, and potentially one "method" for each LHS vs. RHS. When
combined, these two issues will make readability difficult. It will be
difficult to know what $foo * $bar actually executes, and once we find it,
the implementation may be messy.I agree that returning a union is a recipe for a problem, but the fact
that the input parameter must be a union can imply that the return value
must also be a union. For example, Num * Num may return Num, but Num *
Vector3 may return Vector3, or Vector3 * Vector3 may represent dot product
and return Num. But let's not get hung up on specific scenarios; it's a
problem that exists in the general sense, and I believe that if PHP is to
offer operator overloading, it should do so in a way that is type safe and
unambiguous.Method overloading could address both issues (LHS always "owns" the
implementation, and has a separate implementation for each type allowed on
the RHS). But I see this as a non-starter because it would not allow scalar
types on the LHS.It's difficult to think of a solution that addresses both of these issues
without introducing more. One could imagine something like the following:register_operator(, function (Foo $lhs, Bar $rhs): Foo { ...});
register_operator(, function (Bar $lhs, Foo $rhs): Foo { ...});
register_operator(*, function (int $lhs, Foo $rhs): int { ...});But this just brings a new set of problems, including visibility issues
(i.e. can't use private fields in the implementation), and the fact that
this requires executing a function at runtime rather than being defined at
compile time.I don't have any ideas that address all of these issues, but I do think
they deserve further thought.
With respect, these are not things that were overlooked. Method overloads
is something that I understand to be a complete non-starter within PHP. I
do not want to speak for other people, but I have been told multiple times
by multiple people that this is a feature which there is significant
resistance to, to the point of being something which should be avoided.
Certainly, it is a separate feature from operator overloading, and
shouldn't be included as part of this RFC.
As you noted, all of the alternatives have multiple other issues. I
considered many different ways to implement this, and I decided that this
particular way of doing it presented the fewest problems. The reason I made
that decision was that problems such as visibility issues would affect
nearly every implementation. But the issue of non-sibling type resolution
is something which would only affect a small subset of very complicated
programs in general. So I chose to confine the issues to the more complex
implementations, because these are likely also the ones where the developer
is more experienced or has more resources to solve the issues presented.
In general, unioning types should be seen as a "code smell" with this
feature in my personal opinion. If you start to see 4, 5, 6 different types
in your parameters, it should be a signal that you want to re-examine how
you are implementing them. I think it works well for this purpose, as many
developers already try to refactor code which has very complicated type
unions. Given that method overloads were off the table, and that the only
realistic way to provide for visibility concerns was to place the overloads
on classes, I see the requirement of union typing the operators as a guard
rail to help developers avoid implementations which are prone to error or
make the program excessively complex to understand.
If we created something instead that was a global register of type
combinations, such as those suggested by Mel, the implementations would
likely be all in one place (some kind of bootstrap or header file), but now
would be completely separated from the actual implementations.
I did consider all these issues quite extensively. I think that the
solution I'm presenting creates the smallest amount of issues for the
smallest set of users. In practice, the two most common usages for this
feature (in my estimation) are likely to be userland scalar object
implementations, and currency objects. Both of these are very
self-contained, and unlikely to want to interact with external objects. The
main applications that would be interested in doing that are complex
mathematical libraries (the kind of application that would fit your example
of Vector * Num). Such libraries are very likely to make subordinate calls
within the operator overloads, as the implementations of the mathematics
themselves are already very complex and likely used in multiple ways at
different times (spoken from experience as someone who maintains a complex
mathematics library). For those kinds of applications, the library itself
is inherently complex, and I very much doubt that operator overloads will
be the main source of complexity and confusion. When dealing with such
math, the more difficult parts to use are things that are related to the
math itself, such as the idea that complex numbers don't have a <=>
relationship to other numbers but do have a == relationship, or the concept
of stochastic rounding for applications such as machine learning.
I am definitely open to improvements and suggestions, I just want to be
clear that this wasn't overlooked. As you wrote out, the alternatives that
are obvious to explore present problems that would be experienced on a more
widespread basis, and I felt it was best to avoid that. I looked at how
other languages implement this feature as well, including Python, R, and
C++, to examine how those programming communities interact with different
language designs. This RFC is closest to the design of Python, as the
concerns within Python are much more similar to the concerns within PHP. If
you find another alternative to explore I am happy to discuss it. These
same trade-offs exist in other languages which have this feature. Again,
I'd look at Python for the closest analogue to this RFC, where operator
overloads are used extensively by many of the applications you would
expect, but do not appear to present these unstoppable complexity problems
to most applications.
They are more widely problematic in C++, but several of the most common
sources of pain with C++ operator overloading are entirely avoided (on
purpose) in this RFC. You cannot overload the assignment operator, you
cannot overload the logical operators, you cannot implement == and != with
different logic. Even Python allows for you to define > and < with
different logic (it doesn't even require a boolean return value). If this
RFC were to be accepted, PHP would have some of the most restrictive and
logically consistent operator overloads of any language I've investigated
as part of this RFC.
Is my proposal perfect? I very much doubt that. There is always room for
improvement. But an extreme amount of care went into trying to limit the
amount of "gunk" this feature will generate, some of it not obvious at
first glance of the RFC.
Jordan
In general, unioning types should be seen as a "code smell" with this
feature in my personal opinion. If you start to see 4, 5, 6 different types
in your parameters, it should be a signal that you want to re-examine how
you are implementing them. I think it works well for this purpose, as many
developers already try to refactor code which has very complicated type
unions.
I'm not sure this argument really makes sense in context, because the
usual way to refactor a method with a lot of unioned types would be to
create multiple methods with different names; with operator overloads,
you clearly can't do that.
In one of the previous discussions, I shared a real life C# Money
example: https://externals.io/message/115648#115666 I thought it would
be interesting to see how that would look in the current proposal. Most
of the operators are straight-forward:
public operator - (Money $other, OperandPosition $operandPos): Money
public operator + (Money $other, OperandPosition $operandPos): Money
public operator * (float $multiple, OperandPosition $operandPos): Money
public operator == (Money $other, OperandPosition $operandPos): bool
public operator <=> (Money $other, OperandPosition $operandPos): int
The division cases however are a little awkward:
/**
* @param float|Money $divisor A float to calculate a fraction, or
another Money to calculate a ratio
* @return Money|float Money if $divisor is float, float if $divisor is
Money
* @throws TypeError if $divisor is float, and OperandPosition is
OperandPosition::RightSide
*/
public operator / (float|Money $divisor, OperandPosition $operandPos):
Money|float
The intent is to support Money / float returning Money, and Money /
Money returning float, but not float / Money.
I don't think this kind of type list would be unusual, but it may be a
compromise we have to live with given PHP's type system.
Regards,
--
Rowan Tommins
[IMSoP]
If the names are a problem, why not registering those using an attribute
?
If there is a strong reason to use attributes, then the argument
should start from there.
Starting from "well we could just use an attribute" and then putting
the pressure on other people to find a reason to not use an
attribute, is a terrible design process.
Every language that has annotations ends up with far too many of them;
PHP is likely to end up with too many of them also. The time to push
back against using them is now, not when the damage has been done.
But to repeat, I don't think the names of magic methods are a problem.
Documenting that 'the name refers to the operator sigil, not to what
the function does', avoids it being a problem to be solved.
cheers
Dan
Ack
On Thu, Dec 9, 2021 at 12:11 PM Jordan LeDoux jordan.ledoux@gmail.com
wrote:
Hello internals,
I last brought this RFC up for discussion in August, and there was
certainly interesting discussion. Since then there have been many
improvements, and I'd like to re-open discussion on this RFC. I mentioned
in the first email to the list that I was planning on taking a while before
approaching a vote, however the RFC is much closer to vote-ready now, and
I'd like to open discussion with that in mind.RFC Link: https://wiki.php.net/rfc/user_defined_operator_overloads
There is a patch for this RFC, however the latest commits are not
playable. It will build, but with various problems which are being worked
on related to enums. The last playable commit can be found by checking out
this commit:https://github.com/JordanRL/php-src/commit/e044f53830a9ded19f7c16a9542521601ac3f331
This commit however does not have the enum for operator position described
in the RFC. It uses a bool instead with true being the left side, and false
being the right side.Implementation details still left:
- There are issues related to opcache/JIT still, so if you want to play
around with the playable commit disable both.- Reflection has not been updated, but the proposed updates necessary are
described in the RFC.It is a long RFC, but operator overloads are a complicated topic if done
correctly. Please review the FAQ section before asking a question, as it
covers many of the main objections or inquiries to the feature. I'd be
happy to expand on any of the answers there if prompted however.Jordan
It seems that most of the discussion and questions have happened. As such,
I'll be opening voting on the RFC on January 3rd unless anyone believes
there are further outstanding issues which should be discussed prior.
I've put together a small set of rules for operator overloads, guidelines
for implementations, that the PHP community could use to start learning the
limitations of this feature while the implementation is being finished and
the documentation for PHP.net is being worked on:
https://github.com/JordanRL/operator-overloads-in-php/blob/master/README.md
Jordan