[RFC] Differentiate op from assign-op in operator overloading

9 years ago by Sara Golemon — view source

unread

This is a separate proposal from the userspace operator overloading I
put up for Patricio yesterday and aims to fix what I see as a bug in
our operator overloading implementation (though some may disagree).

It specifically only seeks to differentiate const operations which
produce a new value from mutations which alter an existing overloaded
object.

https://wiki.php.net/rfc/assignment-overloading

9 years ago by Nikita Popov — view source

unread

This is a separate proposal from the userspace operator overloading I
put up for Patricio yesterday and aims to fix what I see as a bug in
our operator overloading implementation (though some may disagree).

It specifically only seeks to differentiate const operations which
produce a new value from mutations which alter an existing overloaded
object.

https://wiki.php.net/rfc/assignment-overloading

I'd like to provide some context as to why the current implementation works
as it does. The basic idea is that, prior to the introduction of operator
overloading, primitive operators only worked on types with by-value
semantics (or rather: only returned types with by-value semantics). For
such types it holds, that

$a op= $b   is equivalent to     $a = $a op $b.

We actually transform the latter operation into the former in pass3
optimization.

Operator overloading was implemented in such a way that this invariant
still holds. To illustrate why keeping this invariant might be
beneficial, consider this example from the GMP overloading RFC:

The example appeared in the context that, with operator overloading
supported, the above function could be transparently used either with
ordinary integers or with overloaded GMP objects. However this would not be
true if compound assignment operators would modify the object instead of
creating a new one. In this case the argument passed to $exponent would
always be GMP(0) after the function finishes running. I expect that this
would be somewhat unexpected.

Basically the lack of distinction between assign-op and op-and-assign
enforces that the object behaves as an immutable value object (at least as
far as overloading is concerned).

Nikita

9 years ago by Sara Golemon — view source

unread

I'd like to provide some context as to why the current implementation works
as it does.

Thanks for the context, Niki. It makes sense that, with GMP as the
flagship target of operator overloading, stripping away the by-ref
semantics of objects would be appealing. This way GMP objects just
look like regular numbers.

Except they're not. Nor are they really objects at this point. They
have no methods, no properties, they're not instantiated with the new
keyword, and they have by-value semantics (mostly).

That feels really odd to me, that GMP objects, at the moment of their
inception were /designed/ to be non-object objects. Would this carry
forward to any OOP API we might introduce in the future? i.e.
$g->add(123); wouldn't muttate $g, but would return a new instance?

Leaving GMP out of the equation for now (and I think we need to have a
longer discussion about this, but in another thread), I think the
question which remains is: Do we want more non-object objects?
Should, for example, SimpleXMLbe constrained by GMP's goals when
implementing overloaded operators (no, I don't know what this would
look like, it's just a for-instance).

If the answer to that is "No" (which I think it is), then the question
is: Can we modify the do_operation API in a way that allows GMP to
remain psuedo-by-value while still allowing other internal objects
which implement overloading to be more correct? I think we can.
I'll cobble together a gist of how that might look if there's
tentative buy-in, but I'm pretty sure PHP7's variable model does let
us do that.

-Sara

9 years ago by Nikita Popov — view source

unread

I'd like to provide some context as to why the current implementation
works
as it does.

Thanks for the context, Niki. It makes sense that, with GMP as the
flagship target of operator overloading, stripping away the by-ref
semantics of objects would be appealing. This way GMP objects just
look like regular numbers.

Except they're not. Nor are they really objects at this point. They
have no methods, no properties, they're not instantiated with the new
keyword, and they have by-value semantics (mostly).

That feels really odd to me, that GMP objects, at the moment of their
inception were /designed/ to be non-object objects. Would this carry
forward to any OOP API we might introduce in the future? i.e.
$g->add(123); wouldn't muttate $g, but would return a new instance?

GMP objects are, with the exceptions of gmp_setbit and gmp_clrbit,
immutable value objects. And yes, that's exactly what I would expect any
object representing a number to be. If $g->add(123) would modify $g instead
of returning a new object, that would be a major WTF moment for me (and for
that matter, make the usage very inconvenient.)

I don't get why you denounce immutable value objects as being "non-object
objects". Seems like very standard usage to me, and one that seems to be
increasingly preferred.

Leaving GMP out of the equation for now (and I think we need to have a
longer discussion about this, but in another thread), I think the
question which remains is: Do we want more non-object objects?
Should, for example, SimpleXMLbe constrained by GMP's goals when
implementing overloaded operators (no, I don't know what this would
look like, it's just a for-instance).

If we leave GMP out of the equation, then yes, I agree. Whether $a op= $b
should create a new object or modify an existing one depends on the nature
of that object. For GMP (and Rational and Complex and Currency and most
other things that tend to be immutable value objects) I think the behavior
we currently provide is preferable (though I guess that is subject to
discussion), but for some other applications of operator overloading, this
choice wouldn't really make sense.

An example for such a case I came across today in an extension for
collection objects, is the implementation of set union, intersection and
symmetric difference using bitwise operators. Sets usually aren't immutable
and $set |= $addSet creating a new object for the result would likely lead
to much WTF with sets being passed around.

If the answer to that is "No" (which I think it is), then the question
is: Can we modify the do_operation API in a way that allows GMP to
remain psuedo-by-value while still allowing other internal objects
which implement overloading to be more correct? I think we can.
I'll cobble together a gist of how that might look if there's
tentative buy-in, but I'm pretty sure PHP7's variable model does let
us do that.

It's currently already possible to distinguish this based on whether result
== op1. However we reserve the right (i.e. have some currently non-default
optimizations doing that) to have result == op1 in $a = $a + $b operations
as well, so this check is not robust. So, it would be good to change things
to pass in separate ZEND_ASSIGN_* flags. Will make the implementation even
more ugly though :)

Nikita

9 years ago by Sara Golemon — view source

unread

GMP objects are, with the exceptions of gmp_setbit and gmp_clrbit, immutable
value objects. And yes, that's exactly what I would expect any object
representing a number to be. If $g->add(123) would modify $g instead of
returning a new object, that would be a major WTF moment for me (and for
that matter, make the usage very inconvenient.)

Sure, but if I called $g->assignAdd(123); then I would expect $g to be modified.
That's the analogue to += so I while I don't disagree with your
conclusion about add(), I do disagree with your analogy.

I don't get why you denounce immutable value objects as being "non-object
objects". Seems like very standard usage to me, and one that seems to be
increasingly preferred.

It's not their immutability which makes me call them that. It's the
lack of anything you would normally see on an object:

No methods
No properties
No constants
Not initialized via new

GMP objects are just resources by another name, and operator
overloading got built around that model, rather than the model enjoyed
by normal objects. That's why I denounce them as non-object objects.

Leaving GMP out of the equation for now (and I think we need to have a
longer discussion about this, but in another thread), I think the
question which remains is: Do we want more non-object objects?
Should, for example, SimpleXMLbe constrained by GMP's goals when
implementing overloaded operators (no, I don't know what this would
look like, it's just a for-instance).

If we leave GMP out of the equation, then yes, I agree. Whether $a op= $b
should create a new object or modify an existing one depends on the nature
of that object. For GMP (and Rational and Complex and Currency and most
other things that tend to be immutable value objects) I think the behavior
we currently provide is preferable (though I guess that is subject to
discussion), but for some other applications of operator overloading, this
choice wouldn't really make sense.

Agreed. The choice of mutability needs to be class specific, which is
why I propose letting the class enforce it rather than dictating it
from the engine and not giving the classes a choice.

It's currently already possible to distinguish this based on whether result
== op1. However we reserve the right (i.e. have some currently non-default
optimizations doing that) to have result == op1 in $a = $a + $b operations
as well, so this check is not robust. So, it would be good to change things
to pass in separate ZEND_ASSIGN_* flags. Will make the implementation even
more ugly though :)

Right, that optimization pass is slightly more complex. I haven't had
the bandwidth to dig into it, but assuming a clean/efficient solution
is available, would you be okay with the rest of the proposal overall?
(Non-binding)

-Sara

9 years ago by Nikita Popov — view source

unread

GMP objects are, with the exceptions of gmp_setbit and gmp_clrbit,
immutable
value objects. And yes, that's exactly what I would expect any object
representing a number to be. If $g->add(123) would modify $g instead of
returning a new object, that would be a major WTF moment for me (and for
that matter, make the usage very inconvenient.)

Sure, but if I called $g->assignAdd(123); then I would expect $g to be
modified.
That's the analogue to += so I while I don't disagree with your
conclusion about add(), I do disagree with your analogy.

I picked the $g->add(123) example from your mail, maybe that was only a
typo?

To be clear, I don't really have a problem with something like
$g->assignAdd(123) as a performance optimization. GMP is potentially
critical code and depending on the application, the overhead of object and
mpz allocations may be significantly higher than the cost of the operations
themselves. My point here is only that I don't think these kind of mutable
APIs on numbers are a good choice for being part of the primary, go-to API
(which for GMP now are the overloaded operators).

I don't get why you denounce immutable value objects as being "non-object
objects". Seems like very standard usage to me, and one that seems to be
increasingly preferred.

It's not their immutability which makes me call them that. It's the
lack of anything you would normally see on an object:

No methods

No properties

No constants

Not initialized via new

GMP objects are just resources by another name, and operator
overloading got built around that model, rather than the model enjoyed
by normal objects. That's why I denounce them as non-object objects.

Thanks, that makes more sense.

Leaving GMP out of the equation for now (and I think we need to have a
longer discussion about this, but in another thread), I think the
question which remains is: Do we want more non-object objects?
Should, for example, SimpleXMLbe constrained by GMP's goals when
implementing overloaded operators (no, I don't know what this would
look like, it's just a for-instance).

If we leave GMP out of the equation, then yes, I agree. Whether $a op= $b
should create a new object or modify an existing one depends on the
nature
of that object. For GMP (and Rational and Complex and Currency and most
other things that tend to be immutable value objects) I think the
behavior
we currently provide is preferable (though I guess that is subject to
discussion), but for some other applications of operator overloading,
this
choice wouldn't really make sense.

Agreed. The choice of mutability needs to be class specific, which is
why I propose letting the class enforce it rather than dictating it
from the engine and not giving the classes a choice.

It's currently already possible to distinguish this based on whether
result
== op1. However we reserve the right (i.e. have some currently
non-default
optimizations doing that) to have result == op1 in $a = $a + $b
operations
as well, so this check is not robust. So, it would be good to change
things
to pass in separate ZEND_ASSIGN_* flags. Will make the implementation
even
more ugly though :)

Right, that optimization pass is slightly more complex. I haven't had
the bandwidth to dig into it, but assuming a clean/efficient solution
is available, would you be okay with the rest of the proposal overall?
(Non-binding)

Yeah, I totally agree with part 1+2 of the proposal :)

Nikita

9 years ago by Bob Weinand — view source

unread

Am 04.01.2016 um 01:37 schrieb Sara Golemon pollita@php.net:

This is a separate proposal from the userspace operator overloading I
put up for Patricio yesterday and aims to fix what I see as a bug in
our operator overloading implementation (though some may disagree).

It specifically only seeks to differentiate const operations which
produce a new value from mutations which alter an existing overloaded
object.

https://wiki.php.net/rfc/assignment-overloading

Hey,

I think this RFC is attempting to solve the wrong problem... Let me explain why:

a) What do you do in cases like:
$a = gmp_init(125);
$b = $a;
$b += 10;

$a and $b hold the same object reference, so we'd need to do an implicit clone (separation) before passing it to the assign ops, or we are going to break the immutable-value-object assumption.
That risks to be be problematic. As long as it is internal, we could deal with it internally. When we leak operator overloading to userland, not so much.

b) The main goal is to not needing copies when unnecessary. This could then as well apply to simple cases like:
$a = gmp_init(125);
$b = $a + 10;
/* $a isn't used anymore later */
or just explicit:
$a = $a + 10; # instead of $a += 10; // under some circumstances that may be more readable...

With opcache analysis we'll probably be able to determine when we won't use the symtable directly (via varvars, get_defined_variables() etc.), where this will present another valuable optimization possibly.

At that point you would possibly rather explicitly check: "are we at the end of a variable range (if CV) and the refcount of the object == 1?" and eventually forward that information to userland as optional third param.
Which would give you the most optimization possible; and even applicable in the example Nikita provided with sets.

Example:
function or($a, $b, $mutable = OVERLOAD::NO_REUSE) {
switch ($mutable) {
case OVERLOAD::REUSE_OP1_OP2:
case OVERLOAD::REUSE_OP1:
$a->add($b);
return $a;
case OVERLOAD::REUSE_OP2:
$b->add($a);
return $b;
default:
$a = clone $a;
$a->add($b);
return $a;
}
}

Bob

9 years ago by Sara Golemon — view source

unread

I think this RFC is attempting to solve the wrong problem... Let me explain why:

a) What do you do in cases like:
$a = gmp_init(125);
$b = $a;
$b += 10;

$a and $b hold the same object reference, so we'd need to do an implicit clone
(separation) before passing it to the assign ops, or we are going to break the
immutable-value-object assumption.

I do understand that. My point is that the immutable-value-object
assumption shouldn't be forced on every class which chooses to
implement overloading. It makes sense for GMP, so GMP should
certainly be allowed to implement it, but GMP's use case isn't
everyone's use case.

That risks to be be problematic. As long as it is internal, we could deal with it internally.
When we leak operator overloading to userland, not so much.

That's a big IF, but I agree we should have a mind to that leak being
someday possible and not shoot our future selves in the foot.

b) The main goal is to not needing copies when unnecessary. This could then as well apply to simple cases like:

$a = gmp_init(125);
$b = $a + 10;
/* $a isn't used anymore later */
or just explicit:
$a = $a + 10; # instead of $a += 10; // under some circumstances that may be more readable...

I'm confused. Are you saying that avoiding copies is the main goal of
the RFC? Because no, it's not. Being able to take consistent when
implementing overloading is the main goal. An occasional win from not
having to clone is just a side benefit.

Or are you saying that avoiding copies is a benefit of what we have
now, because it's not. What we have now is clone-always, regardless
of self-assignment.

At that point you would possibly rather explicitly check: "are we at the end of a variable range
(if CV) and the refcount of the object == 1?" and eventually forward that information to
userland as optional third param.
Which would give you the most optimization possible; and even applicable in the example
Nikita provided with sets.

I think that approach is piling a whole lot of complexity on for
minimal theoretical benefit.

Again, this RFC is about correctness, not performance.

-Sara

9 years ago by Bob Weinand — view source

unread

Am 07.01.2016 um 20:29 schrieb Sara Golemon pollita@php.net:

I think this RFC is attempting to solve the wrong problem... Let me explain why:

a) What do you do in cases like:
$a = gmp_init(125);
$b = $a;
$b += 10;

$a and $b hold the same object reference, so we'd need to do an implicit clone
(separation) before passing it to the assign ops, or we are going to break the
immutable-value-object assumption.

I do understand that. My point is that the immutable-value-object
assumption shouldn't be forced on every class which chooses to
implement overloading. It makes sense for GMP, so GMP should
certainly be allowed to implement it, but GMP's use case isn't
everyone's use case.

That risks to be be problematic. As long as it is internal, we could deal with it internally.
When we leak operator overloading to userland, not so much.

That's a big IF, but I agree we should have a mind to that leak being
someday possible and not shoot our future selves in the foot.

Absolutely.

b) The main goal is to not needing copies when unnecessary. This could then as well apply to simple cases like:

$a = gmp_init(125);
$b = $a + 10;
/* $a isn't used anymore later */
or just explicit:
$a = $a + 10; # instead of $a += 10; // under some circumstances that may be more readable...

I'm confused. Are you saying that avoiding copies is the main goal of
the RFC? Because no, it's not. Being able to take consistent when
implementing overloading is the main goal. An occasional win from not
having to clone is just a side benefit.

Or are you saying that avoiding copies is a benefit of what we have
now, because it's not. What we have now is clone-always, regardless
of self-assignment.

Hmm, re-read the RFC right now. Must have misunderstood something… you are saying the operations should apply to the objects behind and not to the variables?
I think that's a bad idea.

It's fine to optimize clones away, but I think having $a += $b; mean anything else (result-wise) than $a = $a + $b; is a bad idea (see below for why).

At that point you would possibly rather explicitly check: "are we at the end of a variable range
(if CV) and the refcount of the object == 1?" and eventually forward that information to
userland as optional third param.
Which would give you the most optimization possible; and even applicable in the example
Nikita provided with sets.

I think that approach is piling a whole lot of complexity on for
minimal theoretical benefit.

Again, this RFC is about correctness, not performance.

I'd like to emphasize that by-object is everything, but not "by reference like" behavior.
There's a big difference between:

$a = gmp_init(125);
$a = $b; $b += 2;
// or
$a = &$b: $b += 2;

It would be a mistake to make both behaviors equal. After all, I consider operator overloading a static operation on two items yielding a new result.

Another quirk here is:
$a = gmp_init(125);
$b = 5;
$c = $b;
$b += $a;
// compared to
$a = 5;
$b = gmp_init(125);
$c = $b;
$b += $a;

in first example (obviously!) we get $b != $c, but in the second one $b == $c with this RFC. Especially with GMP, we try to have faked by-value (which is very nice for pseudo-scalars).
It'd basically make this impossible now.

So, the big question boils down to: Is $a += $b; something different to $a = $a + $b; ? (As you say in the RFC.)
My answer is no. Objects are not references. Operators operate on the variables, not on the values.

Additional bonus point: This RFC requires that userland operator overloading will alter $this based on the operand passed in. But maybe we'll decide on another RFC promoting static functions with two operands (assuming binary operator) instead.
In the latter case you might return in a div function of a Integer class an instance of Fractional instead. Do you then think this should happen:
$a = new Integer(3);
$b = $a;
$a /= 2;
$b instanceof Fractional === true??

I don't think so (that would be a very clear violation of by-object anyway).

Hence, can we please delay further discussion until we really will have decided on the exact semantics of userland operator overloading?

Bob