Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:115658
User-Agent: Cyrus-JMAP/3.5.0-alpha0-552-g2afffd2709-fm-20210805.001-g2afffd27
Mime-Version: 1.0
Message-ID: <94696d46-c4e6-406a-b859-89144bff31bf@www.fastmail.com>
In-Reply-To: 
 <CAMrTa2E-pV+xP+3NJ=zd0S7N2agVRGiALaaJCBUaM=UwETEArw@mail.gmail.com>
References: 
 <CAMrTa2EO-5nyoCYkHhMVxQ4nucRoHJj0=M=W22kUaW0jpHkHLg@mail.gmail.com>
 <f01eed58-6ccb-91a9-16d0-0f0c4e97d98a@gmail.com>
 <CAMrTa2E-pV+xP+3NJ=zd0S7N2agVRGiALaaJCBUaM=UwETEArw@mail.gmail.com>
Date: Sat, 07 Aug 2021 17:28:44 -0500
To: "php internals" <internals@lists.php.net>
Content-Type: text/plain
Subject: Re: [PHP-DEV] Revisiting Userland Operator Overloads
From: larry@garfieldtech.com ("Larry Garfield")

On Sat, Aug 7, 2021, at 3:07 PM, Jordan LeDoux wrote:
> > a) Treating operators as arbitrary symbols, which can be assigned any
> operation which makes sense in a particular domain.
> > b) Treating operators as having a fixed meaning, and allowing custom
> types to implement them with that meaning.
> 
> I think this is the core design choice that will affect how the
> implementation is approached, and having some good discussion around it
> before I got into the implementation was the goal of this thread. :) Jan's
> proposal for 8.0 fell more into the a) category with each symbol being
> given an independent, unrelated, and unopinionated override. That RFC very
> nearly passed, the vote was 38 for and 28 against.
> 
> My one hesitation in pushing for a b) type implementation right now (which
> I favor slightly personally) is that the basic math operators do have very
> different meanings between arithmetic, matrix/vector math, and complex
> numbers, all of which are in the same domain of "math". Granted, only
> objects which represent a number valid for arithmetic could also be used
> with other math functions in PHP (such as the sqrt() or cos() functions).
> However, they are definitely use cases that are well treaded in userspace
> code and libraries.
> 
> Complex numbers, for example, couldn't implement a __compare() function at
> all, as they don't have any consistent and sensical definition of "greater
> than" or "less than". This means that if an object represented a complex
> number, the following code would be perhaps unexpected to some:
> 
> if (10 < $complex) {
>     // Never gets here
> }
> 
> if (10 > $complex) {
>     // Never gets here
> }
> 
> if (10 == $complex) {
>     // Never gets here (!!)
> }
> 
> $comparison = 10 <=> $complex; // Nonsensical, should throw an exception
> 
> So while I tend to lean more towards a b) type implementation myself, even
> within that I understood there to be some non-trivial considerations.
> "Numbers" in PHP are obviously real numbers, instead of matrices or
> complex, so all previous semantics of operators and math functions would
> reflect that. To me, an ideal implementation of operator overloading would
> be both:
> 
> 1. Flexible about the contextual meaning of a given operator.
> 2. Somewhat opinionated about the semantical meaning of an operator.
> 
> This is obviously challenging to accomplish, which is why I'm leaving
> myself nearly a whole year for discussion and implementation. I don't want
> to do this quickly and end up with something that gets accepted because we
> want some form of operator overloading, or something that gets rejected
> again despite putting in a great deal of work.
> 
> Jordan

Side note: Please remember to bottom-post.

I think Rowan's breakdown is a bit too pessimistic and binary.  There are definitely different possible ways to interpret operator overloading, but IMO there is a reasonable middle-ground.

At one end is the most restrictive, which would be clustering all "related" overloads together.  That would be something like this:

interface Arithmetic {
  public function __add($arg);
  public function __subtract($arg);
  public function __multiply($arg);
  public function __delete($arg);
}

The intent of clustering like that would be to "force" developers to use it only on number-like things.  However, I believe that has a number of problems.

1) What is a number-like thing?  How number-ish does it have to be?

As an example here, time units.  Adding two hour:minute time tuples together to get a new time (wrapping at the 24 hour mark) is an entirely reasonable thing to do.  But multiplication and division on time doesn't make any sense at all.  Or, maybe it does but only with ints (2:30 * 3 = 7:30?), kind of, but certainly not on the same type.  I'm sure we could come up with an infinite number of cases where one or more arithmetic operations are entirely reasonable and well-defined, but others are not.

2) We know from experience that it doesn't work.

PHP already has ArrayAccess, which has four methods.  It's extremely common for people to implement ArrayAccess and stub out some of the methods with exceptions because they don't make sense in context.  I've seen it a bunch, and I've done it a bunch myself  ArrayAccess is, basically, operator overloading for four different operators: [], [$key], isset(), and unset().  But plenty of use cases exist for wanting to do only some of those (eg, a read-only map so stub out unset and offsetSet()), and generally speaking, developers have responded to that conundrum by saying "screw it, Exceptions for everybody!"

If we went with a combined interface, I am 100% certain we would see people implementing Arithmetic and throwing exceptions from __multiply() and __divide().

At the other extreme is arbitrary operator definition a la C++.  That would look something vaguely like:

class Foo {
  public function __override(+)($arg);
}

That would give the most flexibility to the developer.  On the one hand, this appeals to me greatly as within 30 seconds of it passing I would personally release an interface like this:

interface Monad {
  public function __override(>>=)(callable $arg): static;
}

And a few more along similar lines.  The downside is that 30 seconds after that, 15 other libraries would do the same in subtly incompatible ways, and then both Laravel and Symfony would release their own that are incompatible with each other, and it would just be a total mess because you would have NFI what any given operator is going to do.  Then FIG would try to define a few to standardize the madness, would take about 10-12 months to do so, but both Symfony and Laravel would go on using their own instead because they're big enough that they can do that, and we'll have a mess basically forever.  That is what my crystal ball tells me would happen.

So while this approach appeals to me personally, I think in the long run it's probably a bad idea.  My understanding is that many people consider C++'s adoption of this approach a mistake, although I'm not a C++ developer so cannot speak from first hand experience.

The middle-ground is to give each overridable operator a dedicated named method:

interface Addable {
  public function __add($arg);
}

That way, people can opt-in to whatever meaning of "add" they want, but it still means that + always must mean "a method called add()".  That provides some guidelines as to what you should do with an operator (if you implement _add() and have it return an object that contains less of something, there's a very strong argument that you're just being stupid and your code is bad), and precludes competing custom operators like >>, >>=, etc.  (Much as I would love to make use of them.)

This approach also has precedent in PHP, with, I would argue, far greater success than mega-interfaces.  Countable and Traversable are very often implemented together.  However, they do not have to be.  Sometimes you have something iterable that is uncountable (infinite list, lazy list, etc.), or something countable that it doesn't make sense to foreach() over.  So you opt-in to whichever bits make sense.  You could also separately opt-in to ArrayAccess, which sometimes also makes sense and sometimes not.

I would argue that the micro-interface approach has a far better success rate in PHP, especially when it comes to "magic" behavior/engine hooks.  If we're going to adopt operator overloading, that is the safest middle-ground to take.

(Similarly, there's nothing that forces someone to return an actual count from Countable::count().  It has to be an int, but the language would happy let your return random_int() if you wanted.  But the vast majority of the time people use it responsibly and return an int that makes logical sense in context.)

We also have the advantage now of both union types and intersection types.  That means if you want to allow your object to add itself, or some other type, and behave differently, you can easily do so by defining __add(Foo|string $other) and tossing a match() statement into your method body.  (Side note: Pattern matching would make that even better.)  Anything you don't explicitly allow just type errors for you already.

Conversely, if you want to accept an object that is addable, subtractable, and comparable, you can type it exactly like that:

function foo(Addable&Subtractable&Comparabie $var) {}

So the updated type system makes one-off interfaces a lot easier and more practical to work with than in the past.

To be fair, this approach would not prevent weirdos like me from implementing __add() and using it as a Monadic bind operator or something silly like that.  However, I believe experience has shown that a combined Arithmetic interface wouldn't stop me from doing silly things either, given experience with ArrayAccess.

That leaves four remaining questions, which apply in any of the above cases:

1) What operators do we build in overloading for?  I think there are six to start with: The 4 arithmetic operators, concat, and compare.  compare should be essentially an internalized version of the custom sort function passed to usort() and friends.  The others are reasonably self-explanatory.

An interesting possibility I just realized as I was writing this is using bitwise operator overloading in combination with Enums.  Would that be "good enough" for enum sets?

enum FileAccess: int implements Andable, Orable {
  case Execute = b1;
  case Read = b10;
  case Write = b100;
  case ReadExecute = b11;
  case WriteExecute = b101;
  case ReadWrite = b110;
  case All = b111;

  public function __and(FileAccess $other): FileAccess {
    return self::from($this->value & $other->value);
  }

  public function __or(FileAccess $other): FileAccess {
    return self::from($this->value | $other->value);
  }
}

I don't know if I like that or not, but it's an interesting thought.  I'm not sure if negation makes sense to overload.  Once the basic pattern is established we could likely add new operators individually fairly easily.

2) None of these approaches resolves the commutability problem.  There is no guarantee that $a + $b === $b + $a, if $a or $b are objects that implement Addable.  I suspect that problem is fundamentally intractable, and if we want operator overloading we'll just have to suck it up and accept that we cannot guarantee that is always the case.  For some that may be a fatal problem, which is fair.  It's not a fatal problem for me, personally.

3) Should the methods in question be dynamic or static?  In my mind, the only argument for static is that it makes it more likely that they'll be implemented in an immutable way, viz, you'll return a new instance of the object rather than modifying either $this or $other.  However, there is no guarantee of that at all.  A static method has just as much access to private variables of its own class as a normal method does, so nothing would prevent a static method from modifying one or both of its operands even if we say not to.  That's the same as for a normal method.  I think the best we can do in either case is to document "please please don't modify the object in place" and move on.  For that reason I would favor a normal method, as a static method just makes things more complicated.

4) What if any type enforcement should the language force?  Eg, should __add() be required to return static, or do we leave that up to the implementer, as there are likely use cases we're not thinking of?  If the engine can handle it I would favor following the pattern of __invoke(): Let the implementer do whatever it wants for both params and return, but an interface can mandate __invoke() (or __add()) with certain parameter and return types if it wants.

--Larry Garfield