Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:116633
MIME-Version: 1.0
References: <CAMrTa2FB-EVGq29oLCU8fYkMYi+KsQiEAFv1xxMFO7vnm8e6=Q@mail.gmail.com>
 <CA+kxMuQqGpkuqwFLrW29G38aOUNbvXM6Dq=P8NXt=1zai_=zxA@mail.gmail.com>
 <7126a5cb-fdaf-4e50-b8af-7d95965d1125@www.fastmail.com> <CA+kxMuROkHkVVWd0AFDweQBBNqXCkPmGqJVggGduDF6=KyZEgg@mail.gmail.com>
In-Reply-To: <CA+kxMuROkHkVVWd0AFDweQBBNqXCkPmGqJVggGduDF6=KyZEgg@mail.gmail.com>
Date: Sun, 12 Dec 2021 23:17:59 -0800
Message-ID: <CAMrTa2Gkra=yA9x+C6mLCTmGSkxn__PCzAwsBxO9Uyk2meXocA@mail.gmail.com>
To: php internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary="000000000000f5652005d301dfe9"
Subject: Re: [PHP-DEV] [RFC] User Defined Operator Overloads (v0.6)
From: jordan.ledoux@gmail.com (Jordan LeDoux)

--000000000000f5652005d301dfe9
Content-Type: text/plain; charset="UTF-8"

Danack wrote:

> btw, I don't really care about this naming problem. My concern is that
> it's being used as a reason for introducing a special new type
> function, when it's really not a big enough problem to deserve making
> the language have special new syntax.
>
>
Danack wrote:
> I think you've taken the position that using the symbols are cool, and
> you're reasoning about how the RFC should operate from decision.

Ah, I see. That's a more fundamental objection than the technicals, I
think. It sort of implies that any arguments I provide are justifications
rather than arguments, which makes it difficult to have a productive
conversation about it. You expressed a similar concern about your efforts
to present arguments to me, which makes sense if this is your fundamental
concern.

First, let me start off by saying that I fully acknowledge and document in
the RFC that it is possible to provide a perfectly workable version of
*this* RFC without the operator keyword. I mention as much in the RFC. If
that is a true blocker for voters, I would at least consider it. However, I
do believe that's the incorrect decision. Not because it's "cool". The code
that handles the parsing of the new keyword is the only part of this RFC
that I didn't write from scratch, it was contributed by someone more
familiar with the parser. I feel like I could hardly have the "coolness" of
the work being my motivating factor when I did not in fact write that part
of the code.

But I do understand the concern. Adding complexity without reason is
foolish, particularly on a project that impacts many people and is
maintained by volunteers. As I immediately told you, I don't think your
concern is without merit, and I don't think it's something that should be
dismissed. But I clearly have (still) done a poor job communicating what I
perceive as the factors that outweigh this concern. It's not that I think
the concern is invalid or that it's small, it's that I view other things as
being an acceptable tradeoff. So I'll attempt one more time to communicate
why.

# Forwards Compatibility

Other replies have touched on this, and the RFC talks about this too, but
perhaps the language used has been skipping a couple of steps. This is, by
far, the biggest driving factor for why I believe the operator keyword is
the correct decision, so I will spend most of my time here.

There are two main kinds of forward compatibility achieved with a new
keyword that are difficult to achieve with magic methods: compatibility
with arbitrary symbol combinations, and behavior modifiers that can be
scoped only to operators. You mention that the symbols could be replaced
with their symbol names in english, which avoids the issue of misnaming the
functions. But this would still require the engine to specifically support
every symbol combination that is allowed.

Now, in this RFC I am limiting overloads to *not only* symbols which are
already used, but to a specific subset of them which are predetermined.
This is for several reasons:

1. The PHP developer community will have no direct experience with operator
overloads unless they have experience with another language such as C# or
python which supports them. Giving developers an initial set of operators
that address 90% of use cases but are limited allows the PHP developer
community time to learn and experiment with the feature while avoiding some
of the most dangerous possible misuses, such as accidentally redefining the
&& operator in a way that breaks boolean algebra.
2. This reduces the change necessary to the VM, to class entries, and to
the behavior of existing opcodes. This PR is already very large, and I
wanted to make sure that it wasn't impossible for the people who
participate here on their own time to actually consider the changes being
suggested.
3. I am already aware of several people within internals that believe any
version of this feature will result in uncontrolled chaos in PHP codebases.
I think this is false, as I do not see that kind of uncontrolled chaos in
the languages which do have this feature. However I would think that
allowing arbitrary overloads would increase that proportion.
4. This is limited to operator combinations with objects, which *ALL*
currently result in an error. That means there is no code that was working
on PHP 8.1 that will break with this included, as all such code currently
results in a fatal error. The current error is even the parent class of the
error *after* this RFC, so even the catch blocks, if they currently exist
in PHP codebases, should continue to work as before.

However, once a feature is added it is very difficult to change it. Not
only for backward compatibility reasons, but for the sheer inertia of the
massive impact that PHP has. I do not plan on ever proposing that arbitrary
symbol combinations be allowed for overloads myself. But I cannot possibly
know what internals might think of that possibility 10 years from now when
this feature has been in widespread usage for a long time. Using magic
methods makes it extremely difficult at *any* point in the future to allow
PHP developers the option of an overload for say +=+. What would such a
magic method be? __plus_equals_plus()? With some kind of magic in the
compiler to rename symbols in certain circumstances?

That sounds far *less* maintainable to me. It seems more likely that even
if it were a desired feature 10 years from now, it would be something that
would be extremely difficult to implement, maintain, and pass.

I also elaborate in the RFC as to why I think allowing operator specific
method modifiers is a very powerful bit of forwards compatibility as well.
Method modifiers simply result in a change to the function flags mask,
which is an extremely low cost lookup, which makes it very easy to
implement such features in the future if they are desired. I want to make
sure that once included, this feature doesn't result in a dead-end
implementation that boxes internals out of improvements that can be made
moving forward. I think that this is something that is far easier to do
with the operator keyword than it is with magic methods.

# Code That Promotes Correct Usage

Enums, as an example, are classes. Internally, they are classes in most
respects. So why is a new keyword for enums useful? Not only for many of
the same reasons listed above, but also because it is *useful* for the
language to communicate to the developer that a certain thing should be
treated differently, even if it shares a syntax. The fact that PHP
developers can *see* that enums are different from classes in their code is
not a trivial and unimportant matter.

In the same way, operator overloads are methods. Internally, they are
methods in most respects. But it is *useful* for the language to
communicate that *these* methods will change engine behavior. It is
*useful* for it to communicate that they should be treated differently. The
fact that PHP developers will be able to see that operators are different
from methods will help avoid some of the concerns people have with misuse.
It will communicate that these are areas where new maxims and new habits
should apply, that new things must be learned and new rules followed.

This may seem like such an esoteric suggestion to some, but it follows from
an entire field of study: human-centered design. This is a rigorous field
which explores how technology can be *designed* to be used correctly.

# Acceptance Of Restrictions

We can, of course, place restrictions on how operator overloads are used
when we are concerned about causing trouble. But such restrictions will
generate frustration and opposition in some circumstances. Enums are
another great example. Methods on enums are simply not allowed to do things
that will mutate the object. The engine simply prohibits it. This makes a
lot of sense for enums, but would such restrictions be possible if enums
were simply classes which have cases within them? Technically, certainly it
would be possible. But while I do not hear a lot of PHP developers
complaining about having method behavior restricted in enums, I expect that
there would be a lot of this unnecessary noise if instead PHP developers
saw them as "classes which have cases".

The fact that they are marked as a distinct construct simply makes such
restrictions make more sense to the people who use them.

These are engine hooks. People should not be shoving lots of other logic
into operator overloads. They should always be returning a result, they
should nearly always be implemented immutably, they should document the
logic of interaction with the given operator and nothing more. They
*shouldn't* be directly called, because they should not contain the kind of
logic that you *want* to directly call.

One of these restrictions that I included in this RFC was that typing the
parameters is not optional. This is extremely useful for operator
overloads, because you must document all the types that your implementation
understands how to interact with, and the engine will simply not allow for
undetermined or uncertain values to be handled. This restriction would feel
very out of place to many in a function, because other PHP functions do not
behave this way. But for a new thing, with a new keyword that marks itself
as something separate? Well now it makes sense. New things have their own
rules. Just like the restrictions on enum classes.

--

I think these things outweigh the cost of adding a new keyword,
particularly a new keyword that is limited only to the class definition and
that has behavior and syntax that is substantially similar to something
developers are already familiar with. I truly believe this is the better
way of doing this feature, I would not suggest it otherwise. And while an
implementation that doesn't include this is possible and workable, I feel
it is suboptimal and limiting. I feel that it is more likely to result in
problematic usage, complaints, and buggy code from PHP developers.

This new keyword required very minimal changes to the parser, and no
changes to the compiler. I think this is an acceptable tradeoff for the
benefits it brings. That is the reason that I am arguing for it, and no
other reason. I'm sorry if it seems like I am not listening to what you are
saying. That is not the case, I take the feedback of others on this list
very seriously. It's just that you haven't yet brought up a point which I
haven't considered and personally decided was worth the benefits. I agree
this will result in changes for tooling. I accept that those changes will
be larger with a new keyword. I do not think that it is worth delivering an
inferior version of this feature that is more prone to error and misuse,
and is more restricted in future scope.

Jordan

--000000000000f5652005d301dfe9--