Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116633 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 75494 invoked from network); 13 Dec 2021 06:16:34 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 13 Dec 2021 06:16:34 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id E7F5E18053B for ; Sun, 12 Dec 2021 23:18:11 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 12 Dec 2021 23:18:11 -0800 (PST) Received: by mail-lj1-f176.google.com with SMTP id bn20so22339633ljb.8 for ; Sun, 12 Dec 2021 23:18:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=uG1FUsTruRM4ZsvtSdRvhxQSd+vXhm4McbzuDFnX+9A=; b=B97Gj2ttZYFyTIW8TYNGnRr4nkJsiNcHnyntZtPInnymwJxHoF8wx7yUuraqqOwLVf I+XBY4/vMX8IAkq9vIhzT2aZS5xK1TdakaAxBhkfJcqBSg8sTdrJI3mrLUKs9GXBqne9 9AhvBbrpcRJ+AtpBTOZdL8pJjK0E6Bi4hSh8uRhWezTrFfFuyG7m2LEsBvWP29yzXOyf CtmcohfwNhqukzogvmLKkCSpW7e2bl8CA5wFX/dQ+ANMOTHBCSJInaqTIMOs7yeVtvNf LTmJxwmf99niblwFS7n5xpAOdWCAfsnl0uw/ATB9YrTpnksPYTJJYUsWy7pnsNJLuoCX QxBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=uG1FUsTruRM4ZsvtSdRvhxQSd+vXhm4McbzuDFnX+9A=; b=aznRDs0cd37UYgmPoeSf4EjA8MfMPn3Q6UjnRb53ciN1R6MHnXhYh1AUqUfRR07naB /bMFga2KSGYIHuKO0FmX2U8Y+pSofSLeLfX/Ox2suUICE79VUM2Rtx8QZ7sMuy1ZOlJJ lkE/OAD96sS30CASt51z04koS/nz1+3SjdbLdzYP0Hee4nAOhtstleD/d0dAf6bE29gb SfUvfpx/xcVzLPyoMp83FuH19126opAvzYaRh6aMYAgWnrWLAklODCSWiUnkdUYZNK5X ZnWgfb1sxcSo/scaUEh3mm3J02NfoH6ejW7Agob9gvomMYS5w2RzOtos8XPx1iLKXvDJ KKLQ== X-Gm-Message-State: AOAM532PKzZfr7n+ZoUj0G9uNaJLGpEjzW2HyxYV5Bg3fQI2W6RVqmlW rznH7DEGIyRUxzxAFflP7+6TmqOuiEb3DM1gjVDRo/OuKuk= X-Google-Smtp-Source: ABdhPJzf5u/WfU3bgRG0uawpbjThBzcmFJaOWxOYIHNypXLnbgERsQ3VLZtB+Kj57y9ErgYT6OvNSqtRkGIvJogIoW0= X-Received: by 2002:a2e:9787:: with SMTP id y7mr27326125lji.228.1639379889381; Sun, 12 Dec 2021 23:18:09 -0800 (PST) MIME-Version: 1.0 References: <7126a5cb-fdaf-4e50-b8af-7d95965d1125@www.fastmail.com> In-Reply-To: Date: Sun, 12 Dec 2021 23:17:59 -0800 Message-ID: To: php internals Content-Type: multipart/alternative; boundary="000000000000f5652005d301dfe9" Subject: Re: [PHP-DEV] [RFC] User Defined Operator Overloads (v0.6) From: jordan.ledoux@gmail.com (Jordan LeDoux) --000000000000f5652005d301dfe9 Content-Type: text/plain; charset="UTF-8" Danack wrote: > btw, I don't really care about this naming problem. My concern is that > it's being used as a reason for introducing a special new type > function, when it's really not a big enough problem to deserve making > the language have special new syntax. > > Danack wrote: > I think you've taken the position that using the symbols are cool, and > you're reasoning about how the RFC should operate from decision. Ah, I see. That's a more fundamental objection than the technicals, I think. It sort of implies that any arguments I provide are justifications rather than arguments, which makes it difficult to have a productive conversation about it. You expressed a similar concern about your efforts to present arguments to me, which makes sense if this is your fundamental concern. First, let me start off by saying that I fully acknowledge and document in the RFC that it is possible to provide a perfectly workable version of *this* RFC without the operator keyword. I mention as much in the RFC. If that is a true blocker for voters, I would at least consider it. However, I do believe that's the incorrect decision. Not because it's "cool". The code that handles the parsing of the new keyword is the only part of this RFC that I didn't write from scratch, it was contributed by someone more familiar with the parser. I feel like I could hardly have the "coolness" of the work being my motivating factor when I did not in fact write that part of the code. But I do understand the concern. Adding complexity without reason is foolish, particularly on a project that impacts many people and is maintained by volunteers. As I immediately told you, I don't think your concern is without merit, and I don't think it's something that should be dismissed. But I clearly have (still) done a poor job communicating what I perceive as the factors that outweigh this concern. It's not that I think the concern is invalid or that it's small, it's that I view other things as being an acceptable tradeoff. So I'll attempt one more time to communicate why. # Forwards Compatibility Other replies have touched on this, and the RFC talks about this too, but perhaps the language used has been skipping a couple of steps. This is, by far, the biggest driving factor for why I believe the operator keyword is the correct decision, so I will spend most of my time here. There are two main kinds of forward compatibility achieved with a new keyword that are difficult to achieve with magic methods: compatibility with arbitrary symbol combinations, and behavior modifiers that can be scoped only to operators. You mention that the symbols could be replaced with their symbol names in english, which avoids the issue of misnaming the functions. But this would still require the engine to specifically support every symbol combination that is allowed. Now, in this RFC I am limiting overloads to *not only* symbols which are already used, but to a specific subset of them which are predetermined. This is for several reasons: 1. The PHP developer community will have no direct experience with operator overloads unless they have experience with another language such as C# or python which supports them. Giving developers an initial set of operators that address 90% of use cases but are limited allows the PHP developer community time to learn and experiment with the feature while avoiding some of the most dangerous possible misuses, such as accidentally redefining the && operator in a way that breaks boolean algebra. 2. This reduces the change necessary to the VM, to class entries, and to the behavior of existing opcodes. This PR is already very large, and I wanted to make sure that it wasn't impossible for the people who participate here on their own time to actually consider the changes being suggested. 3. I am already aware of several people within internals that believe any version of this feature will result in uncontrolled chaos in PHP codebases. I think this is false, as I do not see that kind of uncontrolled chaos in the languages which do have this feature. However I would think that allowing arbitrary overloads would increase that proportion. 4. This is limited to operator combinations with objects, which *ALL* currently result in an error. That means there is no code that was working on PHP 8.1 that will break with this included, as all such code currently results in a fatal error. The current error is even the parent class of the error *after* this RFC, so even the catch blocks, if they currently exist in PHP codebases, should continue to work as before. However, once a feature is added it is very difficult to change it. Not only for backward compatibility reasons, but for the sheer inertia of the massive impact that PHP has. I do not plan on ever proposing that arbitrary symbol combinations be allowed for overloads myself. But I cannot possibly know what internals might think of that possibility 10 years from now when this feature has been in widespread usage for a long time. Using magic methods makes it extremely difficult at *any* point in the future to allow PHP developers the option of an overload for say +=+. What would such a magic method be? __plus_equals_plus()? With some kind of magic in the compiler to rename symbols in certain circumstances? That sounds far *less* maintainable to me. It seems more likely that even if it were a desired feature 10 years from now, it would be something that would be extremely difficult to implement, maintain, and pass. I also elaborate in the RFC as to why I think allowing operator specific method modifiers is a very powerful bit of forwards compatibility as well. Method modifiers simply result in a change to the function flags mask, which is an extremely low cost lookup, which makes it very easy to implement such features in the future if they are desired. I want to make sure that once included, this feature doesn't result in a dead-end implementation that boxes internals out of improvements that can be made moving forward. I think that this is something that is far easier to do with the operator keyword than it is with magic methods. # Code That Promotes Correct Usage Enums, as an example, are classes. Internally, they are classes in most respects. So why is a new keyword for enums useful? Not only for many of the same reasons listed above, but also because it is *useful* for the language to communicate to the developer that a certain thing should be treated differently, even if it shares a syntax. The fact that PHP developers can *see* that enums are different from classes in their code is not a trivial and unimportant matter. In the same way, operator overloads are methods. Internally, they are methods in most respects. But it is *useful* for the language to communicate that *these* methods will change engine behavior. It is *useful* for it to communicate that they should be treated differently. The fact that PHP developers will be able to see that operators are different from methods will help avoid some of the concerns people have with misuse. It will communicate that these are areas where new maxims and new habits should apply, that new things must be learned and new rules followed. This may seem like such an esoteric suggestion to some, but it follows from an entire field of study: human-centered design. This is a rigorous field which explores how technology can be *designed* to be used correctly. # Acceptance Of Restrictions We can, of course, place restrictions on how operator overloads are used when we are concerned about causing trouble. But such restrictions will generate frustration and opposition in some circumstances. Enums are another great example. Methods on enums are simply not allowed to do things that will mutate the object. The engine simply prohibits it. This makes a lot of sense for enums, but would such restrictions be possible if enums were simply classes which have cases within them? Technically, certainly it would be possible. But while I do not hear a lot of PHP developers complaining about having method behavior restricted in enums, I expect that there would be a lot of this unnecessary noise if instead PHP developers saw them as "classes which have cases". The fact that they are marked as a distinct construct simply makes such restrictions make more sense to the people who use them. These are engine hooks. People should not be shoving lots of other logic into operator overloads. They should always be returning a result, they should nearly always be implemented immutably, they should document the logic of interaction with the given operator and nothing more. They *shouldn't* be directly called, because they should not contain the kind of logic that you *want* to directly call. One of these restrictions that I included in this RFC was that typing the parameters is not optional. This is extremely useful for operator overloads, because you must document all the types that your implementation understands how to interact with, and the engine will simply not allow for undetermined or uncertain values to be handled. This restriction would feel very out of place to many in a function, because other PHP functions do not behave this way. But for a new thing, with a new keyword that marks itself as something separate? Well now it makes sense. New things have their own rules. Just like the restrictions on enum classes. -- I think these things outweigh the cost of adding a new keyword, particularly a new keyword that is limited only to the class definition and that has behavior and syntax that is substantially similar to something developers are already familiar with. I truly believe this is the better way of doing this feature, I would not suggest it otherwise. And while an implementation that doesn't include this is possible and workable, I feel it is suboptimal and limiting. I feel that it is more likely to result in problematic usage, complaints, and buggy code from PHP developers. This new keyword required very minimal changes to the parser, and no changes to the compiler. I think this is an acceptable tradeoff for the benefits it brings. That is the reason that I am arguing for it, and no other reason. I'm sorry if it seems like I am not listening to what you are saying. That is not the case, I take the feedback of others on this list very seriously. It's just that you haven't yet brought up a point which I haven't considered and personally decided was worth the benefits. I agree this will result in changes for tooling. I accept that those changes will be larger with a new keyword. I do not think that it is worth delivering an inferior version of this feature that is more prone to error and misuse, and is more restricted in future scope. Jordan --000000000000f5652005d301dfe9--