Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125598 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 7685D1A00BD for ; Tue, 17 Sep 2024 18:16:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1726597100; bh=S3k9BjaXmXDIkG/BCGoO198aHZun0EjC5zmxRtUF/pk=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=b99ZbxCp16uTeoIjveS8M5lfd8fbD89Wjhc+JabmUMe3YN6gd/khxa3R6+8A8+hMT c995TRYb1AklInKiR9eK0SEbze/HTQiY6ipfsNEwbtebqOBVT8RVRlTvcj695JDFtD M3DAN15YKd6GTNQNXFM8mQAmqnzBa24zXx13G54hvJGLDJUTiQeoj1hmOlsL7CVvA7 XHcKd4vmIynLi5pIZMkwqgTCVoqkB562F3rzGXgSK6pXLFyMH1oe07eVIbQROV7ONW opEmIiI+1rP7QZTYwH5hUzIGCnzOJG4ObGnpSJlwjyxBmAEbwP8B0REr0kVwb5/bKe Nb2Dy2r0uOHlw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A0A44180077 for ; Tue, 17 Sep 2024 18:18:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from pv50p00im-zteg10021401.me.com (pv50p00im-zteg10021401.me.com [17.58.6.47]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 17 Sep 2024 18:18:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daveyshafik.com; s=sig1; t=1726596967; bh=nWBQKkEO9LJRWi6QKYbJ2tg+p5tFtd13jfwOll8wFcA=; h=From:Message-Id:Content-Type:Mime-Version:Subject:Date:To; b=KQKsCVPTHKvCTrdIsRgKX8A3dMnm9waOVWKgNN9LFygZBjLUCivzZ04WxEwJoH8Bb 7gEYAKaXWOaSjpz0vZQAyYwKPgRkTP9W1/scV+LOLFxYacHd6WDuEB/MBq+ShdDKYW VbtQ5hGzR1QMa6U+OY/jFaPdMmRccXcv/Ns/GSagKoi3uDRsZx++AgkmcSrmqbxPks v2rpC5ep90Toc/MtQ2jfo/tnHu4Z/jBuMmBnyA1sDglCFcel6TbFWgxdbOyHfx66Rv D/wfYd2/3e+wkPxWHxRtMWZ/qitahIGWkSZEKD2qQJ15Wp0vKy5QeAOYSNyoRoISK1 3sEL1giKOAFig== Received: from smtpclient.apple (pv50p00im-dlb-asmtp-mailmevip.me.com [17.56.9.10]) by pv50p00im-zteg10021401.me.com (Postfix) with ESMTPSA id 710368E0737; Tue, 17 Sep 2024 18:16:03 +0000 (UTC) Message-ID: <4A0C727C-49F2-4333-89AE-47E79CD28C0E@daveyshafik.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_0AE8D965-FB8D-4BE1-B436-19483ADBC24B" Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Re: [PHP-DEV] [Pre-RFC Discussion] User Defined Operator Overloads (again) Date: Tue, 17 Sep 2024 11:15:51 -0700 In-Reply-To: Cc: "Rowan Tommins [IMSoP]" , internals@lists.php.net To: Jordan LeDoux References: <2551c06a-ec1f-4870-a590-aeb5752fc944@rwec.co.uk> <8C83F906-5B45-4CB9-8E6B-D85D43E74A63@daveyshafik.com> X-Mailer: Apple Mail (2.3776.700.51) X-Proofpoint-GUID: uvvij4WPzlZhfYAJv7saP8tUB6Ys7Pv7 X-Proofpoint-ORIG-GUID: uvvij4WPzlZhfYAJv7saP8tUB6Ys7Pv7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-17_08,2024-09-16_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1030 mlxlogscore=999 adultscore=0 spamscore=0 phishscore=0 bulkscore=0 malwarescore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2308100000 definitions=main-2409170130 From: me@daveyshafik.com (Davey Shafik) --Apple-Mail=_0AE8D965-FB8D-4BE1-B436-19483ADBC24B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Sep 17, 2024, at 11:11, Jordan LeDoux = wrote: >=20 >=20 >=20 > On Tue, Sep 17, 2024 at 10:55=E2=80=AFAM Davey Shafik = > wrote: >>=20 >>=20 >>> On Sep 17, 2024, at 10:15, Jordan LeDoux > wrote: >>>=20 >>>=20 >>>=20 >>> On Tue, Sep 17, 2024 at 1:18=E2=80=AFAM Rowan Tommins [IMSoP] = > wrote: >>>> On 14/09/2024 22:48, Jordan LeDoux wrote: >>>> > >>>> > 1. Should the next version of this RFC use the `operator` = keyword, or=20 >>>> > should that approach be abandoned for something more familiar? = Why do=20 >>>> > you feel that way? >>>> > >>>> > 2. Should the capability to overload comparison operators be = provided=20 >>>> > in the same RFC, or would it be better to separate that into its = own=20 >>>> > RFC? Why do you feel that way? >>>> > >>>> > 3. Do you feel there were any glaring design weaknesses in the=20 >>>> > previous RFC that should be addressed before it is re-proposed? >>>> > >>>>=20 >>>> I think there are two fundamental decisions which inform a lot of = the=20 >>>> rest of the design: >>>>=20 >>>> 1. Are we over-riding *operators* or *operations*? That is, is the = user=20 >>>> saying "this is what happens when you put a + symbol between two = Foo=20 >>>> objects", or "this is what happens when you add two Foo objects = together"? >>>=20 >>> If we allow developers to define arbitrary code which is executed as = a result of an operator, we will always end up allowing the first one. >>> =20 >>>> 2. How do we despatch a binary operator to one of its operands? = That is,=20 >>>> given $a + $b, where $a and $b are objects of different classes, = how do=20 >>>> we choose which implementation to run? >>>>=20 >>>=20 >>> This is something not many other people have been interested in so = far, but interestingly there is a lot of prior art on this question in = other languages! :)=20 >>>=20 >>> The best approach, from what I have seen and developer usage in = other languages, is somewhat complicated to follow, but I will do my = best to make sure it is understandable to anyone who happens to be = following this thread on internals. >>>=20 >>> The approach I plan to use for this question has a name: Polymorphic = Handler Resolution. The overload that is executed will be decided by the = following series of decisions: >>>=20 >>> 1. Are both of the operands objects? If not, use the overload on the = one that is. (NOTE: if neither are objects, the new code will be = bypassed entirely, so I do not need to handle this case) >>> 2. If they are both objects, are they both instances of the same = class? If they are, use the overload of the one on the left. >>> 3. If they are not objects of the same class, is one of them a = direct descendant of the other? If so, use the overload of the = descendant. >>> 4. If neither of them are direct descendants of the other, use the = overload of the object on the left. Does it produce a type error because = it does not accept objects of the type in the other position? Return the = error and abort instead of re-trying by using the overload on the right. >>>=20 >>> This results from what it means to `extend` a class. Suppose you = have a class `Foo` and a class `Bar` that extends `Foo`. If both `Foo` = and `Bar` implement an overload, that means `Bar` inherited an overload. = It is either the same as the overload from `Foo`, in which case it = shouldn't matter which is executed, or it has been updated with even = more specific logic which is aware of the extra context that `Bar` = provides, in which case we want to execute the updated implementation. >>>=20 >>> So the implementation on the left would almost always be executed, = unless the implementation on the right comes from a class that is a = direct descendant of the class on the left. >>>=20 >>> `Foo + Bar` >>> `Bar + Foo` >>>=20 >>> In practice, you would very rarely (if ever) use two classes from = entirely different class inheritance hierarchies in the same overload. = That would closely tie the two classes together in a way that most = developers try to avoid, because the implementation would need to be = aware of how to handle the classes it accepts as an argument. >>>=20 >>> The exception to this that I can imagine is something like a = container, that maybe does not care what class the other object is = because it doesn't mutate it, only store it. >>>=20 >>> But for virtually every real-world use case, executing the overload = for the child class regardless of its position would be preferred, = because overloads will tend to be confined to the core types of PHP + = the classes that are part of the hierarchy the overload is designed to = interact with. >>> =20 >>>>=20 >>>>=20 >>>> Finally, a very quick note on the OperandPosition enum: I think = just a=20 >>>> "bool $isReversed" would be fine - the "natural" expansion of = "$a+$b" is=20 >>>> "$a->operator+($b, false)"; the "fallback" is "$b->operator+($a, = true)" >>>>=20 >>>>=20 >>>> Regards, >>>>=20 >>>> --=20 >>>> Rowan Tommins >>>> [IMSoP] >>>=20 >>> This is similar to what I originally designed, and I actually moved = to an enum based on feedback. The argument was something like = `$isReversed` or `$left` or so on is somewhat ambiguous, while the enum = makes it extremely explicit. >>>=20 >>> However, it's not a design detail I am committed to. I just want to = let you know why it was done that way. >>>=20 >>> Jordan >>=20 >> To be clear: I=E2=80=99m very much in favor of operator overloading. = I frequently work with both Money value objects, and DateTime objects = that I need to manipulate through arithmetic with others of the same = type. >>=20 >> What if I wanted to create a generic `add($a, $b)` function, how = would I type hint the params to ensure that I only get =E2=80=9Caddable=E2= =80=9D things? I would expect that to be: >>=20 >> - Ints >> - Floats >> - Objects of classes with =E2=80=9Coperator+=E2=80=9D defined >>=20 >> I think that an interface is the right solution for that, and you can = just union with int/float type hints: add(int | float | Addable = =E2=80=A6$operands) (or add(int | float | (Foo & Addable) =E2=80=A6$operan= ds) >>=20 >> Is this type of behavior even allowed? I think the intention is that = it must be otherwise the decision over which overload method gets called = is drastically simplified. >>=20 >> Perhaps for a first iteration, operator overloads only work between = objects of the same type or their descendants =E2=80=94 and if a = descendant overrides the overload, the descendants version is used = regardless of left/right precedence. >>=20 >> I suspect this will simplify the complexity of the magic, and solve = the majority of cases where operator overloading is desired. >>=20 >> - Davey >=20 > The problem with providing interfaces is something the nikic addressed = very early in my design process and convinced me of: an `Addable` = interface will not actually tell you if two objects can be added = together. A `Money` class and a `Vector2D` class might both have an = implementation for `operator +()` and implement some kind of `Addable` = interface. But there is no sensible way in which they could actually be = added. Knowing that an object implements an overload is not enough in = most cases to use operators with them. This is part of the reason that I = am skeptical of people who worry about accidentally using random = overloads. >=20 > The signature for the implementation in the `Money` class, might look = something like this: >=20 > `operator +(Money $other, OperandPosition $position): Money` >=20 > while the signature for the implementation in the `Vector2D` class = might look something like this: >=20 > `operator +(Vector2D|array $other, OperandPosition $position): = Vector2D` >=20 > Any attempt to add these two together will result in a `TypeError`. >=20 > Classes which have overloads that look like the following would be = something I think developers should be IMMEDIATELY suspicious of: >=20 > `operator +(object $other, OperandPosition $position)` > `operator +(mixed $other, OperandPosition $position)` >=20 > Does your implementation really have a plan for how to `+` with a = stream resource like a file handler, as well as an int? Can you just as = easily use `+` with the `DateTime` class as you can with a `Money` class = in your implementation? >=20 > I think there are very few use cases that would survive code reviews = or feedback or testing that look like any of these signatures. >=20 > There are situations in which objects might accept objects from a = different class hierarchy. For instance, with the changes Saki has made = there are now objects for numbers in the BcMath extension. Those are = objects that might be quite widely accepted in overload implementations, = since they represent numbers in the same way that just an int or float = might. But I highly doubt that it's even possible for the overload to = accept those sorts of things without also being aware of them, and if = the overload is aware of them it can type-hint them in the signature. >=20 > Jordan Goods points, while Money objects are frequently added together, I would = typically add DateInterval instances to DateTime instances, which breaks = the limitation. - Davey --Apple-Mail=_0AE8D965-FB8D-4BE1-B436-19483ADBC24B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Sep 17, 2024, at 11:11, Jordan LeDoux = <jordan.ledoux@gmail.com> wrote:



On Tue, Sep 17, 2024 at 10:55=E2=80=AFAM Davey = Shafik <me@daveyshafik.com> = wrote:


On Sep 17, 2024, at 10:15, Jordan LeDoux <jordan.ledoux@gmail.com> = wrote:



On Tue, Sep 17, = 2024 at 1:18=E2=80=AFAM Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> = wrote:
On 14/09/2024 22:48, Jordan LeDoux = wrote:
>
> 1. Should the next version of this RFC use the = `operator` keyword, or 
> should that approach = be abandoned for something more familiar? Why = do 
> you feel that way?
>
> 2. = Should the capability to overload comparison operators be = provided 
> in the same RFC, or would it be = better to separate that into its own 
> RFC? Why = do you feel that way?
>
> 3. Do you feel there were any = glaring design weaknesses in the 
> previous RFC = that should be addressed before it is re-proposed?
>

I = think there are two fundamental decisions which inform a lot of = the 
rest of the design:

1. Are we = over-riding *operators* or *operations*? That is, is the = user 
saying "this is what happens when you put a + = symbol between two Foo 
objects", or "this is what = happens when you add two Foo objects = together"?

If we allow developers to = define arbitrary code which is executed as a result of an operator, we = will always end up allowing the first = one.
 
2. How do we despatch a binary = operator to one of its operands? That is, 
given $a = + $b, where $a and $b are objects of different classes, how = do 
we choose which implementation to = run?


This is something not many = other people have been interested in so far, but interestingly there is = a lot of prior art on this question in other languages! = :) 

The best approach, = from what I have seen and developer usage in other languages, is = somewhat complicated to follow, but I will do my best to make sure it is = understandable to anyone who happens to be following this thread on = internals.

The approach I plan to use for this = question has a name: Polymorphic Handler Resolution. The overload that = is executed will be decided by the following series of = decisions:

1. Are both of the operands objects? = If not, use the overload on the one that is. (NOTE: if neither are = objects, the new code will be bypassed entirely, so I do not need to = handle this case)
2. If they are both objects, are they both = instances of the same class? If they are, use the overload of the one on = the left.
3. If they are not objects of the same class, is one = of them a direct descendant of the other? If so, use the overload of the = descendant.
4. If neither of them are direct descendants of = the other, use the overload of the object on the left. Does it produce a = type error because it does not accept objects of the type in the other = position? Return the error and abort instead of re-trying by using the = overload on the right.

This results from what = it means to `extend` a class. Suppose you have a class `Foo` and a class = `Bar` that extends `Foo`. If both `Foo` and `Bar` implement an overload, = that means `Bar` inherited an overload. It is either the same as the = overload from `Foo`, in which case it shouldn't matter which is = executed, or it has been updated with even more specific logic which is = aware of the extra context that `Bar` provides, in which case we want to = execute the updated implementation.

So the = implementation on the left would almost always be executed, unless the = implementation on the right comes from a class that is a direct = descendant of the class on the left.

`Foo + = Bar`
`Bar + Foo`

In practice, you = would very rarely (if ever) use two classes from entirely different = class inheritance hierarchies in the same overload. That would closely = tie the two classes together in a way that most developers try to avoid, = because the implementation would need to be aware of how to handle the = classes it accepts as an argument.

The = exception to this that I can imagine is something like a container, that = maybe does not care what class the other object is because it doesn't = mutate it, only store it.

But for virtually = every real-world use case, executing the overload for the child class = regardless of its position would be preferred, because overloads will = tend to be confined to the core types of PHP + the classes that are part = of the hierarchy the overload is designed to interact = with.
 


Finally, a very quick note on = the OperandPosition enum: I think just a 
"bool = $isReversed" would be fine - the "natural" expansion of "$a+$b" = is 
"$a->operator+($b, false)"; the "fallback" = is "$b->operator+($a, = true)"


Regards,

-- 
Rowan = Tommins
[IMSoP]

This is similar = to what I originally designed, and I actually moved to an enum based on = feedback. The argument was something like `$isReversed` or `$left` or so = on is somewhat ambiguous, while the enum makes it extremely = explicit.

However, it's not a design detail I = am committed to. I just want to let you know why it was done that = way.

Jordan

To be clear: I=E2=80=99m very much in favor of operator = overloading. I frequently work with both Money value objects, and = DateTime objects that I need to manipulate through arithmetic with = others of the same type.

What if I wanted to = create a generic `add($a, $b)` function, how would I type hint the = params to ensure that I only get =E2=80=9Caddable=E2=80=9D things? I = would expect that to be:

- Ints
- = Floats
- Objects of classes with =E2=80=9Coperator+=E2=80=9D = defined

I think that an interface is the right = solution for that, and you can just union with int/float type hints: = add(int | float | Addable =E2=80=A6$operands) (or add(int | float | (Foo = & Addable) =E2=80=A6$operands)

Is this type = of behavior even allowed? I think the intention is that it must be = otherwise the decision over which overload method gets called is = drastically simplified.

Perhaps for a first = iteration, operator overloads only work between objects of the same type = or their descendants =E2=80=94 and if a descendant overrides the = overload, the descendants version is used regardless of left/right = precedence.

I suspect this will simplify the complexity of = the magic, and solve the majority of cases where operator overloading is = desired.

- = Davey

The problem with = providing interfaces is something the nikic addressed very early in my = design process and convinced me of: an `Addable` interface will not = actually tell you if two objects can be added together. A `Money` class = and a `Vector2D` class might both have an implementation for = `operator +()` and implement some kind of `Addable` interface. But = there is no sensible way in which they could actually be added. Knowing = that an object implements an overload is not enough in most cases to use = operators with them. This is part of the reason that I am skeptical of = people who worry about accidentally using random = overloads.

The signature for the implementation = in the `Money` class, might look something like = this:

`operator +(Money $other, OperandPosition = $position): Money`

while the signature for the = implementation in the `Vector2D` class might look something like = this:

`operator +(Vector2D|array $other, = OperandPosition $position): Vector2D`

Any = attempt to add these two together will result in a = `TypeError`.

Classes which have overloads that = look like the following would be something I think developers should be = IMMEDIATELY suspicious of:

`operator +(object = $other, OperandPosition $position)`
`operator +(mixed $other, = OperandPosition $position)`

Does your = implementation really have a plan for how to `+` with a stream resource = like a file handler, as well as an int? Can you just as easily use `+` = with the `DateTime` class as you can with a `Money` class in your = implementation?

I think there are very few use = cases that would survive code reviews or feedback or testing that look = like any of these signatures.

There are = situations in which objects might accept objects from a different class = hierarchy. For instance, with the changes Saki has made there are now = objects for numbers in the BcMath extension. Those are objects that = might be quite widely accepted in overload implementations, since they = represent numbers in the same way that just an int or float might. But I = highly doubt that it's even possible for the overload to accept those = sorts of things without also being aware of them, and if the overload is = aware of them it can type-hint them in the = signature.

Jordan

Goods points, while Money objects are = frequently added together, I would typically add DateInterval instances = to DateTime instances, which breaks the = limitation.

- Davey

= --Apple-Mail=_0AE8D965-FB8D-4BE1-B436-19483ADBC24B--