Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122951 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 8E6E21A009C for ; Thu, 4 Apr 2024 22:28:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712269745; bh=AFKO1bShQUXP+UJjdp0B67XS2Yj15qkzFPc9e3svEhk=; h=Date:Subject:To:References:From:In-Reply-To:From; b=eHgiOTSJZnhSAwPXOGEVJ65B++20m7z3D0dTn9z7QtCDGbUhJHsljXVpZf7SG9t7A XUy4VJ4boYtrOR314dNMQAQQ9Djo0YN/fWLmLlQRBtre4B7eQ47vwsGQFULEcSPZx2 jySPIVHoo5ci2AG1HaqnduYHx/NkI4Dqqtj561Goy+rMPBICRqzej9jyUS8rvkFG/z 9C3IQofYZwGbwbC0TAOfWXMn5whVtWkQJHXxGQlJISLmG2zUTbFBXrGlf5Xpe7nGK8 Vk6flwFs/nn4PvSvcFe1/dH5R4tc94T2ZLJiJjVd6thfIOD5rIZ1pwwojocH54//Fr AKqVsArZbV7sw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 76C8F180387 for ; Thu, 4 Apr 2024 22:29:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fhigh1-smtp.messagingengine.com (fhigh1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 4 Apr 2024 22:29:03 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailfhigh.nyi.internal (Postfix) with ESMTP id EC05811400B1 for ; Thu, 4 Apr 2024 18:28:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 04 Apr 2024 18:28:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rwec.co.uk; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1712269713; x=1712356113; bh=5p4sskc8SG IPJmr/FLnZLisAh/8VOdCbWw9PrwEdhVI=; b=Aa6gisstbSHkP4TnWYkFLfB7g4 OOqN0fbx7N4fo/IFI5YH2gy5UhxhpSGl8u1fO5urRvJ3/+Mmc2fQzoJ1M8NgUN67 hf6F3xZFjV2eeJqlN1bASjDs9vdG5TxHdZ1itBrdzA6oKYVQ1v6vkzPLA877tUvg QngHGYyy6HEjZKWAn8P9zfoNGFwJ3BrCv+UOqcTTNZIfs0z9hpJW9h/h1R4J0kQA ZLxOcbPCW/zd92orFj9qbjCmMnoRyhha2poOnoFi4vIZ3GeMg4GQ+z0wSMGWikq5 baY74WdTSJCDcUyFRoMO4+2BEJ0tLCVQS+kzEuYpJeYvetrGuS1V7RaxA0uA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1712269713; x=1712356113; bh=5p4sskc8SGIPJmr/FLnZLisAh/8V OdCbWw9PrwEdhVI=; b=ESGqtjMMpxs+thyTqIXf0YuK9irMLHaA27U7+7/PXRVr ChaNr4dBLCSkO7YWecrdpAJXIbZusoiIYmw7wxB5PQmwsz2IjAltKjifJnbTxEbe 2cOqdXJpaKNK0d8kyHur67Jxs+HM+NdbkinrlL432fnZkUx1/Q6q/4Xu8Nmwq0FI 9RlxFUqh+hvFih7gG4HzfLsfbidZZoZWh+SQsDszYr943m8yfyT8bDwJGgj1ntwI riYnEdgScAQ16Djr/PJCWQATRBt8HLqRAXF/jObWDWcSwmOxYM+v4Gh7AkjwSo7O p3QJJN5KrONlcdzUy74znWVywpAJniEBjGTy5rosrw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefledguddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpegtkfffgggfuffvfhfhjgesrgdtre ertddvjeenucfhrhhomhepfdftohifrghnucfvohhmmhhinhhsucglkffoufhorfgnfdcu oehimhhsohhprdhphhhpsehrfigvtgdrtghordhukheqnecuggftrfgrthhtvghrnhephe etleeiiefgueduieeuieffvdevheduueefkeejuefgffeftdeitdegtedtleetnecuvehl uhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepihhmshhophdrph hhphesrhifvggtrdgtohdruhhk X-ME-Proxy: Feedback-ID: id5114917:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 4 Apr 2024 18:28:33 -0400 (EDT) Content-Type: multipart/alternative; boundary="------------tyC3wj0WCwBoWAy7x0E06jZh" Message-ID: <9f66fd2b-d93e-42d1-906a-ca83cc51b08e@rwec.co.uk> Date: Thu, 4 Apr 2024 23:28:29 +0100 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs) To: internals@lists.php.net References: Content-Language: en-GB In-Reply-To: From: imsop.php@rwec.co.uk ("Rowan Tommins [IMSoP]") This is a multi-part message in MIME format. --------------tyC3wj0WCwBoWAy7x0E06jZh Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 03/04/2024 00:01, Ilija Tovilo wrote: > Data classes are classes with a single additional > zend_class_entry.ce_flags flag. So unless customized, they behave as > classes. This way, we have the option to tweak any behavior we would > like, but we don't need to. > > Of course, this will still require an analysis of what behavior we > might want to tweak. Regardless of the implementation, there are a lot of interactions we will want to consider; and we will have to keep considering new ones as we add to the language. For instance, the Property Hooks RFC would probably have needed a section on "Interaction with Data Classes". On the other hand, maybe having two types of objects to consider each time is better than having to consider combinations of lots of small features. On a practical note, a few things I've already thought of to consider: - Can a data class have readonly properties (or be marked "readonly data class")? If so, how will they behave? - Can you explicitly use the "clone" keyword with an instance of a data class? Does it make any difference? - Tied into that: can you implement __clone(), and when will it be called? - If you implement __set(), will copy-on-write be triggered before it's called? - Can you implement __destruct()? Will it ever be called? > Consider this example, which would > work with the current approach: > > $shapes[0]->position->zero!(); I find this concise example confusing, and I think there's a few things to unpack here... Firstly, there's putting a data object in an array: $numbers = [ new Number(42) ]; $cow = $numbers; $cow[0]->increment!(); assert($numbers !== $cow); This is fairly clearly equivalent to this: $numbers = [ 42 ]; $cow = $numbers; $cow[0]++; assert($numbers !== $cow); CoW is triggered on the array for both, because ++ and ->increment!() are both clearly modifications. Second, there's putting a data object into another data object: $shape = new Shape(new Position(42,42)); $cow = $shape; $cow->position->zero!(); assert($shape !== $cow); This is slightly less obvious, because it presumably depends on the definition of Shape. Assuming Position is a data class: - If Shape is a normal class, changing the value of $cow->position just happens in place, and the assertion fails - If Shape is a readonly class (or position is a readonly property on a normal class), changing the value of $cow->position shouldn't be allowed, so this will presumably give an error - If Shape is a data class, changing the value of $shape->position implies a "mutation" of $shape itself, so we get a separation before anything is modified, and the assertion passes Unlike in the array case, this behaviour can't be resolved until you know the run-time type of $shape. Now, back to your example: $shapes = [ new Shape(new Position(42,42)) ]; $cow = $shapes; $shapes[0]->position->zero!(); assert($cow !== $shapes); This combines the two, meaning that now we can't know whether to separate the array until we know (at run-time) whether Shape is a normal class or a data class. But once that is known, the whole of "->position->zero!()" is a modification to $shapes[0], so we need to separate $shapes. > Without such a class-wide marker, you'll need to remember to add the > special syntax exactly where applicable. > > $shapes![0]!->position!->zero(); The array access doesn't need any special marker, because there's no ambiguity. The ambiguous call is the reference to ->position: in your current proposal, this represents a modification *if Shape is a data class, and is itself being modified*. My suggestion (or really, thought experiment) was that it would represent a modification *if it has a ! in the call*. So if Shape is a readonly class: $shapes[0]->position->!zero(); // Error: attempting to modify readonly property Shape::$position $shapes[0]->!position->!zero(); // OK; an optimised version of: $shapes[0] = clone $shapes[0] with [     'position' =>  (clone $shapes[0]->position with ['x'=>0,'y'=>0]) ]; If ->! is only allowed if the RHS is either a readonly property or a mutating method, then this can be reasoned about statically: it will either error, or cause a CoW separation of $shapes. It also allows classes to mix aspects of "data class" and "normal class" behaviour, which might or might not be a good idea. This is mostly just a thought experiment, but I am a bit concerned that code like this is going to be confusingly ambiguous: $item->shape->position->zero!(); What is going to be CoW cloned, and what is going to be modified in place? I can't actually know without knowing the definition behind both $item and $item->shape. It might even vary depending on input. Regards, -- Rowan Tommins [IMSoP] --------------tyC3wj0WCwBoWAy7x0E06jZh Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit On 03/04/2024 00:01, Ilija Tovilo wrote:
> Data classes are classes with a single additional > zend_class_entry.ce_flags flag. So unless customized, they behave as > classes. This way, we have the option to tweak any behavior we would > like, but we don't need to. > > Of course, this will still require an analysis of what behavior we > might want to tweak.

Regardless of the implementation, there are a lot of interactions we will want to consider; and we will have to keep considering new ones as we add to the language. For instance, the Property Hooks RFC would probably have needed a section on "Interaction with Data Classes".

On the other hand, maybe having two types of objects to consider each time is better than having to consider combinations of lots of small features.


On a practical note, a few things I've already thought of to consider:

- Can a data class have readonly properties (or be marked "readonly data class")? If so, how will they behave?
- Can you explicitly use the "clone" keyword with an instance of a data class? Does it make any difference?
- Tied into that: can you implement __clone(), and when will it be called?
- If you implement __set(), will copy-on-write be triggered before it's called?
- Can you implement __destruct()? Will it ever be called?



> Consider this example, which would > work with the current approach: > > $shapes[0]->position->zero!();

I find this concise example confusing, and I think there's a few things to unpack here...


Firstly, there's putting a data object in an array:

$numbers = [ new Number(42) ];
$cow = $numbers;
$cow[0]->increment!();
assert($numbers !== $cow);

This is fairly clearly equivalent to this:

$numbers = [ 42 ];
$cow = $numbers;
$cow[0]++;
assert($numbers !== $cow);

CoW is triggered on the array for both, because ++ and ->increment!() are both clearly modifications.


Second, there's putting a data object into another data object:

$shape = new Shape(new Position(42,42));
$cow = $shape;
$cow->position->zero!();
assert($shape !== $cow);

This is slightly less obvious, because it presumably depends on the definition of Shape. Assuming Position is a data class:

- If Shape is a normal class, changing the value of $cow->position just happens in place, and the assertion fails

- If Shape is a readonly class (or position is a readonly property on a normal class), changing the value of $cow->position shouldn't be allowed, so this will presumably give an error

- If Shape is a data class, changing the value of $shape->position implies a "mutation" of $shape itself, so we get a separation before anything is modified, and the assertion passes

Unlike in the array case, this behaviour can't be resolved until you know the run-time type of $shape.


Now, back to your example:

$shapes = [ new Shape(new Position(42,42)) ];
$cow = $shapes;
$shapes[0]->position->zero!(); assert($cow !== $shapes);

This combines the two, meaning that now we can't know whether to separate the array until we know (at run-time) whether Shape is a normal class or a data class.

But once that is known, the whole of "->position->zero!()" is a modification to $shapes[0], so we need to separate $shapes.


Without such a class-wide marker, you'll need to remember to add the
special syntax exactly where applicable.

$shapes![0]!->position!->zero();


The array access doesn't need any special marker, because there's no ambiguity. The ambiguous call is the reference to ->position: in your current proposal, this represents a modification *if Shape is a data class, and is itself being modified*. My suggestion (or really, thought experiment) was that it would represent a modification *if it has a ! in the call*.

So if Shape is a readonly class:

$shapes[0]->position->!zero();
// Error: attempting to modify readonly property Shape::$position

$shapes[0]->!position->!zero();
// OK; an optimised version of:
$shapes[0] = clone $shapes[0] with [
    'position' =>  (clone $shapes[0]->position with ['x'=>0,'y'=>0])
];

If ->! is only allowed if the RHS is either a readonly property or a mutating method, then this can be reasoned about statically: it will either error, or cause a CoW separation of $shapes. It also allows classes to mix aspects of "data class" and "normal class" behaviour, which might or might not be a good idea.


This is mostly just a thought experiment, but I am a bit concerned that code like this is going to be confusingly ambiguous:

$item->shape->position->zero!();

What is going to be CoW cloned, and what is going to be modified in place? I can't actually know without knowing the definition behind both $item and $item->shape. It might even vary depending on input.


Regards,

-- 
Rowan Tommins
[IMSoP]
--------------tyC3wj0WCwBoWAy7x0E06jZh--