Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122874 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 7E2381A009C for ; Tue, 2 Apr 2024 15:30:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712071875; bh=r578GW+T/TY0OHN2mSnz/zcv9qQLxemksm04oU5l/sE=; h=In-Reply-To:References:Date:From:To:Subject:From; b=d9FWt65PIoAalti0MFQWBceKPJYaitpu3sLExoPOYD73eYDoy7Z/npb4owkMLLnzk RCmWNY9IhWEIE/y3T4e8SPkWVipx7MzmW6cEPf+rujm2f5/SOHlIgQXinISHtmnIiY Iz2QQfsYp41ExfgcbVnbEZPf75Vf5mGmErvyOQzMA8IWWn6caJAw3K06JnQLU18YR7 YW7lVBs5X0141Qdcs4tLLa/TaQw6zdSwKnh05ZgnmozObeP+1f43gN1/DQFExq/4on GQMq7PVlUp7VbDv2nnGGYPTuHXCJiZIhjDp+qrNRd3Nxis/Tbm8gXCy9PrcqhX6jBW MD1KWHHa5PA8A== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5AA32180793 for ; Tue, 2 Apr 2024 15:31:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 2 Apr 2024 15:31:13 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 2CCD05C00CB for ; Tue, 2 Apr 2024 11:30:45 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute1.internal (MEProxy); Tue, 02 Apr 2024 11:30:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= garfieldtech.com; h=cc:content-type:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1712071845; x= 1712158245; bh=3Kpulo2lbzkTw174Cq7S2fa1UnXlf6UAkMcxQlkJv1Y=; b=d ljwh6qWofmkZexPpUzNVD6j/T2Q5xCgMlcrzYuB78+a70htJ94CyMJnBm7q69RgN eOWbG5BHX3a2kcrEXdQ+Zb8K8zwe1JLzP37CkvbpD3TTvPBR0ors2mBzx1IeC7sr 5EQaKIo8M5ct71wjUPZiLIliOqi1xY6KLmwb61GA/DS3d5Tc26MKE7I8v1FyC7Pp 73e/k0w3WHb8gsoKEQnwaBaPTEEzbsBo/q5xkUlJhRpctq8vAvPN0FjmYav73jG0 29EHGwB1AcG8CxhFNkowdLLZN3F/QYJ+oD5CvDLFhvNC5to1GFEpRnL/cjFq5/rw dxXfUYgdB0gyCTXjsx/1Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1712071845; x=1712158245; bh=3Kpulo2lbzkTw174Cq7S2fa1UnXl f6UAkMcxQlkJv1Y=; b=wipr4/uK+80ltiNQYcyBf03bMufqR01Za9ZuLV5YlQwR 7Ue9z53EuRReKno/Bz1THgvG//rK9ifWeeNEepzMcr2my8p2PbOyrD7WVvv1fjy2 KHG8ilNrMCDfc51/Nulq9+FR2zCCe3KdTaZlHb7ll2ZwYK/euEVY3R2A70CvSQiZ uE1W3B7BRvl9bArBNWKXBqYrCmePkaYCXtZjW/B6obJ8ITpCyOlBRo61c5xs+NqM D0CneJ0xIUgMpT+MhCxO4eAe8RArwe5Ck1KPaHqolvlnJXm0FfYuuuuz2S9HNFcQ WcJRFZUIzKe8dLvht8u/opQ/fYwOhz79E00lUdTYWg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudefvddgkeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdfnrghr rhihucfirghrfhhivghlugdfuceolhgrrhhrhiesghgrrhhfihgvlhguthgvtghhrdgtoh hmqeenucggtffrrghtthgvrhhnpeekteelheffgeefvddufeeujeekhfdvtdeuuedvveet ieevheeludegjeduhffhteenucffohhmrghinhepghhithhhuhgsrdgtohhmnecuvehluh hsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheplhgrrhhrhiesghgr rhhfihgvlhguthgvtghhrdgtohhm X-ME-Proxy: Feedback-ID: i8414410d:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 88E9B1700093; Tue, 2 Apr 2024 11:30:44 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-333-gbfea15422e-fm-20240327.001-gbfea1542 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Message-ID: <1a4a44c5-d4f5-40ca-b3dc-13d64c2b4425@app.fastmail.com> In-Reply-To: References: Date: Tue, 02 Apr 2024 15:30:23 +0000 To: "php internals" Subject: Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs) Content-Type: text/plain From: larry@garfieldtech.com ("Larry Garfield") On Tue, Apr 2, 2024, at 12:17 AM, Ilija Tovilo wrote: > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). *gets popcorn* > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a = [1, 2, 3]; > $b = $a; > $b[] = 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] = > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. What would be the reason not to? As you indicated in another reply, the main reason some languages don't is to avoid large stack copies, but PHP doesn't have large stack copies for objects anyway so that's a non-issue. I've long argued that the fewer differences there are between service classes and data classes, the better, so I'm not sure what advantage this would have other than "ugh, inheritance is such a mess" (which is true, but that ship sailed long ago). > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. As discussed in R11, it would be very beneficial if this marker could be on the method definition, not the method invocation. You indicated that would be Hard(tm), but I think it's worth some effort to see if it's surmountably hard. (Or at least less hard than just auto-detecting it, which you indicated is Extremely Hard(tm).) > * Data classes customize identity (`===`) comparison, in the same way > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. For reference: Java record classes auto-generate equals(), toString(), hashCode(), and same-name methods (we don't need that). Kotlin data classes auto-generate equals(), toString(), hashCode(), same-name methods, and a copy() method that is basically what we've been discussing as clone-with. C# record classes auto-generate equals() and ToString(), and are immutable. They also support "with expressions" ($foo with { new args }, basically clone-with). C# record structs auto-generate equals() and ToString(), and are mutable. (Go figure.) Python data classes are highly configurable, but by default generate toString(), a var-dump-targeted string (__repr__), a hash function, and some other Python-specific things with no PHP-equivalent. They can also opt-in to generating ordering overrides (op overloads), being readonly (frozen), or being named-args-only. Swift structs, from what I can find just briefly, don't seem to auto-generate anything. (I could be wrong here.) (In basically all cases above, providing your own implementation in the data class overrides the default generated one.) The concept doesn't exist in C/++, Go, or Rust, at least not in a usefully equivalent way. TypeScript doesn't seem to have them from what I can find. So to the extent there is a consensus, equality, stringifying, and a hashcode (which we don't have yet, but will need in the future for some things I suspect) seem to be the rough expected defaults. > * Data classes protect from interior mutability. More concretely, > mutating nested data objects stored in a `readonly` property is not > legal, whereas it would be if they were ordinary objects. > * In the future, it should be possible to allow using data classes in > `SplObjectStorage`. However, because hashing is complex, this will be > postponed to a separate RFC. Would data class properties only be allowed to be other data classes, or could they hold a non-data class? My knee jerk response is they should be data classes all the way down; the only counter-argument I can think of it would be how much existing code is out there that is a "data class" in all but name. I still fear someone adding a DB connection object to a data class and everything going to hell, though. :-) > One known gotcha is that we cannot trivially enforce placement of > `modfying` on methods without a performance hit. It is the > responsibility of the user to correctly mark such methods. > > Here's a fully functional PoC, excluding JIT: > https://github.com/php/php-src/pull/13800 > > Let me know what you think. I will start working on an RFC draft once > work on property hooks concludes. > > Ilija --Larry Garfield