Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122898 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 74D131A009C for ; Tue, 2 Apr 2024 22:02:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712095390; bh=/1/xpF3FUbtokzQFcnGIhi+DEVtmUKUeM+SFieqBRFg=; h=In-Reply-To:References:Date:From:To:Subject:From; b=iFZMEOqB3Lm+WQ17m+20s7zTEFdj2P9Su++ACR0N3NgAzQWnomlTfmRej/julp7aS 025Q/410KSZpKLIbyGaaGOKSDjdAmidBtnkKj+Ha0WV+HiUznDzuq3ZK5tFzPaIXdn NGnkbdyMuo1CWRKvIPhchamHJEcVjvwKdQxQXwZlKxIe2nP5JAu+5gqcLkpRg/eG9z N2c6MdEu3JQSRWalrx4amRS/1+XeysUa9YTU7BVtDPcfwMTvQRCq+3l4uNCBR/uc+V mb+Oq4+/esqvKyIYOZcqzEVPImk73n4ZORYtAPO/i+SL7lsk4eE3MTenNUNzhNxXU9 U4RgXi3ydmxDw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 452C31805C2 for ; Tue, 2 Apr 2024 22:03:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 2 Apr 2024 22:03:08 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 193F65C00B6 for ; Tue, 2 Apr 2024 18:02:40 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute1.internal (MEProxy); Tue, 02 Apr 2024 18:02:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= garfieldtech.com; h=cc:content-type:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1712095360; x= 1712181760; bh=DdP7mLcKBWb2Zxuyp4c10dRpFacwVjckwyDxR8cPkPg=; b=L /9g6Pq68SZ6LVAdzF92U6rSvv208LIWktiBch7NPyW68qvdhya3jnimX3XGdNNE/ V0IUgBu384pW0HawA2G4z7yRAQy0SzLD2hA3kK3c+tzS6wpp6M8cwNqXirJAZhjQ p7AsUV2vCiVu9GQqUP9nOO0G/aR2UQ8WTs0AzRfOLeEDPjvlAQz4b+mIYdOntVOA BXiXLfO2nysbyuiYXyaS4/zI34XJMOJODuZy8V8RRXbA3/5gd5JwndVX/sQ/samn 6WDUdrcXcJ+kosh2YgN3z8J0EQ/Teko7bpQZ1UfLnLhG9Oqnbpe3Qxt9VcsEX2/4 ovZ31Ll/dqSiUne4tdy6Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1712095360; x=1712181760; bh=DdP7mLcKBWb2Zxuyp4c10dRpFacw VjckwyDxR8cPkPg=; b=n0BVRurm6t/KzirVBLnkgVxABoMDnFP5oOsu9XotGIXe SkbL5+G3tFVX/YDTvOygaAHgK5UGtDAksn2Aahq2bYBNZ7k+LuRAPGGf7hXmt2N6 ChoLBQKbA0Tl+iWNaOdkGKYZRZhF45uL3hk72txzn41XrRmZAC/bSnTQsukZari4 fvsBbJOVve6gX/o4ZetS3hgdJEcypLrk7crLWt2noch+v8oFI9MP4QnlAQLt6wGU C223E4skim51ZzL/vd+JV267xygCracs4AOmVDqXPJk0aEthD0DKu8H21BfVE1va i5w2E5v0Q7IO3Jvk5O0D1cVn534j3OS1AG+7098f1w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudeffedgtdehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdfnrghr rhihucfirghrfhhivghlugdfuceolhgrrhhrhiesghgrrhhfihgvlhguthgvtghhrdgtoh hmqeenucggtffrrghtthgvrhhnpeeglefgkeduiedvvdetffeujefftdfhjeeiveehgfff keduveektddvledvvdfffeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmh grihhlfhhrohhmpehlrghrrhihsehgrghrfhhivghlughtvggthhdrtghomh X-ME-Proxy: Feedback-ID: i8414410d:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 9E5E41700093; Tue, 2 Apr 2024 18:02:39 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-333-gbfea15422e-fm-20240327.001-gbfea1542 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Message-ID: In-Reply-To: References: <1a4a44c5-d4f5-40ca-b3dc-13d64c2b4425@app.fastmail.com> Date: Tue, 02 Apr 2024 22:02:19 +0000 To: "php internals" Subject: Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs) Content-Type: text/plain From: larry@garfieldtech.com ("Larry Garfield") On Tue, Apr 2, 2024, at 6:04 PM, Ilija Tovilo wrote: >> What would be the reason not to? As you indicated in another reply, the main reason some languages don't is to avoid large stack copies, but PHP doesn't have large stack copies for objects anyway so that's a non-issue. >> >> I've long argued that the fewer differences there are between service classes and data classes, the better, so I'm not sure what advantage this would have other than "ugh, inheritance is such a mess" (which is true, but that ship sailed long ago). > > One issue that just came to mind is object identity. For example: > > class Person { > public function __construct( > public string $firstname, > public string $lastname, > ) {} > } > > class Manager extends Person { > public function bossAround() {} > } > > $person = new Person('Boss', 'Man'); > $manager = new Manager('Boss', 'Man'); > var_dump($person === $manager); // ??? > > Equality for data objects is based on data, rather than the object > handle. How does this interact with inheritance? Technically, Person > and Manager represent the same data. Manager contains additional > behavior, but does that change identity? > > I'm not sure what the answer is. That's just the first thing that came > to mind. I'm confident we'll discover more such edge cases. Of course, > I can invest the time to find the questions before deciding to > disallow inheritance. As Bruce already demonstrated, equality should include type, not just properties. Even without inheritance that is necessary. There may be good reason to omit inheritance, as we did on enums, but that shouldn't be the starting point. (I'd have to research and see what other languages do. I think it's a mixed bag.) We should try to ferret out those edge cases and see if there's reasonable solutions to them. >> > * Mutating method calls on data classes use a slightly different >> > syntax: `$vector->append!(42)`. All methods mutating `$this` must be >> > marked as `mutating`. The reason for this is twofold: 1. It signals to >> > the caller that the value is modified. 2. It allows `$vector` to be >> > cloned before knowing whether the method `append` is modifying, which >> > hugely reduces implementation complexity in the engine. >> >> As discussed in R11, it would be very beneficial if this marker could be on the method definition, not the method invocation. You indicated that would be Hard(tm), but I think it's worth some effort to see if it's surmountably hard. (Or at least less hard than just auto-detecting it, which you indicated is Extremely Hard(tm).) > > I think you misunderstood. The intention is to mark both call-site and > declaration. Call-site is marked with ->method!(), while declaration > is marked with "public mutating function". Call-site is required to > avoid the engine complexity, as previously mentioned. But > declaration-site is required so that the user (and IDEs) even know > that you need to use the special syntax at the call-site. Ah, OK. That's... unfortunate, but I defer to you on the implementation complexity. >> So to the extent there is a consensus, equality, stringifying, and a hashcode (which we don't have yet, but will need in the future for some things I suspect) seem to be the rough expected defaults. > > I'm just skeptical whether the default __toString() is ever useful. I > can see an argument for it for quick debugging in languages that don't > provide something like var_dump(). In PHP this seems much less useful. > It's impossible to provide a default implementation that works > everywhere (or pretty much anywhere, even). > > Equality is already included. Hashing should be added separately, and > probably not just to data classes. The equivalent of Python's __repr__ (which it auto-generates) would be __debugInfo(). Arguably its current output is what the default would likely be anyway, though. I believe the typical auto-toString output is the same data, but presented in a more human-friendly way. (So yes, mainly useful for debugging.) Equality, well, we've already debated whether or not we should make that a general feature. :-) Of note, though, in languages with equals(), it's also user-overridable. >> > * In the future, it should be possible to allow using data classes in >> > `SplObjectStorage`. However, because hashing is complex, this will be >> > postponed to a separate RFC. I believe this is where we would want/need a __hash() method or similar; Derick and I encountered that while researching collections in other languages. Leaving it out for now is fine, but it would be important for any future list-of functionality. >> Would data class properties only be allowed to be other data classes, or could they hold a non-data class? My knee jerk response is they should be data classes all the way down; the only counter-argument I can think of it would be how much existing code is out there that is a "data class" in all but name. I still fear someone adding a DB connection object to a data class and everything going to hell, though. :-) > > Disallowing ordinary by-ref objects is not trivial without additional > performance penalties, and I don't see a good reason for it. Can you > provide an example on when that would be problematic? > > Ilija There's two aspects to it, that I see. data class A { public function __construct(public string $name) {} } data class B { public function __construct( public A $a, public PDO $conn, ) {} } $b = new B(new A(), $pdoConnection); function stuff(B $b2) { $b2->a->name = 'Larry'; // This triggers a CoW on $b2, separating it from $b, and also creating a new instance of A. What about $conn? // Does it get cloned? That would be bad. Does it not get cloned? That seems weird that it's still the same on // a data object. $b2->conn->beginTransaction(); // This I would say is technically a modification, since the state of the connection is changing. But then // should this trigger $b2 cloning from $b1? Neither answer is obvious to me. } In a sense, it's similar to the "PSR-7 is immutable, asterisk, streams" issue that has often been pointed out. "Data objects are safe to pass around and will self-clone when needed, asterisk, unless there's a normal object in it and then it's non-obvious" doesn't sound like a good mental model to give people. Or consider DateTime. It's mutable. Should mutating it clone an object that has a DateTime property? I can realistically argue both ways, and I'm not convinced either is right; just that neither is intuitive. "Data classes all the way down" resolves this problem. The caveat would be that a genuinely immutable object would (probably?) be safe (DateTimeImmutable, or a readonly class), so maybe we can make readonly classes an exception? Ah, no, we cannot, because despite what PHPStan insists, there's no reason that the single write to a readonly property must happen at construction. It can easily happen as a side effect of another method (eg, a cache value), meaning readonly objects are not truly immutable. In fact, readonly objects can have non-readonly objects on their properties, too. So I don't think that's safe, either. The other aspect is, eg, serialization. People will come to expect (reasonably) that a data class will have certain properties (in the abstract sense, not lexical sense). For instance, most classes are serializable, but a few are not. (Eg, if they have a reference to PDO or a file handle or something unserializable.) Data classes seem like they should be safe to serialize always, as they're "just data". If data classes are limited to primitives and data classes internally, that means we can effectively guarantee that they will be serializable, always. If one of the properties could be a non-serializable object, that assumption breaks. There's probably other similar examples besides serialization where "think of this as data" and "think of this as logic" is how you'd want to think, which leads to different assumptions, which we shouldn't stealthily break. --Larry Garfield