Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
This adds a new class modifier: data. This modifier drastically changes how classes work, making them comparable by value instead of reference, and any mutations behave more like arrays than objects (by vale). If desired, it can be combined with other modifiers, such as readonly, to enforce immutability.
I've been playing with this feature for a few days now, and it is surprisingly intuitive to use. There is a (mostly) working implementation available on GitHub (https://github.com/php/php-src/pull/16904) if you want to have a go at it.
Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // true
Data classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}
I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Hello
If I remember correctly, the whole concept of "value" is fully described in
DDD book by Eric Evans. If that's the point of the RFC, I wonder of there's
any point in not making such classes immutable by default, and to keep only
one instance of value object unique per given set of properties in memory,
thereby eliminating cloning altogether and optimizing the memory usage.
Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion,
I would like to introduce to you a competing RFC: Data Classes (
https://wiki.php.net/rfc/dataclass).This adds a new class modifier: data. This modifier drastically changes
how classes work, making them comparable by value instead of reference, and
any mutations behave more like arrays than objects (by vale). If desired,
it can be combined with other modifiers, such as readonly, to enforce
immutability.I've been playing with this feature for a few days now, and it is
surprisingly intuitive to use. There is a (mostly) working implementation
available on GitHub (https://github.com/php/php-src/pull/16904) if you
want to have a go at it.Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
If I remember correctly, the whole concept of "value" is fully
described in DDD book by Eric Evans. If that's the point of the RFC, I
wonder of there's any point in not making such classes immutable by
default, and to keep only one instance of value object unique per
given set of properties in memory, thereby eliminating cloning
altogether and optimizing the memory usage.
As I mentioned on the "Records" thread, guaranteeing that every
combination of values will exist exactly once in memory could create
more overhead than it saves.
Certainly if you write this, sharing memory makes a lot of sense:
$arr = []; $i = 0;
while ( $i++ < 100 ) {
$arr[] = new Point(0,0);
}
But if you instead write this, maintaining the cache will end up more
expensive than just allocating each object/record/struct directly:
$arr = []; $i = 0;
while ( $i++ < 100 ) {
$arr[] = new Point($i, $i);
}
If the guarantee is copy-on-write, caching could be a compile-time
optimisation; e.g. an OpCache pass might rewrite the first loop to the
equivalent of this:
$arr = []; $i = 0;
$__cachedPoint = new Point(0,0);
while ( $i++ < 100 ) {
$arr[] = $__cachedPoint;
}
The main thing that would prevent this optimisation is a custom
constructor which might make the number of "new" calls observable.
Either the optimiser would have to detect a custom constructor, or data
classes / structs / records would have to prohibit defining one.
--
Rowan Tommins
[IMSoP]
Hello
If I remember correctly, the whole concept of "value" is fully described in DDD book by Eric Evans. If that's the point of the RFC, I wonder of there's any point in not making such classes immutable by default, and to keep only one instance of value object unique per given set of properties in memory, thereby eliminating cloning altogether and optimizing the memory usage.
__
Hello internals,Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
This adds a new class modifier: data. This modifier drastically changes how classes work, making them comparable by value instead of reference, and any mutations behave more like arrays than objects (by vale). If desired, it can be combined with other modifiers, such as readonly, to enforce immutability.
I've been playing with this feature for a few days now, and it is surprisingly intuitive to use. There is a (mostly) working implementation available on GitHub (https://github.com/php/php-src/pull/16904) if you want to have a go at it.
Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Hello!
Don't forget to bottom-post!
I wonder of there's any point in not making such classes immutable by default, and to keep only one instance of value object unique per given set of properties in memory, thereby eliminating cloning altogether and optimizing the memory usage.
This was the entire point of the records RFC ;) it was immutable by default, but people were wondering what it would look like if it were more 'composable' vs. 'dedicated syntax'.
— Rob
Born from the Records RFC (https://wiki.php.net/rfc/records)
discussion, I would like to introduce to you a competing RFC: Data
Classes (https://wiki.php.net/rfc/dataclass).
Thank you for continuing to think about this, and PoC code is always
useful to work through the implications.
It seems like this is going in a very similar direction to the work
Ilija shared in April: https://externals.io/message/122845 and
https://github.com/php/php-src/pull/13800
My knowledge of the engine isn't good enough to compare the two PRs, but
the descriptions seem very similar. The main differences seem to be
details mentioned in one draft and not the other:
- You have described details for constructors and inheritance, which
Ilija left as open questions - Ilija had considered how instance methods should behave, proposing a
"mutating" keyword and "!" call-site marker. Your RFC doesn't discuss
this - the changeName example shows behaviour inside the method, but
not behaviour when calling it
Is it just an oversight that you didn't link to the previous discussion,
or had you not realised how similar the proposals would end up? Either
way, this looks like ripe ground for collaboration, unless there is some
fundamental disagreement about the approach.
--
Rowan Tommins
[IMSoP]
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
Thank you for continuing to think about this, and PoC code is always useful to work through the implications.
It seems like this is going in a very similar direction to the work Ilija shared in April: https://externals.io/message/122845 and https://github.com/php/php-src/pull/13800
My knowledge of the engine isn't good enough to compare the two PRs, but the descriptions seem very similar. The main differences seem to be details mentioned in one draft and not the other:
- You have described details for constructors and inheritance, which Ilija left as open questions
- Ilija had considered how instance methods should behave, proposing a "mutating" keyword and "!" call-site marker. Your RFC doesn't discuss this - the changeName example shows behaviour inside the method, but not behaviour when calling it
Is it just an oversight that you didn't link to the previous discussion, or had you not realised how similar the proposals would end up? Either way, this looks like ripe ground for collaboration, unless there is some fundamental disagreement about the approach.
--
Rowan Tommins
[IMSoP]
Yes, this is mostly about "composability" vs. dedicated syntax. A bare "data class" is very similar to struct while a "final readonly data class" is very similar to records.
Is it just an oversight that you didn't link to the previous
discussion, or had you not realised how similar the proposals
would end up?
Yes, it is an oversight! I didn't even think to link to it. To be fair, I also didn't link to the records RFC. I've updated the RFC with links. While some behavior is similar to what Ilija described, it is mostly a natural progression of adding a data
modifier to classes. There's no special syntax because classes already have a well-defined syntax.
Your RFC
doesn't discuss this - the changeName example shows behaviour
inside the method, but not behaviour when calling it
An interesting observation, can you explain more as to what you mean? The changeName example is simply about constructors—whose behavior is mostly due to engine limitations. You can't very easily see what happens inside a constructor from outside a constructor (usually).
— Rob
Your RFC doesn't discuss this - the changeName example shows behaviour
inside the method, but not behaviour when calling itAn interesting observation, can you explain more as to what you mean?
Looking closer, there's a hint at what you expect to happen in your
Rectangle example:
$bigRectangle = $rectangle->resize(10, 20);
assert($bigRectangle !== $rectangle); // true
It seems that modifications to $this aren't visible outside the method,
creating a purely local clone, which would be discarded if it wasn't
returned (or saved somewhere).
I can see the logic, but the result is a bit unintuitive:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc();
echo $foo->x; // 1, not 2
I think it would be clearer to prevent direct modification of $this:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++; // ERROR: Can not mutate $this in data class
}
public function withInc(): static {
$new = $this; // explicitly make a local copy of $this
$new->x++; // copy-on-write separates $new from $this
return $new;
}
}
That would still be compatible with Ilija's suggestion, which was to add
special "mutating methods":
data class Example {
public function __construct(public int $x) {}
public mutating function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc!(); // copy-on-write triggered before the method is called
echo $foo->x; // 2
--
Rowan Tommins
[IMSoP]
Your RFC doesn't discuss this - the changeName example shows behaviour
inside the method, but not behaviour when calling itAn interesting observation, can you explain more as to what you mean?
Looking closer, there's a hint at what you expect to happen in your Rectangle example:
$bigRectangle = $rectangle->resize(10, 20);
assert($bigRectangle !== $rectangle); // trueIt seems that modifications to $this aren't visible outside the method, creating a purely local clone, which would be discarded if it wasn't returned (or saved somewhere).
I can see the logic, but the result is a bit unintuitive:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc();
echo $foo->x; // 1, not 2
Interesting! I actually found it to be intuitive.
Think of it like this:
function increment(array $array) {
$array[0]++;
}
$arr = [0];
increment($arr);
echo $arr[0]; // is 0
We don't expect $arr to be any different outside of the function because $arr is a value, not a reference. "data classes" are "values" and not references to values, thus when you modify $this, you modify the value, and it doesn't affect values elsewhere. If you want to keep track of that value, you have to put it somewhere where you can reference it—a return value, global variable, property in a regular class, etc. In any case, lets keep going to see if there is a better way.
I think it would be clearer to prevent direct modification of $this:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++; // ERROR: Can not mutate $this in data class
}
public function withInc(): static {
$new = $this; // explicitly make a local copy of $this
$new->x++; // copy-on-write separates $new from $this
return $new;
}
}
Not that I disagree (see the records RFC), but at that point, why not make data classes implicitly readonly?
That would still be compatible with Ilija's suggestion, which was to add special "mutating methods":
data class Example {
public function __construct(public int $x) {}
public mutating function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc!(); // copy-on-write triggered before the method is called
echo $foo->x; // 2
I actually find this appealing, but it is strange to me to allow this syntax on classes. Is there precedent for that? Or is there a way we can do it using "regular looking PHP"; or are structs the way to go?
Another alternative would be that mutations still trigger a copy-on-write, but the outer variable is updated with $this upon return. So this would work:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++;
}
}
$foo = new Example(1);
$bar = $foo;
$foo->inc(); // foo is copied on mutation, and $foo points at the new value on return.
echo $bar->x; // 1
echo $foo->x; // 2
To me, this seems like it would be even more intuitive; $foo has the value you would expect from outside the class and doesn't require you keeping track of the value yourself. Though, there are some footguns here too:
class Foo {
Example $bar;
function baz() {
$bar = $this->bar; // should be $bar = &$this->bar
$bar->inc();
echo $this->bar->x; // not incremented
}
}
I could go either way on this one, honestly; so it makes sense to me why structs would have a dedicated syntax for whichever you prefer. On the one hand, it currently requires you to be explicit all the time (which can be annoying), and on the other hand, there's the implicit copy here, which requires you to be explicit when you want a reference.
I suppose there is a third option as well, and that is to not to any of these options and just have data classes that always compare by value.
??--
Rowan Tommins
[IMSoP]
— Rob
Interesting! I actually found it to be intuitive.
Think of it like this:
function increment(array $array) {
$array[0]++;
}$arr = [0];
increment($arr);
echo $arr[0]; // is 0We don't expect $arr to be any different outside of the function
because $arr is a value, not a reference.
My mental model, rightly or wrongly, is that passing to a parameter is a
bit like an assignment to a local variable:
function increment(array) {
$array = $__args[0];
$array[0]++;
}
(This is explicitly how subroutine parameters work in Perl; I don't know
if that's affected my mental model, or just means Larry Wall pictured it
the same way.)
You can even assign a new value to it, like any other variable:
function whatever(array $array) {
$array = 'not even an array any more';
}
But in PHP, $this isn't a parameter, and it's never possible to assign a
new value to $this; so it feels completely alien to have a method where
$this stops referring to the current object, and becomes a local variable.
I think it would be clearer to prevent direct modification of $this:
Not that I disagree (see the records RFC), but at that point, why not
make data classes implicitly readonly?
I'm only suggesting restricting mutation on $this, not on the object
itself. $foo->x++ would still work, and have automatic copy-on-write;
but $this->x++ would be an error on a data class, just as $this=$bar is
an error on all existing objects.
That would still be compatible with Ilija's suggestion, which was to
add special "mutating methods":I actually find this appealing, but it is strange to me to allow this
syntax on classes. Is there precedent for that? Or is there a way we
can do it using "regular looking PHP"; or are structs the way to go?
The way I see it, it's just a third type of method, to add to the two we
already have:
- instance methods: $this refers to the current instance
- static methods: $this is forbidden
- mutating methods: $this refers to the desired result of the mutation
In fact, it's a bit like __construct or __clone, where $this refers to
the newly created/copied object, before anything else points to it.
--
Rowan Tommins
[IMSoP]
Rob Landers rob@bottled.codes hat am 23.11.2024 14:11 CET geschrieben:
Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
This adds a new class modifier: data. This modifier drastically changes how classes work, making them comparable by value instead of reference, and any mutations behave more like arrays than objects (by vale). If desired, it can be combined with other modifiers, such as readonly, to enforce immutability.
I've been playing with this feature for a few days now, and it is surprisingly intuitive to use. There is a (mostly) working implementation available on GitHub (https://github.com/php/php-src/pull/16904) if you want to have a go at it.
Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Thanks for the rfc!
From userland perspective I would prefer to have the cloning more explicitly, e.g.
return clone $this($this->x + $other->x, $this->y + $other->y);
or
return clone $this(x: $this->x + $other->x); // clone with y unchanged
Best Regards
Thomas
Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records)
discussion, I would like to introduce to you a competing RFC: Data
Classes (https://wiki.php.net/rfc/dataclass).This adds a new class modifier: data. This modifier drastically changes
how classes work, making them comparable by value instead of reference,
and any mutations behave more like arrays than objects (by vale). If
desired, it can be combined with other modifiers, such as readonly, to
enforce immutability.I've been playing with this feature for a few days now, and it is
surprisingly intuitive to use. There is a (mostly) working
implementation available on GitHub
(https://github.com/php/php-src/pull/16904) if you want to have a go at
it.Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Oh boy. Again, I think there's too much going on here, but I think that's because different people are operating under a different definition of what "value semantics" means. Let me try to break down what I think are the constituent parts.
-
Pass-by-value. This is what arrays, ints, strings, etc. do. When you pass a value to a function, what you get is logically a new value. It may be equal to the old one, it may be the same memory location as the old one, but that's hidden from you. Logically, it's a new value. (And if there's a shared memory location, CoW hides that from you, too.) The intent here is to avoid "spooky action at a distance" (SAAAD) (that is, changing a value inside a function is guaranteed to not have any effect on the function that called it).
-
Logical equality. This only applies to compound values (arrays and objects), but would imply checking equality by recursively checking equality on sub-elements. (Properties in the case of objects, keys in the case of arrays.)
-
Physical equality. This is what === does, and checks that two variables refer to the same memory location. Physical equality implies logical equality, but not vice versa.
-
Immutability. A given variable's value cannot change.
-
Product types. A type that is based on two or more other types. (Eg, Point is a product of int and int.)
These are all circling around the same problem space, but are all different things. For instance, rigidly immutable values make pass-by-value irrelevant, while pass-by-value avoids SAAD without needing immutability.
I think that's the key place where Rob's approach and Ilija's approach differ. Rob's approach (records and dataclass) are trying to solve SAAAD through immutability, one way or another. Ilija's approach is trying to solve SAAAD through pass-by-value semantics.
By-value semantics would be really easy to implement by just auto-cloning an object at a function boundary. However, that's also very wasteful, as the object probably won't be modified, making the clone just a memory hog. The issue is that detecting a modification on nested objects is not particularly easy, which is how Ilija ended up with an explicit syntax to mark such modification. (I personally dislike it, from a DX perspective, but I don't have any suggestions on how to avoid it. If someone else does, please speak up.)
Immutability semantics, as we've seen, seem easy but are actually quite logically complex once you get past the bare minimum. (The bare minimum is already provided by readonly classes. Problem solved.)
So I'm not sure we're all talking about solving the same problem, or solving it in the same way.
Moreover, I don't think we all agree on the use cases we're solving. Let me offer a few examples.
- Fancy typed values
readonly class UserID {
public function __construct(public int $id) {}
}
This is already mostly supported, as above, just a bit verbose. In this case, it makes sense that two equivalent objects are ==, and if we can make them === then that's a nice memory optimization, but not a requirement. In this case, we're really just providing additional typing, and the immutability is trivial (and already supported).
- Product types (part 1)
class Point {
public function __construct(public int $x, public int $y) {}
}
Now here's the interesting part. Should Point be immutable? Should modifications to Point inside a function affect values outside the function? MAYBE! It depends on the context. In most cases, probably not. However, consider a "registration/collection" use case of an event dispatcher:
class RegisterPluginsEvent {
public function __construct(public array $pluginsToRegister) {}
}
This is a "data" class in that it is carrying data, and is not a service. However, we very clearly DO want SAAAD in this case. That's the whole reason it exists. Currently this case is solved by conventional classes, so I don't think there's anything to do here.
- Product types (part 2)
Where it gets interesting is when you do need to modify an object, and propagate those changes, but NOT propagate the ability to change it. Consider:
class Circle {
public function __construct(Point $center, int $radius) {}
}
$c = new Circle(new Point(1, 2), 5);
if ($some_user_data) {
$c->center->x = 10;
}
draw($c);
Here, we do want the ability to modify $c after construction. However, we do NOT want to allow draw() to modify our $c. This case is currently unsolved in PHP.
As above, there's two approaches to solving it: Making $c immutable generally, or making a copy (immediately or delayed) when passing to draw(). Making $c immutable generally would, in this case, be bad, because we do want the ability to modify $c before passing it. It's just much more convenient than needing to compute everything ahead of time and pass it to the constructor like it's just a function.
- Aggregate types
One of the main places that Ilija and I have discussed his structs proposal is collections[1]. In many languages, collections have both an in-place modifier and a clone-along-the-way modifier. For instance, sort()
and sorted(), reverse() and reversed(), etc. (Details vary a little by language.) Some languages also have both mutable and immutable versions of each collection type (Seq, Set, Map), with the in-place methods only available on the mutable variant. There's also then methods to convert a mutable collection into an immutable one and vice versa, which (I believe) implies making a copy. Kotlin does both of the above, and is the model that I have been planning to pursue in PHP, eventually.
Ilija has argued that if we can flag collection classes as pass-by-value, then we don't need the immutable versions at all. The only reason for the immutable versions to exist is to prevent SAAAD. If that's already prevented by the passing semantics, then we don't need an explicitly immutable collection.
So that would mean:
$c = new List();
$c->add(1); // in place mutation.
$c->add(3); // in place mutation.
$c->add(2); // in place mutation.
function doStuff(List $l) {
$l->sort(); // in-place mutation of a value-passed value.
// do stuff with l.
}
doStuff($c);
var_dump($c); // Still ordered 1, 3, 2
So a sorted() method or an ImmutableList class wouldn't be necessary. (I can see a use for sorted() anyway, to make it chainable, just like another recent RFC proposed for the existing sort()
function. That's related but a separate question.)
This approach would not be possible if data/record/struct/whatever classes have any built-in immutability to them. They just become super cumbersome to work with. One way or another, you end up back at the withX() methods that we already have and use.
$c = new List();
$c = $c->add(1);
$c = $c->add(3);
$c = $c->add(2);
// ...
Eew. I can do that already today, and I don't want to.
Here's the important observation: Speaking as the leading functional programming in PHP fanboy, I don't really see much value at all to intra-function immutability. It's just... not useful in PHP. Immutability at function boundaries, that's super useful. But solving the problem at the object-immutability level is the wrong place in PHP. (It is arguably the right place in Haskell or ML, but PHP is not Haskell or ML.)
So IMO, the focus should be on just the function boundary semantics. The main issue is how to make that work without wonky new syntax. Again, I don't have a good answer, but would kindly request one. :-)
Finally, there's the question of equality. Be aware, PHP already does value equality for objects:
The issue isn't that it's not there, it's that it cannot be controlled. I am not convinced that overriding === to mean logical equality rather than physical equality, but only for data objects, is wise. And we already have == handled. (I use that fact in my PHPUnit tests all the time.) What is missing is the ability to control how that == comparison is made.
class Rect {
private int $area;
public function __construct(public readonly int $h, public readonly int $w) {}
public function area(): int {
$this->area ??= $this->h * $this->w;
}
}
$r1 = new Rect(4, 5);
$r2 = new Rect(4, 5);
print $r1->area;
var_dump($r1 == $r2); // What happens here?
Presumably, we'd want those to be equal without having to compute $area on $r2. Right now, that's impossible, and those objects would not be equal. Fixing that has... nothing to do with value semantics at all. It has to do with operator overloading, and I'm already on record that I am very in favor of addressing that.
I hope that gives a better lay of the land for everyone in this thread.
--Larry Garfield
[1] https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/#collections
Hi Rob
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
As others have pointed out, your RFC is very similar to my proposal
for struct. I don't quite understand the reason to compete and race
each other to the finish line. Combined efforts are usually better.
One of the bigger differences between our proposals is the addition of
mutating methods in my proposal compared to yours. You show the
following example in your RFC:
data class Rectangle {
public function __construct(public int $width, public int $height) {}
public function resize(int $width, int $height): static {
$this->height = $height;
$this->width = $width;
return $this;
}
}
The resize method here modifies the instance and thus implicitly
creates a copy. That's fine for such a small structure. However,
note that this still leads to the performance issues we have
previously discussed for growable data structures.
data class Vector {
public function append(mixed $value): static {
/* Internal implementation, $values is some underlying storage. */
$this->values[] = $value;
return $this;
}
}
Calling $vector->append(42);
will increase the refcount of
$vector
, and cause separation on $this->values[] = ...;
. If
$vector->values
is a big storage, cloning will be very expensive.
Hence, appending becomes an O(n) operation (because each element in
the vector is copied to the new structure), and hence appending to an
array in a loop will tank your performance. That's the reason for the
introduction of the $vector->append!(42)
syntax in my proposal. It
separates the value at call-site when necessary, and avoids separation
on $this
in methods altogether.
There might be some general confusion on the performance issue. In one
of your e-mails in the last thread, you have mentioned:
Like Ilija mentioned in their email, there are significant performance optimizations to be had here that are simply not possible using regular (readonly) classes. I didn't go into detail as to how it works because it feels like an implementation detail, but I will spend some time distilling this and its consequences, into the RFC, over the coming days. As a simple illustration, there can be significant memory usage improvements:
100,000 arrays: https://3v4l.org/Z4CcV
100,000 readonly classes: https://3v4l.org/1vhNp
First off, the array example only uses less memory because [1, 2] is a
constant array. When you make it dynamic, they will become way less
efficient than objects. https://3v4l.org/pETM9
But this is not the point I was trying to make either. Rather, when it
comes to immutable, growable data structures, every mutation becomes
an extremely expensive operation because the entire data structure,
including its underlying storage, needs to be copied. For example:
class Vector {
private $values;
public function populate() {
$this->values = range(1, 1_000_000);
}
public function appendMutable() {
$this->values[] = 100_000_001;
}
public function appendImmutable() {
$new = clone $this;
$this->values[] = 100_000_001;
}
}
appendMutable(): float(8.106231689453125E-6)
appendImmutable(): float(0.012187957763671875)
That's a factor of 1 500 difference for an array containing 1 million
numbers. Obviously, concrete numbers will vary, but the problem grows
the bigger the array becomes.
Ilija
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
But this is not the point I was trying to make either. Rather, when it
comes to immutable, growable data structures, every mutation becomes
an extremely expensive operation because the entire data structure,
including its underlying storage, needs to be copied. For example:
Oops, appendImmutable() should of course have modified $new and
returned it. https://3v4l.org/IkSY1
But you get the point.