Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
This adds a new class modifier: data. This modifier drastically changes how classes work, making them comparable by value instead of reference, and any mutations behave more like arrays than objects (by vale). If desired, it can be combined with other modifiers, such as readonly, to enforce immutability.
I've been playing with this feature for a few days now, and it is surprisingly intuitive to use. There is a (mostly) working implementation available on GitHub (https://github.com/php/php-src/pull/16904) if you want to have a go at it.
Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // true
Data classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}
I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Hello
If I remember correctly, the whole concept of "value" is fully described in
DDD book by Eric Evans. If that's the point of the RFC, I wonder of there's
any point in not making such classes immutable by default, and to keep only
one instance of value object unique per given set of properties in memory,
thereby eliminating cloning altogether and optimizing the memory usage.
Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion,
I would like to introduce to you a competing RFC: Data Classes (
https://wiki.php.net/rfc/dataclass).This adds a new class modifier: data. This modifier drastically changes
how classes work, making them comparable by value instead of reference, and
any mutations behave more like arrays than objects (by vale). If desired,
it can be combined with other modifiers, such as readonly, to enforce
immutability.I've been playing with this feature for a few days now, and it is
surprisingly intuitive to use. There is a (mostly) working implementation
available on GitHub (https://github.com/php/php-src/pull/16904) if you
want to have a go at it.Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
If I remember correctly, the whole concept of "value" is fully
described in DDD book by Eric Evans. If that's the point of the RFC, I
wonder of there's any point in not making such classes immutable by
default, and to keep only one instance of value object unique per
given set of properties in memory, thereby eliminating cloning
altogether and optimizing the memory usage.
As I mentioned on the "Records" thread, guaranteeing that every
combination of values will exist exactly once in memory could create
more overhead than it saves.
Certainly if you write this, sharing memory makes a lot of sense:
$arr = []; $i = 0;
while ( $i++ < 100 ) {
$arr[] = new Point(0,0);
}
But if you instead write this, maintaining the cache will end up more
expensive than just allocating each object/record/struct directly:
$arr = []; $i = 0;
while ( $i++ < 100 ) {
$arr[] = new Point($i, $i);
}
If the guarantee is copy-on-write, caching could be a compile-time
optimisation; e.g. an OpCache pass might rewrite the first loop to the
equivalent of this:
$arr = []; $i = 0;
$__cachedPoint = new Point(0,0);
while ( $i++ < 100 ) {
$arr[] = $__cachedPoint;
}
The main thing that would prevent this optimisation is a custom
constructor which might make the number of "new" calls observable.
Either the optimiser would have to detect a custom constructor, or data
classes / structs / records would have to prohibit defining one.
--
Rowan Tommins
[IMSoP]
Hello
If I remember correctly, the whole concept of "value" is fully described in DDD book by Eric Evans. If that's the point of the RFC, I wonder of there's any point in not making such classes immutable by default, and to keep only one instance of value object unique per given set of properties in memory, thereby eliminating cloning altogether and optimizing the memory usage.
__
Hello internals,Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
This adds a new class modifier: data. This modifier drastically changes how classes work, making them comparable by value instead of reference, and any mutations behave more like arrays than objects (by vale). If desired, it can be combined with other modifiers, such as readonly, to enforce immutability.
I've been playing with this feature for a few days now, and it is surprisingly intuitive to use. There is a (mostly) working implementation available on GitHub (https://github.com/php/php-src/pull/16904) if you want to have a go at it.
Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Hello!
Don't forget to bottom-post!
I wonder of there's any point in not making such classes immutable by default, and to keep only one instance of value object unique per given set of properties in memory, thereby eliminating cloning altogether and optimizing the memory usage.
This was the entire point of the records RFC ;) it was immutable by default, but people were wondering what it would look like if it were more 'composable' vs. 'dedicated syntax'.
— Rob
Born from the Records RFC (https://wiki.php.net/rfc/records)
discussion, I would like to introduce to you a competing RFC: Data
Classes (https://wiki.php.net/rfc/dataclass).
Thank you for continuing to think about this, and PoC code is always
useful to work through the implications.
It seems like this is going in a very similar direction to the work
Ilija shared in April: https://externals.io/message/122845 and
https://github.com/php/php-src/pull/13800
My knowledge of the engine isn't good enough to compare the two PRs, but
the descriptions seem very similar. The main differences seem to be
details mentioned in one draft and not the other:
- You have described details for constructors and inheritance, which
Ilija left as open questions - Ilija had considered how instance methods should behave, proposing a
"mutating" keyword and "!" call-site marker. Your RFC doesn't discuss
this - the changeName example shows behaviour inside the method, but
not behaviour when calling it
Is it just an oversight that you didn't link to the previous discussion,
or had you not realised how similar the proposals would end up? Either
way, this looks like ripe ground for collaboration, unless there is some
fundamental disagreement about the approach.
--
Rowan Tommins
[IMSoP]
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
Thank you for continuing to think about this, and PoC code is always useful to work through the implications.
It seems like this is going in a very similar direction to the work Ilija shared in April: https://externals.io/message/122845 and https://github.com/php/php-src/pull/13800
My knowledge of the engine isn't good enough to compare the two PRs, but the descriptions seem very similar. The main differences seem to be details mentioned in one draft and not the other:
- You have described details for constructors and inheritance, which Ilija left as open questions
- Ilija had considered how instance methods should behave, proposing a "mutating" keyword and "!" call-site marker. Your RFC doesn't discuss this - the changeName example shows behaviour inside the method, but not behaviour when calling it
Is it just an oversight that you didn't link to the previous discussion, or had you not realised how similar the proposals would end up? Either way, this looks like ripe ground for collaboration, unless there is some fundamental disagreement about the approach.
--
Rowan Tommins
[IMSoP]
Yes, this is mostly about "composability" vs. dedicated syntax. A bare "data class" is very similar to struct while a "final readonly data class" is very similar to records.
Is it just an oversight that you didn't link to the previous
discussion, or had you not realised how similar the proposals
would end up?
Yes, it is an oversight! I didn't even think to link to it. To be fair, I also didn't link to the records RFC. I've updated the RFC with links. While some behavior is similar to what Ilija described, it is mostly a natural progression of adding a data
modifier to classes. There's no special syntax because classes already have a well-defined syntax.
Your RFC
doesn't discuss this - the changeName example shows behaviour
inside the method, but not behaviour when calling it
An interesting observation, can you explain more as to what you mean? The changeName example is simply about constructors—whose behavior is mostly due to engine limitations. You can't very easily see what happens inside a constructor from outside a constructor (usually).
— Rob
Your RFC doesn't discuss this - the changeName example shows behaviour
inside the method, but not behaviour when calling itAn interesting observation, can you explain more as to what you mean?
Looking closer, there's a hint at what you expect to happen in your
Rectangle example:
$bigRectangle = $rectangle->resize(10, 20);
assert($bigRectangle !== $rectangle); // true
It seems that modifications to $this aren't visible outside the method,
creating a purely local clone, which would be discarded if it wasn't
returned (or saved somewhere).
I can see the logic, but the result is a bit unintuitive:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc();
echo $foo->x; // 1, not 2
I think it would be clearer to prevent direct modification of $this:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++; // ERROR: Can not mutate $this in data class
}
public function withInc(): static {
$new = $this; // explicitly make a local copy of $this
$new->x++; // copy-on-write separates $new from $this
return $new;
}
}
That would still be compatible with Ilija's suggestion, which was to add
special "mutating methods":
data class Example {
public function __construct(public int $x) {}
public mutating function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc!(); // copy-on-write triggered before the method is called
echo $foo->x; // 2
--
Rowan Tommins
[IMSoP]
Your RFC doesn't discuss this - the changeName example shows behaviour
inside the method, but not behaviour when calling itAn interesting observation, can you explain more as to what you mean?
Looking closer, there's a hint at what you expect to happen in your Rectangle example:
$bigRectangle = $rectangle->resize(10, 20);
assert($bigRectangle !== $rectangle); // trueIt seems that modifications to $this aren't visible outside the method, creating a purely local clone, which would be discarded if it wasn't returned (or saved somewhere).
I can see the logic, but the result is a bit unintuitive:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc();
echo $foo->x; // 1, not 2
Interesting! I actually found it to be intuitive.
Think of it like this:
function increment(array $array) {
$array[0]++;
}
$arr = [0];
increment($arr);
echo $arr[0]; // is 0
We don't expect $arr to be any different outside of the function because $arr is a value, not a reference. "data classes" are "values" and not references to values, thus when you modify $this, you modify the value, and it doesn't affect values elsewhere. If you want to keep track of that value, you have to put it somewhere where you can reference it—a return value, global variable, property in a regular class, etc. In any case, lets keep going to see if there is a better way.
I think it would be clearer to prevent direct modification of $this:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++; // ERROR: Can not mutate $this in data class
}
public function withInc(): static {
$new = $this; // explicitly make a local copy of $this
$new->x++; // copy-on-write separates $new from $this
return $new;
}
}
Not that I disagree (see the records RFC), but at that point, why not make data classes implicitly readonly?
That would still be compatible with Ilija's suggestion, which was to add special "mutating methods":
data class Example {
public function __construct(public int $x) {}
public mutating function inc(): void {
$this->x++;
}
}
$foo = new Example(0);
$foo->x++;
$foo->inc!(); // copy-on-write triggered before the method is called
echo $foo->x; // 2
I actually find this appealing, but it is strange to me to allow this syntax on classes. Is there precedent for that? Or is there a way we can do it using "regular looking PHP"; or are structs the way to go?
Another alternative would be that mutations still trigger a copy-on-write, but the outer variable is updated with $this upon return. So this would work:
data class Example {
public function __construct(public int $x) {}
public function inc(): void {
$this->x++;
}
}
$foo = new Example(1);
$bar = $foo;
$foo->inc(); // foo is copied on mutation, and $foo points at the new value on return.
echo $bar->x; // 1
echo $foo->x; // 2
To me, this seems like it would be even more intuitive; $foo has the value you would expect from outside the class and doesn't require you keeping track of the value yourself. Though, there are some footguns here too:
class Foo {
Example $bar;
function baz() {
$bar = $this->bar; // should be $bar = &$this->bar
$bar->inc();
echo $this->bar->x; // not incremented
}
}
I could go either way on this one, honestly; so it makes sense to me why structs would have a dedicated syntax for whichever you prefer. On the one hand, it currently requires you to be explicit all the time (which can be annoying), and on the other hand, there's the implicit copy here, which requires you to be explicit when you want a reference.
I suppose there is a third option as well, and that is to not to any of these options and just have data classes that always compare by value.
??--
Rowan Tommins
[IMSoP]
— Rob
Interesting! I actually found it to be intuitive.
Think of it like this:
function increment(array $array) {
$array[0]++;
}$arr = [0];
increment($arr);
echo $arr[0]; // is 0We don't expect $arr to be any different outside of the function
because $arr is a value, not a reference.
My mental model, rightly or wrongly, is that passing to a parameter is a
bit like an assignment to a local variable:
function increment(array) {
$array = $__args[0];
$array[0]++;
}
(This is explicitly how subroutine parameters work in Perl; I don't know
if that's affected my mental model, or just means Larry Wall pictured it
the same way.)
You can even assign a new value to it, like any other variable:
function whatever(array $array) {
$array = 'not even an array any more';
}
But in PHP, $this isn't a parameter, and it's never possible to assign a
new value to $this; so it feels completely alien to have a method where
$this stops referring to the current object, and becomes a local variable.
I think it would be clearer to prevent direct modification of $this:
Not that I disagree (see the records RFC), but at that point, why not
make data classes implicitly readonly?
I'm only suggesting restricting mutation on $this, not on the object
itself. $foo->x++ would still work, and have automatic copy-on-write;
but $this->x++ would be an error on a data class, just as $this=$bar is
an error on all existing objects.
That would still be compatible with Ilija's suggestion, which was to
add special "mutating methods":I actually find this appealing, but it is strange to me to allow this
syntax on classes. Is there precedent for that? Or is there a way we
can do it using "regular looking PHP"; or are structs the way to go?
The way I see it, it's just a third type of method, to add to the two we
already have:
- instance methods: $this refers to the current instance
- static methods: $this is forbidden
- mutating methods: $this refers to the desired result of the mutation
In fact, it's a bit like __construct or __clone, where $this refers to
the newly created/copied object, before anything else points to it.
--
Rowan Tommins
[IMSoP]
Rob Landers rob@bottled.codes hat am 23.11.2024 14:11 CET geschrieben:
Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
This adds a new class modifier: data. This modifier drastically changes how classes work, making them comparable by value instead of reference, and any mutations behave more like arrays than objects (by vale). If desired, it can be combined with other modifiers, such as readonly, to enforce immutability.
I've been playing with this feature for a few days now, and it is surprisingly intuitive to use. There is a (mostly) working implementation available on GitHub (https://github.com/php/php-src/pull/16904) if you want to have a go at it.
Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Thanks for the rfc!
From userland perspective I would prefer to have the cloning more explicitly, e.g.
return clone $this($this->x + $other->x, $this->y + $other->y);
or
return clone $this(x: $this->x + $other->x); // clone with y unchanged
Best Regards
Thomas
Hello internals,
Born from the Records RFC (https://wiki.php.net/rfc/records)
discussion, I would like to introduce to you a competing RFC: Data
Classes (https://wiki.php.net/rfc/dataclass).This adds a new class modifier: data. This modifier drastically changes
how classes work, making them comparable by value instead of reference,
and any mutations behave more like arrays than objects (by vale). If
desired, it can be combined with other modifiers, such as readonly, to
enforce immutability.I've been playing with this feature for a few days now, and it is
surprisingly intuitive to use. There is a (mostly) working
implementation available on GitHub
(https://github.com/php/php-src/pull/16904) if you want to have a go at
it.Example:
data class UserId { public function __construct(public int $id) {} }
$user = new UserId(12);
// later
$admin = new UserId(12);
if ($admin === $user) { // do something } // trueData classes are true value objects, with full copy-on-write optimizations:
data class Point {
public function __construct(public int $x, public int $y) {}
public function add(Point $other): Point {
// illustrating value semantics, no copy yet
$previous = $this;
// a copy happens on the next line
$this->x = $this->x + $other->x;
$this->y = $this->y + $other->y;
assert($this !== $previous); // passes
return $this;
}
}I think this would be an amazing addition to PHP.
Sincerely,
— Rob
Oh boy. Again, I think there's too much going on here, but I think that's because different people are operating under a different definition of what "value semantics" means. Let me try to break down what I think are the constituent parts.
-
Pass-by-value. This is what arrays, ints, strings, etc. do. When you pass a value to a function, what you get is logically a new value. It may be equal to the old one, it may be the same memory location as the old one, but that's hidden from you. Logically, it's a new value. (And if there's a shared memory location, CoW hides that from you, too.) The intent here is to avoid "spooky action at a distance" (SAAAD) (that is, changing a value inside a function is guaranteed to not have any effect on the function that called it).
-
Logical equality. This only applies to compound values (arrays and objects), but would imply checking equality by recursively checking equality on sub-elements. (Properties in the case of objects, keys in the case of arrays.)
-
Physical equality. This is what === does, and checks that two variables refer to the same memory location. Physical equality implies logical equality, but not vice versa.
-
Immutability. A given variable's value cannot change.
-
Product types. A type that is based on two or more other types. (Eg, Point is a product of int and int.)
These are all circling around the same problem space, but are all different things. For instance, rigidly immutable values make pass-by-value irrelevant, while pass-by-value avoids SAAD without needing immutability.
I think that's the key place where Rob's approach and Ilija's approach differ. Rob's approach (records and dataclass) are trying to solve SAAAD through immutability, one way or another. Ilija's approach is trying to solve SAAAD through pass-by-value semantics.
By-value semantics would be really easy to implement by just auto-cloning an object at a function boundary. However, that's also very wasteful, as the object probably won't be modified, making the clone just a memory hog. The issue is that detecting a modification on nested objects is not particularly easy, which is how Ilija ended up with an explicit syntax to mark such modification. (I personally dislike it, from a DX perspective, but I don't have any suggestions on how to avoid it. If someone else does, please speak up.)
Immutability semantics, as we've seen, seem easy but are actually quite logically complex once you get past the bare minimum. (The bare minimum is already provided by readonly classes. Problem solved.)
So I'm not sure we're all talking about solving the same problem, or solving it in the same way.
Moreover, I don't think we all agree on the use cases we're solving. Let me offer a few examples.
- Fancy typed values
readonly class UserID {
public function __construct(public int $id) {}
}
This is already mostly supported, as above, just a bit verbose. In this case, it makes sense that two equivalent objects are ==, and if we can make them === then that's a nice memory optimization, but not a requirement. In this case, we're really just providing additional typing, and the immutability is trivial (and already supported).
- Product types (part 1)
class Point {
public function __construct(public int $x, public int $y) {}
}
Now here's the interesting part. Should Point be immutable? Should modifications to Point inside a function affect values outside the function? MAYBE! It depends on the context. In most cases, probably not. However, consider a "registration/collection" use case of an event dispatcher:
class RegisterPluginsEvent {
public function __construct(public array $pluginsToRegister) {}
}
This is a "data" class in that it is carrying data, and is not a service. However, we very clearly DO want SAAAD in this case. That's the whole reason it exists. Currently this case is solved by conventional classes, so I don't think there's anything to do here.
- Product types (part 2)
Where it gets interesting is when you do need to modify an object, and propagate those changes, but NOT propagate the ability to change it. Consider:
class Circle {
public function __construct(Point $center, int $radius) {}
}
$c = new Circle(new Point(1, 2), 5);
if ($some_user_data) {
$c->center->x = 10;
}
draw($c);
Here, we do want the ability to modify $c after construction. However, we do NOT want to allow draw() to modify our $c. This case is currently unsolved in PHP.
As above, there's two approaches to solving it: Making $c immutable generally, or making a copy (immediately or delayed) when passing to draw(). Making $c immutable generally would, in this case, be bad, because we do want the ability to modify $c before passing it. It's just much more convenient than needing to compute everything ahead of time and pass it to the constructor like it's just a function.
- Aggregate types
One of the main places that Ilija and I have discussed his structs proposal is collections[1]. In many languages, collections have both an in-place modifier and a clone-along-the-way modifier. For instance, sort()
and sorted(), reverse() and reversed(), etc. (Details vary a little by language.) Some languages also have both mutable and immutable versions of each collection type (Seq, Set, Map), with the in-place methods only available on the mutable variant. There's also then methods to convert a mutable collection into an immutable one and vice versa, which (I believe) implies making a copy. Kotlin does both of the above, and is the model that I have been planning to pursue in PHP, eventually.
Ilija has argued that if we can flag collection classes as pass-by-value, then we don't need the immutable versions at all. The only reason for the immutable versions to exist is to prevent SAAAD. If that's already prevented by the passing semantics, then we don't need an explicitly immutable collection.
So that would mean:
$c = new List();
$c->add(1); // in place mutation.
$c->add(3); // in place mutation.
$c->add(2); // in place mutation.
function doStuff(List $l) {
$l->sort(); // in-place mutation of a value-passed value.
// do stuff with l.
}
doStuff($c);
var_dump($c); // Still ordered 1, 3, 2
So a sorted() method or an ImmutableList class wouldn't be necessary. (I can see a use for sorted() anyway, to make it chainable, just like another recent RFC proposed for the existing sort()
function. That's related but a separate question.)
This approach would not be possible if data/record/struct/whatever classes have any built-in immutability to them. They just become super cumbersome to work with. One way or another, you end up back at the withX() methods that we already have and use.
$c = new List();
$c = $c->add(1);
$c = $c->add(3);
$c = $c->add(2);
// ...
Eew. I can do that already today, and I don't want to.
Here's the important observation: Speaking as the leading functional programming in PHP fanboy, I don't really see much value at all to intra-function immutability. It's just... not useful in PHP. Immutability at function boundaries, that's super useful. But solving the problem at the object-immutability level is the wrong place in PHP. (It is arguably the right place in Haskell or ML, but PHP is not Haskell or ML.)
So IMO, the focus should be on just the function boundary semantics. The main issue is how to make that work without wonky new syntax. Again, I don't have a good answer, but would kindly request one. :-)
Finally, there's the question of equality. Be aware, PHP already does value equality for objects:
The issue isn't that it's not there, it's that it cannot be controlled. I am not convinced that overriding === to mean logical equality rather than physical equality, but only for data objects, is wise. And we already have == handled. (I use that fact in my PHPUnit tests all the time.) What is missing is the ability to control how that == comparison is made.
class Rect {
private int $area;
public function __construct(public readonly int $h, public readonly int $w) {}
public function area(): int {
$this->area ??= $this->h * $this->w;
}
}
$r1 = new Rect(4, 5);
$r2 = new Rect(4, 5);
print $r1->area;
var_dump($r1 == $r2); // What happens here?
Presumably, we'd want those to be equal without having to compute $area on $r2. Right now, that's impossible, and those objects would not be equal. Fixing that has... nothing to do with value semantics at all. It has to do with operator overloading, and I'm already on record that I am very in favor of addressing that.
I hope that gives a better lay of the land for everyone in this thread.
--Larry Garfield
[1] https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/#collections
Hi Larry,
I think that's a useful breakdown of the concepts involved. I don't
think it's a bad thing to have a feature that covers multiple of them -
common cases shouldn't need a long string of modifiers like "immutable
copyonwrite valueequality class Point { ... }" - but being explicit
about what we are and are not including is wise.
There is one point I'd like to nitpick on, though:
- Physical equality. This is what === does, and checks that two variables refer to the same memory location. Physical equality implies logical equality, but not vice versa.
PHP's === operator is not, in general, an identity operator; it is a
"strict equality" operator, whose exact meaning depends on the type of
its operands.
For "scalar" types, it checks the concrete type and the value: 1+1 ===
2, and strtoupper('hello') === 'HELLO'
For arrays, the definition is applied recursively: two arrays are
loosely equal if all their elements are loosely equal, and strictly
equal if all their elements are strictly equal.
Objects are really the outlier, overloading the operator to mean
"identity" rather than applying a strict value comparison.
Example: https://3v4l.org/udOoU
If we introduce some new "value type", it seems very reasonable to use
the same recursive definition of strict equality used for arrays.
--
Rowan Tommins
[IMSoP]
Hi Larry,
I think that's a useful breakdown of the concepts involved. I don't
think it's a bad thing to have a feature that covers multiple of them -
common cases shouldn't need a long string of modifiers like "immutable
copyonwrite valueequality class Point { ... }" - but being explicit
about what we are and are not including is wise.There is one point I'd like to nitpick on, though:
- Physical equality. This is what === does, and checks that two variables refer to the same memory location. Physical equality implies logical equality, but not vice versa.
PHP's === operator is not, in general, an identity operator; it is a
"strict equality" operator, whose exact meaning depends on the type of
its operands.For "scalar" types, it checks the concrete type and the value: 1+1 ===
2, and strtoupper('hello') === 'HELLO'For arrays, the definition is applied recursively: two arrays are
loosely equal if all their elements are loosely equal, and strictly
equal if all their elements are strictly equal.Objects are really the outlier, overloading the operator to mean
"identity" rather than applying a strict value comparison.Example: https://3v4l.org/udOoU
If we introduce some new "value type", it seems very reasonable to use
the same recursive definition of strict equality used for arrays.--
Rowan Tommins
[IMSoP]
Valid point, thank you. Though I'm still not convinced that value object equality behaving differently from normal object equality is going to be self-evident or intuitive.
Either way, my core point still stands: Give me a way to directly control how equality works per-class, like most of our sibling languages do, and we're all set. Problem solved.
--Larry Garfield
Hi Rob
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
As others have pointed out, your RFC is very similar to my proposal
for struct. I don't quite understand the reason to compete and race
each other to the finish line. Combined efforts are usually better.
One of the bigger differences between our proposals is the addition of
mutating methods in my proposal compared to yours. You show the
following example in your RFC:
data class Rectangle {
public function __construct(public int $width, public int $height) {}
public function resize(int $width, int $height): static {
$this->height = $height;
$this->width = $width;
return $this;
}
}
The resize method here modifies the instance and thus implicitly
creates a copy. That's fine for such a small structure. However,
note that this still leads to the performance issues we have
previously discussed for growable data structures.
data class Vector {
public function append(mixed $value): static {
/* Internal implementation, $values is some underlying storage. */
$this->values[] = $value;
return $this;
}
}
Calling $vector->append(42);
will increase the refcount of
$vector
, and cause separation on $this->values[] = ...;
. If
$vector->values
is a big storage, cloning will be very expensive.
Hence, appending becomes an O(n) operation (because each element in
the vector is copied to the new structure), and hence appending to an
array in a loop will tank your performance. That's the reason for the
introduction of the $vector->append!(42)
syntax in my proposal. It
separates the value at call-site when necessary, and avoids separation
on $this
in methods altogether.
There might be some general confusion on the performance issue. In one
of your e-mails in the last thread, you have mentioned:
Like Ilija mentioned in their email, there are significant performance optimizations to be had here that are simply not possible using regular (readonly) classes. I didn't go into detail as to how it works because it feels like an implementation detail, but I will spend some time distilling this and its consequences, into the RFC, over the coming days. As a simple illustration, there can be significant memory usage improvements:
100,000 arrays: https://3v4l.org/Z4CcV
100,000 readonly classes: https://3v4l.org/1vhNp
First off, the array example only uses less memory because [1, 2] is a
constant array. When you make it dynamic, they will become way less
efficient than objects. https://3v4l.org/pETM9
But this is not the point I was trying to make either. Rather, when it
comes to immutable, growable data structures, every mutation becomes
an extremely expensive operation because the entire data structure,
including its underlying storage, needs to be copied. For example:
class Vector {
private $values;
public function populate() {
$this->values = range(1, 1_000_000);
}
public function appendMutable() {
$this->values[] = 100_000_001;
}
public function appendImmutable() {
$new = clone $this;
$this->values[] = 100_000_001;
}
}
appendMutable(): float(8.106231689453125E-6)
appendImmutable(): float(0.012187957763671875)
That's a factor of 1 500 difference for an array containing 1 million
numbers. Obviously, concrete numbers will vary, but the problem grows
the bigger the array becomes.
Ilija
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
But this is not the point I was trying to make either. Rather, when it
comes to immutable, growable data structures, every mutation becomes
an extremely expensive operation because the entire data structure,
including its underlying storage, needs to be copied. For example:
Oops, appendImmutable() should of course have modified $new and
returned it. https://3v4l.org/IkSY1
But you get the point.
Hi Rob
Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
As others have pointed out, your RFC is very similar to my proposal
for struct. I don't quite understand the reason to compete and race
each other to the finish line. Combined efforts are usually better.
This isn't a race or competition, and I meant "competing with records" and not "competing with Ilija." Though, I see how it could be interpreted that way.
To be honest, I don't like this RFC. I think records and/or structs are the right answer (dedicated behavior vs. trying to stuff more behavior into classes). As mentioned elsewhere in the thread, the behavior here is more of an emergent property than any actual thinking through it. So, the fact that it is similar to your proposal is merely coincidental; it wasn't planned that way (nor did I catch it before submitting the RFC).
This mailing list is the only real interaction I have with other php-src devs (and the occasional PR to php-src), so please forgive me if I don't notice something. Yes, that isn't a real excuse, but not having other people to bounce ideas off of usually means I'll make an idiot out of myself, eventually.
That being said, there are a ton of edge cases I've uncovered where everything breaks down in this RFC:
• PHP references are "evil", so I've discovered.
• "new" generates some strange opcodes—I'm currently investigating this—that make dealing with value types difficult. I'm trying to solve this in some way that doesn't require changing the generated opcodes, but that might be impossible. This solution would allow using "new" with records and solve the constructor problem in this RFC.
• Did I mention PHP references are evil?
In any case, I'd much rather help with your structs proposal, as the more I work on this, the more I don't like it.
One of the bigger differences between our proposals is the addition of
mutating methods in my proposal compared to yours. You show the
following example in your RFC:data class Rectangle { public function __construct(public int $width, public int $height) {} public function resize(int $width, int $height): static { $this->height = $height; $this->width = $width; return $this; } }
The resize method here modifies the instance and thus implicitly
creates a copy. That's fine for such a small structure. However,
note that this still leads to the performance issues we have
previously discussed for growable data structures.data class Vector { public function append(mixed $value): static { /* Internal implementation, $values is some underlying storage. */ $this->values[] = $value; return $this; } }
Calling
$vector->append(42);
will increase the refcount of
$vector
, and cause separation on$this->values[] = ...;
. If
$vector->values
is a big storage, cloning will be very expensive.
Hence, appending becomes an O(n) operation (because each element in
the vector is copied to the new structure), and hence appending to an
array in a loop will tank your performance. That's the reason for the
introduction of the$vector->append!(42)
syntax in my proposal. It
separates the value at call-site when necessary, and avoids separation
on$this
in methods altogether.
As I mentioned to Larry in the records discussion, the biggest problem with a "data class" is that it really needs dedicated syntax to be done properly (such as the ones in structs, and I plan to remove some syntax features from records since I got a lot of negative feedback about the syntax there; shout out to reddit). I don't think that would belong to a general class but rather a dedicated type. There really isn't a "one size fits all" solution here.
There might be some general confusion on the performance issue. In one
of your e-mails in the last thread, you have mentioned:Like Ilija mentioned in their email, there are significant performance optimizations to be had here that are simply not possible using regular (readonly) classes. I didn't go into detail as to how it works because it feels like an implementation detail, but I will spend some time distilling this and its consequences, into the RFC, over the coming days. As a simple illustration, there can be significant memory usage improvements:
100,000 arrays: https://3v4l.org/Z4CcV
100,000 readonly classes: https://3v4l.org/1vhNpFirst off, the array example only uses less memory because [1, 2] is a
constant array. When you make it dynamic, they will become way less
efficient than objects. https://3v4l.org/pETM9But this is not the point I was trying to make either. Rather, when it
comes to immutable, growable data structures, every mutation becomes
an extremely expensive operation because the entire data structure,
including its underlying storage, needs to be copied. For example:class Vector { private $values; public function populate() { $this->values = range(1, 1_000_000); } public function appendMutable() { $this->values[] = 100_000_001; } public function appendImmutable() { $new = clone $this; $this->values[] = 100_000_001; } }
appendMutable(): float(8.106231689453125E-6)
appendImmutable(): float(0.012187957763671875)That's a factor of 1 500 difference for an array containing 1 million
numbers. Obviously, concrete numbers will vary, but the problem grows
the bigger the array becomes.Ilija
I'm not quite focused on actual performance (yet), but I understand your point. To that, I have an idea I've been playing around with to have "layered hashmaps" where a copy/clone is just a layer on the original (made immutable) hashmap, thus having a near zero cost for copies. There's still extra indirection involved, so it potentially could be worse 🤷. I would think an average case O(1) would be faster than a copy, at least for large maps, but wall-clock time and Big O don't always correlate. I also note that the benchmarks in the repo are based on operations, not time, so it is quite difficult to show that a feature that increases the number of operations decreases the overall time. In other words, you may add 100 operations with a strong cache locality and remove 10 with poor cache locality -- quality vs. quantity, an age-old problem. Anyway, there's no way to tell the exact performance characteristics until I finish the implementation. I'd love to discuss this more, as it's actually pretty neat and interesting to solve in a performant way.
It could also be like my zend_string refactor I spent a few months on over the summer, where the performance isn't affected (at least for wall-clock time), and the only added benefits showing when you perform lots of string modifications. Other benefits won't make sense without already having an RFC I was working on (that is still in the draft stage). So, for now, that branch will just gather dust.
Things like this are largely why I didn't put too much information about the performance characteristics of records in my RFC. It's really an implementation detail, and I didn't want people's decisions to be tied to an implementation detail that could change drastically. In other words, maybe the performance would be poor now, but I'm confident it can be improved later. IMHO, we should focus on whether we want the feature in the first place rather than worrying about how fast or slow it is. Many improvements don't make sense for the sake of improving them, and it isn't until there is a problem to be solved (such as a poor performing feature) that the improvement becomes worthwhile.
— Rob
This isn't a race or competition, and I meant "competing with records" and not "competing with Ilija." Though, I see how it could be interpreted that way.
Rob,
You are making it a race. You did the same thing to Ilija now what you tried to do to Gina a while back with function autoloading.
And no this is not about the wording you chose.
I know you have your opinion about internals, I've seen your reddit complaints a couple of times, but I would urge you open up to a different perspective.
I can understand really wanting to land a particular feature into PHP, but I would urge you to collaborate instead of making close copies of the same proposal.
Or if you have a different proposal, try to contact the authors of the currently running proposal to see if the combined efforts "can be greater than the sum of the individual parts".
This kind of behaviour isn't doing you credit.
If you're that eager to land changes in PHP, feel free to go through https://github.com/php/php-src/issues, there are always more issues than we can handle.
I usually only lurk on the mailing list, but the thread really stood out to me.
This is all I have to say for now, and I probably won't interact with this thread again.
Kind regards
Niels