Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:126045 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 0CDF01A00BD for ; Sun, 24 Nov 2024 18:11:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1732471682; bh=2jLBAeSRnD8CrlQUS/ppV5Q8WYTxXZSHsCciQZ41NK4=; h=Date:From:To:In-Reply-To:References:Subject:From; b=A+dF4HMs1k8cRVD2g5HEHvto820CTfqGNIX7wZMlxIZz3tXkcACehSGp1ktP8nuGZ JSAxHLk0lqNQ2rbuQa+Oa3QdD2j4lfjLxH4t/WB8DcnrQ+YqSllozyAoaudYgyFqP/ nq3h0z/BjlteLLqsvofE/gXZa+twFhAO6fb7TXS3JAPeP0tYIAaUXVJXi6eIsYeBKy YmhuZEMkjhX5NHI2fnVFFmy4y6W53fVPlcP2knUS45ha+gr/h30dGlq2mPeVYY3xN4 EnPbQNvOYPSEnuLQ+waLvuPcQDtErnUZq83XliMz9FJ2Y1WNL69bc/cXzPlNbolYIu RF/p+LtSkR0pw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 07C4B18056F for ; Sun, 24 Nov 2024 18:08:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fhigh-a8-smtp.messagingengine.com (fhigh-a8-smtp.messagingengine.com [103.168.172.159]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 24 Nov 2024 18:08:00 +0000 (UTC) Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfhigh.phl.internal (Postfix) with ESMTP id 64A3A1140073 for ; Sun, 24 Nov 2024 06:51:11 -0500 (EST) Received: from phl-imap-09 ([10.202.2.99]) by phl-compute-01.internal (MEProxy); Sun, 24 Nov 2024 06:51:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1732449071; x=1732535471; bh=zhWuJTOiEv Q1rRGmIdnbmV1ra07Z3Af3kBay2iH+AnM=; b=Z02sk3CPi/quK3VyEndKzsmNJu /FHhcXhd4/VGOmBtWjQ1twJXTaXYq00V4z6/5/a+yDauZ0CeFlzcQu4xl8A09aZ2 9gMk7naUv59Qve3MvDvj2ewKVMdtiu4BYPKQLu3Jb9udEH1LM7PQoRJfrh5YujUw fKkQgtuQlXdUOi98Sj/Ce9Cl8iv1sozTr4z1AYZvC53HIje6kom7js1lBjf9zi16 tMN1lpQ312cZ+ADHrBH8CtMOSohj55DYc//I92+eOjYteYob8PtjZt5jI4O6p9iD tqxgVXQ2KqW4CiwoWvTXdQ3lLrtd2+irEMeUQrBCFFH5j7iWKMn5FXellnxw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1732449071; x=1732535471; bh=zhWuJTOiEvQ1rRGmIdnbmV1ra07Z3Af3kBa y2iH+AnM=; b=IxNGjChmKS8jckUQUbfsugzUjCC7fYKAoljmpo/QlUzyYHQwvvz 2cEnyecLecg+c4t2Oy/+aD/drAUTU15vHbjnGWbDTETHgkjE/uXpVVCu+V/nTGSf 4X+ZO+B/3Y0NpG2NxdAnUzlozlp3Nove/WMHXwBFoZxa5h3xDmdnLL14aSNXsMxG cZj4XGUU9Ow6rsNudANjhLXqD0TgU+ZWgKUgR9bXRbWH2BesY3rxneqhR+cme2iU YeqDOEDd3nAdS6wukmsGI9fk6aA6G6erqGAyBMgy6ceEeSM7DfRgIAkrLV+2X/Qt u1mn93hqX1KCnDV2EsLxpwrh3f3xOsNXgWQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefuddrgeefgdeffecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdpuffr tefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecuogfuuhhsphgvtghtffhomh grihhnucdlgeelmdenucfjughrpefoggffhffvkfgjfhfutgesrgdtreerredtjeenucfh rhhomhepfdftohgsucfnrghnuggvrhhsfdcuoehrohgssegsohhtthhlvggurdgtohguvg hsqeenucggtffrrghtthgvrhhnpeekleelgeejhffhgfevleekuddtkeduveduveeutdet heeghfdtkeffvdejlefgteenucffohhmrghinhepphhhphdrnhgvthdpfehvgehlrdhorh hgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprhho sgessghothhtlhgvugdrtghouggvshdpnhgspghrtghpthhtohepuddpmhhouggvpehsmh htphhouhhtpdhrtghpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrphhhphdrnhgv th X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id 125F478006A; Sun, 24 Nov 2024 06:51:11 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Date: Sun, 24 Nov 2024 12:49:44 +0100 To: internals@lists.php.net Message-ID: <1fbc115a-dc22-4997-84cb-0ed7de2990dc@app.fastmail.com> In-Reply-To: References: <18b85ba5-5f1c-489c-9096-3ae203977fbe@app.fastmail.com> Subject: Re: [PHP-DEV] [RFC] Data Classes Content-Type: multipart/alternative; boundary=ae168ccaf1ad4b60a2d588d5eba20c11 From: rob@bottled.codes ("Rob Landers") --ae168ccaf1ad4b60a2d588d5eba20c11 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sat, Nov 23, 2024, at 23:10, Ilija Tovilo wrote: > Hi Rob >=20 > On Sat, Nov 23, 2024 at 2:12=E2=80=AFPM Rob Landers wrote: > > > > Born from the Records RFC (https://wiki.php.net/rfc/records) discuss= ion, I would like to introduce to you a competing RFC: Data Classes (htt= ps://wiki.php.net/rfc/dataclass). >=20 > As others have pointed out, your RFC is very similar to my proposal > for struct. I don't quite understand the reason to compete and race > each other to the finish line. Combined efforts are usually better. This isn't a race or competition, and I meant "competing with records" a= nd not "competing with Ilija." Though, I see how it could be interpreted= that way. To be honest, I don't like this RFC. I think records and/or structs are = the right answer (dedicated behavior vs. trying to stuff more behavior i= nto classes). As mentioned elsewhere in the thread, the behavior here is= more of an emergent property than any actual thinking through it. So, t= he fact that it is similar to your proposal is merely coincidental; it w= asn't planned that way (nor did I catch it before submitting the RFC).=20 This mailing list is the only real interaction I have with other php-src= devs (and the occasional PR to php-src), so please forgive me if I don'= t notice something. Yes, that isn't a real excuse, but not having other = people to bounce ideas off of usually means I'll make an idiot out of my= self, eventually. That being said, there are a ton of edge cases I've uncovered where ever= ything breaks down in this RFC: =E2=80=A2 PHP references are "evil", so I've discovered. =E2=80=A2 "new" generates some strange opcodes=E2=80=94I'm currently in= vestigating this=E2=80=94that make dealing with value types difficult. I= 'm trying to solve this in some way that doesn't require changing the ge= nerated opcodes, but that might be impossible. This solution would allow= using "new" with records and solve the constructor problem in this RFC. =E2=80=A2 Did I mention PHP references are evil? In any case, I'd much rather help with your structs proposal, as the mor= e I work on this, the more I don't like it. >=20 > One of the bigger differences between our proposals is the addition of > mutating methods in my proposal compared to yours. You show the > following example in your RFC: >=20 > ```php > data class Rectangle { > public function __construct(public int $width, public int $height)= {} >=20 > public function resize(int $width, int $height): static { > $this->height =3D $height; > $this->width =3D $width; > return $this; > } > } > ``` >=20 > The resize method here modifies the instance and thus implicitly > creates a copy. That's _fine_ for such a small structure. However, > note that this still leads to the performance issues we have > previously discussed for growable data structures. >=20 > ```php > data class Vector { > public function append(mixed $value): static { > /* Internal implementation, $values is some underlying storage= . */ > $this->values[] =3D $value; > return $this; > } > } > ``` >=20 > Calling `$vector->append(42);` will increase the refcount of > `$vector`, and cause separation on `$this->values[] =3D ...;`. If > `$vector->values` is a big storage, cloning will be very expensive. > Hence, appending becomes an O(n) operation (because each element in > the vector is copied to the new structure), and hence appending to an > array in a loop will tank your performance. That's the reason for the > introduction of the `$vector->append!(42)` syntax in my proposal. It > separates the value at call-site when necessary, and avoids separation > on `$this` in methods altogether. As I mentioned to Larry in the records discussion, the biggest problem w= ith a "data class" is that it really needs dedicated syntax to be done p= roperly (such as the ones in structs, and I plan to remove some syntax f= eatures from records since I got a lot of negative feedback about the sy= ntax there; shout out to reddit). I don't think that would belong to a g= eneral class but rather a dedicated type. There really isn't a "one size= fits all" solution here. >=20 > There might be some general confusion on the performance issue. In one > of your e-mails in the last thread, you have mentioned: >=20 > > Like Ilija mentioned in their email, there are significant performan= ce optimizations to be had here that are simply not possible using regul= ar (readonly) classes. I didn't go into detail as to how it works becaus= e it feels like an implementation detail, but I will spend some time dis= tilling this and its consequences, into the RFC, over the coming days. A= s a simple illustration, there can be significant memory usage improveme= nts: > > > > 100,000 arrays: https://3v4l.org/Z4CcV > > 100,000 readonly classes: https://3v4l.org/1vhNp >=20 > First off, the array example only uses less memory because [1, 2] is a > constant array. When you make it dynamic, they will become way less > efficient than objects. https://3v4l.org/pETM9 >=20 > But this is not the point I was trying to make either. Rather, when it > comes to immutable, growable data structures, every mutation becomes > an extremely expensive operation because the entire data structure, > including its underlying storage, needs to be copied. For example: >=20 > https://3v4l.org/BEsYT >=20 > ```php > class Vector { > private $values; >=20 > public function populate() { > $this->values =3D range(1, 1_000_000); > } >=20 > public function appendMutable() { > $this->values[] =3D 100_000_001; > } >=20 > public function appendImmutable() { > $new =3D clone $this; > $this->values[] =3D 100_000_001; > } > } > ``` >=20 > > appendMutable(): float(8.106231689453125E-6) > > appendImmutable(): float(0.012187957763671875) >=20 > That's a factor of 1 500 difference for an array containing 1 million > numbers. Obviously, concrete numbers will vary, but the problem grows > the bigger the array becomes. >=20 > Ilija >=20 I'm not quite focused on actual performance (yet), but I understand your= point. To that, I have an idea I've been playing around with to have "l= ayered hashmaps" where a copy/clone is just a layer on the original (mad= e immutable) hashmap, thus having a near zero cost for copies. There's s= till extra indirection involved, so it potentially could be worse =F0=9F= =A4=B7. I would think an average case O(1) would be faster than a copy, = at least for large maps, but wall-clock time and Big O don't always corr= elate. I also note that the benchmarks in the repo are based on operatio= ns, not time, so it is quite difficult to show that a feature that incre= ases the number of operations decreases the overall time. In other words= , you may add 100 operations with a strong cache locality and remove 10 = with poor cache locality -- quality vs. quantity, an age-old problem. An= yway, there's no way to tell the exact performance characteristics until= I finish the implementation. I'd love to discuss this more, as it's act= ually pretty neat and interesting to solve in a performant way. It could also be like my zend_string refactor I spent a few months on ov= er the summer, where the performance isn't affected (at least for wall-c= lock time), and the only added benefits showing when you perform lots of= string modifications. Other benefits won't make sense without already h= aving an RFC I was working on (that is still in the draft stage). So, fo= r now, that branch will just gather dust. Things like this are largely why I didn't put too much information about= the performance characteristics of records in my RFC. It's really an im= plementation detail, and I didn't want people's decisions to be tied to = an implementation detail that could change drastically. In other words, = maybe the performance would be poor now, but I'm confident it can be imp= roved later. IMHO, we should focus on whether we want the feature in the= first place rather than worrying about how fast or slow it is. Many imp= rovements don't make sense for the sake of improving them, and it isn't = until there is a problem to be solved (such as a poor performing feature= ) that the improvement becomes worthwhile. =E2=80=94 Rob --ae168ccaf1ad4b60a2d588d5eba20c11 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
On Sat, Nov 23,= 2024, at 23:10, Ilija Tovilo wrote:
Hi Rob

On Sat, No= v 23, 2024 at 2:12=E2=80=AFPM Rob Landers <rob@bottled.codes> wrote:
>
=
> Born from the Records RFC (https://wiki.php.net/rfc/records) discussion, I would like= to introduce to you a competing RFC: Data Classes (https://wiki.php.net/rfc/dataclass).
<= /div>

As others have pointed out, your RFC is very si= milar to my proposal
for struct. I don't quite understand = the reason to compete and race
each other to the finish li= ne. Combined efforts are usually better.

=
This isn't a race or competition, and I meant "competing with= records" and not "competing with Ilija." Though, I see how it coul= d be interpreted that way.

To be honest, I = don't like this RFC. I think records and/or structs are the right answer= (dedicated behavior vs. trying to stuff more behavior into classes). As= mentioned elsewhere in the thread, the behavior here is more of an emer= gent property than any actual thinking through it. So, the fact that it = is similar to your proposal is merely coincidental; it wasn't planned th= at way (nor did I catch it before submitting the RFC).
This mailing list is the only real interaction I have with = other php-src devs (and the occasional PR to php-src), so please forgive= me if I don't notice something. Yes, that isn't a real excuse, but not = having other people to bounce ideas off of usually means I'll make an id= iot out of myself, eventually.

That being s= aid, there are a ton of edge cases I've uncovered where everything break= s down in this RFC:
  • PHP references are "evil", so I've = discovered.
  • "new" generates some strange opcodes=E2=80=94I'm= currently investigating this=E2=80=94that make dealing with value types= difficult. I'm trying to solve this in some way that doesn't require ch= anging the generated opcodes, but that might be impossible. This solutio= n would allow using "new" with records and solve the constructor problem= in this RFC.
  • Did I mention PHP references are evil?
In any case, I'd much rather help with your structs proposal,= as the more I work on this, the more I don't like it.


On= e of the bigger differences between our proposals is the addition of
=
mutating methods in my proposal compared to yours. You show t= he
following example in your RFC:

=
```php
data class Rectangle {
 &nb= sp;  public function __construct(public int $width, public int $hei= ght) {}

    public function = resize(int $width, int $height): static {
  &nbs= p;     $this->height =3D $height;
&= nbsp;       $this->width =3D $width;
        return $this;
=
    }
}
```

The resize method here modifies the instance and = thus implicitly
creates a copy. That's _fine_ for such a s= mall structure. However,
note that this still leads to the= performance issues we have
previously discussed for growa= ble data structures.

```php
d= ata class Vector {
    public function appe= nd(mixed $value): static {
     &= nbsp;  /* Internal implementation, $values is some underlying stora= ge. */
        $this-&g= t;values[] =3D $value;
      = ;  return $this;
    }
}=
```

Calling `$vector->app= end(42);` will increase the refcount of
`$vector`, and cau= se separation on `$this->values[] =3D ...;`. If
`$vecto= r->values` is a big storage, cloning will be very expensive.
Hence, appending becomes an O(n) operation (because each element i= n
the vector is copied to the new structure), and hence ap= pending to an
array in a loop will tank your performance.&= nbsp; That's the reason for the
introduction of the `$vect= or->append!(42)` syntax in my proposal. It
separates th= e value at call-site when necessary, and avoids separation
on `$this` in methods altogether.

=
As I mentioned to Larry in the records discussion, the biggest prob= lem with a "data class" is that it really needs dedicated syntax to be d= one properly (such as the ones in structs, and I plan to remove some syn= tax features from records since I got a lot of negative feedback about t= he syntax there; shout out to reddit). I don't think that would belong t= o a general class but rather a dedicated type. There really isn't a "one= size fits all" solution here.


There might be some gene= ral confusion on the performance issue. In one
of your e-m= ails in the last thread, you have mentioned:

> Like Ilija mentioned in their email, there are significant perfor= mance optimizations to be had here that are simply not possible using re= gular (readonly) classes. I didn't go into detail as to how it works bec= ause it feels like an implementation detail, but I will spend some time = distilling this and its consequences, into the RFC, over the coming days= . As a simple illustration, there can be significant memory usage improv= ements:
>
> 100,000 arrays: https://3v4l.org/Z4CcV
&= gt; 100,000 readonly classes: ht= tps://3v4l.org/1vhNp

First off, the arr= ay example only uses less memory because [1, 2] is a
const= ant array. When you make it dynamic, they will become way less
=
efficient than objects. htt= ps://3v4l.org/pETM9

But this is not the= point I was trying to make either. Rather, when it
comes = to immutable, growable data structures, every mutation becomes
=
an extremely expensive operation because the entire data structure,=
including its underlying storage, needs to be copied. For= example:


```php
class Vector {
    private $values;<= br>

    public function populate= () {
        $this->= values =3D range(1, 1_000_000);
    }

    public function appendMutable= () {
        $this->= values[] =3D 100_000_001;
    }

    public function appendImmutable() {=
        $new =3D clone= $this;
        $this-&= gt;values[] =3D 100_000_001;
    }
}
```

> appendMutabl= e(): float(8.106231689453125E-6)
> appendImmutable(): f= loat(0.012187957763671875)

That's a factor = of 1 500 difference for an array containing 1 million
numb= ers. Obviously, concrete numbers will vary, but the problem grows
the bigger the array becomes.

Ilija=


I'm not quite = focused on actual performance (yet), but I understand your point. To tha= t, I have an idea I've been playing around with to have "layered hashmap= s" where a copy/clone is just a layer on the original (made immutable) h= ashmap, thus having a near zero cost for copies. There's still extra ind= irection involved, so it potentially could be worse =F0=9F=A4=B7. I woul= d think an average case O(1) would be faster than a copy, at least for l= arge maps, but wall-clock time and Big O don't always correlate. I also = note that the benchmarks in the repo are based on operations, not time, = so it is quite difficult to show that a feature that increases the numbe= r of operations decreases the overall time. In other words, you may add = 100 operations with a strong cache locality and remove 10 with poor cach= e locality -- quality vs. quantity, an age-old problem. Anyway, there's = no way to tell the exact performance characteristics until I finish the = implementation. I'd love to discuss this more, as it's actually pretty n= eat and interesting to solve in a performant way.

It could also be like my zend_string refactor I spent a few month= s on over the summer, where the performance isn't affected (at least for= wall-clock time), and the only added benefits showing when you perform = lots of string modifications. Other benefits won't make sense without al= ready having an RFC I was working on (that is still in the draft stage).= So, for now, that branch will just gather dust.

Things like this are largely why I didn't put too much information= about the performance characteristics of records in my RFC. It's really= an implementation detail, and I didn't want people's decisions to be ti= ed to an implementation detail that could change drastically. In other w= ords, maybe the performance would be poor now, but I'm confident it can = be improved later. IMHO, we should focus on whether we want the feature = in the first place rather than worrying about how fast or slow it is. Ma= ny improvements don't make sense for the sake of improving them, and it = isn't until there is a problem to be solved (such as a poor performing f= eature) that the improvement becomes worthwhile.

=E2=80=94 Rob
--ae168ccaf1ad4b60a2d588d5eba20c11--