Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122846 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 54D981A009C for ; Tue, 2 Apr 2024 00:56:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712019432; bh=rWN9W6v5tya4eDODoukQImsyviPxNn18HRiBy/zJJJI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=W503MYHjVtSub3BidpXn7bvUJ86LLnUQezINCpEU8jMDMA9kMxEQr8v6fDKJGmSRD uvCtFGyity+iTGfA9/jIc5YQrQOPmb59XtxzZQ4MLfefdQpZebBDYfDNbEEz/Wsehs Lmx0/gNKPfMxdEyrf3GCCKa6VFDQ7Iqe+v2CrvZxXsU5nAC2TlVBfCqHhLUqbv9va3 emRWLA4IGQHdbh0TkDenOgdqJIDbTBg045nejz9Ol2nOBIeovoh3pSXeT/1yixfTiz WgP7cDXpXnE3ub1x+nnit4XglCCXxol/Z83oBgyIIwb6S7f7XtYFyQytmHsc76GGci kp8ag3aaXd8bw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 903DE180047 for ; Tue, 2 Apr 2024 00:57:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 2 Apr 2024 00:57:11 +0000 (UTC) Received: by mail-vs1-f43.google.com with SMTP id ada2fe7eead31-476c3d37454so613747137.1 for ; Mon, 01 Apr 2024 17:56:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712019402; x=1712624202; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=wX/j8beFKu24Jk/Y/1Wd0Dzpi2zmGIhNHqeHJLtpVJ4=; b=QGTmkT/QRo2NpINkT2ijixTzEV/x/Mbrtd+m/UtyGV+S7FanqJW//TS2TuLBc7TUEV XOf7UN+YvI23dka5MJ5YKGqzmIkJDN2xcJEukjcjpIh0pw4iQVdurnG9fX1Kkr0pQtXY Oxaogy7eOozXQms/8eZXDVVUi2dB/oOyTQUsSHt/jduihyIxkMx4nh/FHJ4IVlceDjRz SSD/0s2mjoz9masur62hmCq8rkog0+RlipE4FB71G9k5lGgyGc3bR0IiwpHrrNwsyzOA ZFlCT1scb4KSXeO4xR8XayPwAKs7bzwCJ3jTaWtlAnFodTn1e+e/zAbag4sg2Rz3bvLg AG+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712019402; x=1712624202; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wX/j8beFKu24Jk/Y/1Wd0Dzpi2zmGIhNHqeHJLtpVJ4=; b=NOt6mlGwNrB8P8jj3V/TxfaQqb4Q7KcpmMYW6SLaytz0w14DjS+O3VtTA60kqU+UQb 4kHeUZT1cUeIyXlf4Vsen+oSmNo290ltml4K0BDOtz4Mclee0zK3GaUbcz67SkBsc5yR D6jo7R9jZbjM4kM3t/SyAoHHTUorHjM9vcRwdse6SZa2MmvYIlEWC5/UwbkvQVDDe/rK loq1ZNHTsEt7AJ2wc7+ZHqkpT3U2TRIq27CTiNhcPlO7rgs7e2fRoRv9WOZ+DL0mAIQ8 gc4KnK/SZnQmDB46ZKYfhlTSdWLwPa4obgtFsUz4NMcZYxVM43Dqag2M45xBG8hO1/FP +lkA== X-Gm-Message-State: AOJu0YzrkqOuVHfdcA0txzwln871vnueH3tcDgQOcYut7Uv+D1EC9w9R BVVV+IOCK4dOrmvBEltkFvlo7gwXfWeAMRc2+5cai1Yc+tDDN1P+YUtvJsCzXp+rYi5U+PhJlpw IOnREaPgdrPUxQXLwfQx00TT5a/sS3OasO3c= X-Google-Smtp-Source: AGHT+IF3rScV0gVsRz/uRJ12wxvX+lSdjY9ygr1m1lMr8PdUXRb50xs0cdTXQu8eq94PXWNR7SG1NPG77Dy2rd8pDcI= X-Received: by 2002:a05:6122:36a9:b0:4cc:4cdd:3faa with SMTP id ec41-20020a05612236a900b004cc4cdd3faamr7541456vkb.0.1712019402453; Mon, 01 Apr 2024 17:56:42 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: In-Reply-To: Date: Mon, 1 Apr 2024 21:56:06 -0300 Message-ID: Subject: Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs) To: Ilija Tovilo Cc: PHP internals Content-Type: multipart/alternative; boundary="00000000000054fa9d061512942f" From: deleugyn@gmail.com (Deleu) --00000000000054fa9d061512942f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 1, 2024 at 9:20=E2=80=AFPM Ilija Tovilo wrote: > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a =3D [1, 2, 3]; > $b =3D $a; > $b[] =3D 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] =3D > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values =3D $values; > } > > public mutating function append($value) { > $this->values[] =3D $value; > } > } > > $a =3D new Vector(1, 2, 3); > $b =3D $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Exciting times to be a PHP Developer! > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. > I'd argue in favor of not including inheritance in the first version. Taking inheritance out is an impossible BC Break. Not introducing it in the first stable release gives users a chance to evaluate whether it's something we will drastically miss. > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. > I'm not sure if I understood this one. Do you mean that the `!` modifier here (at call-site) is helping the engine clone the variable before even diving into whether `append()` has been tagged as mutating? From outside it looks odd that a clone would happen ahead-of-time while talking about copy-on-write. Would this syntax break for non-mutating methods? > * Data classes customize identity (`=3D=3D=3D`) comparison, in the same w= ay > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. > * Data classes protect from interior mutability. More concretely, > mutating nested data objects stored in a `readonly` property is not > legal, whereas it would be if they were ordinary objects. > * In the future, it should be possible to allow using data classes in > `SplObjectStorage`. However, because hashing is complex, this will be > postponed to a separate RFC. > > One known gotcha is that we cannot trivially enforce placement of > `modfying` on methods without a performance hit. It is the > responsibility of the user to correctly mark such methods. > > Here's a fully functional PoC, excluding JIT: > https://github.com/php/php-src/pull/13800 > > Let me know what you think. I will start working on an RFC draft once > work on property hooks concludes. > > Ilija > Looking forward to this!!! --=20 Marco Deleu --00000000000054fa9d061512942f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Mon, Apr 1, 2024 at 9:20=E2=80=AFP= M Ilija Tovilo <tovilo.ilija@g= mail.com> wrote:
Hi everyone!

I'd like to introduce an idea I've played around with for a couple = of
weeks: Data classes, sometimes called structs in other languages (e.g.
Swift and C#).

In a nutshell, data classes are classes with value semantics.
Instances of data classes are implicitly copied when assigned to a
variable, or when passed to a function. When the new instance is
modified, the original instance remains untouched. This might sound
familiar: It's exactly how arrays work in PHP.

```php
$a =3D [1, 2, 3];
$b =3D $a;
$b[] =3D 4;
var_dump($a); // [1, 2, 3]
var_dump($b); // [1, 2, 3, 4]
```

You may think that copying the array on each assignment is expensive,
and you would be right. PHP uses a trick called copy-on-write, or CoW
for short. `$a` and `$b` actually share the same array until `$b[] =3D
4;` modifies it. It's only at this point that the array is copied and replaced in `$b`, so that the modification doesn't affect `$a`. As
long as a variable is the sole owner of a value, or none of the
variables modify the value, no copy is needed. Data classes use the
same mechanism.

But why value semantics in the first place? There are two major flaws
with by-reference semantics for data structures:

1. It's very easy to forget cloning data that is referenced somewhere else before modifying it. This will lead to "spooky actions at a
distance". Having recently used JavaScript (where all data structures<= br> have by-reference semantics) for an educational IR optimizer,
accidental mutations of shared arrays/maps/sets were my primary source
of bugs.
2. Defensive cloning (to avoid issue 1) will lead to useless work when
the value is not referenced anywhere else.

PHP offers readonly properties and classes to address issue 1.
However, they further promote issue 2 by making it impossible to
modify values without cloning them first, even if we know they are not
referenced anywhere else. Some APIs further exacerbate the issue by
requiring multiple copies for multiple modifications (e.g.
`$response->withStatus(200)->withHeader('X-foo', 'foo'= ;);`).

As you may have noticed, arrays already solve both of these issues
through CoW. Data classes allow implementing arbitrary data structures
with the same value semantics in core, extensions or userland. For
example, a `Vector` data class may look something like the following:

```php
data class Vector {
=C2=A0 =C2=A0 private $values;

=C2=A0 =C2=A0 public function __construct(...$values) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 $this->values =3D $values;
=C2=A0 =C2=A0 }

=C2=A0 =C2=A0 public mutating function append($value) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 $this->values[] =3D $value;
=C2=A0 =C2=A0 }
}

$a =3D new Vector(1, 2, 3);
$b =3D $a;
$b->append!(4);
var_dump($a); // Vector(1, 2, 3)
var_dump($b); // Vector(1, 2, 3, 4)
```

An internal Vector implementation might offer a faster and stricter
alternative to arrays (e.g. Vector from php-ds).


Exciting times to be a PHP Developer!<= /div>
=C2=A0
Some other things to note about data classes:

* Data classes are ordinary classes, and as such may implement
interfaces, methods and more. I have not decided whether they should
support inheritance.

I'd argue in f= avor of not including inheritance in the first version. Taking inheritance = out is an impossible BC Break. Not introducing it in the first stable relea= se gives users a chance to evaluate whether it's something we will dras= tically miss.
=C2=A0
* Mutating method calls on data classes use a slightly different
syntax: `$vector->append!(42)`. All methods mutating `$this` must be
marked as `mutating`. The reason for this is twofold: 1. It signals to
the caller that the value is modified. 2. It allows `$vector` to be
cloned before knowing whether the method `append` is modifying, which
hugely reduces implementation complexity in the engine.

I'm not sure if I understood this one. Do you mean tha= t the `!` modifier here (at call-site) is helping the engine clone the vari= able before even diving into whether `append()` has been tagged as mutating= ? From outside it looks odd that a clone would happen ahead-of-time while t= alking about copy-on-write. Would this syntax break for non-mutating method= s?=C2=A0
=C2=A0
* Data classes customize identity (`=3D=3D=3D`) comparison, in the same way=
arrays do. Two data objects are identical if all their properties are
identical (including order for dynamic properties).
* Sharing data classes by-reference is possible using references, as
you would for arrays.
* We may decide to auto-implement `__toString` for data classes,
amongst other things. I am still undecided whether this is useful for
PHP.
* Data classes protect from interior mutability. More concretely,
mutating nested data objects stored in a `readonly` property is not
legal, whereas it would be if they were ordinary objects.
* In the future, it should be possible to allow using data classes in
`SplObjectStorage`. However, because hashing is complex, this will be
postponed to a separate RFC.

One known gotcha is that we cannot trivially enforce placement of
`modfying` on methods without a performance hit. It is the
responsibility of the user to correctly mark such methods.

Here's a fully functional PoC, excluding JIT:
https://github.com/php/php-src/pull/13800

Let me know what you think. I will start working on an RFC draft once
work on property hooks concludes.

Ilija

Looking forward to this!!!

--
Marco Deleu
--00000000000054fa9d061512942f--