Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122845 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 958A71A009C for ; Tue, 2 Apr 2024 00:17:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712017093; bh=QRl6YrNzF/1xiybIId3YmOfud9KQ8g/kdlsuSayAqVY=; h=From:Date:Subject:To:From; b=Nqm7lJLFA39PTwOuOM4S4JpjJZbvJyRkFlbRFCYtD4bWhaF2qP2gqyGvWGIrmyrNi o97L+b9eBfZ8dEegYH3opQz+Xf5XHC/g33D58q3BH1I8vuQC+nzO3lrz3IzcKIBD4E ai4feMAu2Y3qeiQiu4KyPnwL9bU9YoSUzneK1oC/hpqvQBOqnl5EzXxUU5oJEhL+4D 5re5uEAgyaEOv9R7XnPf4uiRgZ8oEk5tMYB1crLT5QlcmAW9cIVCyOnuEhZqomUW7y 8pwdCBvkNQZJhor+eHl7K5LaVtOGA+bkyu/Lrz3gbfF958IGO4mf6F7pXw88AYUxgv 4ior82fq9vr2Q== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2AE44180004 for ; Tue, 2 Apr 2024 00:18:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 2 Apr 2024 00:18:11 +0000 (UTC) Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6962e6fbf60so40892216d6.1 for ; Mon, 01 Apr 2024 17:17:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712017063; x=1712621863; darn=lists.php.net; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=vJmhx8cl66UqRzsp7u3RAgxABK8q3LJ1eV76FYD432Q=; b=GBQ1+aio18RLm7aiuHY1HtqSdbAKUhu2HCxxrUCihDE9UbcR7abg4vKYdbJWXs6vCn m958P21QFx+3+kK0VAFP8iemjvnKd89xUzRQsmJVSqyE+PkWxgV8l//VHSdOMOmFwjgh c+oiIqg7cIk9zHn8yGHKopWa8iszKa7hR59aJvUWj/huI8eLLngxztOXmsk1q+GoHOhh BdflHnjyna3C4tRGsQX9CDa89mcAMIjASbEo2+v359YuHl8iUL3++VlTjnIQnPjw37HF xiYVWlxuD1Cf2jv8OAPtbxP0BlEBgRTv8wI1cX2xf9DL1o40y7Ff/JbU7kWPUOeXR4dP +Txg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712017063; x=1712621863; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vJmhx8cl66UqRzsp7u3RAgxABK8q3LJ1eV76FYD432Q=; b=BQ/ParlVdQSxU8etaUuFniHjTDSVXPqBR+BXbMpWNvAOHp1uQMMwCXvOo/CBObAa5v O6oE49/eXz1prlvh97cfB1NIsVh/WiYCcmSU04FauBG6p4NvOfjc/5Nk0Lk+AyFt6JlQ SYCUsI9G/+1jrJDWydboBSiCinnrE1/ZnIsTT/Q8HV3OqdXTGfiDk+DlyGhmUMIFniOV Lx/MelRdWSz3FFduXA5/w0cWnZe9xCIzf9mR7NjLKQzpIoglUvi6hJr69XAYhyhuBG55 YU7fkgtY6/6EVWD4xSTeMouLw6c3NTXtMVwWaP4/SmuVJgmh1ZyfuTbH34b3pf3cjRZ7 cj0w== X-Gm-Message-State: AOJu0Ywmk12g8H2yK0SVBVNxbLwATfHVbEo+VSkfPXogikRbm3qwXvcy /TsuaXugkEfEWe+fkKN8JmQsVcE30viHT0KO8k+xz04C5Y7l4DPwmlMtaqQg7X0TZKdIY90nLwx omxWIEYQfaQshcwJAWNMQPt4poR+ysyBfqSmVMqsk X-Google-Smtp-Source: AGHT+IHVz6n7SWEmMrq2Kg+J1u2NVcZouLYrnbiu4vWzfRBAx8RLUWhgwZIPgX+0zaRJT0OiviEMfiXqxj6v8pFy5iM= X-Received: by 2002:a0c:e58c:0:b0:691:6bf3:e6d1 with SMTP id t12-20020a0ce58c000000b006916bf3e6d1mr10029730qvm.33.1712017062912; Mon, 01 Apr 2024 17:17:42 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 Date: Tue, 2 Apr 2024 02:17:32 +0200 Message-ID: Subject: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs) To: PHP internals Content-Type: text/plain; charset="UTF-8" From: tovilo.ilija@gmail.com (Ilija Tovilo) Hi everyone! I'd like to introduce an idea I've played around with for a couple of weeks: Data classes, sometimes called structs in other languages (e.g. Swift and C#). In a nutshell, data classes are classes with value semantics. Instances of data classes are implicitly copied when assigned to a variable, or when passed to a function. When the new instance is modified, the original instance remains untouched. This might sound familiar: It's exactly how arrays work in PHP. ```php $a = [1, 2, 3]; $b = $a; $b[] = 4; var_dump($a); // [1, 2, 3] var_dump($b); // [1, 2, 3, 4] ``` You may think that copying the array on each assignment is expensive, and you would be right. PHP uses a trick called copy-on-write, or CoW for short. `$a` and `$b` actually share the same array until `$b[] = 4;` modifies it. It's only at this point that the array is copied and replaced in `$b`, so that the modification doesn't affect `$a`. As long as a variable is the sole owner of a value, or none of the variables modify the value, no copy is needed. Data classes use the same mechanism. But why value semantics in the first place? There are two major flaws with by-reference semantics for data structures: 1. It's very easy to forget cloning data that is referenced somewhere else before modifying it. This will lead to "spooky actions at a distance". Having recently used JavaScript (where all data structures have by-reference semantics) for an educational IR optimizer, accidental mutations of shared arrays/maps/sets were my primary source of bugs. 2. Defensive cloning (to avoid issue 1) will lead to useless work when the value is not referenced anywhere else. PHP offers readonly properties and classes to address issue 1. However, they further promote issue 2 by making it impossible to modify values without cloning them first, even if we know they are not referenced anywhere else. Some APIs further exacerbate the issue by requiring multiple copies for multiple modifications (e.g. `$response->withStatus(200)->withHeader('X-foo', 'foo');`). As you may have noticed, arrays already solve both of these issues through CoW. Data classes allow implementing arbitrary data structures with the same value semantics in core, extensions or userland. For example, a `Vector` data class may look something like the following: ```php data class Vector { private $values; public function __construct(...$values) { $this->values = $values; } public mutating function append($value) { $this->values[] = $value; } } $a = new Vector(1, 2, 3); $b = $a; $b->append!(4); var_dump($a); // Vector(1, 2, 3) var_dump($b); // Vector(1, 2, 3, 4) ``` An internal Vector implementation might offer a faster and stricter alternative to arrays (e.g. Vector from php-ds). Some other things to note about data classes: * Data classes are ordinary classes, and as such may implement interfaces, methods and more. I have not decided whether they should support inheritance. * Mutating method calls on data classes use a slightly different syntax: `$vector->append!(42)`. All methods mutating `$this` must be marked as `mutating`. The reason for this is twofold: 1. It signals to the caller that the value is modified. 2. It allows `$vector` to be cloned before knowing whether the method `append` is modifying, which hugely reduces implementation complexity in the engine. * Data classes customize identity (`===`) comparison, in the same way arrays do. Two data objects are identical if all their properties are identical (including order for dynamic properties). * Sharing data classes by-reference is possible using references, as you would for arrays. * We may decide to auto-implement `__toString` for data classes, amongst other things. I am still undecided whether this is useful for PHP. * Data classes protect from interior mutability. More concretely, mutating nested data objects stored in a `readonly` property is not legal, whereas it would be if they were ordinary objects. * In the future, it should be possible to allow using data classes in `SplObjectStorage`. However, because hashing is complex, this will be postponed to a separate RFC. One known gotcha is that we cannot trivially enforce placement of `modfying` on methods without a performance hit. It is the responsibility of the user to correctly mark such methods. Here's a fully functional PoC, excluding JIT: https://github.com/php/php-src/pull/13800 Let me know what you think. I will start working on an RFC draft once work on property hooks concludes. Ilija