Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122887 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 5A2DD1A009C for ; Tue, 2 Apr 2024 18:14:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1712081716; bh=HnETBa4QOpAU12kD1rYuF80Vy2437cOuAuj8dSZ/iPo=; h=Date:Subject:To:References:From:In-Reply-To:From; b=dLuLHbpS/W1zaQ9F9izvAMKr/uOKIvNX4SqRtt5dt0idyPvoKKII/Xc2L5bgqX1Ob Iq+eykJsHtw/0ghDpRw3hyrs4FMOZR2KQfwgQntUbKgsU9VtMrYHFgIKUA6/E9SysW U5l7mcESuH+IEhDN+QnKAjdiohcaAzHH0M9FpvlHnDiswLJSn/drKslS0kIb531MN2 VygD3KTBavriRAWGRFjt9avy2fVOW7fnISMY4Gy3qogrcZ9P1MurkmXhC301OEsf2L ZBn6UH8h5nY6Jf+55/siiEPVgrEp6roF+zAdMdc9Q2S1vE1TNBwiipI8daQ8UGqRwL QmRg0gevIbVFg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 69A1C180069 for ; Tue, 2 Apr 2024 18:15:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 2 Apr 2024 18:15:14 +0000 (UTC) Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-4161d73d873so6030945e9.0 for ; Tue, 02 Apr 2024 11:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712081685; x=1712686485; darn=lists.php.net; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=F9gMVU9y2NpARiE6MIC3ctwXCBbiHWlTHG9VMG1zIOA=; b=k1ja0e4NtgWEjhgtF9SLHI35n5SMAXPWCS2DC63mg11c3hAVOlfIhOEGfeyouOUiXs uTFiKkjDbfqxEsebLhEmeDDFNS6UmrhpQkk3yLU89M7FFGWbbWPfh/4o46afQgrcoq4d wydJoavHP4T4P+wPwPPdnU34LGmp6B/wgeQqpnbN8m3VR+bp/47S8zLsV5rr16J8mcYj hOJgR+e+5okueZlUw1RDb3WiU1ldng0oLZPrQGxzhTAlZ1hwvvEplsgJn7tCVUH+9b/n LTy+Cf+a/hDK51ROQrgK2rLUEUdcDgbrpkMZ5k2vVYbWPdoAs+ADHddQ8gO3Ai3XUMWD a03A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712081685; x=1712686485; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F9gMVU9y2NpARiE6MIC3ctwXCBbiHWlTHG9VMG1zIOA=; b=DCJ/YhrERsg3/muTmvxjkx8bA0k+cW16Y4JTUSKmaiqNnTUzbSPafrsru7eMljNNz0 e7dTiLEJkeH8pUiCMXBW1htMdSNb/fd/JR/811KEFcKbhui42sG6kB9F+lv8s/5Ops71 Fm0o1w7ksk+MCF306I/AFAZEv4rMuvpbW+7/atz6Gm9fg37mz3G/P9P1h3WuV/5XlH1T phd/1HGVJK2aJI3EXSbACg15YIe6i8HXomPzg66UYumyiRHkO/4Jv+OemLJjq1gxDHiO Wny7kJWpY0q4zO1VAD2loKsmRnMab8L6lN4EDckf7bsSKA7SCOhT3x/srDZeY0f2UjOj YpKw== X-Gm-Message-State: AOJu0Yy6xlZV7wUrfq7mjhQqZ5sUSUpYOIlbMfUGhYGo+Sb/cjqB4hAG BzBKs0w2hd5/a9/bn39FnUnktXAJ57CDcgp+OxZUUn+8+4UgOQlBAYnfJaUE X-Google-Smtp-Source: AGHT+IEi8KOXnHmi/HQwzlr2jTt3fC04OkBZBOqHW4aUPyMAwXdJJWWkW3NPS1tuz5yPn0x7+9YQCQ== X-Received: by 2002:a05:600c:35c1:b0:415:6b9a:326d with SMTP id r1-20020a05600c35c100b004156b9a326dmr4109483wmq.4.1712081685261; Tue, 02 Apr 2024 11:14:45 -0700 (PDT) Received: from ?IPV6:2a02:1811:cc83:ee50:280e:1e36:3a00:824? (ptr-dtfv08akcem5xburtic.18120a2.ip6.access.telenet.be. [2a02:1811:cc83:ee50:280e:1e36:3a00:824]) by smtp.gmail.com with ESMTPSA id g17-20020a05600c001100b004155a32841bsm11662322wmc.39.2024.04.02.11.14.44 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Apr 2024 11:14:44 -0700 (PDT) Message-ID: <9aeebf8e-d6a3-4d55-a926-77433a0bf4e2@gmail.com> Date: Tue, 2 Apr 2024 20:14:52 +0200 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PHP-DEV] [RFC][Concept] Data classes (a.k.a. structs) To: internals@lists.php.net References: Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit From: dossche.niels@gmail.com (Niels Dossche) On 02/04/2024 02:17, Ilija Tovilo wrote: > Hi everyone! > > I'd like to introduce an idea I've played around with for a couple of > weeks: Data classes, sometimes called structs in other languages (e.g. > Swift and C#). > > In a nutshell, data classes are classes with value semantics. > Instances of data classes are implicitly copied when assigned to a > variable, or when passed to a function. When the new instance is > modified, the original instance remains untouched. This might sound > familiar: It's exactly how arrays work in PHP. > > ```php > $a = [1, 2, 3]; > $b = $a; > $b[] = 4; > var_dump($a); // [1, 2, 3] > var_dump($b); // [1, 2, 3, 4] > ``` > > You may think that copying the array on each assignment is expensive, > and you would be right. PHP uses a trick called copy-on-write, or CoW > for short. `$a` and `$b` actually share the same array until `$b[] = > 4;` modifies it. It's only at this point that the array is copied and > replaced in `$b`, so that the modification doesn't affect `$a`. As > long as a variable is the sole owner of a value, or none of the > variables modify the value, no copy is needed. Data classes use the > same mechanism. > > But why value semantics in the first place? There are two major flaws > with by-reference semantics for data structures: > > 1. It's very easy to forget cloning data that is referenced somewhere > else before modifying it. This will lead to "spooky actions at a > distance". Having recently used JavaScript (where all data structures > have by-reference semantics) for an educational IR optimizer, > accidental mutations of shared arrays/maps/sets were my primary source > of bugs. > 2. Defensive cloning (to avoid issue 1) will lead to useless work when > the value is not referenced anywhere else. > > PHP offers readonly properties and classes to address issue 1. > However, they further promote issue 2 by making it impossible to > modify values without cloning them first, even if we know they are not > referenced anywhere else. Some APIs further exacerbate the issue by > requiring multiple copies for multiple modifications (e.g. > `$response->withStatus(200)->withHeader('X-foo', 'foo');`). > > As you may have noticed, arrays already solve both of these issues > through CoW. Data classes allow implementing arbitrary data structures > with the same value semantics in core, extensions or userland. For > example, a `Vector` data class may look something like the following: > > ```php > data class Vector { > private $values; > > public function __construct(...$values) { > $this->values = $values; > } > > public mutating function append($value) { > $this->values[] = $value; > } > } > > $a = new Vector(1, 2, 3); > $b = $a; > $b->append!(4); > var_dump($a); // Vector(1, 2, 3) > var_dump($b); // Vector(1, 2, 3, 4) > ``` > > An internal Vector implementation might offer a faster and stricter > alternative to arrays (e.g. Vector from php-ds). > > Some other things to note about data classes: > > * Data classes are ordinary classes, and as such may implement > interfaces, methods and more. I have not decided whether they should > support inheritance. > * Mutating method calls on data classes use a slightly different > syntax: `$vector->append!(42)`. All methods mutating `$this` must be > marked as `mutating`. The reason for this is twofold: 1. It signals to > the caller that the value is modified. 2. It allows `$vector` to be > cloned before knowing whether the method `append` is modifying, which > hugely reduces implementation complexity in the engine. > * Data classes customize identity (`===`) comparison, in the same way > arrays do. Two data objects are identical if all their properties are > identical (including order for dynamic properties). > * Sharing data classes by-reference is possible using references, as > you would for arrays. > * We may decide to auto-implement `__toString` for data classes, > amongst other things. I am still undecided whether this is useful for > PHP. > * Data classes protect from interior mutability. More concretely, > mutating nested data objects stored in a `readonly` property is not > legal, whereas it would be if they were ordinary objects. > * In the future, it should be possible to allow using data classes in > `SplObjectStorage`. However, because hashing is complex, this will be > postponed to a separate RFC. > > One known gotcha is that we cannot trivially enforce placement of > `modfying` on methods without a performance hit. It is the > responsibility of the user to correctly mark such methods. > > Here's a fully functional PoC, excluding JIT: > https://github.com/php/php-src/pull/13800 > > Let me know what you think. I will start working on an RFC draft once > work on property hooks concludes. > > Ilija Hi Ilija Thank you for this proposal, I like the idea of having value semantic objects available. I pulled your branch and played with it a bit. As already hinted in the thread, I also think inheritance may be dangerous in a first version. I want to add to that: if you extend a data-class with a non-data-class, the data-class behaviour gets lost, which is logical in a sense but also surprised me in a way. Also, FWIW, I'm not sure about the name "data" class, perhaps "value" class or something alike is what people may be more familiar with wrt semantics, although dataclass is also a known term. I do have a question about iterator behaviour. Consider this code: ``` data class Test { public $a = 1; public $b = 2; } $test = new Test; foreach ($test as $k => &$v) { if ($k === "b") $test->a = $test; var_dump($k); } ``` This will reset the iterator of the object on separation, so we will get an infinite loop. Is this intended? If so, is it because the right hand side is the original object while the left hand side gets the clone? Is this consistent with how arrays separate? (Note: I haven't really looked at your code) Kind regards Niels