Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:122531 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 999091AD8F6 for ; Wed, 28 Feb 2024 13:50:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1709128217; bh=p9ZU7O1pzaW6xaH5PvS3KDfUPvPnBxJ5VKo2YIKrtSw=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=dffa4IZuRb89gNratj5TbagJSKxulfGi96x/fLnlUxQfHSLAfFN/FmyaiHJuG7o1N /nGDbUzbfmIO0fyYokEf5dsboWaGSqi/kCrWvV2bDAYHIcu91yZ1leXmWWhnAQQm9y +CMYnFHVTd5zCCA8KRr/iMMFGPgTii7yn9laKDXAXkZbuWDzHXb/JhEO1eWiPyd+HF f1jGMKHEII8wzpfZEOb2GQvLFJD1KeUzEjhF3U0TOPL6LoMsGDUS2LDx9MTeADlcs6 6uKKeMHqGxvOaofmF6z497czAbDwYze/GpE7Q6bBFywTnPCl1bM5yNvv8MBUAy8kOC jtUK9rUtHdZjg== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5E1A6183E2C for ; Wed, 28 Feb 2024 13:50:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40,DMARC_MISSING, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail1.25mail.st (mail1.25mail.st [206.123.115.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 28 Feb 2024 13:50:14 +0000 (UTC) Received: from smtpclient.apple (unknown [49.48.240.182]) by mail1.25mail.st (Postfix) with ESMTPSA id 88B15603F0; Wed, 28 Feb 2024 13:49:59 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.400.31\)) Subject: Re: [PHP-DEV] [RFC[ Property accessor hooks, take 2 In-Reply-To: <2299271f-50ea-48c1-81fb-b64fa10c9bbb@app.fastmail.com> Date: Wed, 28 Feb 2024 20:49:46 +0700 Cc: php internals Content-Transfer-Encoding: quoted-printable Message-ID: <1204BFC3-B976-4FEE-BE01-E668699C84E2@koalephant.com> References: <59619244-917d-4936-8f21-2854840a9bf8@rwec.co.uk> <2299271f-50ea-48c1-81fb-b64fa10c9bbb@app.fastmail.com> To: Larry Garfield X-Mailer: Apple Mail (2.3774.400.31) From: php-lists@koalephant.com (Stephen Reay) > On 28 Feb 2024, at 06:17, Larry Garfield = wrote: >=20 > On Sun, Feb 25, 2024, at 10:16 PM, Rowan Tommins [IMSoP] wrote: >> [Including my full previous reply, since the list and gmail currently=20= >> aren't being friends. Apologies that this leads to rather a lot of=20 >> reading in one go...] >=20 > Eh, I'd prefer a few big emails that come in slowly to lots of little = emails that come in fast. :-) >=20 >>> On 21/02/2024 18:55, Larry Garfield wrote: >>>> Hello again, fine Internalians. >>>>=20 >>>> After much on-again/off-again work, Ilija and I are back with a = more polished property access hooks/interface properties RFC. >>>=20 >>>=20 >>> Hello, and a huge thanks to both you and Ilija for the continued = work=20 >>> on this. I'd really like to see this feature make it into PHP, and=20= >>> agree with a lot of the RFC. >>>=20 >>>=20 >>> My main concern is the proliferation of things that look the same = but=20 >>> act differently, and things that look different but act the same: >=20 > *snip* >=20 >>> - a and b are both what we might call "traditional" properties, and=20= >>> equivalent to each other; a uses legacy syntax which we haven't=20 >>> removed for some reason >=20 > I don't know why we haven't removed `var` either. I can't recall the = last time I saw it in real code. But that's out of scope here. >=20 > *snip* >=20 >> I think there's some really great functionality in the RFC, and would=20= >> love for it to succeed in some form, but I think it would benefit = from=20 >> removing some of the "magic". >>=20 >>=20 >> Regards, >>=20 >> --=20 >> Rowan Tommins >> [IMSoP] >=20 >=20 > I'm going to try and respond to a couple of different points together = here, including from later in the thread, as it's just easier. >=20 > =3D=3D Re, design philosophy: >=20 >> In C#, all "properties" are virtual - as soon as you have any=20 >> non-default "get", "set" or "init" definition, it's up to you to = declare=20 >> a separate "field" to store the value in. Swift's "computed = properties"=20 >> are similar: if you have a custom getter or setter, there is no = backing=20 >> store; to add behaviour to a "stored property", you use the separate=20= >> "property observer" hooks. >>=20 >> Kotlin's approach is philosophically the opposite: there are no = fields,=20 >> only properties, but properties can access a hidden "backing field" = via=20 >> the special keyword "field". Importantly, omitting the setter doesn't=20= >> make the property read-only, it implies set(value) { field =3D value = } >=20 > A little history here to help clarify how we ended up where we are: = The original RFC as we designed it modeled very closely on Swift, with 4 = hooks. Using get/set at all would create a virtual property and you = were on your own, while the beforeSet/afterSet hooks would not. We ran = that design by some PHP Foundation sponsors a year ago (I don't actually = know who, Roman did it for us), and the general feedback was "we like = the idea, but woof this is complicated with all these hooks and having = to make my own backing property for all these little things. Couldn't = this be simplified?" We thought a bit more, and I off-handedly = suggested to Ilija "I mean, would it be possible to just detect if a = get/set hook is using a backing store and make it automatically? Then = we could get rid of the before/after hooks." He gave it a quick try and = found that was straightforward, so we pivoted to that simplified = version. We then realized that we had... mostly just recreated Kotlin's = design, so shrugged happily and went on with life. >=20 > As noted in an earlier email, C#, Kotlin, and Swift all have different = stances on the variable name for the incoming value. We originally = modeled on Swift so had that model (optional newVal name), and also = because we liked how compact it was. When we switched to the = simplified, incidentally Kotlin-esque approach, we just kept the = optional variable as it works. >=20 > I think where that ended up is pretty nice, personally, even if it is = not a direct map of any particular other language. >=20 > =3D=3D Re asymmetric typing: >=20 > This is capability already present today if using a setter method. =20 >=20 > class Person { > private $name; >=20 > public function setName(UnicodeString|string $name) > { > $this->name =3D $value instanceof UnicodeString ? $value : = new UnicodeString($value); =20 > } > } >=20 > And widening the parameter type in a child class is also entirely = legal. As the goal of the RFC is, essentially, "make most common = getter/setter patterns easy to add to a property without making an = API-breaking method, so people don't have to add redundant just-in-case = getters and setters all the time," covering an easy-to-cover use case = seems like a good thing to do. =20 >=20 > It also ties into the question of the explict/implicit name, for the = reason you mentioned earlier (unspecified means mixed), not by intent. = More on that in another section. >=20 > =3D=3D Re virtual properties: >=20 > Ilija and I talked this through, and there's pros and cons to a = `virtual` keyword. Ilija also suggested a `backed` keyword, which = forces a backed property to exist even if it's not used in the hook = itself. >=20 > * Adding `virtual` adds more work for the developer, but more clarity. = It would also mean $this->$propName or $this->{__PROPERTY__} would work = "as expected", since there's no auto-detection for virtual-ness. On the = downside, if you have a could-be-virtual property but never actually use = the backing value, you have an extra backing value hanging around in = memory that is inaccessible normally, but will still show up in some = serialization formats, which could be unexpected. If you omit one of = the hooks and forget to mark it virtual, you'll still get the default of = the other operation, which could be unexpected. (Mostly this would be = for a virtual-get that accidentally has a default setter because you = forgot to mark it `virtual`.) > * Doing autodetection as now, but with an added "make a backing value = anyway" flag would resolve the use case of "My set hook just calls a = method, and that method sets the property, but since the hook doesn't = mention the property it doesn't get created" problem. It would also = allow for $this->$propName to work if a property is explicitly backed. = On the flipside, it's one more thing to think about, and the above = example it solves would be trivially solved by having the method just = return the value to set and letting the set hook do the actual write, = which is arguably better and more reliable code anyway. > * The status quo (auto-detection based on the presence of = $this->propName). This has the advantage it "just works" in the 95% = case, without having to think about anything extra. The downside is it = does have some odd edge cases, like needing $this->propName to be = explicitly used. =20 >=20 > I don't think any is an obvious winner. My personal preference would = be for status quo (auto-detect) or explicit-virtual always. I could = probably live with either, though I think I'd personally favor status = quo. Thoughts from others? >=20 I agree that a flag to make the field *virtual* (thus disabling the = backing store) makes more sense than a flag to make it backed; It's also = easier to understand when comparing hooked properties with regular = properties (essentially, backed is the default, you have to opt-in to it = being virtual). I don't think the edge cases of "auto" make it = worthwhile just to not need "virtual".=20 > =3D=3D Re reference-get >=20 > Allowing backed properties to have a reference return creates a = situation where any writes would then bypass the set hook, and thus any = validation implemented there. That is, it makes the validation = unreliable. A major footgun. The question is, do we favor = caveat-emptor flexibility or correct-by-construction safety? Personally = I always lead toward the latter, though PHP in general is... = schizophrenic about it, I'd say. :-) >=20 > At this point, we'd much rather leave it blocked to avoid the issue; = it's easier to enable that loophole in the future if we really want it = than to get rid of it if it turns out to have been a bad idea. >=20 > There is one edge case that *might* make sense: If there is no set = hook defined, then there's no set hook to worry about bypassing. So it = may be safe to allow &get on backed properties IFF there is no set hook. = I worry that is "one more quirky edge case", though, so as above it may = be better to skip for now as it's easier to add later than remove. But = if the consensus is to do that, we're open to it. (Question for = everyone.) >=20 I don't have strong feeling about this, but in general I usually tend to = prefer options that are consistent, and give power/options to the = developer. If references are opt-in anyway, I see that as accepting the = trade-offs. If a developer doesn't want to allow by-ref modifications of = the property, why would they make it referenceable in the first place? = This sounds a bit like disallowing regular public properties because = they might be modified outside the class - that's kind of the point, = surely. > =3D=3D Re=20 >=20 > =3D=3D Re arrays >=20 >> Regarding arrays, have you considered allowing array-index writes if > an &get hook is defined? i.e. "$x->foo['bar'] =3D 42;" could be = treated > as semantically equivalent to "$_temp =3D& $x->foo; $_temp['bar'] =3D = 42; > unset($_temp);" >=20 > That's already discussed in the RFC: >=20 >> The simplest approach would be to copy the array, modify it = accordingly, and pass it to set hook. This would have a large and = obscure performance penalty, and a set implementation would have no way = of knowing which values had changed, leading to, for instance, code = needing to revalidate all elements even if only one has changed. >=20 > Unless we were OK with that bypassing the set hook entirely if = defined, which, as noted above, means any safety guarantees provided by = a set hook are bypassed, leading to untrustworthy code. >=20 > =3D=3D Re hook shorthands and return values >=20 > Ilija and I have been discussing this for a bit, and we've both budged = a little. :-) Here's our counter-proposal: >=20 > - Drop the "top level" shorthand, for get-only hooks. > - Keep the =3D> shorthand for the get hook itself. > - For a set hook, the {} form has no return value; set the value = yourself however you want. > - For a set hook, the =3D> form implies a backed value and will set = the property to whatever value that evaluates to. >=20 > So these are equivalent: >=20 > public $foo { set { $this->foo =3D $value; } } > public $foo { set =3D> $value; } >=20 > These are equivalent: >=20 > public string $foo { > get { > return strtoupper($this->foo); > } > } > public string $foo { get =3D> strtoupper($this->foo); } >=20 > And this goes away: >=20 > public string $foo =3D> strtoupper($this->foo); >=20 > That covers the common cases with an arrow-function-like syntax that = behaves as you'd expect (it returns things), and allows a longer version = with arbitrarily complex logic if desired. It also means that each = syntax variant does mean something importantly different. >=20 > Would that be an acceptable compromise? (Question for everyone.) >=20 I think the examples given are clear, and the lack of the top-level = short closure-esque version makes it more obvious. Forgive me, I must = have missed some of the previous comments - is there a reason the 'full' = setter can't return a value, for the sake of consistency? I understand = that you don't want "return to set" to be the *only* option, for the = sake of e.g. change/audit logging type functionality (i.e. set and then = some action to record that the change was made), but it seems a little = odd and inconsistent to me that the return value of a short closure = would be used when the return value of the long version isn't. This = isn't really a major issue, I'm just curious if there was some = explanation about it? > =3D=3D Re the $value variable in set >=20 > Honestly, Rowan's earlier point here is the strongest argument in = favor for me of the current RFC approach. Anywhere else in PHP, = something that looks like a parameter and has no type, like `($param)`, = means its type is `mixed`. It would be weird and confusing to be = different here. That's above and beyond the issue of forcing people to = retype something obvious every time. (I cite again, recent PHP's trend = toward removing needless boilerplate, which is very good.) Requiring = that the type be specified, for consistency, makes little sense if the = type is not allowed to vary. You're just repeating a string from = earlier on the same line, for no particular benefit. >=20 > I genuinely don't understand the pushback on $value. It's something = you learn once and never have to think about again. It's consistent. >=20 > Ilija jokingly suggested making it always $value, unconditionally, and = allowing only the type to be specified if widening: >=20 > public int $foo { set(int|float) =3D> floor($value); } >=20 > Though I suspect that won't go over well, either. :-) >=20 > So what makes the most sense to me is to keep $value optional, but IF = you specify an alternate name, you must also specify a type (which may = be wider). So these are equivalent: >=20 > public int $foo { set (int $value) =3D> $value + 1 } > public int $foo { set =3D> $value + 1 } >=20 > And only those forms are legal. But you could also do this, if the = situation called for it: >=20 > public int $foo { set(int|float $num) =3D> floor($num) + 1; } >=20 > This "all or nothing" approach seems like it strikes the best balance, = gives the most flexibility where needed while still having the least = redundancy when not needed, and when a name/type is provided, its = behavior is the same as for a method being inherited. >=20 > Does that sound acceptable? (Again, question for everyone.) >=20 My only question with this is the same as I had in an earlier reply (and = I'm not sure it was ever answered directly?), and you allude to this = yourself: everywhere *else*, `($var)` means a parameter with type = `mixed`. Why is the type *required* here, when you've specifically said = you want to avoid boilerplate? If we're going to assume people can = understand that `(implicit property-type $value) is implicit, surely we = can also assume that they will understand "specifying a parameter = without a type" means the parameter has no type (i.e. is `mixed`). Again, for myself I'd be likely to type it (or regular parameters, = properties, etc) as `mixed` if that's what I want *anyway*, but the = inconsistency here seems odd, unless there's some until-now unknown = drive to deprecate type-less parameters/properties/etc. > The alternative that gives the most future-flexibility is to do = neither: The variable is called $value, period, you can't change it, and = you can't change the type, either. There is no () after set, ever. = Punt both of those to a later follow-up. I'd prefer to include both = now, but including neither now is the next-safer option. >=20 >=20 > ## Regarding $field >=20 > Sigh, now y'all like it. :-P Most of the feedback on this has been = negative, so I'm inclined to leave it out at this point, unless there's = a major swing in feedback to bring it back. But the RFC seems more = likely to pass without it than with right now. >=20 > --Larry Garfield >=20 Cheers Stephen=20