Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:124234
X-Original-To: internals@lists.php.net
Delivered-To: internals@lists.php.net
Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5])
	by qa.php.net (Postfix) with ESMTPS id BCB641A009C
	for <internals@lists.php.net>; Fri,  5 Jul 2024 19:49:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail;
	t=1720209045; bh=Z4UqOWhBUfN3fdpep7JN3SUkp8uWjdtivPvFkicLlVg=;
	h=Date:Subject:To:Cc:References:From:In-Reply-To:From;
	b=m8L8eU6YOPawl+4mL+WIqEyEU/fSjwvPUjQPR4CiLsUMYKX++i+XfLNxlxEnzTwPw
	 bSQ64ypC6Y64jZSqfFqTHFKrGJ0WYORzLISfaVPb1QYP/RrmzuvuRGJ57is+T5XgMQ
	 Cl4YhIjI2kRnTB0JutLivw8/YtxnrKy4QVf0DVssmGT/IxAjF1YKsCiwnpAHFtuV4f
	 G7mxqonRfC140OjrRdhmmYq/FmyTZBrCZiKshPZUJZ0lgUh9GdWPglwb83Sng+G0pD
	 jFRvRwNlREPs10mfBAHKDigcLbKcs6GLNjgt9t0alRW7zR/RUizlMpvSXlvDPkPQ1a
	 HY9F3SKifzXHg==
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id A81131801DE
	for <internals@lists.php.net>; Fri,  5 Jul 2024 19:50:44 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,SPF_HELO_NONE,
	SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no
	version=4.0.0
X-Spam-Virus: Error (Cannot connect to unix socket
	'/var/run/clamav/clamd.ctl': connect: Connection refused)
X-Envelope-From: <tim@bastelstu.be>
Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Fri,  5 Jul 2024 19:50:43 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be;
	s=mail20171119; t=1720208958;
	bh=md7okiKbPDckjUy4rL8XiAy3cYySekZ39ozDWiKRlIg=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type:from:to:cc:subject:message-id;
	b=CPPJwf/UIUPJ1gyFDxPUy6wpK4PEvkg2Tp7AInzZIEEjtzFNiwn3slgaaBT61jTYG
	 LOm2He6fsNLmZT/p7yhOHI/Rtgpq7pAMiQoqAfg0nIEvO+VcDO07IwqgTBJU5c76Na
	 GDhmzAUuSrEEbX8rvXUWbK27pdNSiFGREfLC8MsW5ALH1gclpUjM9RbxuERIAXEhIE
	 tKaz/87TfeOWaHSif6l1h50TDNBI3cqchlMTwxD+GsQsRmGBD/YEe00AcucYmLvYKw
	 qSTcdsNVMoRtUZnUi2zLRAgYzL/3/RZOpd6cK4ClQzVhltjbhAUhynbafLe0dYT2YR
	 UwRp1NlrkA2EQ==
Message-ID: <46bd4098-2936-4e46-98e9-fe55118325c2@bastelstu.be>
Date: Fri, 5 Jul 2024 21:49:17 +0200
Precedence: bulk
list-help: <mailto:internals+help@lists.php.net
list-unsubscribe: <mailto:internals+unsubscribe@lists.php.net>
list-post: <mailto:internals@lists.php.net>
List-Id: internals.lists.php.net
MIME-Version: 1.0
Subject: Re: [PHP-DEV] [RFC] Lazy Objects
To: Nicolas Grekas <nicolas.grekas+php@gmail.com>,
 Benjamin Eberlei <kontakt@beberlei.de>, Rob Landers <rob@bottled.codes>,
 Valentin Udaltsov <udaltsov.valentin@gmail.com>,
 Marco Pivetta <ocramius@gmail.com>
Cc: Arnaud Le Blanc <arnaud.lb@gmail.com>,
 PHP Internals List <internals@lists.php.net>
References: <CAOWwgpmbq5VRrZQvUXDsKiNK4r6+bFA4VxnjQ_U=h8T9r0o3DA@mail.gmail.com>
 <ab83af79-0669-47dd-a3cb-ab72327ae174@bastelstu.be>
 <CAP1Jc13JWVg99AULgzrGXcoCWJtJZGz+nr_VoyUocw8n5n5sfQ@mail.gmail.com>
 <1118bbcd-a7b4-47bf-bf35-1a36ab4628e1@bastelstu.be>
 <CAP1Jc11vNiazMRL8Y-4NcJhsEkg6kgFNKLymTGfEPYRv114oTA@mail.gmail.com>
 <45847b93-02bf-459f-bcd2-81ba35a12c24@bastelstu.be>
 <CAOWwgpnt77-Sybzc3hcnFEui9t8wAv03NAov4xCK9ib_VRZ0JA@mail.gmail.com>
Content-Language: en-US
In-Reply-To: <CAOWwgpnt77-Sybzc3hcnFEui9t8wAv03NAov4xCK9ib_VRZ0JA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=C3=BCsterhus?=)

Hi

On 7/2/24 16:48, Nicolas Grekas wrote:
> Thanks for the detailed feedback again, it's very helpful!
> Let me try to answer many emails at once, in chronological order:

Note that this kind of bulk reply make it very hard for me to keep track 
of mailing list threads. It breaks threading, which makes it much harder 
for me to find original context of a quoted part, especially since you 
did not include the author / date for the quotes.

That said, I've taken a look at the differences since my email and also 
gave the entire RFC another read.

> don't touch `readonly` because of lazy objects: this feature is too niche
>> to cripple a major-major feature like `readonly`. I would suggest deferring
>> until after the first bits of this RFC landed.
>>
> 
> Following Marco's advice, we've decided to remove all the flags related to
> the various ways to handle readonly. This also removes the secondary vote.
> The behavior related to readonly properties is now that they are skipped if
> already initialized when calling resetAsLazy* methods, throw in the
> initializer as usual, and are resettable only if the class is not final, as
> already allowed in userland (and as explained in the RFC).

The 'readonly' section still mentions 'makeInstanceLazy', which likely 
is a left-over from a previous version of the RFC. You should have 
another look and clean up the naming there.

>>> There are not many reasons to do that. The only indented use-case that
>>> doesn't involve an object freshly created with
>>> ->newInstanceWithoutConstructor() is to let an object manage its own
>>> laziness by making itself lazy in its constructor:
>>>
>>
>> Okay. But the RFC (and your email) does not explain why I would want do
>> that. It appears that much of the RFC's complexity (e.g. around readonly
>> properties and destructors) stems from the wish to support turning an
>> existing object into a lazy object. If there is no strong reason to
>> support that, I would suggest dropping that. It could always be added in
>> a future PHP version.
>>
> 
> This capability is needed for two reasons: 1. completeness and 2. feature
> parity with what can be currently done using magic methods (so that it's
> already used to solve real-world problems).

Many things are already possible in userland. That does not always mean 
that the cost-benefit ratio is appropriate for inclusion in core. I get 
behind the two examples in the “About Lazy-Loading Strategies” section, 
but I'm afraid I still can't wrap my head why I would want an object 
that makes itself lazy in its own constructor: I have not yet seen a 
real-world example.

> True, thanks for raising this point. After brainstorming with Arnaud, we
> improved this behavior by:
> 1. allowing only parent classes, not child classes
> 2. requiring that all properties from a real instance have a corresponding
> one on the proxy OR that the extra properties on the proxy are skipped/set
> before initialization.
> 
> This means that it's now possible for a child class to add a property,
> private or not. There's one requirement: the property must be skipped or
> set before initialization.
> 
> For the record, with magic methods, we currently have no choice but to
> create an inheritance proxy. This means the situation of having Proxy
> extend Real like in your example is the norm. While doing so, it's pretty
> common to attach some interface so that we can augment Real with extra
> capabilities (let's say Proxy implements LazyObjectInterface). Being able
> to use class Real as a backing store for Proxy gives us a very smooth
> upgrade path (the implementation of the laziness can remain an internal
> detail), and it's also sometimes the only way to leverage a factory that
> returns Real, not Proxy.

I'm not entirely convinced that this is sound now, but I'm not in a 
state to think this through in detail.

I have one question regarding the updated initialization sequence. The 
RFC writes:

> Properties that are declared on the real instance are uninitialized on the proxy instance (including overlapping properties used with ReflectionProperty::skipLazyInitialization() or setRawValueWithoutLazyInitialization()) to synchronize the state shared by both instances.

I do not understand this. Specifically I do not understand the "to 
synchronize the state" bit. My understanding is that the proxy will 
always forward the property access, so there effectively is no state on 
the proxy?! A more expansive explanation would be helpful. Possibly with 
an example that explains what would break if this would not happen.

> That is very true. I had a look at the userland implementation and indeed,
> we keep the wrapper while cloning the backing instance (it's not that we
> have the choice, the engine doesn't give us any other options).
> RFC updated.
> 
> We also updated the behavior when an uninitialized proxy is cloned: we now
> postpone calling $real->__clone to the moment where the proxy clone is
> initialized.

Do I understand it correctly that the initializer of the cloned proxy is 
effectively replaced by the following:

     function (object $clonedProxy) use ($originalProxy) {
         return clone $originalProxy->getRealObject();
     }

? Then I believe this is unsound. Consider the following:

     $myProxy = $r->newLazyProxy(...);
     $clonedProxy = clone $myProxy;
     $r->initialize($myProxy);
     $myProxy->someProp++;
     var_dump($clonedProxy->someProp);

The clone was created before `someProp` was modified, but it outputs the 
value after modification!

Also: What happens if the cloned proxy is initialized *before* the 
original proxy? There is no real object to clone.

I believe the correct behavior would be: Just clone the proxy and keep 
the same initializer. Then both proxies are actually fully independent 
after cloning, as I would expect from the clone operation.

>    Any access to a non-existant (i.e. dynamic) property will trigger
>> initialization and this is not preventable using
>> 'skipLazyInitialization()' and 'setRawValueWithoutLazyInitialization()'
>> because these only work with known properties?
>>
>> While dynamic properties are deprecated, this should be clearly spelled
>> out in the RFC for voters to make an informed decision.
> 
> 
> Absolutely. From a behavioral PoV, dynamic vs non-dynamic properties
> doesn't matter: both kinds are uninitialized at this stage and the engine
> will trigger object handlers in the same way (it will just not trigger the
> same object handlers).
> 

Unless I missed it, you didn't update the RFC to mention this. Please do 
so, I find it important to have a record of all details that were 
discussed (e.g. for the documentation or when evaluating bug reports).

>    > If the object is already lazy, a ReflectionException is thrown with
>> the message “Object is already lazy”.
>>
>> What happens when calling the method on a *initialized* proxy object?
>> i.e. the following:
>>
>>       class Obj { public function __construct(public string $name) {} }
>>       $obj1 = new Obj('obj1');
>>       $r->resetAsLazyProxy($obj, ...);
>>       $r->initialize($obj);
>>       $r->resetAsLazyProxy($obj, ...);
>>
>> What happens when calling it for the actual object of an initialized
>> proxy object?
> 
> 
> Once initialized, a lazy object should be indistinguishable from a non-lazy
> one.
> This means that the second call to resetAsLazyProxy will just do that:
> reset the object like it does for any regular object.
> 
> 
> 
>> It's probably not possible to prevent this, but will this
>> allow for proxy chains? Example:
>>
>>       class Obj { public function __construct(public string $name) {} }
>>       $obj1 = new Obj('obj1');
>>       $r->resetAsLazyProxy($obj1, function () use (&$obj2) {
>>           $obj2 = new Obj('obj2');
>>           return $obj2;
>>       });
>>       $r->resetAsLazyProxy($obj2, function () {
>>           return new Obj('obj3');
>>       });
>>       var_dump($obj1->name); // what will this print?
> 
> 
> This example doesn't work because $obj2 doesn't exist when trying to make
> it lazy but you probably mean this instead?

Ah, yes you are right. An initialization is missing in the middle of the 
two `reset` calls (like in the previous example). My question was 
specifically about resetting an initialized proxy, so your adjusted 
example is *not quite* what I was looking for, but the results should 
probably be the same?

> 
>       class Obj { public function __construct(public string $name) {} }
>>       $obj1 = new Obj('obj1');
>>       $obj2 = new Obj('obj2');
>>       $r->resetAsLazyProxy($obj1, function () use ($obj2) {
>>           return $obj2;
>>       });
>>       $r->resetAsLazyProxy($obj2, function () {
>>           return new Obj('obj3');
>>       });
>>       var_dump($obj1->name); // what will this print?
> 
> 
> This will print "obj3": each object is separate from the other from a
> behavioral perspective, but with such a chain, accessing $obj1 will trigger
> its initializer and will then access $obj2->name, which will trigger the
> second initializer then access $obj3->name, which contains "obj3".
> (I just confirmed with the implementation I have, which is from a previous
> API flavor, but the underlying mechanisms are the same).

Okay, that works as expected then.

> Please let me know if any topics remain unanswered.

I've indeed found two more questions.

1.

Just to confirm my understanding: The RFC mentions that the initializer 
of a proxy receives the proxy object as the first parameter. It further 
mentions that making changes is legal (but likely useless).

My understanding is that attempting to read a property of the 
initializer object will most likely fail, because it still is 
uninitialized? Or are the properties of the proxy object initialized 
with their default value before calling the initializer?

For ghost objects the behavior is clear, just not for proxies.

2.

 > Properties are not initialized to their default value yet (they are 
initialized before calling the initializer).

I see that you removed the bit about this being not observable. What is 
the reason that you removed that? One possible reason that comes to my 
mind is a default value that refers to a non-existing constant. It would 
be observable because the initialization emits an error. Are there any 
other reasons?

Best regards
Tim Düsterhus