[RFC] Short Closures 2, aka auto-capture take 3

3 years ago by Marco Pivetta — view source

unread

Hey Larry,

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures. That RFC didn't fully go to completion due to concerns
over the performance impact, which Nuno and I didn't have bandwidth to
resolve.

Arnaud Le Blanc has now picked up the flag with an improved implementation
that includes benchmarks showing an effectively net-zero performance
impact, aka, good news as it avoids over-capturing.

The RFC has therefore been overhauled accordingly and is now ready for
consideration.

https://wiki.php.net/rfc/auto-capture-closure

Couple questions:

nesting these functions within each other

What happens when/if we nest these functions? Take this minimal example:

$a = 'hello world';

(fn () {
    (fn () {
        echo $a;
    })();
})();

capturing `$this`

In the past (also present), I had to type static fn () => ... or static function () { ... all over the place, to avoid implicitly binding $this
to a closure, causing hidden memory leaks.

Assuming following:

these new closures could capture $this automatically, once detected
these new closures can optimize away unnecessary variables that aren't
captured

Would that allow us to get rid of static fn () { declarations, when
creating one of these closures in an instance method context?

Greets,

Marco Pivetta

https://twitter.com/Ocramius

https://ocramius.github.io/

3 years ago by Arnaud Le Blanc — view source

unread

Hi,

On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:

nesting these functions within each other

What happens when/if we nest these functions? Take this minimal example:
$a = 'hello world';

(fn () {
    (fn () {
        echo $a;
    })();
})();

Capture bubbles up. When an inner function uses a variable, the outer function
in fact uses it too, so it's captured by both functions, by-value.

This example prints "hello world": The inner function captures $a from the
outer function, which captures $a from its declaring scope.

This is equivalent to

(function () use ($a) {
    (function () use ($a) {
        echo $a;
    })();
})();

capturing $this

In the past (also present), I had to type static fn () => ... or static function () { ... all over the place, to avoid implicitly binding $this
to a closure, causing hidden memory leaks.

Assuming following:

these new closures could capture $this automatically, once detected

these new closures can optimize away unnecessary variables that aren't
captured

Would that allow us to get rid of static fn () { declarations, when
creating one of these closures in an instance method context?

It would be great to get rid of this, but ideally this would apply to Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.

--
Arnaud Le Blanc

3 years ago by Nikita Popov — view source

unread

Hi,

On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:
nesting these functions within each other

What happens when/if we nest these functions? Take this minimal example:
$a = 'hello world';

(fn () {
    (fn () {
        echo $a;
    })();
})();
Capture bubbles up. When an inner function uses a variable, the outer
function
in fact uses it too, so it's captured by both functions, by-value.

This example prints "hello world": The inner function captures $a from the
outer function, which captures $a from its declaring scope.

This is equivalent to
(function () use ($a) {
    (function () use ($a) {
        echo $a;
    })();
})();
capturing $this

In the past (also present), I had to type static fn () => ... or
static function () { ... all over the place, to avoid implicitly binding
$this
to a closure, causing hidden memory leaks.

Assuming following:

these new closures could capture $this automatically, once detected

these new closures can optimize away unnecessary variables that aren't
captured

Would that allow us to get rid of static fn () { declarations, when
creating one of these closures in an instance method context?

It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.

I've tried this in the past, and this is not possible due to implicit $this
uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.

Regards,
Nikita

3 years ago by Marco Pivetta — view source

unread

On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:
Hi,

On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:
nesting these functions within each other

What happens when/if we nest these functions? Take this minimal example:
$a = 'hello world';

(fn () {
    (fn () {
        echo $a;
    })();
})();
Capture bubbles up. When an inner function uses a variable, the outer
function
in fact uses it too, so it's captured by both functions, by-value.

This example prints "hello world": The inner function captures $a from
the
outer function, which captures $a from its declaring scope.

This is equivalent to
(function () use ($a) {
    (function () use ($a) {
        echo $a;
    })();
})();
capturing $this

In the past (also present), I had to type static fn () => ... or
static function () { ... all over the place, to avoid implicitly binding
$this
to a closure, causing hidden memory leaks.

Assuming following:

these new closures could capture $this automatically, once detected

these new closures can optimize away unnecessary variables that
aren't
captured

Would that allow us to get rid of static fn () { declarations, when
creating one of these closures in an instance method context?

It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.
I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.

Regards,
Nikita

Hey Nikita,

Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|

Marco Pivetta

https://twitter.com/Ocramius

https://ocramius.github.io/

3 years ago by Nikita Popov — view source

unread

On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:
Hi,

On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:
nesting these functions within each other

What happens when/if we nest these functions? Take this minimal
example:
$a = 'hello world';

(fn () {
    (fn () {
        echo $a;
    })();
})();
Capture bubbles up. When an inner function uses a variable, the outer
function
in fact uses it too, so it's captured by both functions, by-value.

This example prints "hello world": The inner function captures $a from
the
outer function, which captures $a from its declaring scope.

This is equivalent to
(function () use ($a) {
    (function () use ($a) {
        echo $a;
    })();
})();
capturing $this

In the past (also present), I had to type static fn () => ... or
static function () { ... all over the place, to avoid implicitly binding
$this
to a closure, causing hidden memory leaks.

Assuming following:

these new closures could capture $this automatically, once
detected

these new closures can optimize away unnecessary variables that
aren't
captured

Would that allow us to get rid of static fn () { declarations, when
creating one of these closures in an instance method context?

It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.
I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.

Regards,
Nikita
Hey Nikita,

Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|

Maybe easier to understand if you replace Foo::bar() with parent::bar()?
That's the most common spelling for this type of call.

I agree that the syntax we use for this is unfortunate (because it is
syntactically indistinguishable from a static method call, which it is
not), but that's what we have right now, and we can hardly just stop
supporting it.

Regards,
Nikita

3 years ago by Marco Pivetta — view source

unread

Hey Nikita,

On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:

Would that allow us to get rid of static fn () { declarations, when
creating one of these closures in an instance method context?

It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.

I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.

Regards,
Nikita

Hey Nikita,

Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|

Maybe easier to understand if you replace Foo::bar() with parent::bar()?
That's the most common spelling for this type of call.

I agree that the syntax we use for this is unfortunate (because it is
syntactically indistinguishable from a static method call, which it is
not), but that's what we have right now, and we can hardly just stop
supporting it.

Dunno, it's a new construct, so perhaps we could do something about it.
I'm not suggesting we change the existing fn or function declarations,
but in this case, we're introducing a new construct, and some work already
went in to do the eager discovery of by-val variables.

Heck, variable variables already wouldn't work here, according to this RFC
:D

Marco Pivetta

https://twitter.com/Ocramius

https://ocramius.github.io/

3 years ago by Nikita Popov — view source

unread

Hey Nikita,

On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:

Would that allow us to get rid of static fn () { declarations, when
creating one of these closures in an instance method context?

It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate
RFC.

I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.

Regards,
Nikita

Hey Nikita,

Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|

Maybe easier to understand if you replace Foo::bar() with parent::bar()?
That's the most common spelling for this type of call.

I agree that the syntax we use for this is unfortunate (because it is
syntactically indistinguishable from a static method call, which it is
not), but that's what we have right now, and we can hardly just stop
supporting it.

Dunno, it's a new construct, so perhaps we could do something about it.
I'm not suggesting we change the existing fn or function declarations,
but in this case, we're introducing a new construct, and some work already
went in to do the eager discovery of by-val variables.

Heck, variable variables already wouldn't work here, according to this RFC
:D

We're not introducing a new construct. We're just extending existing fn()
functions to accept {} blocks, with exactly the same semantics as before. I
would find it highly concerning if fn() => X and fn() => { return X; } had
differences in capture semantics. Those two expressions should be strictly
identical -- the former should be desugared to the latter.

Nikita

3 years ago by michal.brzuchalski@gmail.com — view source

unread

Hi Larry,

czw., 9 cze 2022 o 18:34 Larry Garfield larry@garfieldtech.com napisał(a):

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures. That RFC didn't fully go to completion due to concerns
over the performance impact, which Nuno and I didn't have bandwidth to
resolve.

Arnaud Le Blanc has now picked up the flag with an improved implementation
that includes benchmarks showing an effectively net-zero performance
impact, aka, good news as it avoids over-capturing.

The RFC has therefore been overhauled accordingly and is now ready for
consideration.

https://wiki.php.net/rfc/auto-capture-closure

Nice work. Well-described behaviors.

One question, more around future scope or related functionality in the
future:
A future RFC for "short-methods" described here in one of your declined RFC
https://wiki.php.net/rfc/short-functions in the past could be revived with
no conflicts in the scope of methods?

class Foo {
public string $firstName = 'John';
public function getFirstName(): string => $this->firstName;
}

I'm asking if I understand the scopes of this and previous RFCs correctly
and if they don't block in future "short-methods"?

Cheers,
Michał Marcin Brzuchalski

3 years ago by Larry Garfield — view source

unread

Hi Larry,

czw., 9 cze 2022 o 18:34 Larry Garfield larry@garfieldtech.com napisał(a):

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures. That RFC didn't fully go to completion due to concerns
over the performance impact, which Nuno and I didn't have bandwidth to
resolve.

Arnaud Le Blanc has now picked up the flag with an improved implementation
that includes benchmarks showing an effectively net-zero performance
impact, aka, good news as it avoids over-capturing.

The RFC has therefore been overhauled accordingly and is now ready for
consideration.

https://wiki.php.net/rfc/auto-capture-closure

Nice work. Well-described behaviors.

One question, more around future scope or related functionality in the
future:
A future RFC for "short-methods" described here in one of your declined RFC
https://wiki.php.net/rfc/short-functions in the past could be revived with
no conflicts in the scope of methods?

class Foo {
public string $firstName = 'John';
public function getFirstName(): string => $this->firstName;
}

I'm asking if I understand the scopes of this and previous RFCs correctly
and if they don't block in future "short-methods"?

Cheers,
Michał Marcin Brzuchalski

The short-functions RFC is entirely separate. The syntax choices in both that RFC and this one were made to ensure that they don't conflict with each other, and the resulting syntax meaning is consistent across the language. The implementations are independent and should in no way conflict.

(The short-functions RFC would have enabled short-methods too. It was purely a syntax sugar with no additional behavior.)

That's assuming the attitude toward the short-function RFC ever changes enough in the future to make it worth trying again...

--Larry Garfield

3 years ago by Rowan Tommins — view source

unread

Last year, Nuno Maduro and I put together an RFC for combining the multi-line capabilities of long-closures with the auto-capture compactness of short-closures ... Arnaud Le Blanc has now picked up the flag with an improved implementation ... The RFC has therefore been overhauled accordingly and is now ready for consideration.

https://wiki.php.net/rfc/auto-capture-closure

First of all, thanks to all three of you for the work on this. Although
I'm not quite convinced yet, I know a lot of people have expressed
desire for this feature over the years.

My main concern is summed up accidentally by your choice of subject line
for this thread: is the proposal to add short closure syntax or is it
to add auto-capturing closures?

They may sound like the same thing, but to me "short closure syntax"
(and a lot of the current RFC) implies that the new syntax is better for
nearly all closures, and that once it is introduced, the old syntax
would only really be there for compatibility - similar to how the []
syntax replaces array() and list(). If that is the aim, it's not enough
to assert that "the majority" of closures are very short; the syntax
should stand up even when used for, say, a middleware handler in a
micro-framework. As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.

On the other hand, "auto-capturing" could be seen as a feature in its
own right; something that users will opt into when it makes sense, while
continuing to use explicit capture in others. If that is the aim, the
proposed syntax is decidedly sub-optimal: to a new user, there is no
obvious reason why "fn" and "function" should imply different semantics,
or which one is which. A dedicated syntax such as use(*) or use(...)
would be much clearer. We could even separately propose that "fn" and
"function" be interchangeable everywhere, allowing combinations such as
"fn() use(...) { return $x; }" and "function() => $x;"

To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.

I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time. And users don't "learn
to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.

This is also where comparison to other languages falls down: most
languages which capture implicitly for closures also merge scopes
implicitly at other times - e.g. global variables in functions; instance
properties in methods; or nested block scopes. Generally they also have
a way to opt out of those, and mark a variable as local to a function or
block; PHP does not, because it has always required an opt in.

Which leads me back to my constructive suggestion: let's introduce a
block scoping syntax (e.g. "let $foo;") as a useful feature in its own
right, before we introduce short closures.

As proposed, users will need to have some idea of what "live variable
analysis" means, or add dummy assignments, if they want to be sure a
variable is actually local. With a block scoping keyword, they can mark
local variables explicitly, as they would in other languages.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Deleu — view source

unread

On Sat, Jun 11, 2022 at 11:14 PM Rowan Tommins rowan.collins@gmail.com
wrote:

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures ... Arnaud Le Blanc has now picked up the flag with an
improved implementation ... The RFC has therefore been overhauled
accordingly and is now ready for consideration.

https://wiki.php.net/rfc/auto-capture-closure

They may sound like the same thing, but to me "short closure syntax"
(and a lot of the current RFC) implies that the new syntax is better for
nearly all closures, and that once it is introduced, the old syntax
would only really be there for compatibility - similar to how the []
syntax replaces array() and list(). If that is the aim, it's not enough
to assert that "the majority" of closures are very short; the syntax
should stand up even when used for, say, a middleware handler in a
micro-framework. As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.

The RFC does mention that the existing Anonymous Function Syntax remains
untouched and will not be deprecated. Whether the new syntax is better for
nearly all closures will be a personal choice. If the new syntax doesn't
suit, say, a middleware handler, then we still can:

reach for the old syntax
use invocable classes
call another method or function which creates a brand new scope and then
returns a function/callable.

On the other hand, "auto-capturing" could be seen as a feature in its
own right; something that users will opt into when it makes sense, while
continuing to use explicit capture in others. If that is the aim, the
proposed syntax is decidedly sub-optimal: to a new user, there is no
obvious reason why "fn" and "function" should imply different semantics,
or which one is which. A dedicated syntax such as use(*) or use(...)
would be much clearer. We could even separately propose that "fn" and
"function" be interchangeable everywhere, allowing combinations such as
"fn() use(...) { return $x; }" and "function() => $x;"

The previous discussions talked about use() or use(...) and most people I
know that would love this RFC to pass would also dislike that alternative.
It does not have the greatest asset for short closure: aesthetics. Maybe my
personal bubble is not statistically relevant, but this is where PHP
Internals is lacking on surveying actual users of the language to help on
such matters. All I can say is that use() is not a replacement for the RFC.

To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.

Any attempt to make it explicit defeats the purpose of the RFC. The
auto-capturing means we don't have to write awkward code to access
variables. The only way we have to avoid awkward syntax (such as use
($var1, $var2)) is to declare an entire new invocable class and send the
parameters via the constructor. When many variables are involved, that may
still be a great option, but doing that just for 1 variable and 2 lines is
quite... sad. When I think of new accessors for this particular case, they
would either be innovative or verbose. If they are verbose, we already have
a syntax for that. If they are innovative, it would be an awkward
out-of-place situation that doesn't happen elsewhere in the language. Or I
lack the imagination to see a different result.

Ultimately, I see fn() as "an opt-in to not create a separate scope for a
function". PHP has several language constructs that may or may not create a
separate scope.
Delimite Scope: function, method, class, procedural file
Shared scope: if, for, foreach, include, require and fn

Regards,

--
Rowan Tommins
[IMSoP]

--

To unsubscribe, visit: https://www.php.net/unsub.php

--
Marco Aurélio Deleu

3 years ago by Larry Garfield — view source

unread

On Sat, Jun 11, 2022 at 11:14 PM Rowan Tommins rowan.collins@gmail.com
wrote:

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures ... Arnaud Le Blanc has now picked up the flag with an
improved implementation ... The RFC has therefore been overhauled
accordingly and is now ready for consideration.

https://wiki.php.net/rfc/auto-capture-closure

They may sound like the same thing, but to me "short closure syntax"
(and a lot of the current RFC) implies that the new syntax is better for
nearly all closures, and that once it is introduced, the old syntax
would only really be there for compatibility - similar to how the []
syntax replaces array() and list(). If that is the aim, it's not enough
to assert that "the majority" of closures are very short; the syntax
should stand up even when used for, say, a middleware handler in a
micro-framework. As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.

The RFC does mention that the existing Anonymous Function Syntax remains
untouched and will not be deprecated. Whether the new syntax is better for
nearly all closures will be a personal choice. If the new syntax doesn't
suit, say, a middleware handler, then we still can:

reach for the old syntax

use invocable classes

call another method or function which creates a brand new scope and then
returns a function/callable.

Correct. If this RFC passes, there will be three equally supported syntaxes for creating closures:

function ($a) use ($b) {
return $a * $b;
};

fn ($a) => $a * $b;

fn ($a) {
return $a * $b;
};

Which one is appropriate in a given situation is left up to developer judgement.

My own personal position would be

use fn => where possible
use fn {} if going mult-line.
if the body of the closure is more than ~3 lines and is not virtually the entire wrapping scope, it should be its own named function/method anyway, and the new first-class-callable syntax makes that nice and easy to use.

That is, I would probably not use the manual-capture syntax very often at all. However, if someone disagrees with me on case 3 it's still there for them if that's easier in context.

Whether the new syntax is viewed as "adding auto-capture to long closures" or "adding multi-line support to short closures" is, in the end, a mostly academic distinction with no practical difference. The resulting syntax is smack in the middle of the two existing ones. The original RFC from a year ago approached it from the perspective of the first; The rewritten RFC leans on the second perspective. The use of both names is mostly a historical artifact of reusing the old URL. :-) The net result is the same.

--Larry Garfield

3 years ago by Rowan Tommins — view source

unread

The RFC does mention that the existing Anonymous Function Syntax
remains untouched and will not be deprecated. Whether the new syntax
is better for nearly all closures will be a personal choice.

I honestly don't think this is how it will be perceived. If this syntax
is approved, people will see "fn" as the "new, better way" and
"function" as the "old, annoying way".

To put it a different way: imagine we had no closure support at all, and
decided that we needed two flavours, one with explicit capture and one
with implicit capture. Would we choose "function" and "fn" as keywords?

The previous discussions talked about use() or use(...) and most
people I know that would love this RFC to pass would also dislike that
alternative. It does not have the greatest asset for short closure:
aesthetics. [...] All I can say is that use() is not a replacement
for the RFC.

I think you're trying to have it both ways here: if you really believed
that the two syntaxes were going to live side by side, there would be no
reason for "aesthetics" to be any more important for one than the other.

Some people are of the opinion that automatic capture should always have
been the default, and the current syntax is a mistake. I'm fine with
that opinion, but I want people to be honest about it rather than
pretending they're just adding a new option for a narrow use case.

Any attempt to make it explicit defeats the purpose of the RFC.

That depends what you think the purpose of the RFC is, which is what I
want people to be honest about.

If the purpose is to replace long lists of captured variables, an
explicit "capture all" syntax like "use(*)" achieves that purpose
perfectly fine.

Ultimately, I see fn() as "an opt-in to not create a separate scope
for a function".

I disagree with both parts of this:

I don't think users will see "fn" as an "opt-in", they'll see it as
"the new normal", and "function" as a rare "opt-out" or a "legacy version".
It does still create a separate scope, it just creates a nested
scope, which combines two sets of variables, in a way that PHP currently
never does.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Deleu — view source

unread

On Sun, Jun 12, 2022 at 2:29 PM Rowan Tommins rowan.collins@gmail.com
wrote:

The RFC does mention that the existing Anonymous Function Syntax
remains untouched and will not be deprecated. Whether the new syntax
is better for nearly all closures will be a personal choice.

I honestly don't think this is how it will be perceived. If this syntax
is approved, people will see "fn" as the "new, better way" and
"function" as the "old, annoying way".

And to me that's not an argument to deny what people want.

To put it a different way: imagine we had no closure support at all, and
decided that we needed two flavours, one with explicit capture and one
with implicit capture. Would we choose "function" and "fn" as keywords?

I often don't indulge such hypotheticals because we will never truly be
able to make progress based on such an assumption. A breaking change that
changes how closure works is just not gonna happen. Given the current state
in the world we're in, what can we do to have a better DX on anonymous
functions?

The previous discussions talked about use() or use(...) and most
people I know that would love this RFC to pass would also dislike that
alternative. It does not have the greatest asset for short closure:
aesthetics. [...] All I can say is that use() is not a replacement
for the RFC.

I think you're trying to have it both ways here: if you really believed
that the two syntaxes were going to live side by side, there would be no
reason for "aesthetics" to be any more important for one than the other.

Some people are of the opinion that automatic capture should always have
been the default, and the current syntax is a mistake. I'm fine with
that opinion, but I want people to be honest about it rather than
pretending they're just adding a new option for a narrow use case.

Honestly I don't think it was a mistake. It was designed more than a decade
ago and there was no way of predicting the future. I've seen code written
20~10 years ago and I've seen code written 5~0 years ago. I think the best
decision was taken at the time it was taken and the world of development
has changed enough for us to make different decisions now.

It's not that I'm trying to have it both ways, I'm just not assuming my
view is the right one. I do believe that if such an RFC is approved, I will
almost never reach for function () use () anymore because I will prefer
the short syntax. If I need a new scope I will reach for an invocable
class. But that doesn't mean other teams/projects/people are forced to
agree or follow the same practices as me or my team.

Any attempt to make it explicit defeats the purpose of the RFC.

That depends what you think the purpose of the RFC is, which is what I
want people to be honest about.

If the purpose is to replace long lists of captured variables, an
explicit "capture all" syntax like "use(*)" achieves that purpose
perfectly fine.

If someone decides to implement function () use (*) on a separate RFC, I
would abstain from that because it's not something I'm interested in using
and it doesn't address the aesthetic issue we have today. I just don't like
it being considered an alternative to the current RFC because it's not. The
purpose is to replace long lists of captured variables while addressing the
aesthetic issue caused by use (), which is the only place in the language
we use this construct.

It seems to me that you agree that there is a chance the proposed syntax is
going to be perceived as better and people will not want to use the old
syntax anymore and that makes you not want to accept the RFC.

Regards,

--
Rowan Tommins
[IMSoP]

--

To unsubscribe, visit: https://www.php.net/unsub.php

--
Marco Aurélio Deleu

3 years ago by Rowan Tommins — view source

unread

I honestly don't think this is how it will be perceived. If this syntax
is approved, people will see "fn" as the "new, better way" and
"function" as the "old, annoying way".

And to me that's not an argument to deny what people want.

I never said it was. I said that if that is what we expect, we should design the feature with that in mind, rather than relying on the older syntax as a crutch.

Given the current state in the world we're in, what can we
do to have a better DX on anonymous functions?

I already gave my answer to that: either add implicit capture as an opt-in to the current syntax; or add block scope and treat short closures as consistent with that.

Honestly I don't think it was a mistake. It was designed more than a decade
ago and there was no way of predicting the future. I've seen code written
20~10 years ago and I've seen code written 5~0 years ago. I think the best
decision was taken at the time it was taken and the world of development
has changed enough for us to make different decisions now.

I've seen that argument before, but it's not clear to me that anything has changed. Anonymous functions are used for roughly the same things that always were, so why are the arguments made when they were added, and again when short closures were discussed previously, no longer valid?

If someone decides to implement function () use (*) on a separate RFC, I
would abstain from that because it's not something I'm interested in using

That's fair enough. Just remember that that is your opinion of what is important, and others may have different views.

aesthetic issue caused by use (), which is the only place in the language
we use this construct.

That's like saying we only use the word "class" when declaring classes. It has slightly different syntax, but "use" is exactly the same principle as importing variables into scope with "global", or declaring them "static". It's entirely in keeping with how scope works in PHP.

It seems to me that you agree that there is a chance the proposed syntax is
going to be perceived as better and people will not want to use the old
syntax anymore and that makes you not want to accept the RFC.

No, it makes me want to make the new syntax as useful as possible.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Deleu — view source

unread

On Sun, Jun 12, 2022 at 6:55 PM Rowan Tommins rowan.collins@gmail.com
wrote:

It seems to me that you agree that there is a chance the proposed syntax
is
going to be perceived as better and people will not want to use the old
syntax anymore and that makes you not want to accept the RFC.

No, it makes me want to make the new syntax as useful as possible.

On the sentiment, we can agree. It just happens that from where I'm
standing, any change to the proposed syntax will make it less useful.

--
Marco Aurélio Deleu

3 years ago by Larry Garfield — view source

unread

Last year, Nuno Maduro and I put together an RFC for combining the multi-line capabilities of long-closures with the auto-capture compactness of short-closures ... Arnaud Le Blanc has now picked up the flag with an improved implementation ... The RFC has therefore been overhauled accordingly and is now ready for consideration.

https://wiki.php.net/rfc/auto-capture-closure

First of all, thanks to all three of you for the work on this. Although
I'm not quite convinced yet, I know a lot of people have expressed
desire for this feature over the years.

To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.

I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time. And users don't "learn
to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.

There are languages that do, however. Some languages have block-scoped variables by default (such as Rust), or partially blocked scoped depending on details. PHP is not one of them, but to someone coming from a language that does, PHP's way of doing things is just as weird and requires learning. The point here is that "which things create a scope and which don't" are not "intuitive" in any language. They're always language-idiomatic, and may or may not be internally consistent, which is the important part.

PHP is fairly internally consistent: functions and classes create a scope, nothing else does. This RFC doesn't change that one way or another, so it's not really any harder to learn. Plus, as noted, the fn keyword becomes consistently the flag saying "auto-capture happens here, FYI", which is already the case as of 7.4.

Which leads me back to my constructive suggestion: let's introduce a
block scoping syntax (e.g. "let $foo;") as a useful feature in its own
right, before we introduce short closures.

As proposed, users will need to have some idea of what "live variable
analysis" means, or add dummy assignments, if they want to be sure a
variable is actually local. With a block scoping keyword, they can mark
local variables explicitly, as they would in other languages.

That may be a useful feature on its own, especially for longer loops. I'm definitely open to discussing that. I don't think that is a prerequisite for a nicer lambda syntax, however, as I don't think the confusion potential is anywhere near as large as you apparently fear it is.

--Larry Garfield

3 years ago by MKS Archive — view source

unread

Last year, Nuno Maduro and I put together an RFC for combining the multi-line capabilities of long-closures with the auto-capture compactness of short-closures ... Arnaud Le Blanc has now picked up the flag with an improved implementation ... The RFC has therefore been overhauled accordingly and is now ready for consideration.

https://wiki.php.net/rfc/auto-capture-closure

First of all, thanks to all three of you for the work on this. Although
I'm not quite convinced yet, I know a lot of people have expressed
desire for this feature over the years.

To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.

I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time. And users don't "learn
to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.

There are languages that do, however. Some languages have block-scoped variables by default (such as Rust), or partially blocked scoped depending on details. PHP is not one of them, but to someone coming from a language that does, PHP's way of doing things is just as weird and requires learning.

Working in Go now for several years I'd say one of its biggest foot guns that I consistently run into when doing code reviews and even in my own code is block-level scoping where one variable shadows the same named variable outside the block and the inner variable is updated when the intention was to update the outer variable.

In short, block level scoping is a convenience that does more harm than good. At least in my experience.

Thus I would highly recommend not adding block level variable scope to PHP where a block in the middle of a function shadows a variable outside the block, and that variable is used below the block, such as for a return value.

#jmtcw #fwiw

-Mike

3 years ago by Rowan Tommins — view source

unread

... users don't "learn to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.
There are languages that do, however. Some languages have block-scoped variables by default (such as Rust), or partially blocked scoped depending on details.

That's not what the RFC example implies, though; it implies that someone
might expect $guests and $guestsIds to not be usable inside the
foreach loop, because they were declared outside it. I don't know of
any language where entering a loop creates a completely empty symbol
table, do you?

Whether or not $guest, having been declared inside the loop, is
visible after the loop is a completely different question, and one
that doesn't apply to closures - the content of the closure hasn't been
executed yet, so it is inevitably a black box to the code after it.

PHP is fairly internally consistent: functions and classes create a scope, nothing else does. This RFC doesn't change that one way or another...

The RFC fundamentally changes the rule that a function always creates a
new, empty scope. Every variable that is not local to that scope has
to be explicitly imported, one way or another.

Every language I know where scopes do not start out empty has keywords
for marking which variables are definitely local to that block. That's
why I think a "var" or "let" equivalent is a natural accompaniment to
changing PHP's rules in this way.

Plus, as noted, the fn keyword becomes consistently the flag saying "auto-capture happens here, FYI", which is already the case as of 7.4.

It's a cute idea, but I don't think "if you miss most of the letters out
of a word, it means this special thing" is at all memorable. I've never
heard the syntax added in 7.4 called "fn functions", but I've frequently
heard it called "arrow functions", because what stands out to people is
the "=>". The keyword is only there because ($a)=>$b on its own would
collide with array syntax.

I would much rather see "fn" and "function" become synonyms, so that
"public fn foo() {}" is a valid method declaration, and "function() =>
$foo" is a valid arrow function.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Arnaud Le Blanc — view source

unread

On samedi 11 juin 2022 23:14:28 CEST Rowan Tommins wrote:

My main concern is summed up accidentally by your choice of subject line
for this thread: is the proposal to add short closure syntax or is it
to add auto-capturing closures?

The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements. I wanted to give a name to the RFC, so that we could
refer to the feature by that name instead of the longer "auto-capture multi-
statement closures". But the auto-capture behavior is an important aspect we
want to inherit from Arrow Functions.

As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.

Currently the use() syntax co-exists with auto-capture, but we could change
it so that an explicit use() list disables auto-capture instead:

fn () use ($a) { } // Would capture $a and disable auto-capture
fn () use () { }   // Would capture nothing and disable auto-capture

On the other hand, "auto-capturing" could be seen as a feature in its
own right; something that users will opt into when it makes sense, while
continuing to use explicit capture in others. If that is the aim, the
proposed syntax is decidedly sub-optimal: to a new user, there is no
obvious reason why "fn" and "function" should imply different semantics,
or which one is which. A dedicated syntax such as use(*) or use(...)
would be much clearer. We could even separately propose that "fn" and
"function" be interchangeable everywhere, allowing combinations such as
"fn() use(...) { return $x; }" and "function() => $x;"

Unfortunately, Arrow Functions already auto-capture today, so requiring a
use(*) to enable auto-capture would be a breaking change.

I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time.

Do you have an example where this would be a problem?

This is also where comparison to other languages falls down: most
languages which capture implicitly for closures also merge scopes
implicitly at other times - e.g. global variables in functions; instance
properties in methods; or nested block scopes. Generally they also have
a way to opt out of those, and mark a variable as local to a function or
block; PHP does not, because it has always required an opt in.

These languages capture/inherit in a read-write fashion. Being able to scope a
variable (opt out of capture) is absolutely necessary otherwise there is only
one scope.

In these languages it is easy to accidentally override/bind a variable from
the parent scope by forgetting a variable declaration.

Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.

Which leads me back to my constructive suggestion: let's introduce a
block scoping syntax (e.g. "let $foo;") as a useful feature in its own
right, before we introduce short closures.

I like this, especially if it also allows to specify a type. However, I don't
think it's needed before this RFC.

As proposed, users will need to have some idea of what "live variable
analysis" means, or add dummy assignments, if they want to be sure a
variable is actually local. With a block scoping keyword, they can mark
local variables explicitly, as they would in other languages.

Live-variable analysis is mentioned in as part of implementation details. It
should not be necessary to understand these details to understand the behavior
of auto-capture.

I've updated the "Auto-capture semantics" section of the RFC.

Regards,

Arnaud Le Blanc

3 years ago by Rowan Tommins — view source

unread

The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements.

That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.

Currently the use() syntax co-exists with auto-capture, but we could change
it so that an explicit use() list disables auto-capture instead:
fn () use ($a) { } // Would capture $a and disable auto-capture
fn () use () { }   // Would capture nothing and disable auto-capture

That's an interesting idea. I was coming from the other direction, but
it might make sense I guess.

By the way, the current RFC implies you could write this:

fn() use (&$myRef, $a) { $myRef = $a * $b; }

It's clear that $myRef is captured by reference, and $a by value; but
what about $b? Is it local to the closure as it would be in a "long"
closure, or implicitly captured by value as it would be with no "use"
statement?

It might be best for such mixtures to raise an error.

Unfortunately, Arrow Functions already auto-capture today, so requiring a
use(*) to enable auto-capture would be a breaking change.

I'm not suggesting any change to arrow functions, just the ability to
write "use(*)" (or "use(...)") in all the place you can write
"use($foo)" today.

I don't think that introduces any problems, if you think of "fn" as an
alternative spelling of "function", and "=>" as expanding to "use(*) {
return"

I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time.
Do you have an example where this would be a problem?

I didn't say anything was a problem; I just said that the comparison
didn't make sense, because the scenarios are so different.

Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.

Live-variable analysis is mentioned in as part of implementation details. It
should not be necessary to understand these details to understand the behavior
of auto-capture.

As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.

Currently, the only way to do so is to understand the "implementation
details" of which variables will be captured, and perhaps add dummy
statements like "$foo = null;" or "unset($foo);" to make sure of it.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Larry Garfield — view source

unread

The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements.

That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.

As noted before, this is a distinction without a difference. The proposed syntax brings in one aspect of short-closures and one aspect of long-closures. Which you consider it "coming from" as a starting point is, in practice, irrelevant.

By the way, the current RFC implies you could write this:

fn() use (&$myRef, $a) { $myRef = $a * $b; }

It's clear that $myRef is captured by reference, and $a by value; but
what about $b? Is it local to the closure as it would be in a "long"
closure, or implicitly captured by value as it would be with no "use"
statement?

It might be best for such mixtures to raise an error.

The RFC already covers that. $b will be auto-captured by value from scope if it exists. See the "Explicit capture" section and its example.

Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.

Live-variable analysis is mentioned in as part of implementation details. It
should not be necessary to understand these details to understand the behavior
of auto-capture.

As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.

Currently, the only way to do so is to understand the "implementation
details" of which variables will be captured, and perhaps add dummy
statements like "$foo = null;" or "unset($foo);" to make sure of it.

There's two different issues you're raising here that almost seem to be contradictory.

Auto-capture could still over-capture without people realizing it. Whether this is actually an issue in practice (or would be) is hard to say with certainty; I'm not sure if it's possible to make an educated guess based on a top-1000 analysis, so we're all trying to predict the future. Note, however, that this risk is already present for short-closures, as the capture logic is the same.

Arguably it's less of an issue only because short-closures are, well, short, so less likely to reuse variables unintentionally. However, based on my top-1000 survey, even today the vast majority of long-closures are only 2-4 lines long. I don't believe that makes it 2-4 times more likely, as it's still trivial for a developer to look at a 2 line closure and say "oh, I'm reusing that variable name, maybe that's not as clear as it could be."

The syntactic indicator that "auto capture will happen". The RFC says "fn". You're recommending "use(*)". However, changing the indicator syntax would do nothing to improve point 1. It's just a longer indicator to use the same logic, especially as it would also require the full "function" word. (Making fn and function synonyms sounds like it would have a lot more knock-on effects that feel very out of scope at present.)

--Larry Garfield

3 years ago by Rowan Tommins — view source

unread

That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.
As noted before, this is a distinction without a difference.

It's a difference in focus which is very evident in some of the comments
on this thread. For instance, Arnaud assumed that adding "use(*)" would
require a change to arrow functions, whereas that never even occurred to
me, because we're looking at the feature through a different lens.

By the way, the current RFC implies you could write this:

fn() use (&$myRef, $a) { $myRef = $a * $b; }
The RFC already covers that. $b will be auto-captured by value from scope if it exists. See the "Explicit capture" section and its example.

So it does. I find that extremely confusing; I think it would be clearer
to error for that case, changing the proposal to:

Short Closures support explicit by-reference capture with the |use|
keyword. Combining a short closure with explicit by-value capture
produces an error.

And the example to:

$a = 1;
fn () use (&$b) {
return $a + $b; // $a is auto-captured by value
// $b is explicitly captured by reference
}

Clearer syntax for this has been cited previously as an advantage of
use(*) or use(...):

$a = 1;
function () use (&$b, ...) { // read as "use $b by reference, everything
else by value"
return $a + $b;
}

Auto-capture could still over-capture without people realizing it. Whether this is actually an issue in practice (or would be) is hard to say with certainty; I'm not sure if it's possible to make an educated guess based on a top-1000 analysis, so we're all trying to predict the future.

I tried to make very explicit what I was and was not disputing:

Whether the risk of these side effects is a big problem is up for
debate, but it's wrong to suggest they don't exist.

The RFC seems to be implying that the implementation removes the side
effects, but it does not, it is users paying attention to their code
which will remove the side effects.

Arguably it's less of an issue only because short-closures are, well, short, so less likely to reuse variables unintentionally.

Our current short closures aren't just a single statement, they're a
single expression, and that's a really significant difference, because
it means to all intents and purposes they have no local scope. (You
can create and use a local variable within one expression, but it
requires the kind of twisted code that only happens in code golf.)

If there are no local variables, there is nothing to be accidentally
captured. That's why the current implementation doesn't bother
optimising which variables it captures - it's pretty safe to assume that
all variables in the expression are either parameters or captured.

The syntactic indicator that "auto capture will happen". The RFC says "fn". You're recommending "use(*)". However, changing the indicator syntax would do nothing to improve point 1.

The reason I think it would be better is because it is a more
intentional syntax: the author of the code is more likely to think
"I'm using an auto-capture closure, rather than an explicit-capture
closure, what effect will that have?" and readers of the code are more
likely to think "hm, this is using auto-capture, I wonder which
variables are local, and which are captured?"

Of course they can still guess wrong, but I don't think "fn" vs
"function" is a strong enough clue.

(Making fn and function synonyms sounds like it would have a lot more knock-on effects that feel very out of scope at present.)

Off the top of my head, I can't think of any, but I admit I haven't
tried hacking it into the parser to see if anything explodes.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Mike Schinkel — view source

unread

That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.
As noted before, this is a distinction without a difference.

It's a difference in focus which is very evident in some of the comments on this thread. For instance, Arnaud assumed that adding "use(*)" would require a change to arrow functions, whereas that never even occurred to me, because we're looking at the feature through a different lens.

By the way, the current RFC implies you could write this:

fn() use (&$myRef, $a) { $myRef = $a * $b; }
The RFC already covers that. $b will be auto-captured by value from scope if it exists. See the "Explicit capture" section and its example.

So it does. I find that extremely confusing; I think it would be clearer to error for that case, changing the proposal to:

Short Closures support explicit by-reference capture with the |use| keyword. Combining a short closure with explicit by-value capture produces an error.

And the example to:

$a = 1;
fn () use (&$b) {
return $a + $b; // $a is auto-captured by value
// $b is explicitly captured by reference
}

Clearer syntax for this has been cited previously as an advantage of use(*) or use(...):

$a = 1;
function () use (&$b, ...) { // read as "use $b by reference, everything else by value"
return $a + $b;
}

Auto-capture could still over-capture without people realizing it. Whether this is actually an issue in practice (or would be) is hard to say with certainty; I'm not sure if it's possible to make an educated guess based on a top-1000 analysis, so we're all trying to predict the future.

I tried to make very explicit what I was and was not disputing:

Whether the risk of these side effects is a big problem is up for debate, but it's wrong to suggest they don't exist.

The RFC seems to be implying that the implementation removes the side effects, but it does not, it is users paying attention to their code which will remove the side effects.

Arguably it's less of an issue only because short-closures are, well, short, so less likely to reuse variables unintentionally.

Our current short closures aren't just a single statement, they're a single expression, and that's a really significant difference, because it means to all intents and purposes they have no local scope. (You can create and use a local variable within one expression, but it requires the kind of twisted code that only happens in code golf.)

If there are no local variables, there is nothing to be accidentally captured. That's why the current implementation doesn't bother optimising which variables it captures - it's pretty safe to assume that all variables in the expression are either parameters or captured.

The syntactic indicator that "auto capture will happen". The RFC says "fn". You're recommending "use(*)". However, changing the indicator syntax would do nothing to improve point 1.

The reason I think it would be better is because it is a more intentional syntax: the author of the code is more likely to think "I'm using an auto-capture closure, rather than an explicit-capture closure, what effect will that have?" and readers of the code are more likely to think "hm, this is using auto-capture, I wonder which variables are local, and which are captured?"

Of course they can still guess wrong, but I don't think "fn" vs "function" is a strong enough clue.

"Strong enough" is an opinion, and it seems all who have commented have differing ones of those.

But maybe a memory device would help address (some of?) your concerns:

fn() — It is SHORT and implicit. Short is CONVENIENT. Thus Short auto-captures variables because that is the most Convenient thing to do.
function() — It is LONG. Long is more EXPLICIT. Thus Long requires Explicitly declaring variables, which is also more rigorous and robust.

Or for the TL;DR crowd:

fn() => SHORT => CONVENIENT => Auto-captures
function() => LONG => EXPLICIT => Requires declaration

Hope this helps. #fwiw

-Mike

3 years ago by Arnaud Le Blanc — view source

unread

On lundi 13 juin 2022 15:36:26 CEST Rowan Tommins wrote:

Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.

Live-variable analysis is mentioned in as part of implementation details.
It should not be necessary to understand these details to understand the
behavior of auto-capture.

As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.

My choice of words in this reply was inaccurate when I said "In these
languages it is easy to accidentally override/bind a variable from
the parent scope by forgetting a variable declaration.", since "override" can
be interpreted in different ways.

What I meant here is that it is not possible to accidentally bind a variable
on the parent scope. This is actually impossible unless you explicitly capture
a variable by-reference. Do you agree with this ?

Possible side-effects via object mutations are documented in the "No
unintended side-effects" section of the RFC. This assumes that property
assignments or method calls to captured objects would be intended, since these
assignments/calls would result in an error if the variable was not defined and
not captured. Do you have examples where assignments/calls would non-
intendedly cause a side effect, with code you would actually write ?

As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.

There are two ways for a closure to have a side-effect (already documented in
the RFC) :

The closure explicit captures a variable by reference, and bind it
The closure mutates a value accessed through a captured variable. Mutable
values include objects and resources, but NOT scalars or arrays (since they
are copy-on-write).

In the first case, this is entirely explicit.

In the second case, the only thing you need do understand is that if you
access a variable you did not define, the variable is either undefined or
comes from the declaring scope. Accessing undefined variables is an error, so
it must come from the declaring scope.

Your example uses isset(), which is valid code in most circumstances, but as
you said it's not particularly good code. Do you have other examples that come
to mind ?

Currently, the only way to do so is to understand the "implementation
details"

I'm willing to make changes if that's true, because I definitely don't want
this to be the case.

3 years ago by Rowan Tommins — view source

unread

The closure mutates a value accessed through a captured variable. Mutable
values include objects and resources, but NOT scalars or arrays (since they
are copy-on-write).

It's not something that is used very often, so is often forgotten or
ignored, but there is technically an edge case where arrays are mutable:
they can contain references, and the references remain "live" even when
the array itself is passed by value.

So although unlikely, it is possible for a by-value closure to
over-write variables in other scopes: https://3v4l.org/dPZlI

// plain variable
$a = 42;
// array containing a reference
$b = [
'a' => &$a
];

// capture the array by-value
$f = function() use($b) {
// update the reference from inside the closure
$b['a'] = 69;
};

// call it
$f();
// observe that both the array and the plain variable now have the new value
var_dump($a, $b);

--
Rowan Tommins
[IMSoP]

3 years ago by Rowan Tommins — view source

unread

(Sorry for double reply, hit send too soon)

Your example uses isset(), which is valid code in most circumstances, but as
you said it's not particularly good code. Do you have other examples that come
to mind ?

There is plenty of code out there in the real world that is not
particularly good, so I think it's a realistic example to think about.

The question is, do we think the language can and should help people
avoid that mistake?

Would the explicitness of "use(...)" make it more likely someone would
spot it?
Would people be more likely to write "let $guest=..." (or whatever
block-scope keyword we choose) than add "$guest = null;" at the
beginning of the closure?

I'm not totally sure, but we should always consider the impact on
less-expert users, not just the power-users who are the ones often
asking for new features.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Dan Ackroyd — view source

unread

Hi Arnaud,

Arnaud Le Blanc arnaud.lb@gmail.com wrote:

Following your comment, I have clarified a few things in the "Auto-capture
semantics" section. This includes a list of way in which these effects can be
observed. These are really marginal cases that are not relevant for most
programs.

Cool, thanks.

Unfortunately, Arrow Functions already auto-capture today, so requiring a
use(*) to enable auto-capture would be a breaking change.

I think there are two things that making this conversation be more
adversial than it could be otherwise:

Some people really want implicit auto-capture, while others are
deeply fearful of it. That comes more from the experience people have
from writing/reading different types of code leading them to have
different aesthetic preferences. Trying to persuade people their lived
experience is wrong, is hard.
The current situation of having to list all variables is kind of
annoying when it doesn't provide much value e.g. for stuff like:

function getCallback($foo, $bar, $quux)
{
return function($x) use ($foo, $bar, $quux)
{
return $quux($foo, $bar, $x);
}
}

Where the code that returns the closure is trivial having to list out
the full of captured variables does feel tedious, and doesn't provide
any value.

I realise it's annoying when people suggest expanding the scope of an
RFC, however...how would you feel about adding support for use(*) to
the current 'long closures'?

That way, people could choose between:

Explicit capture of individual variables: function($x) use ($foo,
$bar, $quux) {...}
Explicit capture of all relevant variables: function($x) use (*) {...}
Implicit capture of all relevant variables, and fewer letters: fn($x) {...}

People who don't want implicit capture would be able tell their code
quality analysis tools to warn on any use of short closures (or
possibly better, warn when a variable has been captured). People who
do want implicit capture can use the short closures which always have
implicit capture.

cheers
Dan
Ack

3 years ago by Dan Ackroyd — view source

unread

Hi Larry, Arnaud,

Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.

Separating off some pedantism from the hopefully constructive comment,
I think some of the words in the RFC are a bit inaccurate:

A by-value capture means that it is not possible to modify any variables from the outer scope:

Because variables are bound by-value, the confusing behaviors often associated with closures do not exist.

Because variables are captured by-value, Short Closures can not have unintended side effects.

Those statements are true for scalar values. They are not true for objects:

class Foo
{
function __construct(public string $value) {}

function __toString()
{
return $this->value;
}
}

$a = new Foo('bar');
$f = fn() {
$a->value = 'explicit scope is nice';
};

print $a; // prints "bar"
$f();
print $a; // prints 'explicit scope is nice';

Yes, I know you can avoid these types of problems by avoiding
mutability, and/or avoiding capturing variables that represent
services, but sometimes those things are needed.

When you are capturing objects that can have side effects, making that
capture be explicit is quite nice (imo). I think the different
emphasis on capturing scalar values or objects might come down to a
difference in style of how different people use closures.

cheers
Dan
Ack

3 years ago by Arnaud Le Blanc — view source

unread

Hi Dan,

On lundi 13 juin 2022 19:49:10 CEST Dan Ackroyd wrote:

Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.

Separating off some pedantism from the hopefully constructive comment,

I think some of the words in the RFC are a bit inaccurate:

A by-value capture means that it is not possible to modify any variables
from the outer scope:

Because variables are bound by-value, the confusing behaviors often
associated with closures do not exist.

Because variables are captured by-value, Short Closures can not have
unintended side effects.
Those statements are true for scalar values. They are not true for objects:

This is shown in the "No unintended side-effects" section of the RFC.

I agree that the choice of words is inaccurate, as "modify any variable" could
be interpreted not only as "bind a variable", but also as "mutate a value".

The section you have quoted is meant to show how by-value capture, which is
the default capture mode in all PHP closures, is less error prone than by-
variable/by-reference capture, by a huge margin. Especially since variable
bindings do not have side-effects unless a variable was explicitly captured
by-reference. Do you agree with this ?

The "No unintended side-effects" section assumes that property assignments to
captured variables are intended side-effects. In your example, the programmer
intended to have a side effect because $a can only come from the declaring
scope (the code would result in an error otherwise) :

$a = new Foo('bar');
$f = fn() {
$a->value = 'explicit scope is nice';
};

Do you have an example where the intent would be less obvious ? With code you
would actually write ?

Cheers,

Arnaud Le Blanc

3 years ago by Guilliam Xavier — view source

unread

Because variables are captured by-value, Short Closures can not have
unintended side effects.

Those statements are true for scalar values. They are not true for objects:

This is shown in the "No unintended side-effects" section of the RFC.

I'm confused by the last example:

$fn2 = function () use (&$a) { /* code with $a AND $b */ }

Isn't that missing a ", $b" in the use?

And like others, I also find that allowing mixing explicit by-value
capture with auto-capture is not really needed and even confusing; if
you "expect that explicitly capturing by value will be rare in
practice" you might as well forbid it?

Maybe you don't even need to add explicit [by-reference] capture to
short closures at all, but rather extend long closures so that we
can write things like:

$val1 = `rand()`; $val2 = `rand()`; $ref = null;
$fn1 = function () use (...) { /* do something with $val1 and $val2 */ };
$fn2 = function () use (&$ref, ...) { $ref = $val1 + $val2; };

(and even if not, at least mention in the RFC that it has been considered)?

By the way, what about arrow functions? e.g.

$fn = fn () use (&$ref) => $ref = $val1 + $val2; // assigns and returns

Would that be allowed? Is it really desirable?

Regards,

--
Guilliam Xavier

3 years ago by Larry Garfield — view source

unread

Because variables are captured by-value, Short Closures can not have
unintended side effects.

Those statements are true for scalar values. They are not true for objects:

This is shown in the "No unintended side-effects" section of the RFC.

I'm confused by the last example:
$fn2 = function () use (&$a) { /* code with $a AND $b */ }
Isn't that missing a ", $b" in the use?

And like others, I also find that allowing mixing explicit by-value
capture with auto-capture is not really needed and even confusing; if
you "expect that explicitly capturing by value will be rare in
practice" you might as well forbid it?

Arnaud and I discussed it, and we're going to drop the mix-autocapture-and-manual functionality. I was tepid on it to begin with, and it can be confusing. RFC will be updated soon.

By the way, what about arrow functions? e.g.
$fn = fn () use (&$ref) => $ref = $val1 + $val2; // assigns and returns
Would that be allowed? Is it really desirable?

I don't think it's really desireable. By-ref closure is unusual, probably even less so in one line closures (though I've not checked that specifically), references are usually a bad idea anyway, and in those unusual cases the long-form is still there if you want to control everything.

--Larry Garfield

3 years ago by Björn Larsson via internals — view source

unread

Den 2022-06-13 kl. 14:57, skrev Arnaud Le Blanc:

On samedi 11 juin 2022 23:14:28 CEST Rowan Tommins wrote:

My main concern is summed up accidentally by your choice of subject line
for this thread: is the proposal to add short closure syntax or is it
to add auto-capturing closures?

The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements. I wanted to give a name to the RFC, so that we could
refer to the feature by that name instead of the longer "auto-capture multi-
statement closures". But the auto-capture behavior is an important aspect we
want to inherit from Arrow Functions.

As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.

Currently the use() syntax co-exists with auto-capture, but we could change
it so that an explicit use() list disables auto-capture instead:
fn () use ($a) { } // Would capture $a and disable auto-capture
fn () use () { }   // Would capture nothing and disable auto-capture
I like this idea very much. In the RFC two variables are captured
explicitly and one implicitly.
$c = 1;
fn () use ($a, &$b) { return $a + $b + $c; }

I don't see the UC / value for not specifying $c while specifying
$a. Think it's much clearer when capturing variables to implicitly
capture everything or list the ones that should be captured. One
only need to think about which variables are listed, not the ones
that might be implicitly captured.

Of course capturing by reference will always be required to list
and if combined with capturing variables by value, they also needs
to be listed.

The there is this other proposal to enhance traditional anonymous
functions by allowing the syntax use(*), meaning capture everything.
Even if it's outside the scope of this RFC it could be mentioned in
"What about Anonymous Functions?" or "Future scope".

r//Björn L

3 years ago by Larry Garfield — view source

unread

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture
compactness of short-closures. That RFC didn't fully go to completion
due to concerns over the performance impact, which Nuno and I didn't
have bandwidth to resolve.

Arnaud Le Blanc has now picked up the flag with an improved
implementation that includes benchmarks showing an effectively net-zero
performance impact, aka, good news as it avoids over-capturing.

The RFC has therefore been overhauled accordingly and is now ready for
consideration.

https://wiki.php.net/rfc/auto-capture-closure

A little data:

I used Nikita's project analyzer on the top 1000 projects to get a rough sense of how long-closures are used now. All usual caveats apply about such survey data. I was specifically looking at how many use statements a closure typically had, and how many statements it typically had. Mainly, I am interested in how common "really long closures where the developer is likely to lose track of what is and isn't closed over" are.

Total closures: 20052
Total used variables: 11534

Avg capture per closure: 0.575
Avg statements per closure: 0.575

Used variable distribution (# of use variables => how many times that happens):
0 => 12833
1 => 4585
2 => 1667
3 => 591
4 => 198
5 => 98
6 => 43
7 => 16
8 => 9
9 => 6
10 => 2
11 => 4

Statement count distribution (# of statements => how many times that happens):
0 => 266
1 => 13134
2 => 2885
3 => 1598
4 => 818
5 => 429
6 => 284
7 => 176
8 => 125
9 => 88
10 => 48
11 => 58
12 => 25
13 => 27
14 => 14
15 => 16
16 => 13
17 => 7
18 => 3
19 => 7
20 => 4
21 => 5
22 => 3
23 => 2
24 => 3
26 => 2
27 => 1
29 => 1
30 => 1
35 => 1
36 => 1
42 => 1
44 => 1
48 => 1
59 => 1
69 => 1
103 => 1
122 => 1

Analysis:

The bulk of closures close over nothing, so are irrelevant for us.
The bulk of closures use only one statement. That means they could easily be short-lambdas today, and are likely just pre-7.4 code that no one has bothered to update.
The overwhelming majority of the rest are 2-3 lines long. The dropoff after that is quite steep. (Approximately halving each time, with a few odd exceptions.)
Similarly, most use clauses contain 1-2 variables, and the dropoff after that is also quite steep.
There's some nitwit out there writing 122 line closures, and closing over 11 variables explicitly. Fortunately it looks like an extremely small number of nitwits. :-)

The primary target of this RFC is people writing 2-4 line closures that import 1-2 variables, both easily small enough that there should be very little risk of developers getting confused by their own code. Based on the data above, I conclude that group is very much the typical case for closures already, and thus the risk of this syntax resulting in harder to follow code where developers get confused about what is imported and what isn't is very low.

--Larry Garfield

3 years ago by Mark Baker — view source

unread

A little data:

I used Nikita's project analyzer on the top 1000 projects to get a rough sense of how long-closures are used now. All usual caveats apply about such survey data. I was specifically looking at how many use statements a closure typically had, and how many statements it typically had. Mainly, I am interested in how common "really long closures where the developer is likely to lose track of what is and isn't closed over" are.

Total closures: 20052
Total used variables: 11534

Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.

--
Mark Baker

|. \ -3
|J/ PHP |
|| | __ |
|| |m| |m|

I LOVE PHP

3 years ago by Larry Garfield — view source

unread

A little data:

I used Nikita's project analyzer on the top 1000 projects to get a rough sense of how long-closures are used now. All usual caveats apply about such survey data. I was specifically looking at how many use statements a closure typically had, and how many statements it typically had. Mainly, I am interested in how common "really long closures where the developer is likely to lose track of what is and isn't closed over" are.

Total closures: 20052
Total used variables: 11534

Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.

New run to check for that:

Total used variables: 11534
ByRef used variables: 1833

So around 13% of used variables are by-ref, and thus would need to be explicitly used even with the new syntax.

There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.

I... don't see what relevance that has? The potential for confusion is at the definition point, not call point. If a closure is used inline then those are the same place, but if they're not, it's only the definition point that is relevant at the moment.

--Larry Garfield

3 years ago by Arnaud Le Blanc — view source

unread

On dimanche 12 juin 2022 19:54:06 CEST Mark Baker wrote:

Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.

Please note that auto-capture binds variables at function declaration. This is
the case in Arrow Functions, and is inherited by this RFC.

3 years ago by Robert Landers — view source

unread

On dimanche 12 juin 2022 19:54:06 CEST Mark Baker wrote:

Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.

Please note that auto-capture binds variables at function declaration. This is
the case in Arrow Functions, and is inherited by this RFC.

--

To unsubscribe, visit: https://www.php.net/unsub.php

From a maintainer and code review aspect, I prefer the longer syntax
because it is 100% clear on which variables are being closed over and
utilized in the anonymous function. fn($x) => $x + $y is pretty clear
that $y is being pulled in from an outer scope but if you start
getting into longer ones, it can get non-obvious pretty quickly...

$func = fn($x) {
$y[] = $x;
// do some stuff
return $y;
}

If $y is pulled from the outside scope, it may or may not be
intentional but hopefully, it is an array. If anyone uses the name $y
outside the lambda, this code may subtly break.

That being said, I'd love this RFC broken into two RFCs, one for
generic auto-capturing and one for multi-line fn functions (just to
reduce some typing when refactoring). There are times when
auto-capturing can be useful for all lambdas, especially when writing
some custom middleware.

3 years ago by Arnaud Le Blanc — view source

unread

On lundi 13 juin 2022 12:28:17 CEST Robert Landers wrote:

From a maintainer and code review aspect, I prefer the longer syntax
because it is 100% clear on which variables are being closed over and
utilized in the anonymous function. fn($x) => $x + $y is pretty clear
that $y is being pulled in from an outer scope but if you start
getting into longer ones, it can get non-obvious pretty quickly...

$func = fn($x) {
$y[] = $x;
// do some stuff
return $y;
}

If $y is pulled from the outside scope, it may or may not be
intentional but hopefully, it is an array. If anyone uses the name $y
outside the lambda, this code may subtly break.

This is true for any function that uses the array-append operator on an
undefined variable.

3 years ago by Rowan Tommins — view source

unread

The primary target of this RFC is people writing 2-4 line closures that import 1-2 variables

The first half of that sentence I was expecting - although as I've
already said, I think the chosen syntax suggests strongly that the RFC
is really targeting all closures, not any subset of them.

The second half makes much less sense. If you are only importing 1 or 2
variables, is writing their names really that big a burden?

Several of the conversations I've had on this in the past have been very
explicitly about the burden of large numbers of captures; if that's
really as rare as you suggest, it makes me wonder why we're even bothering.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Larry Garfield — view source

unread

The primary target of this RFC is people writing 2-4 line closures that import 1-2 variables

The first half of that sentence I was expecting - although as I've
already said, I think the chosen syntax suggests strongly that the RFC
is really targeting all closures, not any subset of them.

The second half makes much less sense. If you are only importing 1 or 2
variables, is writing their names really that big a burden?

Several of the conversations I've had on this in the past have been very
explicitly about the burden of large numbers of captures; if that's
really as rare as you suggest, it makes me wonder why we're even bothering.

Regards,

Disclaimer: My own view, I cannot speak for Nuno or Arnaud.

If you're capturing a very large number of variables, then I would view that as a code smell. "Very large" is subjective, of course, and there's some context to it.

The two main use cases I see myself using are

A) 2-3 liners that use 1-3 variables from scope, so it's dead obvious what they are. In this case, the extra use clause doesn't really add much beyond visible noise.

B) An entire method body is a closure that is being returned, or inlined into an inTransction() call or something like that. In this case, basically all method parameters would be captured, and it would be on the very previous line, so no matter how many there are (more than ~5 is probably a problem with the method, not with the closure), they're redundant and don't tell you anything that isn't already self-evident.

So the burden is in having to think about redundant syntax at all, plus having more redundant text that has to be read in the future. Even with use(*) or use(...) or whatever, that's better than the status quo but is still just more boilerplate that would have to be added/removed when switching from a one line short lambda (side note: This is the term I always use; I basically never use "arrow function". I don't know how typical that is) to a 2-line closure when refactoring.

--Larry Garfield

3 years ago by Dan Ackroyd — view source

unread

That RFC didn't fully go to completion due to concerns over the performance impact

I don't believe that is an accurate summary. There were subtle issues
in the previous RFC that should have been addressed. Nikita Popov
wrote in https://news-web.php.net/php.internals/114239

I'm generally in favor of supporting auto-capture for multi-line closures.

There are some caveats though, which this RFC should address:

Subtle side-effects, visible in debugging functionality, or through destructor
effects (the fact that a variable is captured may be observable). I think it
nothing else, the RFC should at least make clear that this behavior
is explicitly unspecified, and a future implementation may no longer capture
variables where any path from entry to read passes a write.

To be clear, I don't fully understand all those issues myself (and I
have just enough knowledge to know to be scared to look at that part
of the engine) but my understanding is that the concerns are not about
just performance, they are deep concerns about the behaviour.

It would produce a better discussion if the RFC document either said
how those issues are resolved, or detail how they are still
limitations on the implementation.

It also probably would have been better (imo) to create a new RFC
document. The previous RFC went to vote, even if the vote was
cancelled. Diskspace is cheap. Having different (though similar) RFCs
under the same URL makes is confusing when trying to understand what
happened to particular RFCs.

cheers
Dan
Ack

3 years ago by Arnaud Le Blanc — view source

unread

On dimanche 12 juin 2022 20:05:02 CEST Dan Ackroyd wrote:

That RFC didn't fully go to completion due to concerns over the
performance impact
I don't believe that is an accurate summary. There were subtle issues
in the previous RFC that should have been addressed. Nikita Popov
wrote in https://news-web.php.net/php.internals/114239

It would produce a better discussion if the RFC document either said
how those issues are resolved, or detail how they are still
limitations on the implementation.

To be clear, I don't fully understand all those issues myself (and I
have just enough knowledge to know to be scared to look at that part
of the engine) but my understanding is that the concerns are not about
just performance, they are deep concerns about the behaviour.

Thank you for pointing this out. Nikita was referring to side-effects of
capturing too much variables, and suggested to make the capture analysis
behavior explicitly unspecified in the RFC so that it could be changed
(optimized) later.

The new version of the RFC does the optimization.

Following your comment, I have clarified a few things in the "Auto-capture
semantics" section. This includes a list of way in which these effects can be
observed. These are really marginal cases that are not relevant for most
programs.

Cheers

Arnaud Le Blanc

3 years ago by Rowan Tommins — view source

unread

Following your comment, I have clarified a few things in the "Auto-capture
semantics" section. This includes a list of way in which these effects can be
observed. These are really marginal cases that are not relevant for most
programs.

I'm not sure I agree that all of these are marginal, or with the way
you've characterised them...

Note that destructor timing is undefined in PHP, especially when
reference cycles exist.

Outside of reference cycles, which are pretty rare and generally easy to
avoid, PHP's destructors are entirely deterministic. Unlike in fully
garbage-collected languages, you can use a plain object to implement an
"RAII" pattern - e.g. the constructor locks a file and the destructor
unlocks it; or the constructor starts a transaction, and the destructor
rolls it back if not yet committed.

A related case is resource lifetime: file and network handles are
guaranteed to be closed when they go out of scope, and accidentally
taking an extra copy of their "value" can prevent that.

It ends up capturing the same variables that would have been captured
by a manually curated |use| list.

This slightly muddles two different questions:

Given a well-written closure, where all variables are either clearly
local or clearly intended to be captured, does the implementation do a
good job of distinguishing them?
Given a badly-written closure, where variables are accidentally
ambiguous, what side-effects might the user experience?

The answer to question 1 seems to be yes, the implementation does a good
job, and that's good news, and thank you for working on it.

That is not the same, however, as saying that question 2 is never
relevant. Consider the following, adapted from an example in the RFC:

$filter = fn ($user) {
    if ( $user->id !== -1 ) {
      $guest = $repository->findByUserId($user->id);
    }
    return isset($guest) && in_array($guest->id, $guestsIds);
};

This is not particularly great code, but it works ... unless the parent
scope happens to have a variable named $guest, which will then be bound
to the closure, since there is a path where it is read before being
written. In this case, side effects include:

The behaviour will change based on the captured value of $guest
Any resources held by that value will be held until $filter is
destructed, rather than when $guest is destructed

Whether the risk of these side effects is a big problem is up for
debate, but it's wrong to suggest they don't exist.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Larry Garfield — view source

unread

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture
compactness of short-closures. That RFC didn't fully go to completion
due to concerns over the performance impact, which Nuno and I didn't
have bandwidth to resolve.

Arnaud Le Blanc has now picked up the flag with an improved
implementation that includes benchmarks showing an effectively net-zero
performance impact, aka, good news as it avoids over-capturing.

The RFC has therefore been overhauled accordingly and is now ready for
consideration.

https://wiki.php.net/rfc/auto-capture-closure

The conversation has died down, so we'll be opening the vote for this tomorrow.

Two changes of note since the discussion started:

The option to mix explicit capture and implicit capture has been removed as too confusing/unpredictable. Either trust the engine to capture the right things (the new syntax proposed here) or explicitly list everything (the existing syntax we've had since 5.3.)
We added a section discussing the use(*) syntax alternative, and why it wasn't, er, used. (Pun only sort of intended.)

--Larry Garfield

3 years ago by Björn Larsson via internals — view source

unread

Den 2022-06-29 kl. 19:30, skrev Larry Garfield:

Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture
compactness of short-closures. That RFC didn't fully go to completion
due to concerns over the performance impact, which Nuno and I didn't
have bandwidth to resolve.

Arnaud Le Blanc has now picked up the flag with an improved
implementation that includes benchmarks showing an effectively net-zero
performance impact, aka, good news as it avoids over-capturing.

The RFC has therefore been overhauled accordingly and is now ready for
consideration.

https://wiki.php.net/rfc/auto-capture-closure

The conversation has died down, so we'll be opening the vote for this tomorrow.

Two changes of note since the discussion started:

The option to mix explicit capture and implicit capture has been removed as too confusing/unpredictable. Either trust the engine to capture the right things (the new syntax proposed here) or explicitly list everything (the existing syntax we've had since 5.3.)

We added a section discussing the use(*) syntax alternative, and why it wasn't, er, used. (Pun only sort of intended.)

--Larry Garfield

Hi,

Would it be an option to include a "Future scope" with the features:

Explicit capture that list only the variables to be captured by value
or reference, nothing else.
Extending the traditional anonymous function with use(*) for capturing
everything.

Anyway, hope this passes for PHP 8.2!

Regards //Björn Larsson

3 years ago by Arnaud Le Blanc — view source

unread

Hi Björn,

On Wed, Jun 29, 2022 at 8:09 PM Björn Larsson via internals <
internals@lists.php.net> wrote:

Would it be an option to include a "Future scope" with the features:

Explicit capture that list only the variables to be captured by value
or reference, nothing else.

Extending the traditional anonymous function with use(*) for capturing
everything.

Anyway, hope this passes for PHP 8.2!

Thank you for the suggestion. The RFC now includes a Future Scope section.
The extension of traditional anonymous functions is discussed separately in
the "Alternative implementations" section.

Regards,

3 years ago by Dan Ackroyd — view source

unread

The conversation has died down, so we'll be opening the vote for this tomorrow.

I think I've just thought of a problem with the optimization bit of
'not capturing variables if they are written to before being used
inside the closure'.

Imagine some code that looks like this:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
// Free that resource
$some_resource = null;
}

// do some stuff that assumes the exclusive
// lock is still active.

// call the callback that we 'know' frees the resource
$fn();

That's a not unreasonable piece of code to write even if it's of a
style many people avoid. I believe in C++ it's called "Resource
acquisition is initialization", though they're trying to change the
name to "Scope-Bound Resource Management" as that is a better
description of what it is.

With the optimization in place, that code would not behave
consistently with how the rest of PHP works, where the lifetime of an
object is reasonably well defined with "The destructor method will be
called as soon as there are no other references to a particular
object,".

From the RFC:

This approach would result in a waste of memory or CPU usage.

For the record, all of my previous concerns about scoping rules have
been about making code hard to reason about, and behave sanely. Memory
itself is cheap.

Although not having that optimization might mean that some variables
last longer than they should, that is at least explainable*. Having
variables not last as long as they should (because of an optimization)
is harder to explain, and harder to explain how to work around.

cheers
Dan
Ack

either use long closures or change your variable name if you don't
want it captured.

3 years ago by Rowan Tommins — view source

unread

Imagine some code that looks like this:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
// Free that resource
$some_resource = null;
}

// do some stuff that assumes the exclusive
// lock is still active.

// call the callback that we 'know' frees the resource
$fn();

That's a not unreasonable piece of code to write

For that to work, it would require the variable to be captured by
reference, not value. Writing to a variable captured by value, like
writing to a parameter passed by value, is just writing to a local variable.

In fact, the "optimisation" is in my opinion a critical part of the
semantics, to avoid the opposite problem:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
    // Use a variable that happens to have the same name
    // A naive implementation would see $some_resource mentioned, and
capture it
    // Over-writing the local variable here makes no difference; the
closure still holds the value for next time
    $some_resource = 'hello';
}

// Free what we believe is the last pointer, to trigger the destructor
unset($some_resource);

// If $some_resource gets captured, it can only be released by
destroying the closure
unset($fn);

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Robert Landers — view source

unread

Imagine some code that looks like this:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
// Free that resource
$some_resource = null;
}

// do some stuff that assumes the exclusive
// lock is still active.

// call the callback that we 'know' frees the resource
$fn();

That's a not unreasonable piece of code to write

For that to work, it would require the variable to be captured by
reference, not value. Writing to a variable captured by value, like
writing to a parameter passed by value, is just writing to a local variable.

In fact, the "optimisation" is in my opinion a critical part of the
semantics, to avoid the opposite problem:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
// Use a variable that happens to have the same name
// A naive implementation would see $some_resource mentioned, and
capture it
// Over-writing the local variable here makes no difference; the
closure still holds the value for next time
$some_resource = 'hello';
}

// Free what we believe is the last pointer, to trigger the destructor
unset($some_resource);

// If $some_resource gets captured, it can only be released by
destroying the closure
unset($fn);

Regards,

--
Rowan Tommins
[IMSoP]

--

To unsubscribe, visit: https://www.php.net/unsub.php

For that to work, it would require the variable to be captured by
reference, not value.

I think their suggested code would work (at least currently in PHP) by
the simple fact they would increase the reference count on that
object/resource until they set it as null. However, with the
"optimization," the reference count will never be incremented and thus
fail to work as defined.

3 years ago by Rowan Tommins — view source

unread

I think their suggested code would work (at least currently in PHP) by
the simple fact they would increase the reference count on that
object/resource until they set it as null. However, with the
"optimization," the reference count will never be incremented and thus
fail to work as defined.

No, the captured value is tied to the lifetime of the closure itself,
not the variable inside the closure.

$some_resource = acquire_some_resource();
// refcount=1 (outer $some_resource)

$fn = function() use ($some_resource) {
$some_resource = null;
}
// refcount=2 (outer $some_resource, closure $fn)

$fn();
// during execution, refcount is 3 (outer $some_resource, closure $fn,
local $some_resource)
// once the local variable is written to, the refcount goes back to 2
(outer $some_resource, closure $fn)

unset($some_resource);
// refcount=1 (closure $fn)

$fn();
// the captured variable always starts with its original value,
regardless of how many times you execute the function
// during execution, refcount is now 2 (closure $fn, local $some_resource)
// after execution, refcount is still 1 (closure $fn)

unset($fn);
// only now does the refcount go down to 0 and trigger the destructor

The only way for it to work would be using capture by reference (not
supported by the proposed short syntax):

$some_resource = acquire_some_resource();
// refcount=1: simple variable

$fn = function() use (&$some_resource) {
$some_resource = null;
}
// refcount=1: a reference set with 2 members (outer $some_resource,
closure $fn)

$fn();
// during execution, we have a reference set with 3 members (outer
$some_resource, closure $fn, local $some_resource)
// the assignment assigns to this reference set, changing the value
referenced by all 3 members
// refcount on the resource drops from 1 to 0, triggering the destructor

$fn();
// because it was captured by reference, the initial value of
$some_resource in the closure has now changed

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Dan Ackroyd — view source

unread

Hi Rowan,

Rowan wrote:

For that to work, it would require the variable to be captured by
reference, not value.
...
The only way for it to work would be using capture by reference (not
supported by the proposed short syntax):

I wrote about this before. Some of the words in the RFC are, in my
opinion, quite inaccurate:

Danack wrote in https://news-web.php.net/php.internals/117938 :

Those statements are true for scalar values. They are not true for objects:

With automatic capturing of variables, for the code example I gave the
user would want the variable to be captured, and to them it looks like
it should be, but because of an optimization it is not.

When the code doesn't work as they expect it to, the programmer is
likely to add a var_dump to try to see what is happening. Which makes
it look like their code 'should' work, as their resource object is
still alive.

In fact, the "optimisation" is in my opinion a critical part of the
semantics, to avoid the opposite problem:

As I said, I think that problem is a lot easier to explain "either use
long closures or change your variable name if you don't want it
captured." than trying to explain "yes, the variable is referenced
inside the closure, but it's not captured because you aren't reading
from it".

cheers
Dan
Ack

For this code, comment the var_dump in/out to affect the lifetime of the object.

class ResourceType
{
public function __destruct() {
echo "Resource is released.\n";
}
}

function get_callback()
{
$some_resource = new ResourceType();
$fn = fn() {
// // why is my lock released?
var_dump($some_resource);
// "Free that resource"
$some_resource = null;
};
return $fn;
}

$fn = get_callback();
echo "Before callback\n";
$fn();
echo "After callback\n";

// Without var_dump
Resource is released.
Before callback
After callback

// With var_dump
Before callback
object(ResourceType)#1 (0) {
}
After callback
Resource is released.

3 years ago by Robert Landers — view source

unread

Hi Rowan,

Rowan wrote:

For that to work, it would require the variable to be captured by
reference, not value.
...
The only way for it to work would be using capture by reference (not
supported by the proposed short syntax):

I wrote about this before. Some of the words in the RFC are, in my
opinion, quite inaccurate:

Danack wrote in https://news-web.php.net/php.internals/117938 :

Those statements are true for scalar values. They are not true for objects:

With automatic capturing of variables, for the code example I gave the
user would want the variable to be captured, and to them it looks like
it should be, but because of an optimization it is not.

When the code doesn't work as they expect it to, the programmer is
likely to add a var_dump to try to see what is happening. Which makes
it look like their code 'should' work, as their resource object is
still alive.

In fact, the "optimisation" is in my opinion a critical part of the
semantics, to avoid the opposite problem:

As I said, I think that problem is a lot easier to explain "either use
long closures or change your variable name if you don't want it
captured." than trying to explain "yes, the variable is referenced
inside the closure, but it's not captured because you aren't reading
from it".

cheers
Dan
Ack

For this code, comment the var_dump in/out to affect the lifetime of the object.

class ResourceType
{
public function __destruct() {
echo "Resource is released.\n";
}
}

function get_callback()
{
$some_resource = new ResourceType();
$fn = fn() {
// // why is my lock released?
var_dump($some_resource);
// "Free that resource"
$some_resource = null;
};
return $fn;
}

$fn = get_callback();
echo "Before callback\n";
$fn();
echo "After callback\n";

// Without var_dump
Resource is released.
Before callback
After callback

// With var_dump
Before callback
object(ResourceType)#1 (0) {
}
After callback
Resource is released.

--

To unsubscribe, visit: https://www.php.net/unsub.php

Rowan wrote:

No, the captured value is tied to the lifetime of the closure itself,
not the variable inside the closure.

With the "optimization," it won't be captured at all by the closure,
possibly causing some resources to go out of scope early. Are
optimizations going to be applied to single-line arrow functions (I
didn't see that in the RFC, but I admittedly didn't look that hard and
I vaguely remember reading something about it in one of these
threads)? If so, it will probably change some behaviors in existing
applications if they were relying on it. Perhaps static analysis tools
can detect this and inform the developer.

Here's Dan's code: https://3v4l.org/99XUN#v8.1.7 that he just sent,
modified to not capture the $some_resource and you can see that it is
indeed released earlier than if it were captured.

3 years ago by Guilliam Xavier — view source

unread

Hi Rowan,

Rowan wrote:

For that to work, it would require the variable to be captured by
reference, not value.
...
The only way for it to work would be using capture by reference (not
supported by the proposed short syntax):

I wrote about this before. Some of the words in the RFC are, in my
opinion, quite inaccurate:

Danack wrote in https://news-web.php.net/php.internals/117938 :

Those statements are true for scalar values. They are not true for objects:

But the RFC has been updated since (notably the DateTime example); do
you find the current wording still inaccurate?

With automatic capturing of variables, for the code example I gave the
user would want the variable to be captured, and to them it looks like
it should be, but because of an optimization it is not.

Am I missing something here? To me, it has been explained (and shown)
by Rowan (and me) that the code example you gave would not work as
expected even without the optimization (for it to work it would need
to either capture by reference, or use e.g.
$some_resource->close(); [or close($some_resource);] instead of a
destructor); but maybe we don't "expect" the same behavior in the
first place?

When the code doesn't work as they expect it to, the programmer is
likely to add a var_dump to try to see what is happening. Which makes
it look like their code 'should' work, as their resource object is
still alive.

This indeed seems a valid point (that adding a
var_dump($some_resource); before the $some_resource = null;
changes it from "not captured" to "captured", with an effect on its
lifetime). But are there "real" cases where it would actually
matter?

In fact, the "optimisation" is in my opinion a critical part of the
semantics, to avoid the opposite problem:

As I said, I think that problem is a lot easier to explain "either use
long closures or change your variable name if you don't want it
captured." than trying to explain "yes, the variable is referenced
inside the closure, but it's not captured because you aren't reading
from it".

Same as above.

Rowan wrote:

No, the captured value is tied to the lifetime of the closure itself,
not the variable inside the closure.

With the "optimization," it won't be captured at all by the closure,
possibly causing some resources to go out of scope early.

And it has been explained that conversely, capturing it would possible
cause some resources to "remain in scope" late.

Are
optimizations going to be applied to single-line arrow functions (I
didn't see that in the RFC, but I admittedly didn't look that hard and
I vaguely remember reading something about it in one of these
threads)?

Seems so: https://github.com/php/php-src/pull/8330/files#diff-85701127596aca0e597bd7961b5d59cdde4f6bb3e2a109a22be859ab7568b4d2R7318-R7320

If so, it will probably change some behaviors in existing
applications if they were relying on it. Perhaps static analysis tools
can detect this and inform the developer.

Here too, do you have a "real" case where it would actually matter?

Here's Dan's code: https://3v4l.org/99XUN#v8.1.7 that he just sent,
modified to not capture the $some_resource and you can see that it is
indeed released earlier than if it were captured.

And here it is "un-modified": https://3v4l.org/gZai2 where you see
that calling $fn() (which internally nullifies its local copy of
$some_resource) does not release; is it really what you expect? are
you creating the closure only to extend the lifetime of
$some_resource?

Regards,

--
Guilliam Xavier

3 years ago by Robert Landers — view source

unread

On Thu, Jun 30, 2022 at 5:47 PM Guilliam Xavier
guilliam.xavier@gmail.com wrote:

Hi Rowan,

Rowan wrote:

For that to work, it would require the variable to be captured by
reference, not value.
...
The only way for it to work would be using capture by reference (not
supported by the proposed short syntax):

I wrote about this before. Some of the words in the RFC are, in my
opinion, quite inaccurate:

Danack wrote in https://news-web.php.net/php.internals/117938 :

Those statements are true for scalar values. They are not true for objects:

But the RFC has been updated since (notably the DateTime example); do
you find the current wording still inaccurate?

With automatic capturing of variables, for the code example I gave the
user would want the variable to be captured, and to them it looks like
it should be, but because of an optimization it is not.

Am I missing something here? To me, it has been explained (and shown)
by Rowan (and me) that the code example you gave would not work as
expected even without the optimization (for it to work it would need
to either capture by reference, or use e.g.
$some_resource->close(); [or close($some_resource);] instead of a
destructor); but maybe we don't "expect" the same behavior in the
first place?

When the code doesn't work as they expect it to, the programmer is
likely to add a var_dump to try to see what is happening. Which makes
it look like their code 'should' work, as their resource object is
still alive.

This indeed seems a valid point (that adding a
var_dump($some_resource); before the $some_resource = null;
changes it from "not captured" to "captured", with an effect on its
lifetime). But are there "real" cases where it would actually
matter?

In fact, the "optimisation" is in my opinion a critical part of the
semantics, to avoid the opposite problem:

As I said, I think that problem is a lot easier to explain "either use
long closures or change your variable name if you don't want it
captured." than trying to explain "yes, the variable is referenced
inside the closure, but it's not captured because you aren't reading
from it".

Same as above.

Rowan wrote:

No, the captured value is tied to the lifetime of the closure itself,
not the variable inside the closure.

With the "optimization," it won't be captured at all by the closure,
possibly causing some resources to go out of scope early.

And it has been explained that conversely, capturing it would possible
cause some resources to "remain in scope" late.

Are
optimizations going to be applied to single-line arrow functions (I
didn't see that in the RFC, but I admittedly didn't look that hard and
I vaguely remember reading something about it in one of these
threads)?

Seems so: https://github.com/php/php-src/pull/8330/files#diff-85701127596aca0e597bd7961b5d59cdde4f6bb3e2a109a22be859ab7568b4d2R7318-R7320

If so, it will probably change some behaviors in existing
applications if they were relying on it. Perhaps static analysis tools
can detect this and inform the developer.

Here too, do you have a "real" case where it would actually matter?

Here's Dan's code: https://3v4l.org/99XUN#v8.1.7 that he just sent,
modified to not capture the $some_resource and you can see that it is
indeed released earlier than if it were captured.

And here it is "un-modified": https://3v4l.org/gZai2 where you see
that calling $fn() (which internally nullifies its local copy of
$some_resource) does not release; is it really what you expect? are
you creating the closure only to extend the lifetime of
$some_resource?

Regards,

--
Guilliam Xavier

--

To unsubscribe, visit: https://www.php.net/unsub.php

And here it is "un-modified": https://3v4l.org/gZai2 where you see
that calling $fn() (which internally nullifies its local copy of
$some_resource) does not release; is it really what you expect? are
you creating the closure only to extend the lifetime of $some_resource?

Personally, not that I'm aware of, which is the point. This may subtly
change code that works just fine today and it will be hard to track it
down. Though perhaps static analysis/IDE's will help track it down by
pointing out automatically captured vs. non-captured variables.

Ah, I see Arnaud just confirmed that it won't be applied to existing
arrow functions. Perhaps this is a moot point and it will be just
another quirk to be aware of when writing PHP. I was just worried
about it being applied to any existing code.

3 years ago by Arnaud Le Blanc — view source

unread

Hi,

On jeudi 30 juin 2022 16:18:44 CEST Robert Landers wrote:

Are
optimizations going to be applied to single-line arrow functions (I
didn't see that in the RFC, but I admittedly didn't look that hard and
I vaguely remember reading something about it in one of these
threads)? If so, it will probably change some behaviors in existing
applications if they were relying on it. Perhaps static analysis tools
can detect this and inform the developer.

It is not planned to change the behavior of arrow functions in this RFC. This
optimization is less important for arrow functions because they don't usually
assign variables.

This could be a follow up RFC though.

3 years ago by Guilliam Xavier — view source

unread

Hi,

On jeudi 30 juin 2022 16:18:44 CEST Robert Landers wrote:

Are
optimizations going to be applied to single-line arrow functions (I
didn't see that in the RFC, but I admittedly didn't look that hard and
I vaguely remember reading something about it in one of these
threads)? If so, it will probably change some behaviors in existing
applications if they were relying on it. Perhaps static analysis tools
can detect this and inform the developer.

It is not planned to change the behavior of arrow functions in this RFC. This
optimization is less important for arrow functions because they don't usually
assign variables.

Ah? Sorry, I had interpreted
https://github.com/php/php-src/pull/8330/files#diff-85701127596aca0e597bd7961b5d59cdde4f6bb3e2a109a22be859ab7568b4d2R7318-R7320
as "capture the minimal set of variables for both arrow functions
and short closures", but I was wrong?

I don't see a test like this:

class C {
    public function __destruct() { echo 'destructed', PHP_EOL; }
}
$x = new C();
$fn = fn ($a, $b) => (($x = $a ** 2) + ($y = $b ** 2)) * ($x - $y);
echo '- unsetting $x', PHP_EOL;
unset($x);
echo '- calling $fn', PHP_EOL;
var_dump($fn(3, 2));
echo '- unsetting $fn', PHP_EOL;
unset($fn);
echo '- DONE.', PHP_EOL;

with current output (https://3v4l.org/ve3BL#v8.1.7):

- unsetting $x
- calling $fn
int(65)
- unsetting $fn
destructed
- DONE.

where the optimization would make the "destructed" line move up to
just after "- unsetting $x"

--
Guilliam Xavier

3 years ago by Arnaud Le Blanc — view source

unread

On jeudi 30 juin 2022 18:29:44 CEST Guilliam Xavier wrote:

Ah? Sorry, I had interpreted
https://github.com/php/php-src/pull/8330/files#diff-85701127596aca0e597bd796
1b5d59cdde4f6bb3e2a109a22be859ab7568b4d2R7318-R7320 as "capture the
minimal set of variables for both arrow functions and short closures",
but I was wrong?

No, you are right, the PR changes arrow functions too. But in the RFC we
decided to not touch the arrow functions for now.

3 years ago by Rowan Tommins — view source

unread

On jeudi 30 juin 2022 18:29:44 CEST Guilliam Xavier wrote:

Ah? Sorry, I had interpreted
https://github.com/php/php-src/pull/8330/files#diff-85701127596aca0e597bd796
1b5d59cdde4f6bb3e2a109a22be859ab7568b4d2R7318-R7320 as "capture the
minimal set of variables for both arrow functions and short closures",
but I was wrong?
No, you are right, the PR changes arrow functions too. But in the RFC we
decided to not touch the arrow functions for now.

Personally, I would be in favour of leaving the change in for arrow
functions as well. The fact that a variable of the same name, whose
value is never actually used, is captured by the closure, is to me a
bug, not a feature.

It's hard to even contrive an example where this is observable, so I
highly doubt anyone is relying on it.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Arnaud Le Blanc — view source

unread

Hi Rowan,

Since this still has a small chance of breaking existing code, we preferred
to exclude arrow functions from this change, for now. We have added this in
Future Scope. This is something we could do in a follow-up RFC.

Regards,

3 years ago by Guilliam Xavier — view source

unread

On jeudi 30 juin 2022 18:29:44 CEST Guilliam Xavier wrote:

Ah? Sorry, I had interpreted
https://github.com/php/php-src/pull/8330/files#diff-85701127596aca0e597bd796
1b5d59cdde4f6bb3e2a109a22be859ab7568b4d2R7318-R7320 as "capture the
minimal set of variables for both arrow functions and short closures",
but I was wrong?

No, you are right, the PR changes arrow functions too. But in the RFC we
decided to not touch the arrow functions for now.

Ah, I see that you have updated the PR indeed:
https://github.com/php/php-src/pull/8330/commits/5bb0a1c8d032666079db5dab94b4b22b2afa9dac
(and thanks for the test).

PS: so the link I gave before is now outdated ^^' I should have given
https://github.com/php/php-src/pull/8330/commits/9dec265adba44dcf9d2cadc05dd5ad842fc4ae66#diff-85701127596aca0e597bd7961b5d59cdde4f6bb3e2a109a22be859ab7568b4d2R7318-R7320

--
Guilliam Xavier

3 years ago by Arnaud Le Blanc — view source

unread

Hi Guilliam,

Thank you for noticing.

The PR is now fully in sync with the RFC (no arrow function changes, no
explicit use support).

The RFC also now clarifies that arrow functions are not changed by the RFC.

3 years ago by Rowan Tommins — view source

unread

With automatic capturing of variables, for the code example I gave the
user would want the variable to be captured, and to them it looks like
it should be, but because of an optimization it is not.

Please look again at the detailed explanation I gave, and the examples
that Guilliam posted. Your example can only work if the variable is
captured by reference, because it requires the statement inside the
closure to have an effect on a variable outside the closure. No
version of auto-capture has ever proposed capturing by reference.

If instead of $some_resource = null; you wrote
$some_container->some_resource = null; then that would have an effect on
the object, but the "optimisation" would be irrelevant because the use
of $some_container itself is not an assignment.

As I said, I think that problem is a lot easier to explain "either use
long closures or change your variable name if you don't want it
captured." than trying to explain "yes, the variable is referenced
inside the closure, but it's not captured because you aren't reading
from it".

Right now, assigning (or unsetting) a variable is the only way to
force it to be local. That's why I said I would be more likely to
support this feature alongside a "var" or "let" keyword to make such
variables explicit. Not being able to have local variables at all
other than by very careful variable naming is a terrible idea.

Just to re-iterate, here's your new example with explicit capture, to
demonstrate that the closure does not and cannot free the resource:
https://3v4l.org/WrTb5

class ResourceType
{
public function __destruct() {
echo "Resource is released.\n";
}
}

function get_callback()
{
$some_resource = new ResourceType();
$fn = function() use ($some_resource) {
// this line does nothing
// it overwrites a local variable which is never read
// next time the closure runs, it will start again as the captured value
$some_resource = null;
};
return $fn;
}

$fn = get_callback();
echo "Before callback\n";
$fn();
echo "After callback\n";
unset($some_resource);
echo "After destroying outer var\n";
// the captured reference is still live here, no matter how many times we call $fn()
// only destroying the closure frees it
unset($fn);
echo "After destroying closure\n";

One way of thinking of it is that assignments inside a closure are
assignments to a local variable, which "shadow" any captured variable
with the same name. If all you do with a variable is shadow it, then it
is illogical to consider it "used" in that function.

Are
optimizations going to be applied to single-line arrow functions (I
didn't see that in the RFC, but I admittedly didn't look that hard and
I vaguely remember reading something about it in one of these
threads)?

I would expect so, yes. It could be considered a bug that the arrow
function implementation currently "over-captures" variables, and it only
wasn't a higher priority in Nikita's RFC because it is extremely rare
that a single expression closure would have any local variables. Indeed,
that lack of local scope is one of the big reasons why I and others
supported that RFC, because it avoids all the confusion evident in
today's messages.

Regards,

--
Rowan Tommins
[IMSoP]

3 years ago by Guilliam Xavier — view source

unread

On Thu, Jun 30, 2022 at 11:20 AM Robert Landers
landers.robert@gmail.com wrote:

Imagine some code that looks like this:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
// Free that resource
$some_resource = null;
}

// do some stuff that assumes the exclusive
// lock is still active.

// call the callback that we 'know' frees the resource
$fn();

That's a not unreasonable piece of code to write

For that to work, it would require the variable to be captured by
reference, not value. Writing to a variable captured by value, like
writing to a parameter passed by value, is just writing to a local variable.

In fact, the "optimisation" is in my opinion a critical part of the
semantics, to avoid the opposite problem:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
// Use a variable that happens to have the same name
// A naive implementation would see $some_resource mentioned, and
capture it
// Over-writing the local variable here makes no difference; the
closure still holds the value for next time
$some_resource = 'hello';
}

// Free what we believe is the last pointer, to trigger the destructor
unset($some_resource);

// If $some_resource gets captured, it can only be released by
destroying the closure
unset($fn);

Regards,

--
Rowan Tommins
[IMSoP]

--

To unsubscribe, visit: https://www.php.net/unsub.php

For that to work, it would require the variable to be captured by
reference, not value.

I think their suggested code would work (at least currently in PHP) by
the simple fact they would increase the reference count on that
object/resource until they set it as null. However, with the
"optimization," the reference count will never be incremented and thus
fail to work as defined.

--

To unsubscribe, visit: https://www.php.net/unsub.php

No offense, but why don't you just try it? Please see equivalents of:

Dan's code: https://3v4l.org/51jXY => doesn't "work"
Dan's code with capture by reference (as said by Rowan):
https://3v4l.org/JoUVi => "works"
Rowan's code: https://3v4l.org/7ZVv3 => shows the "problem" with capture

PS: I see that Rowan just replied with refcount explanations. I agree
(but am sending this anyway)

Regards,

--
Guilliam Xavier

3 years ago by Arnaud Le Blanc — view source

unread

Hi,

On jeudi 30 juin 2022 00:31:44 CEST Dan Ackroyd wrote:

The conversation has died down, so we'll be opening the vote for this
tomorrow.
I think I've just thought of a problem with the optimization bit of
'not capturing variables if they are written to before being used
inside the closure'.

Imagine some code that looks like this:

// Acquire some resource e.g. an exclusive lock.
$some_resource = acquire_some_resource();

$fn = fn () {
// Free that resource
$some_resource = null;
}

// do some stuff that assumes the exclusive
// lock is still active.

// call the callback that we 'know' frees the resource
$fn();

That's a not unreasonable piece of code to write even if it's of a
style many people avoid. I believe in C++ it's called "Resource
acquisition is initialization", though they're trying to change the
name to "Scope-Bound Resource Management" as that is a better
description of what it is.

I feel that the RAII pattern aka SBRM / Scope-Bound Resource Management is not
relevant in PHP context, and I don't believe that it's commonly used in PHP or
in garbage collected language.

Also, in this particular code example, using an explicit fclose() would be
better in every way, including legibility and reliability, so this doesn't
appear to be realistic code.

Because of this, I don't think that we should be taking decisions on this
feature based on this use case.

I've used the RAII pattern in PHP to manage temporary files, as a best-effort
way to remove them (in a destructor) when they are not used anymore. However I
would not rely on this for anything more critical or anything that requires
predictability in resource release timing.

RAII is useful in C++ because memory is managed manually. This is not the case
in PHP.

It's also useful in C++ to manage other kinds of resources such as file
pointers or locks. In PHP it would be dangerous because you don't
realistically control the lifetime of values, so you also don't control the
timing at which the resources are closed. It's too easy to extend the lifetime
of a value accidentally.

One way the lifetime of a value could be extended is via a reference cycle.
These are easy to introduce and difficult to prevent or observe (e.g. in a
test or in an assertion). An other way would be by referencing the value
somewhere else. You can not guarantee that the lifetime of a value is
unaffected after passing it to a function.

In C++ it's different because no code would implicitly keep a reference to a
variable passed to it unless it was part of that code's contract, or unless
the variable was refcounted.

Another factor that makes RAII un-viable in PHP is that the order of the
destructor calls is unspecified. Currently, if multiple objects go out of
scope at the same time, they happen to be called in a FIFO order, which is not
what is needed when using the RAII pattern [0][1].

I think that RAII can only realistically be used in a non-managed, non-
refcounted, non-GC language. GC or reference counting should not be used to
manage anything else than memory allocation.

Other languages typically have other ways to explicitly manage the lifetime of
resources. Go has defer() [2]. Python has context managers / with [3], C#
has using [4]. with and using can be implemented in userland in PHP.

Because of all these reasons, I don't think that RAII in PHP is practical or
actually used. So I don't think that we should be taking decisions on Short
Closures based on this use case.

With the optimization in place, that code would not behave
consistently with how the rest of PHP works

There exist no circumstance in PHP in which the existence of the statement
$a = null would extend the lifetime of the value bound to $a.

[0] Destructor order PHP: https://3v4l.org/iGAPj
[1] Destructor order C++: https://godbolt.org/z/f78Pa9j69
[2] https://go.dev/doc/effective_go#defer
[3] https://docs.python.org/3/reference/compound_stmts.html#with
[4] https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/
keywords/using-statement

Cheers,

Arnaud Le Blanc

3 years ago by Rowan Tommins — view source

unread

I feel that the RAII pattern aka SBRM / Scope-Bound Resource Management is not
relevant in PHP context, and I don't believe that it's commonly used in PHP or
in garbage collected language.

I've used a simple version of the pattern effectively to implement
transactions: if the Transaction object goes out of scope without being
explicitly committed or rolled back, it assumes the program hit an
unexpected error condition and rolls back.

One way the lifetime of a value could be extended is via a reference cycle.
These are easy to introduce and difficult to prevent or observe (e.g. in a
test or in an assertion).

I would expect reference cycles to be pretty rare in most code,
particularly when you're dealing with a value with a short lifetime as
is involved in most RAII scenarios.

The worst-case release of the cycle can also be made predictable by
running gc_collect_cycles()

An other way would be by referencing the value
somewhere else. You can not guarantee that the lifetime of a value is
unaffected after passing it to a function.

Surely the only way to avoid that is with something like Rust's "borrow
checker"? Otherwise, any function that has a reference to something can
extend the lifetime of that reference by storing it inside some other
structure with a longer lifetime. Manually freeing the underlying
resource then just leads to a "use after free" error.

Another factor that makes RAII un-viable in PHP is that the order of the
destructor calls is unspecified. Currently, if multiple objects go out of
scope at the same time, they happen to be called in a FIFO order, which is not
what is needed when using the RAII pattern [0][1].

I can imagine this would be a problem for some advanced uses of the
pattern, but for a simple "acquire lock, release on scope exit" or
"start transaction, rollback on unexpected scope exit", it's generally
not relevant.

Other languages typically have other ways to explicitly manage the lifetime of
resources. Go has defer() [2]. Python has context managers / with [3], C#
has using [4]. with and using can be implemented in userland in PHP.

My understanding is that C#'s "using" is indeed about deterministic
destruction, but Pythons's "with" is a more powerful
inversion-of-control mechanism. I would actually really love to have
some version of Python's context managers in PHP, and think it would be
a better alternative to closures in a lot of cases.

For instance, a motivation cited in support of auto-capture is something
like this:

function doSomething($a, $b, $c) {
   return $db->doInTransaction(fn() {
       // use $a, $b, and $c
       // roll back on exception, commit otherwise
       return $theActualResult;
   }
}

But this is actually quite a "heavy" implementation: we create a
Closure, capture values, enter a new stack frame, and have two return
statements, just to wrap the code in try...catch...finally boilerplate.

The equivalent with a context manager would look something like this:

function doSomething($a, $b, $c) {
   with ( $db->startTransaction() as $transaction ) {
       // use $a, $b, and $c
       // roll back on exception, commit otherwise
       return $theActualResult;
   }
}

Here, the with statement doesn't create a new stack frame, it just
triggers a series of callbacks for the boilerplate at the start and end
of the block. No variables need to be captured, because they are all
still available, and "return" returns from the doSomething() function,
not the transaction wrapper.

The explanation of how Python's implementation works and why is an
interesting read: https://peps.python.org/pep-0343/

Regards,

--
Rowan Tommins
[IMSoP]

[RFC] Short Closures 2, aka auto-capture take 3

nesting these functions within each other

capturing $this

nesting these functions within each other

capturing $this

nesting these functions within each other

capturing $this

nesting these functions within each other

capturing $this

nesting these functions within each other

capturing $this

Regards,

Cheers,

Cheers

Cheers,

capturing `$this`

capturing `$this`

capturing `$this`

capturing `$this`

capturing `$this`